[v6,1/2] eal/ticketlock: ticket based to improve fairness

Message ID 1552632988-80787-2-git-send-email-joyce.kong@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series ticketlock: implement ticketlock and add test case |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Performance-Testing fail build patch failure
ci/Intel-compilation fail Compilation issues

Commit Message

Joyce Kong March 15, 2019, 6:56 a.m. UTC
  The spinlock implementation is unfair, some threads may take locks
aggressively while leaving the other threads starving for long time.

This patch introduces ticketlock which gives each waiting thread a
ticket and they can take the lock one by one. First come, first serviced.
This avoids starvation for too long time and is more predictable.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Joyce kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 MAINTAINERS                                        |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/arch/arm/rte_ticketlock.h       |  64 +++++
 .../common/include/generic/rte_ticketlock.h        | 308 +++++++++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 6 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_ticketlock.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_ticketlock.h
  

Comments

Ananyev, Konstantin March 15, 2019, 12:55 p.m. UTC | #1
Hi,

> diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> new file mode 100644
> index 0000000..d63aaaa
> --- /dev/null
> +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> @@ -0,0 +1,308 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_TICKETLOCK_H_
> +#define _RTE_TICKETLOCK_H_
> +
> +/**
> + * @file
> + *
> + * RTE ticket locks
> + *
> + * This file defines an API for ticket locks, which give each waiting
> + * thread a ticket and take the lock one by one, first come, first
> + * serviced.
> + *
> + * All locks must be initialised before use, and only initialised once.
> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_common.h>
> +#include <rte_lcore.h>
> +#include <rte_pause.h>
> +
> +/**
> + * The rte_ticketlock_t type.
> + */
> +typedef struct {
> +	uint16_t current;
> +	uint16_t next;
> +} rte_ticketlock_t;
> +
> +/**
> + * A static ticketlock initializer.
> + */
> +#define RTE_TICKETLOCK_INITIALIZER { 0 }
> +
> +/**
> + * Initialize the ticketlock to an unlocked state.
> + *
> + * @param tl
> + *   A pointer to the ticketlock.
> + */
> +static inline __rte_experimental void
> +rte_ticketlock_init(rte_ticketlock_t *tl)
> +{
> +	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
> +}
> +
> +/**
> + * Take the ticketlock.
> + *
> + * @param tl
> + *   A pointer to the ticketlock.
> + */
> +static inline __rte_experimental void
> +rte_ticketlock_lock(rte_ticketlock_t *tl)
> +{
> +	uint16_t me = __atomic_fetch_add(&tl->next, 1, __ATOMIC_RELAXED);
> +	while (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) != me)
> +		rte_pause();
> +}
> +
> +/**
> + * Release the ticketlock.
> + *
> + * @param tl
> + *   A pointer to the ticketlock.
> + */
> +static inline __rte_experimental void
> +rte_ticketlock_unlock(rte_ticketlock_t *tl)
> +{
> +	uint16_t i = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> +	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
> +}
> +
> +/**
> + * Try to take the lock.
> + *
> + * @param tl
> + *   A pointer to the ticketlock.
> + * @return
> + *   1 if the lock is successfully taken; 0 otherwise.
> + */
> +static inline __rte_experimental int
> +rte_ticketlock_trylock(rte_ticketlock_t *tl)
> +{
> +	uint16_t next = __atomic_load_n(&tl->next, __ATOMIC_RELAXED);
> +	uint16_t cur = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> +	if (next == cur) {

Probably a naïve one:
Suppose next==cur==1 here, then this thread will experience really long context switch,
so next time it continues its execution tl->next value will wrap-up and will be 1 again, and tl->current==0 (lock held).
I suppose this function will set tl->next=2 and will return a success?
Wouldn't be better here and in _is_locked_ to do load/store for next/current values in one go
(using 32bit op)? 
Konstantin

> +		if (__atomic_compare_exchange_n(&tl->next, &next, next+1,
> +		    0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * Test if the lock is taken.
> + *
> + * @param tl
> + *   A pointer to the ticketlock.
> + * @return
> + *   1 if the lock icurrently taken; 0 otherwise.
> + */
> +static inline __rte_experimental int
> +rte_ticketlock_is_locked(rte_ticketlock_t *tl)
> +{
> +	return (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) !=
> +		__atomic_load_n(&tl->next, __ATOMIC_ACQUIRE));
> +}
> +
  
Gavin Hu March 19, 2019, 9:44 a.m. UTC | #2
Hi Konstantin, 

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Friday, March 15, 2019 8:56 PM
> To: Joyce Kong (Arm Technology China) <Joyce.Kong@arm.com>;
> dev@dpdk.org
> Cc: nd <nd@arm.com>; stephen@networkplumber.org;
> jerin.jacob@caviumnetworks.com; thomas@monjalon.net; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v6 1/2] eal/ticketlock: ticket based to
> improve fairness
> 
> Hi,
> 
> > diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h
> b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > new file mode 100644
> > index 0000000..d63aaaa
> > --- /dev/null
> > +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > @@ -0,0 +1,308 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_TICKETLOCK_H_
> > +#define _RTE_TICKETLOCK_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE ticket locks
> > + *
> > + * This file defines an API for ticket locks, which give each waiting
> > + * thread a ticket and take the lock one by one, first come, first
> > + * serviced.
> > + *
> > + * All locks must be initialised before use, and only initialised once.
> > + *
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_common.h>
> > +#include <rte_lcore.h>
> > +#include <rte_pause.h>
> > +
> > +/**
> > + * The rte_ticketlock_t type.
> > + */
> > +typedef struct {
> > +	uint16_t current;
> > +	uint16_t next;
> > +} rte_ticketlock_t;
> > +
> > +/**
> > + * A static ticketlock initializer.
> > + */
> > +#define RTE_TICKETLOCK_INITIALIZER { 0 }
> > +
> > +/**
> > + * Initialize the ticketlock to an unlocked state.
> > + *
> > + * @param tl
> > + *   A pointer to the ticketlock.
> > + */
> > +static inline __rte_experimental void
> > +rte_ticketlock_init(rte_ticketlock_t *tl)
> > +{
> > +	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
> > +	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
> > +}
> > +
> > +/**
> > + * Take the ticketlock.
> > + *
> > + * @param tl
> > + *   A pointer to the ticketlock.
> > + */
> > +static inline __rte_experimental void
> > +rte_ticketlock_lock(rte_ticketlock_t *tl)
> > +{
> > +	uint16_t me = __atomic_fetch_add(&tl->next, 1,
> __ATOMIC_RELAXED);
> > +	while (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) != me)
> > +		rte_pause();
> > +}
> > +
> > +/**
> > + * Release the ticketlock.
> > + *
> > + * @param tl
> > + *   A pointer to the ticketlock.
> > + */
> > +static inline __rte_experimental void
> > +rte_ticketlock_unlock(rte_ticketlock_t *tl)
> > +{
> > +	uint16_t i = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > +	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
> > +}
> > +
> > +/**
> > + * Try to take the lock.
> > + *
> > + * @param tl
> > + *   A pointer to the ticketlock.
> > + * @return
> > + *   1 if the lock is successfully taken; 0 otherwise.
> > + */
> > +static inline __rte_experimental int
> > +rte_ticketlock_trylock(rte_ticketlock_t *tl)
> > +{
> > +	uint16_t next = __atomic_load_n(&tl->next, __ATOMIC_RELAXED);
> > +	uint16_t cur = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > +	if (next == cur) {
> 
> Probably a naïve one:
> Suppose next==cur==1 here, then this thread will experience really long
> context switch,

By saying context switch, do you mean running to here, it is out of CPU time and starving for CPU? 

> so next time it continues its execution tl->next value will wrap-up and will
> be 1 again, and tl->current==0 (lock held).
> I suppose this function will set tl->next=2 and will return a success?

If this thread was swapped out and another thread took/attempted to take the lock, yes, tl->next == 2 here,
But as next == 1 unchanged, so it would not return a success. 

> Wouldn't be better here and in _is_locked_ to do load/store for
> next/current values in one go
> (using 32bit op)?
> Konstantin

To load both in one go is feasible, but no necessary as we need to compare them. 
We don't need store both as in this function tl->current is read only.
tl->next is read-update-store, I ever thought of combining the two if-statements to one __atomic_compare_exchange_n(&(&tl->next,&tl->current, tl->next+1, ...),
but tl->next+1 is out of atomicity and may be the old value and corrupt the ticket lock waiting chain. 

The current code works ok except it may fail spuriously(in case during context switch, the lock was taken and released by other threads, moving tl->next forward, in this case
The lock is available but not obtained by this trylock).  Anyway, as the name suggests, it is a try/attempt, a spurious fail is not a big deal? And in most cases, dpdk running on dedicated cores,
the context switch will not happen at all. 

Any more comments are welcome!
> 
> > +		if (__atomic_compare_exchange_n(&tl->next, &next,
> next+1,
> > +		    0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> > +			return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * Test if the lock is taken.
> > + *
> > + * @param tl
> > + *   A pointer to the ticketlock.
> > + * @return
> > + *   1 if the lock icurrently taken; 0 otherwise.
> > + */
> > +static inline __rte_experimental int
> > +rte_ticketlock_is_locked(rte_ticketlock_t *tl)
> > +{
> > +	return (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) !=
> > +		__atomic_load_n(&tl->next, __ATOMIC_ACQUIRE));
> > +}
> > +
  
Ananyev, Konstantin March 19, 2019, 10:15 a.m. UTC | #3
Hi Gavin,

> >
> > > diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > new file mode 100644
> > > index 0000000..d63aaaa
> > > --- /dev/null
> > > +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > @@ -0,0 +1,308 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_TICKETLOCK_H_
> > > +#define _RTE_TICKETLOCK_H_
> > > +
> > > +/**
> > > + * @file
> > > + *
> > > + * RTE ticket locks
> > > + *
> > > + * This file defines an API for ticket locks, which give each waiting
> > > + * thread a ticket and take the lock one by one, first come, first
> > > + * serviced.
> > > + *
> > > + * All locks must be initialised before use, and only initialised once.
> > > + *
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <rte_common.h>
> > > +#include <rte_lcore.h>
> > > +#include <rte_pause.h>
> > > +
> > > +/**
> > > + * The rte_ticketlock_t type.
> > > + */
> > > +typedef struct {
> > > +	uint16_t current;
> > > +	uint16_t next;
> > > +} rte_ticketlock_t;
> > > +
> > > +/**
> > > + * A static ticketlock initializer.
> > > + */
> > > +#define RTE_TICKETLOCK_INITIALIZER { 0 }
> > > +
> > > +/**
> > > + * Initialize the ticketlock to an unlocked state.
> > > + *
> > > + * @param tl
> > > + *   A pointer to the ticketlock.
> > > + */
> > > +static inline __rte_experimental void
> > > +rte_ticketlock_init(rte_ticketlock_t *tl)
> > > +{
> > > +	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
> > > +	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
> > > +}
> > > +
> > > +/**
> > > + * Take the ticketlock.
> > > + *
> > > + * @param tl
> > > + *   A pointer to the ticketlock.
> > > + */
> > > +static inline __rte_experimental void
> > > +rte_ticketlock_lock(rte_ticketlock_t *tl)
> > > +{
> > > +	uint16_t me = __atomic_fetch_add(&tl->next, 1,
> > __ATOMIC_RELAXED);
> > > +	while (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) != me)
> > > +		rte_pause();
> > > +}
> > > +
> > > +/**
> > > + * Release the ticketlock.
> > > + *
> > > + * @param tl
> > > + *   A pointer to the ticketlock.
> > > + */
> > > +static inline __rte_experimental void
> > > +rte_ticketlock_unlock(rte_ticketlock_t *tl)
> > > +{
> > > +	uint16_t i = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > > +	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
> > > +}
> > > +
> > > +/**
> > > + * Try to take the lock.
> > > + *
> > > + * @param tl
> > > + *   A pointer to the ticketlock.
> > > + * @return
> > > + *   1 if the lock is successfully taken; 0 otherwise.
> > > + */
> > > +static inline __rte_experimental int
> > > +rte_ticketlock_trylock(rte_ticketlock_t *tl)
> > > +{
> > > +	uint16_t next = __atomic_load_n(&tl->next, __ATOMIC_RELAXED);
> > > +	uint16_t cur = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > > +	if (next == cur) {
> >
> > Probably a naïve one:
> > Suppose next==cur==1 here, then this thread will experience really long
> > context switch,
> 
> By saying context switch, do you mean running to here, it is out of CPU time and starving for CPU?

Yes.

> 
> > so next time it continues its execution tl->next value will wrap-up and will
> > be 1 again, and tl->current==0 (lock held).
> > I suppose this function will set tl->next=2 and will return a success?
> 
> If this thread was swapped out and another thread took/attempted to take the lock, yes, tl->next == 2 here,
> But as next == 1 unchanged, so it would not return a success.

I am not talking about situation when tl->next == 2,tl->current==1 (just one lock() was executed by different thread).
I am talking about situation when this thread was out of cpu for significant amount of cycles,
and in that period tl->next and tl->current were wrapped around (they both reached UINT16_MAX, then 0).
i.e. UINT16_MAX lock/unlock were executed while this thread was away from cpu.
After that another thread just did successful lock(), so tl->next==1 and tl->current==0.
Now this thread wakeups and continues with:
__atomic_compare_exchange_n(&tl->next, &next, next+1, ...)
As both tl->next==1 and next==1, it will succeed.
So we have 2 threads assuming they grabbed the lock successfully. 
Konstantin

> 
> > Wouldn't be better here and in _is_locked_ to do load/store for
> > next/current values in one go
> > (using 32bit op)?
> > Konstantin
> 
> To load both in one go is feasible, but no necessary as we need to compare them.
> We don't need store both as in this function tl->current is read only.
> tl->next is read-update-store, I ever thought of combining the two if-statements to one __atomic_compare_exchange_n(&(&tl->next,&tl-
> >current, tl->next+1, ...),
> but tl->next+1 is out of atomicity and may be the old value and corrupt the ticket lock waiting chain.
> 
> The current code works ok except it may fail spuriously(in case during context switch, the lock was taken and released by other threads,
> moving tl->next forward, in this case
> The lock is available but not obtained by this trylock).  Anyway, as the name suggests, it is a try/attempt, a spurious fail is not a big deal?
> And in most cases, dpdk running on dedicated cores,
> the context switch will not happen at all.
> 
> Any more comments are welcome!
> >
> > > +		if (__atomic_compare_exchange_n(&tl->next, &next,
> > next+1,
> > > +		    0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> > > +			return 1;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * Test if the lock is taken.
> > > + *
> > > + * @param tl
> > > + *   A pointer to the ticketlock.
> > > + * @return
> > > + *   1 if the lock icurrently taken; 0 otherwise.
> > > + */
> > > +static inline __rte_experimental int
> > > +rte_ticketlock_is_locked(rte_ticketlock_t *tl)
> > > +{
> > > +	return (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) !=
> > > +		__atomic_load_n(&tl->next, __ATOMIC_ACQUIRE));
> > > +}
> > > +
  
Gavin Hu March 20, 2019, 5:11 a.m. UTC | #4
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Tuesday, March 19, 2019 6:15 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; stephen@networkplumber.org;
> jerin.jacob@caviumnetworks.com; thomas@monjalon.net; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Joyce Kong (Arm Technology
> China) <Joyce.Kong@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v6 1/2] eal/ticketlock: ticket based to improve
> fairness
> 
> 
> Hi Gavin,
> 
> > >
> > > > diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > new file mode 100644
> > > > index 0000000..d63aaaa
> > > > --- /dev/null
> > > > +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > @@ -0,0 +1,308 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright(c) 2019 Arm Limited
> > > > + */
> > > > +
> > > > +#ifndef _RTE_TICKETLOCK_H_
> > > > +#define _RTE_TICKETLOCK_H_
> > > > +
> > > > +/**
> > > > + * @file
> > > > + *
> > > > + * RTE ticket locks
> > > > + *
> > > > + * This file defines an API for ticket locks, which give each waiting
> > > > + * thread a ticket and take the lock one by one, first come, first
> > > > + * serviced.
> > > > + *
> > > > + * All locks must be initialised before use, and only initialised once.
> > > > + *
> > > > + */
> > > > +
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +#include <rte_common.h>
> > > > +#include <rte_lcore.h>
> > > > +#include <rte_pause.h>
> > > > +
> > > > +/**
> > > > + * The rte_ticketlock_t type.
> > > > + */
> > > > +typedef struct {
> > > > +	uint16_t current;
> > > > +	uint16_t next;
> > > > +} rte_ticketlock_t;
> > > > +
> > > > +/**
> > > > + * A static ticketlock initializer.
> > > > + */
> > > > +#define RTE_TICKETLOCK_INITIALIZER { 0 }
> > > > +
> > > > +/**
> > > > + * Initialize the ticketlock to an unlocked state.
> > > > + *
> > > > + * @param tl
> > > > + *   A pointer to the ticketlock.
> > > > + */
> > > > +static inline __rte_experimental void
> > > > +rte_ticketlock_init(rte_ticketlock_t *tl)
> > > > +{
> > > > +	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
> > > > +	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
> > > > +}
> > > > +
> > > > +/**
> > > > + * Take the ticketlock.
> > > > + *
> > > > + * @param tl
> > > > + *   A pointer to the ticketlock.
> > > > + */
> > > > +static inline __rte_experimental void
> > > > +rte_ticketlock_lock(rte_ticketlock_t *tl)
> > > > +{
> > > > +	uint16_t me = __atomic_fetch_add(&tl->next, 1,
> > > __ATOMIC_RELAXED);
> > > > +	while (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) != me)
> > > > +		rte_pause();
> > > > +}
> > > > +
> > > > +/**
> > > > + * Release the ticketlock.
> > > > + *
> > > > + * @param tl
> > > > + *   A pointer to the ticketlock.
> > > > + */
> > > > +static inline __rte_experimental void
> > > > +rte_ticketlock_unlock(rte_ticketlock_t *tl)
> > > > +{
> > > > +	uint16_t i = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > > > +	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
> > > > +}
> > > > +
> > > > +/**
> > > > + * Try to take the lock.
> > > > + *
> > > > + * @param tl
> > > > + *   A pointer to the ticketlock.
> > > > + * @return
> > > > + *   1 if the lock is successfully taken; 0 otherwise.
> > > > + */
> > > > +static inline __rte_experimental int
> > > > +rte_ticketlock_trylock(rte_ticketlock_t *tl)
> > > > +{
> > > > +	uint16_t next = __atomic_load_n(&tl->next, __ATOMIC_RELAXED);
> > > > +	uint16_t cur = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > > > +	if (next == cur) {
> > >
> > > Probably a naïve one:
> > > Suppose next==cur==1 here, then this thread will experience really long
> > > context switch,
> >
> > By saying context switch, do you mean running to here, it is out of CPU time
> and starving for CPU?
> 
> Yes.
> 
> >
> > > so next time it continues its execution tl->next value will wrap-up and will
> > > be 1 again, and tl->current==0 (lock held).
> > > I suppose this function will set tl->next=2 and will return a success?
> >
> > If this thread was swapped out and another thread took/attempted to take
> the lock, yes, tl->next == 2 here,
> > But as next == 1 unchanged, so it would not return a success.
> 
> I am not talking about situation when tl->next == 2,tl->current==1 (just one
> lock() was executed by different thread).
> I am talking about situation when this thread was out of cpu for significant
> amount of cycles,
> and in that period tl->next and tl->current were wrapped around (they both
> reached UINT16_MAX, then 0).
> i.e. UINT16_MAX lock/unlock were executed while this thread was away from
> cpu.
> After that another thread just did successful lock(), so tl->next==1 and tl-
> >current==0.
> Now this thread wakeups and continues with:
> __atomic_compare_exchange_n(&tl->next, &next, next+1, ...)
> As both tl->next==1 and next==1, it will succeed.
> So we have 2 threads assuming they grabbed the lock successfully.
> Konstantin
> 
Now I understood your points, but not sure if it is a rare or even impossible case for this thread stalls for CPU and during this time, the other threads have taken the lock for 2^16 times, to wrap up. 

Anyway I made a patch, currently in internal review to fix this issue, the basic idea is to compare not only the next, but also the current, and update the next(+1 and take the lock) only if both of them were not changed(or wrapped up and the lock released).
I will submit the patch after internal review approved. Please let me know if you have more comments.

> >
> > > Wouldn't be better here and in _is_locked_ to do load/store for
> > > next/current values in one go
> > > (using 32bit op)?
> > > Konstantin
> >
> > To load both in one go is feasible, but no necessary as we need to compare
> them.
> > We don't need store both as in this function tl->current is read only.
> > tl->next is read-update-store, I ever thought of combining the two if-
> statements to one __atomic_compare_exchange_n(&(&tl->next,&tl-
> > >current, tl->next+1, ...),
> > but tl->next+1 is out of atomicity and may be the old value and corrupt the
> ticket lock waiting chain.
> >
> > The current code works ok except it may fail spuriously(in case during
> context switch, the lock was taken and released by other threads,
> > moving tl->next forward, in this case
> > The lock is available but not obtained by this trylock).  Anyway, as the name
> suggests, it is a try/attempt, a spurious fail is not a big deal?
> > And in most cases, dpdk running on dedicated cores,
> > the context switch will not happen at all.
> >
> > Any more comments are welcome!
> > >
> > > > +		if (__atomic_compare_exchange_n(&tl->next, &next,
> > > next+1,
> > > > +		    0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> > > > +			return 1;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Test if the lock is taken.
> > > > + *
> > > > + * @param tl
> > > > + *   A pointer to the ticketlock.
> > > > + * @return
> > > > + *   1 if the lock icurrently taken; 0 otherwise.
> > > > + */
> > > > +static inline __rte_experimental int
> > > > +rte_ticketlock_is_locked(rte_ticketlock_t *tl)
> > > > +{
> > > > +	return (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) !=
> > > > +		__atomic_load_n(&tl->next, __ATOMIC_ACQUIRE));
> > > > +}
> > > > +
  
Ananyev, Konstantin March 20, 2019, 9:47 a.m. UTC | #5
Hi Gavin,
> > > >
> > > > > diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > > new file mode 100644
> > > > > index 0000000..d63aaaa
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > > @@ -0,0 +1,308 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + * Copyright(c) 2019 Arm Limited
> > > > > + */
> > > > > +
> > > > > +#ifndef _RTE_TICKETLOCK_H_
> > > > > +#define _RTE_TICKETLOCK_H_
> > > > > +
> > > > > +/**
> > > > > + * @file
> > > > > + *
> > > > > + * RTE ticket locks
> > > > > + *
> > > > > + * This file defines an API for ticket locks, which give each waiting
> > > > > + * thread a ticket and take the lock one by one, first come, first
> > > > > + * serviced.
> > > > > + *
> > > > > + * All locks must be initialised before use, and only initialised once.
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +extern "C" {
> > > > > +#endif
> > > > > +
> > > > > +#include <rte_common.h>
> > > > > +#include <rte_lcore.h>
> > > > > +#include <rte_pause.h>
> > > > > +
> > > > > +/**
> > > > > + * The rte_ticketlock_t type.
> > > > > + */
> > > > > +typedef struct {
> > > > > +	uint16_t current;
> > > > > +	uint16_t next;
> > > > > +} rte_ticketlock_t;
> > > > > +
> > > > > +/**
> > > > > + * A static ticketlock initializer.
> > > > > + */
> > > > > +#define RTE_TICKETLOCK_INITIALIZER { 0 }
> > > > > +
> > > > > +/**
> > > > > + * Initialize the ticketlock to an unlocked state.
> > > > > + *
> > > > > + * @param tl
> > > > > + *   A pointer to the ticketlock.
> > > > > + */
> > > > > +static inline __rte_experimental void
> > > > > +rte_ticketlock_init(rte_ticketlock_t *tl)
> > > > > +{
> > > > > +	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
> > > > > +	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * Take the ticketlock.
> > > > > + *
> > > > > + * @param tl
> > > > > + *   A pointer to the ticketlock.
> > > > > + */
> > > > > +static inline __rte_experimental void
> > > > > +rte_ticketlock_lock(rte_ticketlock_t *tl)
> > > > > +{
> > > > > +	uint16_t me = __atomic_fetch_add(&tl->next, 1,
> > > > __ATOMIC_RELAXED);
> > > > > +	while (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) != me)
> > > > > +		rte_pause();
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * Release the ticketlock.
> > > > > + *
> > > > > + * @param tl
> > > > > + *   A pointer to the ticketlock.
> > > > > + */
> > > > > +static inline __rte_experimental void
> > > > > +rte_ticketlock_unlock(rte_ticketlock_t *tl)
> > > > > +{
> > > > > +	uint16_t i = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > > > > +	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * Try to take the lock.
> > > > > + *
> > > > > + * @param tl
> > > > > + *   A pointer to the ticketlock.
> > > > > + * @return
> > > > > + *   1 if the lock is successfully taken; 0 otherwise.
> > > > > + */
> > > > > +static inline __rte_experimental int
> > > > > +rte_ticketlock_trylock(rte_ticketlock_t *tl)
> > > > > +{
> > > > > +	uint16_t next = __atomic_load_n(&tl->next, __ATOMIC_RELAXED);
> > > > > +	uint16_t cur = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
> > > > > +	if (next == cur) {
> > > >
> > > > Probably a naïve one:
> > > > Suppose next==cur==1 here, then this thread will experience really long
> > > > context switch,
> > >
> > > By saying context switch, do you mean running to here, it is out of CPU time
> > and starving for CPU?
> >
> > Yes.
> >
> > >
> > > > so next time it continues its execution tl->next value will wrap-up and will
> > > > be 1 again, and tl->current==0 (lock held).
> > > > I suppose this function will set tl->next=2 and will return a success?
> > >
> > > If this thread was swapped out and another thread took/attempted to take
> > the lock, yes, tl->next == 2 here,
> > > But as next == 1 unchanged, so it would not return a success.
> >
> > I am not talking about situation when tl->next == 2,tl->current==1 (just one
> > lock() was executed by different thread).
> > I am talking about situation when this thread was out of cpu for significant
> > amount of cycles,
> > and in that period tl->next and tl->current were wrapped around (they both
> > reached UINT16_MAX, then 0).
> > i.e. UINT16_MAX lock/unlock were executed while this thread was away from
> > cpu.
> > After that another thread just did successful lock(), so tl->next==1 and tl-
> > >current==0.
> > Now this thread wakeups and continues with:
> > __atomic_compare_exchange_n(&tl->next, &next, next+1, ...)
> > As both tl->next==1 and next==1, it will succeed.
> > So we have 2 threads assuming they grabbed the lock successfully.
> > Konstantin
> >
> Now I understood your points, but not sure if it is a rare or even impossible case for this thread stalls for CPU and during this time, the other
> threads have taken the lock for 2^16 times, to wrap up.

I am agree it should be very rare, but I am not sure it is impossible.
Let say thread is doing lock/unlock in a loop, with one iteration ~100 cycles.
Then it would wrap around in ~6.5M cycles (~3ms on modern cpus).

> 
> Anyway I made a patch, currently in internal review to fix this issue, the basic idea is to compare not only the next, but also the current, and
> update the next(+1 and take the lock) only if both of them were not changed(or wrapped up and the lock released).
> I will submit the patch after internal review approved. Please let me know if you have more comments.

Ok, thanks
Konstantin
  
Gavin Hu March 22, 2019, 2:04 a.m. UTC | #6
Hi Konstantin, 

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Wednesday, March 20, 2019 5:47 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; stephen@networkplumber.org;
> jerin.jacob@caviumnetworks.com; thomas@monjalon.net; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Joyce Kong (Arm Technology
> China) <Joyce.Kong@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v6 1/2] eal/ticketlock: ticket based to improve
> fairness
> 
> Hi Gavin,
> > > > >
> > > > > > diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > > b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > > > new file mode 100644
> > > > > > index 0000000..d63aaaa
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
> > > > > > @@ -0,0 +1,308 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + * Copyright(c) 2019 Arm Limited
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _RTE_TICKETLOCK_H_
> > > > > > +#define _RTE_TICKETLOCK_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * @file
> > > > > > + *
> > > > > > + * RTE ticket locks
> > > > > > + *
> > > > > > + * This file defines an API for ticket locks, which give each waiting
> > > > > > + * thread a ticket and take the lock one by one, first come, first
> > > > > > + * serviced.
> > > > > > + *
> > > > > > + * All locks must be initialised before use, and only initialised once.
> > > > > > + *
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include <rte_common.h>
> > > > > > +#include <rte_lcore.h>
> > > > > > +#include <rte_pause.h>
> > > > > > +
> > > > > > +/**
> > > > > > + * The rte_ticketlock_t type.
> > > > > > + */
> > > > > > +typedef struct {
> > > > > > +	uint16_t current;
> > > > > > +	uint16_t next;
> > > > > > +} rte_ticketlock_t;
> > > > > > +
> > > > > > +/**
> > > > > > + * A static ticketlock initializer.
> > > > > > + */
> > > > > > +#define RTE_TICKETLOCK_INITIALIZER { 0 }
> > > > > > +
> > > > > > +/**
> > > > > > + * Initialize the ticketlock to an unlocked state.
> > > > > > + *
> > > > > > + * @param tl
> > > > > > + *   A pointer to the ticketlock.
> > > > > > + */
> > > > > > +static inline __rte_experimental void
> > > > > > +rte_ticketlock_init(rte_ticketlock_t *tl)
> > > > > > +{
> > > > > > +	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
> > > > > > +	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * Take the ticketlock.
> > > > > > + *
> > > > > > + * @param tl
> > > > > > + *   A pointer to the ticketlock.
> > > > > > + */
> > > > > > +static inline __rte_experimental void
> > > > > > +rte_ticketlock_lock(rte_ticketlock_t *tl)
> > > > > > +{
> > > > > > +	uint16_t me = __atomic_fetch_add(&tl->next, 1,
> > > > > __ATOMIC_RELAXED);
> > > > > > +	while (__atomic_load_n(&tl->current,
> __ATOMIC_ACQUIRE) != me)
> > > > > > +		rte_pause();
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * Release the ticketlock.
> > > > > > + *
> > > > > > + * @param tl
> > > > > > + *   A pointer to the ticketlock.
> > > > > > + */
> > > > > > +static inline __rte_experimental void
> > > > > > +rte_ticketlock_unlock(rte_ticketlock_t *tl)
> > > > > > +{
> > > > > > +	uint16_t i = __atomic_load_n(&tl->current,
> __ATOMIC_RELAXED);
> > > > > > +	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * Try to take the lock.
> > > > > > + *
> > > > > > + * @param tl
> > > > > > + *   A pointer to the ticketlock.
> > > > > > + * @return
> > > > > > + *   1 if the lock is successfully taken; 0 otherwise.
> > > > > > + */
> > > > > > +static inline __rte_experimental int
> > > > > > +rte_ticketlock_trylock(rte_ticketlock_t *tl)
> > > > > > +{
> > > > > > +	uint16_t next = __atomic_load_n(&tl->next,
> __ATOMIC_RELAXED);
> > > > > > +	uint16_t cur = __atomic_load_n(&tl->current,
> __ATOMIC_RELAXED);
> > > > > > +	if (next == cur) {
> > > > >
> > > > > Probably a naïve one:
> > > > > Suppose next==cur==1 here, then this thread will experience really
> long
> > > > > context switch,
> > > >
> > > > By saying context switch, do you mean running to here, it is out of CPU
> time
> > > and starving for CPU?
> > >
> > > Yes.
> > >
> > > >
> > > > > so next time it continues its execution tl->next value will wrap-up and
> will
> > > > > be 1 again, and tl->current==0 (lock held).
> > > > > I suppose this function will set tl->next=2 and will return a success?
> > > >
> > > > If this thread was swapped out and another thread took/attempted to
> take
> > > the lock, yes, tl->next == 2 here,
> > > > But as next == 1 unchanged, so it would not return a success.
> > >
> > > I am not talking about situation when tl->next == 2,tl->current==1 (just
> one
> > > lock() was executed by different thread).
> > > I am talking about situation when this thread was out of cpu for significant
> > > amount of cycles,
> > > and in that period tl->next and tl->current were wrapped around (they
> both
> > > reached UINT16_MAX, then 0).
> > > i.e. UINT16_MAX lock/unlock were executed while this thread was away
> from
> > > cpu.
> > > After that another thread just did successful lock(), so tl->next==1 and tl-
> > > >current==0.
> > > Now this thread wakeups and continues with:
> > > __atomic_compare_exchange_n(&tl->next, &next, next+1, ...)
> > > As both tl->next==1 and next==1, it will succeed.
> > > So we have 2 threads assuming they grabbed the lock successfully.
> > > Konstantin
> > >
> > Now I understood your points, but not sure if it is a rare or even impossible
> case for this thread stalls for CPU and during this time, the other
> > threads have taken the lock for 2^16 times, to wrap up.
> 
> I am agree it should be very rare, but I am not sure it is impossible.
> Let say thread is doing lock/unlock in a loop, with one iteration ~100 cycles.
> Then it would wrap around in ~6.5M cycles (~3ms on modern cpus).
> 

Thanks, agree, your quantitative way of analysis  helped me understand the issue.

> >
> > Anyway I made a patch, currently in internal review to fix this issue, the
> basic idea is to compare not only the next, but also the current, and
> > update the next(+1 and take the lock) only if both of them were not
> changed(or wrapped up and the lock released).
> > I will submit the patch after internal review approved. Please let me know if
> you have more comments.
> 
> Ok, thanks
> Konstantin

The fix was submitted into v7, 1/3, could you help review? 

Thanks!
Gavin
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb..7d87e25 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -210,6 +210,11 @@  M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
 F: lib/librte_eal/common/include/rte_bitmap.h
 F: app/test/test_bitmap.c
 
+Ticketlock
+M: Joyce Kong <joyce.kong@arm.com>
+F: lib/librte_eal/common/include/generic/rte_ticketlock.h
+F: lib/librte_eal/common/include/arch/arm/rte_ticketlock.h
+
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
 M: Gavin Hu <gavin.hu@arm.com>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index d95ad56..aacc66b 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -65,6 +65,7 @@  The public API headers are grouped by topics:
   [atomic]             (@ref rte_atomic.h),
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h)
+  [ticketlock]         (@ref rte_ticketlock.h)
 
 - **CPU arch**:
   [branch prediction]  (@ref rte_branch_prediction.h),
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index c487201..ac3305c 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -20,7 +20,7 @@  INC += rte_bitmap.h rte_vfio.h rte_hypervisor.h rte_test.h
 INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
 
 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
-GENERIC_INC += rte_spinlock.h rte_memcpy.h rte_cpuflags.h rte_rwlock.h
+GENERIC_INC += rte_spinlock.h rte_memcpy.h rte_cpuflags.h rte_rwlock.h rte_ticketlock.h
 GENERIC_INC += rte_vect.h rte_pause.h rte_io.h
 
 # defined in mk/arch/$(RTE_ARCH)/rte.vars.mk
diff --git a/lib/librte_eal/common/include/arch/arm/rte_ticketlock.h b/lib/librte_eal/common/include/arch/arm/rte_ticketlock.h
new file mode 100644
index 0000000..57deb0b
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_ticketlock.h
@@ -0,0 +1,64 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_TICKETLOCK_ARM_H_
+#define _RTE_TICKETLOCK_ARM_H_
+
+#ifndef RTE_FORCE_INTRINSICS
+#  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_common.h>
+#include "generic/rte_ticketlock.h"
+
+static inline int rte_tm_supported(void)
+{
+	return 0;
+}
+
+static inline void
+rte_ticketlock_lock_tm(rte_ticketlock_t *tl)
+{
+	rte_ticketlock_lock(tl); /* fall-back */
+}
+
+static inline int
+rte_ticketlock_trylock_tm(rte_ticketlock_t *tl)
+{
+	return rte_ticketlock_trylock(tl);
+}
+
+static inline void
+rte_ticketlock_unlock_tm(rte_ticketlock_t *tl)
+{
+	rte_ticketlock_unlock(tl);
+}
+
+static inline void
+rte_ticketlock_recursive_lock_tm(rte_ticketlock_recursive_t *tlr)
+{
+	rte_ticketlock_recursive_lock(tlr); /* fall-back */
+}
+
+static inline void
+rte_ticketlock_recursive_unlock_tm(rte_ticketlock_recursive_t *tlr)
+{
+	rte_ticketlock_recursive_unlock(tlr);
+}
+
+static inline int
+rte_ticketlock_recursive_trylock_tm(rte_ticketlock_recursive_t *tlr)
+{
+	return rte_ticketlock_recursive_trylock(tlr);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_TICKETLOCK_ARM_H_ */
diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h b/lib/librte_eal/common/include/generic/rte_ticketlock.h
new file mode 100644
index 0000000..d63aaaa
--- /dev/null
+++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
@@ -0,0 +1,308 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_TICKETLOCK_H_
+#define _RTE_TICKETLOCK_H_
+
+/**
+ * @file
+ *
+ * RTE ticket locks
+ *
+ * This file defines an API for ticket locks, which give each waiting
+ * thread a ticket and take the lock one by one, first come, first
+ * serviced.
+ *
+ * All locks must be initialised before use, and only initialised once.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_common.h>
+#include <rte_lcore.h>
+#include <rte_pause.h>
+
+/**
+ * The rte_ticketlock_t type.
+ */
+typedef struct {
+	uint16_t current;
+	uint16_t next;
+} rte_ticketlock_t;
+
+/**
+ * A static ticketlock initializer.
+ */
+#define RTE_TICKETLOCK_INITIALIZER { 0 }
+
+/**
+ * Initialize the ticketlock to an unlocked state.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ */
+static inline __rte_experimental void
+rte_ticketlock_init(rte_ticketlock_t *tl)
+{
+	__atomic_store_n(&tl->current, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tl->next, 0, __ATOMIC_RELAXED);
+}
+
+/**
+ * Take the ticketlock.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ */
+static inline __rte_experimental void
+rte_ticketlock_lock(rte_ticketlock_t *tl)
+{
+	uint16_t me = __atomic_fetch_add(&tl->next, 1, __ATOMIC_RELAXED);
+	while (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) != me)
+		rte_pause();
+}
+
+/**
+ * Release the ticketlock.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ */
+static inline __rte_experimental void
+rte_ticketlock_unlock(rte_ticketlock_t *tl)
+{
+	uint16_t i = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
+	__atomic_store_n(&tl->current, i+1, __ATOMIC_RELEASE);
+}
+
+/**
+ * Try to take the lock.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_ticketlock_trylock(rte_ticketlock_t *tl)
+{
+	uint16_t next = __atomic_load_n(&tl->next, __ATOMIC_RELAXED);
+	uint16_t cur = __atomic_load_n(&tl->current, __ATOMIC_RELAXED);
+	if (next == cur) {
+		if (__atomic_compare_exchange_n(&tl->next, &next, next+1,
+		    0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+			return 1;
+	}
+
+	return 0;
+}
+
+/**
+ * Test if the lock is taken.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ * @return
+ *   1 if the lock icurrently taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_ticketlock_is_locked(rte_ticketlock_t *tl)
+{
+	return (__atomic_load_n(&tl->current, __ATOMIC_ACQUIRE) !=
+		__atomic_load_n(&tl->next, __ATOMIC_ACQUIRE));
+}
+
+/**
+ * Test if hardware transactional memory (lock elision) is supported
+ *
+ * @return
+ *   1 if the hardware transactional memory is supported; 0 otherwise.
+ */
+static inline int rte_tm_supported(void);
+
+/**
+ * Try to execute critical section in a hardware memory transaction,
+ * if it fails or not available take the ticketlock.
+ *
+ * NOTE: An attempt to perform a HW I/O operation inside a hardware memory
+ * transaction always aborts the transaction since the CPU is not able to
+ * roll-back should the transaction fail. Therefore, hardware transactional
+ * locks are not advised to be used around rte_eth_rx_burst() and
+ * rte_eth_tx_burst() calls.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ */
+static inline void
+rte_ticketlock_lock_tm(rte_ticketlock_t *tl);
+
+/**
+ * Commit hardware memory transaction or release the ticketlock if
+ * the ticketlock is used as a fall-back
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ */
+static inline void
+rte_ticketlock_unlock_tm(rte_ticketlock_t *tl);
+
+/**
+ * Try to execute critical section in a hardware memory transaction,
+ * if it fails or not available try to take the lock.
+ *
+ * NOTE: An attempt to perform a HW I/O operation inside a hardware memory
+ * transaction always aborts the transaction since the CPU is not able to
+ * roll-back should the transaction fail. Therefore, hardware transactional
+ * locks are not advised to be used around rte_eth_rx_burst() and
+ * rte_eth_tx_burst() calls.
+ *
+ * @param tl
+ *   A pointer to the ticketlock.
+ * @return
+ *   1 if the hardware memory transaction is successfully started
+ *   or lock is successfully taken; 0 otherwise.
+ */
+static inline int
+rte_ticketlock_trylock_tm(rte_ticketlock_t *tl);
+
+/**
+ * The rte_ticketlock_recursive_t type.
+ */
+#define TICKET_LOCK_INVALID_ID -1
+
+typedef struct {
+	rte_ticketlock_t tl; /**< the actual ticketlock */
+	int user; /**< core id using lock, TICKET_LOCK_INVALID_ID for unused */
+	unsigned int count; /**< count of time this lock has been called */
+} rte_ticketlock_recursive_t;
+
+/**
+ * A static recursive ticketlock initializer.
+ */
+#define RTE_TICKETLOCK_RECURSIVE_INITIALIZER {RTE_TICKETLOCK_INITIALIZER, \
+				TICKET_LOCK_INVALID_ID, 0}
+
+/**
+ * Initialize the recursive ticketlock to an unlocked state.
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ */
+static inline __rte_experimental void
+rte_ticketlock_recursive_init(rte_ticketlock_recursive_t *tlr)
+{
+	rte_ticketlock_init(&tlr->tl);
+	__atomic_store_n(&tlr->user, TICKET_LOCK_INVALID_ID, __ATOMIC_RELAXED);
+	tlr->count = 0;
+}
+
+/**
+ * Take the recursive ticketlock.
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ */
+static inline __rte_experimental void
+rte_ticketlock_recursive_lock(rte_ticketlock_recursive_t *tlr)
+{
+	int id = rte_gettid();
+
+	if (__atomic_load_n(&tlr->user, __ATOMIC_RELAXED) != id) {
+		rte_ticketlock_lock(&tlr->tl);
+		__atomic_store_n(&tlr->user, id, __ATOMIC_RELAXED);
+	}
+	tlr->count++;
+}
+
+/**
+ * Release the recursive ticketlock.
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ */
+static inline __rte_experimental void
+rte_ticketlock_recursive_unlock(rte_ticketlock_recursive_t *tlr)
+{
+	if (--(tlr->count) == 0) {
+		__atomic_store_n(&tlr->user, TICKET_LOCK_INVALID_ID,
+				 __ATOMIC_RELAXED);
+		rte_ticketlock_unlock(&tlr->tl);
+	}
+}
+
+/**
+ * Try to take the recursive lock.
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_ticketlock_recursive_trylock(rte_ticketlock_recursive_t *tlr)
+{
+	int id = rte_gettid();
+
+	if (__atomic_load_n(&tlr->user, __ATOMIC_RELAXED) != id) {
+		if (rte_ticketlock_trylock(&tlr->tl) == 0)
+			return 0;
+		__atomic_store_n(&tlr->user, id, __ATOMIC_RELAXED);
+	}
+	tlr->count++;
+	return 1;
+}
+
+/**
+ * Try to execute critical section in a hardware memory transaction,
+ * if it fails or not available take the recursive ticketlocks
+ *
+ * NOTE: An attempt to perform a HW I/O operation inside a hardware memory
+ * transaction always aborts the transaction since the CPU is not able to
+ * roll-back should the transaction fail. Therefore, hardware transactional
+ * locks are not advised to be used around rte_eth_rx_burst() and
+ * rte_eth_tx_burst() calls.
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ */
+static inline void
+rte_ticketlock_recursive_lock_tm(rte_ticketlock_recursive_t *tlr);
+
+/**
+ * Commit hardware memory transaction or release the recursive ticketlock
+ * if the recursive ticketlock is used as a fall-back
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ */
+static inline void
+rte_ticketlock_recursive_unlock_tm(rte_ticketlock_recursive_t *tlr);
+
+/**
+ * Try to execute critical section in a hardware memory transaction,
+ * if it fails or not available try to take the recursive lock
+ *
+ * NOTE: An attempt to perform a HW I/O operation inside a hardware memory
+ * transaction always aborts the transaction since the CPU is not able to
+ * roll-back should the transaction fail. Therefore, hardware transactional
+ * locks are not advised to be used around rte_eth_rx_burst() and
+ * rte_eth_tx_burst() calls.
+ *
+ * @param tlr
+ *   A pointer to the recursive ticketlock.
+ * @return
+ *   1 if the hardware memory transaction is successfully started
+ *   or lock is successfully taken; 0 otherwise.
+ */
+static inline int
+rte_ticketlock_recursive_trylock_tm(rte_ticketlock_recursive_t *tlr);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_TICKETLOCK_H_ */
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index 5ecae0b..0670e41 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -99,6 +99,7 @@  generic_headers = files(
 	'include/generic/rte_prefetch.h',
 	'include/generic/rte_rwlock.h',
 	'include/generic/rte_spinlock.h',
+	'include/generic/rte_ticketlock.h',
 	'include/generic/rte_vect.h')
 install_headers(generic_headers, subdir: 'generic')