[v6,1/6] lib/eal: implement the family of rte bit operation APIs

Message ID 1576648808-24765-2-git-send-email-joyce.kong@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series implement common rte bit operation APIs in PMDs |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance fail Performance Testing issues
ci/iol-testing fail Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation fail Compilation issues

Commit Message

Joyce Kong Dec. 18, 2019, 6 a.m. UTC
  There are a lot functions of bit operations scattered and
duplicated in PMDs, consolidating them into a common API
family is necessary. Furthermore, when the bit operation
is applied to the IO devices, use __ATOMIC_ACQ_REL to
ensure the ordering for io bit operation.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 MAINTAINERS                                |   5 +
 doc/api/doxy-api-index.md                  |   5 +-
 lib/librte_eal/common/Makefile             |   1 +
 lib/librte_eal/common/include/rte_bitops.h | 474 +++++++++++++++++++++++++++++
 lib/librte_eal/common/meson.build          |   3 +-
 5 files changed, 485 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_bitops.h
  

Comments

Honnappa Nagarahalli Dec. 20, 2019, 6:52 a.m. UTC | #1
Hi Joyce,
	These APIs seem to be written considering the PMD requirements. Is there a need to expose these to applications (external to DPDK?).

> -----Original Message-----
> From: Joyce Kong <joyce.kong@arm.com>
> Sent: Wednesday, December 18, 2019 12:00 AM
> To: thomas@monjalon.net; stephen@networkplumber.org;
> david.marchand@redhat.com; mb@smartsharesystems.com;
> jerinj@marvell.com; bruce.richardson@intel.com; ravi1.kumar@amd.com;
> rmody@marvell.com; shshaikh@marvell.com; xuanziyang2@huawei.com;
> cloud.wangxiaoyun@huawei.com; zhouguoyang@huawei.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Phil Yang
> <Phil.Yang@arm.com>; Gavin Hu <Gavin.Hu@arm.com>
> Cc: nd <nd@arm.com>; dev@dpdk.org
> Subject: [PATCH v6 1/6] lib/eal: implement the family of rte bit operation
> APIs
> 
> There are a lot functions of bit operations scattered and duplicated in PMDs,
> consolidating them into a common API family is necessary. Furthermore,
> when the bit operation is applied to the IO devices, use __ATOMIC_ACQ_REL
> to ensure the ordering for io bit operation.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  MAINTAINERS                                |   5 +
>  doc/api/doxy-api-index.md                  |   5 +-
>  lib/librte_eal/common/Makefile             |   1 +
>  lib/librte_eal/common/include/rte_bitops.h | 474
> +++++++++++++++++++++++++++++
>  lib/librte_eal/common/meson.build          |   3 +-
>  5 files changed, 485 insertions(+), 3 deletions(-)  create mode 100644
> lib/librte_eal/common/include/rte_bitops.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4395d8d..d2a29a2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -236,6 +236,11 @@ M: Cristian Dumitrescu
> <cristian.dumitrescu@intel.com>
>  F: lib/librte_eal/common/include/rte_bitmap.h
>  F: app/test/test_bitmap.c
> 
> +Bitops
> +M: Joyce Kong <joyce.kong@arm.com>
> +F: lib/librte_eal/common/include/rte_bitops.h
> +F: app/test/test_bitops.c
> +
>  MCSlock - EXPERIMENTAL
>  M: Phil Yang <phil.yang@arm.com>
>  F: lib/librte_eal/common/include/generic/rte_mcslock.h
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index
> dff496b..ade7c01 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -133,12 +133,13 @@ The public API headers are grouped by topics:
>    [BPF]                (@ref rte_bpf.h)
> 
>  - **containers**:
> +  [bitmap]             (@ref rte_bitmap.h),
> +  [bitops]             (@ref rte_bitops.h),
>    [mbuf]               (@ref rte_mbuf.h),
>    [mbuf pool ops]      (@ref rte_mbuf_pool_ops.h),
>    [ring]               (@ref rte_ring.h),
>    [stack]              (@ref rte_stack.h),
> -  [tailq]              (@ref rte_tailq.h),
> -  [bitmap]             (@ref rte_bitmap.h)
> +  [tailq]              (@ref rte_tailq.h)
> 
>  - **packet framework**:
>    * [port]             (@ref rte_port.h):
> diff --git a/lib/librte_eal/common/Makefile
> b/lib/librte_eal/common/Makefile index c2c6d92..dd025c1 100644
> --- a/lib/librte_eal/common/Makefile
> +++ b/lib/librte_eal/common/Makefile
> @@ -19,6 +19,7 @@ INC += rte_malloc.h rte_keepalive.h rte_time.h  INC +=
> rte_service.h rte_service_component.h  INC += rte_bitmap.h rte_vfio.h
> rte_hypervisor.h rte_test.h  INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
> +INC += rte_bitops.h
> 
>  GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
> GENERIC_INC += rte_memcpy.h rte_cpuflags.h diff --git
> a/lib/librte_eal/common/include/rte_bitops.h
> b/lib/librte_eal/common/include/rte_bitops.h
> new file mode 100644
> index 0000000..34158d1
> --- /dev/null
> +++ b/lib/librte_eal/common/include/rte_bitops.h
> @@ -0,0 +1,474 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_BITOPS_H_
> +#define _RTE_BITOPS_H_
> +
> +/**
> + * @file
> + * Bit Operations
> + *
> + * This file defines a API for bit operations without/with memory ordering.
> + */
> +
> +#include <stdint.h>
> +#include <rte_debug.h>
> +#include <rte_compat.h>
> +
> +/*---------------------------- 32 bit operations
> +----------------------------*/
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Get the target bit from a 32-bit value without memory ordering.
> + *
> + * @param nr
> + *   The target bit to get.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The target bit.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_get_bit32_relaxed(unsigned int nr, uint32_t *addr) {
Why not pass the memory order as a parameter? It would reduce the number of API calls by half.

> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	return __atomic_load_n(addr, __ATOMIC_RELAXED) & mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Set the target bit in a 32-bit value to 1 without memory ordering.
> + *
> + * @param nr
> + *   The target bit to set.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_set_bit32_relaxed(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	__atomic_fetch_or(addr, mask, __ATOMIC_RELAXED); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Clear the target bit in a 32-bit value to 0 without memory ordering.
> + *
> + * @param nr
> + *   The target bit to clear.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_clear_bit32_relaxed(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	__atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 32-bit value, then set it to 1
> +without
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and set.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_test_and_set_bit32_relaxed(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	return __atomic_fetch_or(addr, mask, __ATOMIC_RELAXED) & mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 32-bit value, then clear it to 0
> +without
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and clear.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_test_and_clear_bit32_relaxed(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	return __atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED) &
> mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Get the target bit from a 32-bit value with memory ordering.
> + *
> + * @param nr
> + *   The target bit to get.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The target bit.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_get_bit32(unsigned int nr, uint32_t *addr) {
__atomic_load_n takes other memory orders along with relaxed and acquire. The API name needs to change to indicate acquire memory order here?

> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	return __atomic_load_n(addr, __ATOMIC_ACQUIRE) & mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Set the target bit in a 32-bit value to 1 with memory ordering.
> + *
> + * @param nr
> + *   The target bit to set.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_set_bit32(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	__atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL); }
Why not use just '__ATOMIC_RELEASE' here? The full barrier might not be required in all use cases.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Clear the target bit in a 32-bit value to 0 with memory ordering.
> + *
> + * @param nr
> + *   The target bit to clear.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_clear_bit32(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	__atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL); }
Why not use just '__ATOMIC_RELEASE' here? The full barrier might not be required in all use cases. I see similar issue in other APIs below.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 32-bit value, then set it to 1 with
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and set.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_test_and_set_bit32(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	return __atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL) &
> mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 32-bit value, then clear it to 0 with
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and clear.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_test_and_clear_bit32(unsigned int nr, uint32_t *addr) {
> +	RTE_ASSERT(nr < 32);
> +
> +	uint32_t mask = UINT32_C(1) << nr;
> +	return __atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL) &
> mask; }
> +
> +/*---------------------------- 64 bit operations
> +----------------------------*/
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Get the target bit from a 64-bit value without memory ordering.
> + *
> + * @param nr
> + *   The target bit to get.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The target bit.
> + */
> +__rte_experimental
> +static inline uint64_t
> +rte_get_bit64_relaxed(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	return __atomic_load_n(addr, __ATOMIC_RELAXED) & mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Set the target bit in a 64-bit value to 1 without memory ordering.
> + *
> + * @param nr
> + *   The target bit to set.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_set_bit64_relaxed(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	__atomic_fetch_or(addr, mask, __ATOMIC_RELAXED); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Clear the target bit in a 64-bit value to 0 without memory ordering.
> + *
> + * @param nr
> + *   The target bit to clear.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_clear_bit64_relaxed(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	__atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 64-bit value, then set it to 1
> +without
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and set.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint64_t
> +rte_test_and_set_bit64_relaxed(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	return __atomic_fetch_or(addr, mask, __ATOMIC_RELAXED) & mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 64-bit value, then clear it to 0
> +without
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and clear.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint64_t
> +rte_test_and_clear_bit64_relaxed(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	return __atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED) &
> mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Get the target bit from a 64-bit value with memory ordering.
> + *
> + * @param nr
> + *   The target bit to get.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The target bit.
> + */
> +__rte_experimental
> +static inline uint64_t
> +rte_get_bit64(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	return __atomic_load_n(addr, __ATOMIC_ACQUIRE) & mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Set the target bit in a 64-bit value to 1 with memory ordering.
> + *
> + * @param nr
> + *   The target bit to set.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_set_bit64(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	__atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Clear the target bit in a 64-bit value to 0 with memory ordering.
> + *
> + * @param nr
> + *   The target bit to clear.
> + * @param addr
> + *   The address holding the bit.
> + */
> +__rte_experimental
> +static inline void
> +rte_clear_bit64(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	__atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 64-bit value, then set it to 1 with
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and set.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint64_t
> +rte_test_and_set_bit64(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	return __atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL) &
> mask; }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Return the original bit from a 64-bit value, then clear it to 0 with
> + * memory ordering.
> + *
> + * @param nr
> + *   The target bit to get and clear.
> + * @param addr
> + *   The address holding the bit.
> + * @return
> + *   The original bit.
> + */
> +__rte_experimental
> +static inline uint64_t
> +rte_test_and_clear_bit64(unsigned int nr, uint64_t *addr) {
> +	RTE_ASSERT(nr < 64);
> +
> +	uint64_t mask = UINT64_C(1) << nr;
> +	return __atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL) &
> mask; }
> +#endif /* _RTE_BITOPS_H_ */
> diff --git a/lib/librte_eal/common/meson.build
> b/lib/librte_eal/common/meson.build
> index 2b97715..766edbd 100644
> --- a/lib/librte_eal/common/meson.build
> +++ b/lib/librte_eal/common/meson.build
> @@ -50,9 +50,10 @@ common_objs += eal_common_arch_objs
> 
>  common_headers = files(
>  	'include/rte_alarm.h',
> +	'include/rte_bitmap.h',
> +	'include/rte_bitops.h',
>  	'include/rte_branch_prediction.h',
>  	'include/rte_bus.h',
> -	'include/rte_bitmap.h',
>  	'include/rte_class.h',
>  	'include/rte_common.h',
>  	'include/rte_compat.h',
> --
> 2.7.4
  
Honnappa Nagarahalli Dec. 21, 2019, 4:07 p.m. UTC | #2
<snip>

> > Subject: [PATCH v6 1/6] lib/eal: implement the family of rte bit
> > operation APIs
> >
> > There are a lot functions of bit operations scattered and duplicated
> > in PMDs, consolidating them into a common API family is necessary.
> > Furthermore, when the bit operation is applied to the IO devices, use
> > __ATOMIC_ACQ_REL to ensure the ordering for io bit operation.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  MAINTAINERS                                |   5 +
> >  doc/api/doxy-api-index.md                  |   5 +-
> >  lib/librte_eal/common/Makefile             |   1 +
> >  lib/librte_eal/common/include/rte_bitops.h | 474
> > +++++++++++++++++++++++++++++
> >  lib/librte_eal/common/meson.build          |   3 +-
> >  5 files changed, 485 insertions(+), 3 deletions(-)  create mode
> > 100644 lib/librte_eal/common/include/rte_bitops.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS index 4395d8d..d2a29a2 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -236,6 +236,11 @@ M: Cristian Dumitrescu
> > <cristian.dumitrescu@intel.com>
> >  F: lib/librte_eal/common/include/rte_bitmap.h
> >  F: app/test/test_bitmap.c
> >
> > +Bitops
> > +M: Joyce Kong <joyce.kong@arm.com>
> > +F: lib/librte_eal/common/include/rte_bitops.h
> > +F: app/test/test_bitops.c
> > +
> >  MCSlock - EXPERIMENTAL
> >  M: Phil Yang <phil.yang@arm.com>
> >  F: lib/librte_eal/common/include/generic/rte_mcslock.h
> > diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> > index
> > dff496b..ade7c01 100644
> > --- a/doc/api/doxy-api-index.md
> > +++ b/doc/api/doxy-api-index.md
> > @@ -133,12 +133,13 @@ The public API headers are grouped by topics:
> >    [BPF]                (@ref rte_bpf.h)
> >
> >  - **containers**:
> > +  [bitmap]             (@ref rte_bitmap.h),
> > +  [bitops]             (@ref rte_bitops.h),
> >    [mbuf]               (@ref rte_mbuf.h),
> >    [mbuf pool ops]      (@ref rte_mbuf_pool_ops.h),
> >    [ring]               (@ref rte_ring.h),
> >    [stack]              (@ref rte_stack.h),
> > -  [tailq]              (@ref rte_tailq.h),
> > -  [bitmap]             (@ref rte_bitmap.h)
> > +  [tailq]              (@ref rte_tailq.h)
> >
> >  - **packet framework**:
> >    * [port]             (@ref rte_port.h):
> > diff --git a/lib/librte_eal/common/Makefile
> > b/lib/librte_eal/common/Makefile index c2c6d92..dd025c1 100644
> > --- a/lib/librte_eal/common/Makefile
> > +++ b/lib/librte_eal/common/Makefile
> > @@ -19,6 +19,7 @@ INC += rte_malloc.h rte_keepalive.h rte_time.h  INC
> > += rte_service.h rte_service_component.h  INC += rte_bitmap.h
> > rte_vfio.h rte_hypervisor.h rte_test.h  INC += rte_reciprocal.h
> > rte_fbarray.h rte_uuid.h
> > +INC += rte_bitops.h
> >
> >  GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h
> > rte_prefetch.h GENERIC_INC += rte_memcpy.h rte_cpuflags.h diff --git
> > a/lib/librte_eal/common/include/rte_bitops.h
> > b/lib/librte_eal/common/include/rte_bitops.h
> > new file mode 100644
> > index 0000000..34158d1
> > --- /dev/null
> > +++ b/lib/librte_eal/common/include/rte_bitops.h
> > @@ -0,0 +1,474 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_BITOPS_H_
> > +#define _RTE_BITOPS_H_
> > +
> > +/**
> > + * @file
> > + * Bit Operations
> > + *
> > + * This file defines a API for bit operations without/with memory ordering.
> > + */
> > +
> > +#include <stdint.h>
> > +#include <rte_debug.h>
> > +#include <rte_compat.h>
> > +
> > +/*---------------------------- 32 bit operations
> > +----------------------------*/
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Get the target bit from a 32-bit value without memory ordering.
> > + *
> > + * @param nr
> > + *   The target bit to get.
> > + * @param addr
> > + *   The address holding the bit.
> > + * @return
> > + *   The target bit.
> > + */
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_get_bit32_relaxed(unsigned int nr, uint32_t *addr) {
> Why not pass the memory order as a parameter? It would reduce the number
> of API calls by half.
I think these APIs should be modelled according to C11 __atomic_xxx APIs. Otherwise, the programmers have to understand another interface. It will also help reduce the number of APIs.
Converting these into macros will help remove the size based duplication of APIs. I came up with the following macro:

#define RTE_GET_BIT(nr, var, ret, memorder) \
({ \
    if (sizeof(var) == sizeof(uint32_t)) { \
        uint32_t mask1 = 1U << (nr)%32; \
        ret = __atomic_load_n(&var, (memorder)) & mask1;\
    } \
    else {\
        uint64_t mask2 = 1UL << (nr)%64;\
        ret = __atomic_load_n(&var, (memorder)) & mask2;\
    } \
})

The '%' is required to avoid a compiler warning/error. But the '%' operation will get removed by the compiler since 'nr' is a constant.
IMO, the macro itself is not complex and should not be a pain for debugging.

Currently, we have 20 APIs in this patch (possibly more coming in the future and creating an explosion with memory order/size combinations). The above macro will reduce this to 5 macros without further explosion in number of combinations.

Any thoughts? What do others think?
  
Stephen Hemminger Dec. 21, 2019, 6:07 p.m. UTC | #3
On Sat, 21 Dec 2019 16:07:23 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> Converting these into macros will help remove the size based duplication of APIs. I came up with the following macro:
> 
> #define RTE_GET_BIT(nr, var, ret, memorder) \
> ({ \
>     if (sizeof(var) == sizeof(uint32_t)) { \
>         uint32_t mask1 = 1U << (nr)%32; \
>         ret = __atomic_load_n(&var, (memorder)) & mask1;\
>     } \
>     else {\
>         uint64_t mask2 = 1UL << (nr)%64;\
>         ret = __atomic_load_n(&var, (memorder)) & mask2;\
>     } \
> })

Macros are more error prone. Especially because this is in exposed header file
  
Stephen Hemminger Dec. 21, 2019, 6:08 p.m. UTC | #4
On Sat, 21 Dec 2019 16:07:23 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> Converting these into macros will help remove the size based duplication of APIs. I came up with the following macro:
> 
> #define RTE_GET_BIT(nr, var, ret, memorder) \
> ({ \
>     if (sizeof(var) == sizeof(uint32_t)) { \
>         uint32_t mask1 = 1U << (nr)%32; \
>         ret = __atomic_load_n(&var, (memorder)) & mask1;\
>     } \
>     else {\
>         uint64_t mask2 = 1UL << (nr)%64;\
>         ret = __atomic_load_n(&var, (memorder)) & mask2;\
>     } \
> })


Follow on if you want to do it as macros, then use typeof() to make the
mask any size.
  
Honnappa Nagarahalli Dec. 23, 2019, 5:04 a.m. UTC | #5
<snip>

> 
> On Sat, 21 Dec 2019 16:07:23 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> > Converting these into macros will help remove the size based duplication of
> APIs. I came up with the following macro:
> >
> > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> >     if (sizeof(var) == sizeof(uint32_t)) { \
> >         uint32_t mask1 = 1U << (nr)%32; \
> >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> >     } \
> >     else {\
> >         uint64_t mask2 = 1UL << (nr)%64;\
> >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> >     } \
> > })
> 
> Macros are more error prone. Especially because this is in exposed header file
That's another question I have. Why do we need to have these APIs in a public header file? These will add to the ABI burden as well. These APIs should be in a common-but-not-public header file. I am also not sure how helpful these APIs are for applications as these APIs seem to have considered requirements only from the PMDs.
  
Honnappa Nagarahalli Dec. 23, 2019, 5:45 a.m. UTC | #6
> 
> On Sat, 21 Dec 2019 16:07:23 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> > Converting these into macros will help remove the size based duplication of
> APIs. I came up with the following macro:
> >
> > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> >     if (sizeof(var) == sizeof(uint32_t)) { \
> >         uint32_t mask1 = 1U << (nr)%32; \
> >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> >     } \
> >     else {\
> >         uint64_t mask2 = 1UL << (nr)%64;\
> >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> >     } \
> > })
> 
> 
> Follow on if you want to do it as macros, then use typeof() to make the mask
> any size.
Yes, that makes it much simple
#define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
     typeof(var) mask; \
     if (sizeof(var) == sizeof(uint32_t)) { \
         mask = 1U << (nr)%32; \
     } else {\
         mask = 1UL << (nr)%64;\
     } \
     ret = __atomic_load_n(&var, (memorder)) & mask;\
})
  
Jerin Jacob Dec. 23, 2019, 8:59 a.m. UTC | #7
On Sat, Dec 21, 2019 at 9:37 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:

> > > +__rte_experimental
> > > +static inline uint32_t
> > > +rte_get_bit32_relaxed(unsigned int nr, uint32_t *addr) {
> > Why not pass the memory order as a parameter? It would reduce the number
> > of API calls by half.
> I think these APIs should be modelled according to C11 __atomic_xxx APIs. Otherwise, the programmers have to understand another interface. It will also help reduce the number of APIs.
> Converting these into macros will help remove the size based duplication of APIs. I came up with the following macro:
>
> #define RTE_GET_BIT(nr, var, ret, memorder) \
> ({ \
>     if (sizeof(var) == sizeof(uint32_t)) { \
>         uint32_t mask1 = 1U << (nr)%32; \
>         ret = __atomic_load_n(&var, (memorder)) & mask1;\
>     } \
>     else {\
>         uint64_t mask2 = 1UL << (nr)%64;\
>         ret = __atomic_load_n(&var, (memorder)) & mask2;\
>     } \
> })
>
> The '%' is required to avoid a compiler warning/error. But the '%' operation will get removed by the compiler since 'nr' is a constant.
> IMO, the macro itself is not complex and should not be a pain for debugging.
>
> Currently, we have 20 APIs in this patch (possibly more coming in the future and creating an explosion with memory order/size combinations). The above macro will reduce this to 5 macros without further explosion in number of combinations.
>
> Any thoughts? What do others think?

# I think, the most common use case for register manipulation is
getting/setting of "fields"(set of consecutive bits). IMO, Linux
kernel bit manipulation APIs makes more sense.
At least have implementation similar to FIELD_GET() and FIELD_SET().
https://github.com/torvalds/linux/blob/master/include/linux/bitfield.h

# FIELD_GET will be superset API. A single bit can get through width = 1

# I think, it good to two versions of Macro/API. RTE_FIELD_GET(without
atomics) and RTE_FIELD_GET_ATOMIC(with C11 atomics)
  
Stephen Hemminger Dec. 23, 2019, 4:36 p.m. UTC | #8
On Mon, 23 Dec 2019 05:04:12 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> <snip>
> 
> > 
> > On Sat, 21 Dec 2019 16:07:23 +0000
> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> >   
> > > Converting these into macros will help remove the size based duplication of  
> > APIs. I came up with the following macro:  
> > >
> > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > >     if (sizeof(var) == sizeof(uint32_t)) { \
> > >         uint32_t mask1 = 1U << (nr)%32; \
> > >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > >     } \
> > >     else {\
> > >         uint64_t mask2 = 1UL << (nr)%64;\
> > >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > >     } \
> > > })  
> > 
> > Macros are more error prone. Especially because this is in exposed header file  
> That's another question I have. Why do we need to have these APIs in a public header file? These will add to the ABI burden as well. These APIs should be in a common-but-not-public header file. I am also not sure how helpful these APIs are for applications as these APIs seem to have considered requirements only from the PMDs.

Why do we have to wrap every C atomic builtin? What value is there in that?
  
Gavin Hu Dec. 30, 2019, 3:02 a.m. UTC | #9
Hi Stephen, Honnappa,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, December 24, 2019 12:37 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Joyce Kong <Joyce.Kong@arm.com>; thomas@monjalon.net;
> david.marchand@redhat.com; mb@smartsharesystems.com;
> jerinj@marvell.com; bruce.richardson@intel.com; ravi1.kumar@amd.com;
> rmody@marvell.com; shshaikh@marvell.com; xuanziyang2@huawei.com;
> cloud.wangxiaoyun@huawei.com; zhouguoyang@huawei.com; Phil Yang
> <Phil.Yang@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; nd <nd@arm.com>;
> dev@dpdk.org
> Subject: Re: [PATCH v6 1/6] lib/eal: implement the family of rte bit operation
> APIs
> 
> On Mon, 23 Dec 2019 05:04:12 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> > <snip>
> >
> > >
> > > On Sat, 21 Dec 2019 16:07:23 +0000
> > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> > >
> > > > Converting these into macros will help remove the size based duplication
> of
> > > APIs. I came up with the following macro:
> > > >
> > > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > > >     if (sizeof(var) == sizeof(uint32_t)) { \
> > > >         uint32_t mask1 = 1U << (nr)%32; \
> > > >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > > >     } \
> > > >     else {\
> > > >         uint64_t mask2 = 1UL << (nr)%64;\
> > > >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > > >     } \
> > > > })
> > >
> > > Macros are more error prone. Especially because this is in exposed header
> file
> > That's another question I have. Why do we need to have these APIs in a
> public header file? These will add to the ABI burden as well. These APIs should
> be in a common-but-not-public header file. I am also not sure how helpful
> these APIs are for applications as these APIs seem to have considered
> requirements only from the PMDs.
> 
> Why do we have to wrap every C atomic builtin? What value is there in that?

The wrapping is aimed to reduce code duplication, on average 3 lines cut down to 1 line for a single core.
Overall I am thinking this bitops APIs are targeted for use by PMDs only, applications can use C11 freely.
The initial thought for the new APIs came from the idea of consolidating the scattered bit operations all over the PMDs. It is unwise to expanding to applications or libraries, as different memory orderings are required and complexity generate. 

If the use cases are limited to PMDs, a 'volatile' or a compiler barrier is sufficient therefore the number of APIs can be saved by half. 
http://inbox.dpdk.org/dev/VI1PR08MB53766C30B5CDA00FB9FCE9678F2E0@VI1PR08MB5376.eurprd08.prod.outlook.com/

Any thoughts and comments are welcome!
  
Honnappa Nagarahalli Jan. 7, 2020, 12:41 a.m. UTC | #10
<snip>
> >
> > >
> > > On Sat, 21 Dec 2019 16:07:23 +0000
> > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> > >
> > > > Converting these into macros will help remove the size based
> duplication of
> > > APIs. I came up with the following macro:
> > > >
> > > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > > >     if (sizeof(var) == sizeof(uint32_t)) { \
> > > >         uint32_t mask1 = 1U << (nr)%32; \
> > > >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > > >     } \
> > > >     else {\
> > > >         uint64_t mask2 = 1UL << (nr)%64;\
> > > >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > > >     } \
> > > > })
> > >
> > > Macros are more error prone. Especially because this is in exposed header
> file
> > That's another question I have. Why do we need to have these APIs in a
> public header file? These will add to the ABI burden as well. These APIs
> should be in a common-but-not-public header file. I am also not sure how
> helpful these APIs are for applications as these APIs seem to have considered
> requirements only from the PMDs.
> 
> Why do we have to wrap every C atomic builtin? What value is there in that?
As long as we stick to requirements from PMD we do not need to worry about every atomic builtin. We seem to be making these APIs public, which requires us to keep these APIs generic considering possible future requirements.
  
Honnappa Nagarahalli Jan. 7, 2020, 12:44 a.m. UTC | #11
<snip>

> > >
> > > >
> > > > On Sat, 21 Dec 2019 16:07:23 +0000 Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com> wrote:
> > > >
> > > > > Converting these into macros will help remove the size based
> > > > > duplication
> > of
> > > > APIs. I came up with the following macro:
> > > > >
> > > > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > > > >     if (sizeof(var) == sizeof(uint32_t)) { \
> > > > >         uint32_t mask1 = 1U << (nr)%32; \
> > > > >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > > > >     } \
> > > > >     else {\
> > > > >         uint64_t mask2 = 1UL << (nr)%64;\
> > > > >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > > > >     } \
> > > > > })
> > > >
> > > > Macros are more error prone. Especially because this is in exposed
> > > > header
> > file
> > > That's another question I have. Why do we need to have these APIs in
> > > a
> > public header file? These will add to the ABI burden as well. These
> > APIs should be in a common-but-not-public header file. I am also not
> > sure how helpful these APIs are for applications as these APIs seem to
> > have considered requirements only from the PMDs.
> >
> > Why do we have to wrap every C atomic builtin? What value is there in that?
> 
> The wrapping is aimed to reduce code duplication, on average 3 lines cut
> down to 1 line for a single core.
> Overall I am thinking this bitops APIs are targeted for use by PMDs only,
> applications can use C11 freely.
> The initial thought for the new APIs came from the idea of consolidating the
> scattered bit operations all over the PMDs. It is unwise to expanding to
> applications or libraries, as different memory orderings are required and
> complexity generate.
> 
> If the use cases are limited to PMDs, a 'volatile' or a compiler barrier is
> sufficient therefore the number of APIs can be saved by half.
> http://inbox.dpdk.org/dev/VI1PR08MB53766C30B5CDA00FB9FCE9678F2E0
> @VI1PR08MB5376.eurprd08.prod.outlook.com/
> 
> Any thoughts and comments are welcome!
I would prefer that the APIs/Macros just address PMD's requirements. These also should be kept private (through naming conventions?). Given that the current PMDs are not using C11, we can skip using C11 atomics in these APIs.

>
  
Stephen Hemminger Jan. 7, 2020, 1:26 a.m. UTC | #12
On Tue, 7 Jan 2020 00:44:51 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> <snip>
> 
> > > >  
> > > > >
> > > > > On Sat, 21 Dec 2019 16:07:23 +0000 Honnappa Nagarahalli
> > > > > <Honnappa.Nagarahalli@arm.com> wrote:
> > > > >  
> > > > > > Converting these into macros will help remove the size based
> > > > > > duplication  
> > > of  
> > > > > APIs. I came up with the following macro:  
> > > > > >
> > > > > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > > > > >     if (sizeof(var) == sizeof(uint32_t)) { \
> > > > > >         uint32_t mask1 = 1U << (nr)%32; \
> > > > > >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > > > > >     } \
> > > > > >     else {\
> > > > > >         uint64_t mask2 = 1UL << (nr)%64;\
> > > > > >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > > > > >     } \
> > > > > > })  
> > > > >
> > > > > Macros are more error prone. Especially because this is in exposed
> > > > > header  
> > > file  
> > > > That's another question I have. Why do we need to have these APIs in
> > > > a  
> > > public header file? These will add to the ABI burden as well. These
> > > APIs should be in a common-but-not-public header file. I am also not
> > > sure how helpful these APIs are for applications as these APIs seem to
> > > have considered requirements only from the PMDs.
> > >
> > > Why do we have to wrap every C atomic builtin? What value is there in that?  
> > 
> > The wrapping is aimed to reduce code duplication, on average 3 lines cut
> > down to 1 line for a single core.
> > Overall I am thinking this bitops APIs are targeted for use by PMDs only,
> > applications can use C11 freely.
> > The initial thought for the new APIs came from the idea of consolidating the
> > scattered bit operations all over the PMDs. It is unwise to expanding to
> > applications or libraries, as different memory orderings are required and
> > complexity generate.
> > 
> > If the use cases are limited to PMDs, a 'volatile' or a compiler barrier is
> > sufficient therefore the number of APIs can be saved by half.
> > http://inbox.dpdk.org/dev/VI1PR08MB53766C30B5CDA00FB9FCE9678F2E0
> > @VI1PR08MB5376.eurprd08.prod.outlook.com/
> > 
> > Any thoughts and comments are welcome!  
> I would prefer that the APIs/Macros just address PMD's requirements. These also should be kept private (through naming conventions?). Given that the current PMDs are not using C11, we can skip using C11 atomics in these APIs.

Not in favor, just use existing Gcc/clang/icc atomics instead of creating
unnecessary bloat wrappers.
  
Honnappa Nagarahalli Jan. 7, 2020, 4:41 a.m. UTC | #13
<snip>
> >
> > > > >
> > > > > >
> > > > > > On Sat, 21 Dec 2019 16:07:23 +0000 Honnappa Nagarahalli
> > > > > > <Honnappa.Nagarahalli@arm.com> wrote:
> > > > > >
> > > > > > > Converting these into macros will help remove the size based
> > > > > > > duplication
> > > > of
> > > > > > APIs. I came up with the following macro:
> > > > > > >
> > > > > > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > > > > > >     if (sizeof(var) == sizeof(uint32_t)) { \
> > > > > > >         uint32_t mask1 = 1U << (nr)%32; \
> > > > > > >         ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > > > > > >     } \
> > > > > > >     else {\
> > > > > > >         uint64_t mask2 = 1UL << (nr)%64;\
> > > > > > >         ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > > > > > >     } \
> > > > > > > })
> > > > > >
> > > > > > Macros are more error prone. Especially because this is in
> > > > > > exposed header
> > > > file
> > > > > That's another question I have. Why do we need to have these
> > > > > APIs in a
> > > > public header file? These will add to the ABI burden as well.
> > > > These APIs should be in a common-but-not-public header file. I am
> > > > also not sure how helpful these APIs are for applications as these
> > > > APIs seem to have considered requirements only from the PMDs.
> > > >
> > > > Why do we have to wrap every C atomic builtin? What value is there in
> that?
> > >
> > > The wrapping is aimed to reduce code duplication, on average 3 lines
> > > cut down to 1 line for a single core.
> > > Overall I am thinking this bitops APIs are targeted for use by PMDs
> > > only, applications can use C11 freely.
> > > The initial thought for the new APIs came from the idea of
> > > consolidating the scattered bit operations all over the PMDs. It is
> > > unwise to expanding to applications or libraries, as different
> > > memory orderings are required and complexity generate.
> > >
> > > If the use cases are limited to PMDs, a 'volatile' or a compiler
> > > barrier is sufficient therefore the number of APIs can be saved by half.
> > >
> http://inbox.dpdk.org/dev/VI1PR08MB53766C30B5CDA00FB9FCE9678F2E0
> > > @VI1PR08MB5376.eurprd08.prod.outlook.com/
> > >
> > > Any thoughts and comments are welcome!
> > I would prefer that the APIs/Macros just address PMD's requirements.
> These also should be kept private (through naming conventions?). Given that
> the current PMDs are not using C11, we can skip using C11 atomics in these
> APIs.
> 
> Not in favor, just use existing Gcc/clang/icc atomics instead of creating
> unnecessary bloat wrappers.
I thought, you blessed this patch [1]. 

[1] http://mails.dpdk.org/archives/dev/2019-October/147297.html
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 4395d8d..d2a29a2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -236,6 +236,11 @@  M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
 F: lib/librte_eal/common/include/rte_bitmap.h
 F: app/test/test_bitmap.c
 
+Bitops
+M: Joyce Kong <joyce.kong@arm.com>
+F: lib/librte_eal/common/include/rte_bitops.h
+F: app/test/test_bitops.c
+
 MCSlock - EXPERIMENTAL
 M: Phil Yang <phil.yang@arm.com>
 F: lib/librte_eal/common/include/generic/rte_mcslock.h
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index dff496b..ade7c01 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -133,12 +133,13 @@  The public API headers are grouped by topics:
   [BPF]                (@ref rte_bpf.h)
 
 - **containers**:
+  [bitmap]             (@ref rte_bitmap.h),
+  [bitops]             (@ref rte_bitops.h),
   [mbuf]               (@ref rte_mbuf.h),
   [mbuf pool ops]      (@ref rte_mbuf_pool_ops.h),
   [ring]               (@ref rte_ring.h),
   [stack]              (@ref rte_stack.h),
-  [tailq]              (@ref rte_tailq.h),
-  [bitmap]             (@ref rte_bitmap.h)
+  [tailq]              (@ref rte_tailq.h)
 
 - **packet framework**:
   * [port]             (@ref rte_port.h):
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index c2c6d92..dd025c1 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -19,6 +19,7 @@  INC += rte_malloc.h rte_keepalive.h rte_time.h
 INC += rte_service.h rte_service_component.h
 INC += rte_bitmap.h rte_vfio.h rte_hypervisor.h rte_test.h
 INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
+INC += rte_bitops.h
 
 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_memcpy.h rte_cpuflags.h
diff --git a/lib/librte_eal/common/include/rte_bitops.h b/lib/librte_eal/common/include/rte_bitops.h
new file mode 100644
index 0000000..34158d1
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_bitops.h
@@ -0,0 +1,474 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_BITOPS_H_
+#define _RTE_BITOPS_H_
+
+/**
+ * @file
+ * Bit Operations
+ *
+ * This file defines a API for bit operations without/with memory ordering.
+ */
+
+#include <stdint.h>
+#include <rte_debug.h>
+#include <rte_compat.h>
+
+/*---------------------------- 32 bit operations ----------------------------*/
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the target bit from a 32-bit value without memory ordering.
+ *
+ * @param nr
+ *   The target bit to get.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The target bit.
+ */
+__rte_experimental
+static inline uint32_t
+rte_get_bit32_relaxed(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	return __atomic_load_n(addr, __ATOMIC_RELAXED) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Set the target bit in a 32-bit value to 1 without memory ordering.
+ *
+ * @param nr
+ *   The target bit to set.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_set_bit32_relaxed(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	__atomic_fetch_or(addr, mask, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Clear the target bit in a 32-bit value to 0 without memory ordering.
+ *
+ * @param nr
+ *   The target bit to clear.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_clear_bit32_relaxed(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	__atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 32-bit value, then set it to 1 without
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and set.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint32_t
+rte_test_and_set_bit32_relaxed(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	return __atomic_fetch_or(addr, mask, __ATOMIC_RELAXED) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 32-bit value, then clear it to 0 without
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and clear.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint32_t
+rte_test_and_clear_bit32_relaxed(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	return __atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the target bit from a 32-bit value with memory ordering.
+ *
+ * @param nr
+ *   The target bit to get.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The target bit.
+ */
+__rte_experimental
+static inline uint32_t
+rte_get_bit32(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	return __atomic_load_n(addr, __ATOMIC_ACQUIRE) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Set the target bit in a 32-bit value to 1 with memory ordering.
+ *
+ * @param nr
+ *   The target bit to set.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_set_bit32(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	__atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Clear the target bit in a 32-bit value to 0 with memory ordering.
+ *
+ * @param nr
+ *   The target bit to clear.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_clear_bit32(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	__atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 32-bit value, then set it to 1 with
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and set.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint32_t
+rte_test_and_set_bit32(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	return __atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 32-bit value, then clear it to 0 with
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and clear.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint32_t
+rte_test_and_clear_bit32(unsigned int nr, uint32_t *addr)
+{
+	RTE_ASSERT(nr < 32);
+
+	uint32_t mask = UINT32_C(1) << nr;
+	return __atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL) & mask;
+}
+
+/*---------------------------- 64 bit operations ----------------------------*/
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the target bit from a 64-bit value without memory ordering.
+ *
+ * @param nr
+ *   The target bit to get.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The target bit.
+ */
+__rte_experimental
+static inline uint64_t
+rte_get_bit64_relaxed(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	return __atomic_load_n(addr, __ATOMIC_RELAXED) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Set the target bit in a 64-bit value to 1 without memory ordering.
+ *
+ * @param nr
+ *   The target bit to set.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_set_bit64_relaxed(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	__atomic_fetch_or(addr, mask, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Clear the target bit in a 64-bit value to 0 without memory ordering.
+ *
+ * @param nr
+ *   The target bit to clear.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_clear_bit64_relaxed(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	__atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 64-bit value, then set it to 1 without
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and set.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint64_t
+rte_test_and_set_bit64_relaxed(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	return __atomic_fetch_or(addr, mask, __ATOMIC_RELAXED) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 64-bit value, then clear it to 0 without
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and clear.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint64_t
+rte_test_and_clear_bit64_relaxed(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	return __atomic_fetch_and(addr, ~mask, __ATOMIC_RELAXED) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the target bit from a 64-bit value with memory ordering.
+ *
+ * @param nr
+ *   The target bit to get.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The target bit.
+ */
+__rte_experimental
+static inline uint64_t
+rte_get_bit64(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	return __atomic_load_n(addr, __ATOMIC_ACQUIRE) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Set the target bit in a 64-bit value to 1 with memory ordering.
+ *
+ * @param nr
+ *   The target bit to set.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_set_bit64(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	__atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Clear the target bit in a 64-bit value to 0 with memory ordering.
+ *
+ * @param nr
+ *   The target bit to clear.
+ * @param addr
+ *   The address holding the bit.
+ */
+__rte_experimental
+static inline void
+rte_clear_bit64(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	__atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 64-bit value, then set it to 1 with
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and set.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint64_t
+rte_test_and_set_bit64(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	return __atomic_fetch_or(addr, mask, __ATOMIC_ACQ_REL) & mask;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Return the original bit from a 64-bit value, then clear it to 0 with
+ * memory ordering.
+ *
+ * @param nr
+ *   The target bit to get and clear.
+ * @param addr
+ *   The address holding the bit.
+ * @return
+ *   The original bit.
+ */
+__rte_experimental
+static inline uint64_t
+rte_test_and_clear_bit64(unsigned int nr, uint64_t *addr)
+{
+	RTE_ASSERT(nr < 64);
+
+	uint64_t mask = UINT64_C(1) << nr;
+	return __atomic_fetch_and(addr, ~mask, __ATOMIC_ACQ_REL) & mask;
+}
+#endif /* _RTE_BITOPS_H_ */
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index 2b97715..766edbd 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -50,9 +50,10 @@  common_objs += eal_common_arch_objs
 
 common_headers = files(
 	'include/rte_alarm.h',
+	'include/rte_bitmap.h',
+	'include/rte_bitops.h',
 	'include/rte_branch_prediction.h',
 	'include/rte_bus.h',
-	'include/rte_bitmap.h',
 	'include/rte_class.h',
 	'include/rte_common.h',
 	'include/rte_compat.h',