[RFC,v2,5/6] eal: add atomic bit operations

Message ID 20240425085853.97888-6-mattias.ronnblom@ericsson.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series Improve EAL bit operations API |

Commit Message

Mattias Rönnblom April 25, 2024, 8:58 a.m. UTC
  Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 297 +++++++++++++++++++++++++++++++++++
 1 file changed, 297 insertions(+)
  

Comments

Morten Brørup April 25, 2024, 10:25 a.m. UTC | #1
> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_atomic_test32,			\
> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)

I wonder if these should have RTE_ATOMIC qualifier:

+		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,			\
+		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr, memory_order)


> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
> +	static inline bool						\
> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\

I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:

+	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t) *addr,	\

instead of casting into a_addr.

> +				      unsigned int nr, int memory_order) \
> +	{								\
> +		RTE_ASSERT(nr < size);					\
> +									\
> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> +		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
> +	}


Similar considerations regarding volatile qualifier for the "once" operations.
  
Mattias Rönnblom April 25, 2024, 2:36 p.m. UTC | #2
On 2024-04-25 12:25, Morten Brørup wrote:
>> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
>> +	_Generic((addr),						\
>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
> 
> I wonder if these should have RTE_ATOMIC qualifier:
> 
> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,			\
> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr, memory_order)
> 
> 
>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
>> +	static inline bool						\
>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
> 
> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> 
> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t) *addr,	\
> 
> instead of casting into a_addr.
> 

Check the cover letter for the rationale for the cast.

Where I'm at now is that I think C11 _Atomic is rather poor design. The 
assumption that an object which allows for atomic access always should 
require all operations upon it to be atomic, regardless of where it is 
in its lifetime, and which thread is accessing it, does not hold, in the 
general case.

The only reason for _Atomic being as it is, as far as I can see, is to 
accommodate for ISAs which does not have the appropriate atomic machine 
instructions, and thus require a lock or some other data associated with 
the actual user-data-carrying bits. Neither GCC nor DPDK supports any 
such ISAs, to my knowledge. I suspect neither never will. So the cast 
will continue to work.

>> +				      unsigned int nr, int memory_order) \
>> +	{								\
>> +		RTE_ASSERT(nr < size);					\
>> +									\
>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>> +		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
>> +	}
> 
> 
> Similar considerations regarding volatile qualifier for the "once" operations.
>
  
Morten Brørup April 25, 2024, 4:18 p.m. UTC | #3
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Thursday, 25 April 2024 16.36
> 
> On 2024-04-25 12:25, Morten Brørup wrote:
> >> +#define rte_bit_atomic_test(addr, nr, memory_order)
> 	\
> >> +	_Generic((addr),						\
> >> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> memory_order)
> >
> > I wonder if these should have RTE_ATOMIC qualifier:
> >
> > +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> 		\
> > +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
> memory_order)
> >
> >
> >> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> 	\
> >> +	static inline bool						\
> >> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> 	\
> >
> > I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >
> > +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
> *addr,	\
> >
> > instead of casting into a_addr.
> >
> 
> Check the cover letter for the rationale for the cast.

Thanks, that clarifies it. Then...
For the series:
Acked-by: Morten Brørup <mb@smartsharesystems.com>

> 
> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> assumption that an object which allows for atomic access always should
> require all operations upon it to be atomic, regardless of where it is
> in its lifetime, and which thread is accessing it, does not hold, in the
> general case.

It might be slow, but I suppose the C11 standard prioritizes correctness over performance.

It seems locks are automatically added to _Atomic types larger than what is natively supported by the architecture.
E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]

[1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/

> 
> The only reason for _Atomic being as it is, as far as I can see, is to
> accommodate for ISAs which does not have the appropriate atomic machine
> instructions, and thus require a lock or some other data associated with
> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> such ISAs, to my knowledge. I suspect neither never will. So the cast
> will continue to work.

I tend to agree with you on this.

We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.

> 
> >> +				      unsigned int nr, int memory_order) \
> >> +	{								\
> >> +		RTE_ASSERT(nr < size);					\
> >> +									\
> >> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> mask; \
> >> +	}
> >
> >
> > Similar considerations regarding volatile qualifier for the "once"
> operations.
> >
  
Mattias Rönnblom April 26, 2024, 9:39 a.m. UTC | #4
On 2024-04-25 18:18, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Thursday, 25 April 2024 16.36
>>
>> On 2024-04-25 12:25, Morten Brørup wrote:
>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
>> 	\
>>>> +	_Generic((addr),						\
>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
>> memory_order)
>>>
>>> I wonder if these should have RTE_ATOMIC qualifier:
>>>
>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
>> 		\
>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
>> memory_order)
>>>
>>>
>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
>> 	\
>>>> +	static inline bool						\
>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
>> 	\
>>>
>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
>>>
>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
>> *addr,	\
>>>
>>> instead of casting into a_addr.
>>>
>>
>> Check the cover letter for the rationale for the cast.
> 
> Thanks, that clarifies it. Then...
> For the series:
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
>>
>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
>> assumption that an object which allows for atomic access always should
>> require all operations upon it to be atomic, regardless of where it is
>> in its lifetime, and which thread is accessing it, does not hold, in the
>> general case.
> 
> It might be slow, but I suppose the C11 standard prioritizes correctness over performance.
> 

That's a false dichotomy, in this case. You can have both.

> It seems locks are automatically added to _Atomic types larger than what is natively supported by the architecture.
> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> 
> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/
> 
>>
>> The only reason for _Atomic being as it is, as far as I can see, is to
>> accommodate for ISAs which does not have the appropriate atomic machine
>> instructions, and thus require a lock or some other data associated with
>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
>> such ISAs, to my knowledge. I suspect neither never will. So the cast
>> will continue to work.
> 
> I tend to agree with you on this.
> 
> We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.
> 

I think this is a subject which needs to be further explored.

Objects that can be accessed both atomically and non-atomically should 
be without _Atomic. With my current understanding of this issue, that 
seems like the best option.

You could turn it around as well, and have such marked _Atomic and have 
explicit casts to their non-_Atomic cousins when operated upon by 
non-atomic functions. Not sure how realistic that is, since 
non-atomicity is the norm. All generic selection-based "functions" must 
take this into account.

>>
>>>> +				      unsigned int nr, int memory_order) \
>>>> +	{								\
>>>> +		RTE_ASSERT(nr < size);					\
>>>> +									\
>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
>> mask; \
>>>> +	}
>>>
>>>
>>> Similar considerations regarding volatile qualifier for the "once"
>> operations.
>>>
  
Morten Brørup April 26, 2024, noon UTC | #5
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Friday, 26 April 2024 11.39
> 
> On 2024-04-25 18:18, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Thursday, 25 April 2024 16.36
> >>
> >> On 2024-04-25 12:25, Morten Brørup wrote:
> >>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
> >> 	\
> >>>> +	_Generic((addr),						\
> >>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> >> memory_order)
> >>>
> >>> I wonder if these should have RTE_ATOMIC qualifier:
> >>>
> >>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> >> 		\
> >>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
> >> memory_order)
> >>>
> >>>
> >>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> >> 	\
> >>>> +	static inline bool						\
> >>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> >> 	\
> >>>
> >>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >>>
> >>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
> >> *addr,	\
> >>>
> >>> instead of casting into a_addr.
> >>>
> >>
> >> Check the cover letter for the rationale for the cast.
> >
> > Thanks, that clarifies it. Then...
> > For the series:
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >
> >>
> >> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> >> assumption that an object which allows for atomic access always should
> >> require all operations upon it to be atomic, regardless of where it is
> >> in its lifetime, and which thread is accessing it, does not hold, in the
> >> general case.
> >
> > It might be slow, but I suppose the C11 standard prioritizes correctness
> over performance.
> >
> 
> That's a false dichotomy, in this case. You can have both.

In theory you shouldn't need non-atomic access to atomic variables.
In reality, we want it anyway, because real CPUs are faster at non-atomic operations.

> 
> > It seems locks are automatically added to _Atomic types larger than what is
> natively supported by the architecture.
> > E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> >
> > [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
> 2022-version-17-5-preview-2/
> >
> >>
> >> The only reason for _Atomic being as it is, as far as I can see, is to
> >> accommodate for ISAs which does not have the appropriate atomic machine
> >> instructions, and thus require a lock or some other data associated with
> >> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >> such ISAs, to my knowledge. I suspect neither never will. So the cast
> >> will continue to work.
> >
> > I tend to agree with you on this.
> >
> > We should officially decide that DPDK treats RTE_ATOMIC types as a union of
> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic
> and non-atomic.
> >
> 
> I think this is a subject which needs to be further explored.

Yes. It's easier exploring and deciding now, when our options are open, than after we have locked down the affected APIs.

> 
> Objects that can be accessed both atomically and non-atomically should
> be without _Atomic. With my current understanding of this issue, that
> seems like the best option.

Agree.

The alterative described below is certainly no good!

It would be nice if they were marked as sometimes-atomic by a qualifier or special type, like rte_be32_t marks the network byte order variant of an uint32_t.

Furthermore, large atomic objects need the _Atomic qualifier for the compiler to add (and use) the associated lock.
Alternatively, we could specify that sometimes-atomic objects cannot be larger than 8 byte, which is what MSVC can handle without locking.

> 
> You could turn it around as well, and have such marked _Atomic and have
> explicit casts to their non-_Atomic cousins when operated upon by
> non-atomic functions. Not sure how realistic that is, since
> non-atomicity is the norm. All generic selection-based "functions" must
> take this into account.
> 
> >>
> >>>> +				      unsigned int nr, int memory_order) \
> >>>> +	{								\
> >>>> +		RTE_ASSERT(nr < size);					\
> >>>> +									\
> >>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> >> mask; \
> >>>> +	}
> >>>
> >>>
> >>> Similar considerations regarding volatile qualifier for the "once"
> >> operations.
> >>>
  
Mattias Rönnblom April 28, 2024, 3:37 p.m. UTC | #6
On 2024-04-26 14:00, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Friday, 26 April 2024 11.39
>>
>> On 2024-04-25 18:18, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Thursday, 25 April 2024 16.36
>>>>
>>>> On 2024-04-25 12:25, Morten Brørup wrote:
>>>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
>>>> 	\
>>>>>> +	_Generic((addr),						\
>>>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
>>>> memory_order)
>>>>>
>>>>> I wonder if these should have RTE_ATOMIC qualifier:
>>>>>
>>>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
>>>> 		\
>>>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
>>>> memory_order)
>>>>>
>>>>>
>>>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
>>>> 	\
>>>>>> +	static inline bool						\
>>>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
>>>> 	\
>>>>>
>>>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
>>>>>
>>>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
>>>> *addr,	\
>>>>>
>>>>> instead of casting into a_addr.
>>>>>
>>>>
>>>> Check the cover letter for the rationale for the cast.
>>>
>>> Thanks, that clarifies it. Then...
>>> For the series:
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>
>>>>
>>>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
>>>> assumption that an object which allows for atomic access always should
>>>> require all operations upon it to be atomic, regardless of where it is
>>>> in its lifetime, and which thread is accessing it, does not hold, in the
>>>> general case.
>>>
>>> It might be slow, but I suppose the C11 standard prioritizes correctness
>> over performance.
>>>
>>
>> That's a false dichotomy, in this case. You can have both.
> 
> In theory you shouldn't need non-atomic access to atomic variables.
> In reality, we want it anyway, because real CPUs are faster at non-atomic operations.
> 
>>
>>> It seems locks are automatically added to _Atomic types larger than what is
>> natively supported by the architecture.
>>> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
>>>
>>> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
>> 2022-version-17-5-preview-2/
>>>
>>>>
>>>> The only reason for _Atomic being as it is, as far as I can see, is to
>>>> accommodate for ISAs which does not have the appropriate atomic machine
>>>> instructions, and thus require a lock or some other data associated with
>>>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
>>>> such ISAs, to my knowledge. I suspect neither never will. So the cast
>>>> will continue to work.
>>>
>>> I tend to agree with you on this.
>>>
>>> We should officially decide that DPDK treats RTE_ATOMIC types as a union of
>> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic
>> and non-atomic.
>>>
>>
>> I think this is a subject which needs to be further explored.
> 
> Yes. It's easier exploring and deciding now, when our options are open, than after we have locked down the affected APIs.
> 
>>
>> Objects that can be accessed both atomically and non-atomically should
>> be without _Atomic. With my current understanding of this issue, that
>> seems like the best option.
> 
> Agree.
> 
> The alterative described below is certainly no good!
> 
> It would be nice if they were marked as sometimes-atomic by a qualifier or special type, like rte_be32_t marks the network byte order variant of an uint32_t.
> 
> Furthermore, large atomic objects need the _Atomic qualifier for the compiler to add (and use) the associated lock.

If you have larger objects than the ISA can handle, you wouldn't want to 
leave the choice of the synchronization primitive to use to the 
compiler. I don't see how it could possibly know, which one is the most 
appropriate, especially in a DPDK context. It would for example need to 
know if the contending threads are non-preemptable or not.

In some situations a sequence lock may well be your best option. Will 
the compiler generate one for you?

If "lock" means std::mutex, it would be a disaster, performance-wise.

> Alternatively, we could specify that sometimes-atomic objects cannot be larger than 8 byte, which is what MSVC can handle without locking.
> 
>>
>> You could turn it around as well, and have such marked _Atomic and have
>> explicit casts to their non-_Atomic cousins when operated upon by
>> non-atomic functions. Not sure how realistic that is, since
>> non-atomicity is the norm. All generic selection-based "functions" must
>> take this into account.
>>
>>>>
>>>>>> +				      unsigned int nr, int memory_order) \
>>>>>> +	{								\
>>>>>> +		RTE_ASSERT(nr < size);					\
>>>>>> +									\
>>>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>>>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>>>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>>>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
>>>> mask; \
>>>>>> +	}
>>>>>
>>>>>
>>>>> Similar considerations regarding volatile qualifier for the "once"
>>>> operations.
>>>>>
  
Morten Brørup April 29, 2024, 7:24 a.m. UTC | #7
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Sunday, 28 April 2024 17.38
> 
> On 2024-04-26 14:00, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Friday, 26 April 2024 11.39
> >>
> >> On 2024-04-25 18:18, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >>>> Sent: Thursday, 25 April 2024 16.36
> >>>>
> >>>> On 2024-04-25 12:25, Morten Brørup wrote:
> >>>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
> >>>> 	\
> >>>>>> +	_Generic((addr),						\
> >>>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >>>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> >>>> memory_order)
> >>>>>
> >>>>> I wonder if these should have RTE_ATOMIC qualifier:
> >>>>>
> >>>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> >>>> 		\
> >>>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr,
> nr,
> >>>> memory_order)
> >>>>>
> >>>>>
> >>>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> >>>> 	\
> >>>>>> +	static inline bool						\
> >>>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> >>>> 	\
> >>>>>
> >>>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >>>>>
> >>>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ##
> _t)
> >>>> *addr,	\
> >>>>>
> >>>>> instead of casting into a_addr.
> >>>>>
> >>>>
> >>>> Check the cover letter for the rationale for the cast.
> >>>
> >>> Thanks, that clarifies it. Then...
> >>> For the series:
> >>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>
> >>>>
> >>>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> >>>> assumption that an object which allows for atomic access always should
> >>>> require all operations upon it to be atomic, regardless of where it is
> >>>> in its lifetime, and which thread is accessing it, does not hold, in the
> >>>> general case.
> >>>
> >>> It might be slow, but I suppose the C11 standard prioritizes correctness
> >> over performance.
> >>>
> >>
> >> That's a false dichotomy, in this case. You can have both.
> >
> > In theory you shouldn't need non-atomic access to atomic variables.
> > In reality, we want it anyway, because real CPUs are faster at non-atomic
> operations.
> >
> >>
> >>> It seems locks are automatically added to _Atomic types larger than what
> is
> >> natively supported by the architecture.
> >>> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> >>>
> >>> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
> >> 2022-version-17-5-preview-2/
> >>>
> >>>>
> >>>> The only reason for _Atomic being as it is, as far as I can see, is to
> >>>> accommodate for ISAs which does not have the appropriate atomic machine
> >>>> instructions, and thus require a lock or some other data associated with
> >>>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >>>> such ISAs, to my knowledge. I suspect neither never will. So the cast
> >>>> will continue to work.
> >>>
> >>> I tend to agree with you on this.
> >>>
> >>> We should officially decide that DPDK treats RTE_ATOMIC types as a union
> of
> >> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both
> atomic
> >> and non-atomic.
> >>>
> >>
> >> I think this is a subject which needs to be further explored.
> >
> > Yes. It's easier exploring and deciding now, when our options are open, than
> after we have locked down the affected APIs.
> >
> >>
> >> Objects that can be accessed both atomically and non-atomically should
> >> be without _Atomic. With my current understanding of this issue, that
> >> seems like the best option.
> >
> > Agree.
> >
> > The alterative described below is certainly no good!
> >
> > It would be nice if they were marked as sometimes-atomic by a qualifier or
> special type, like rte_be32_t marks the network byte order variant of an
> uint32_t.
> >
> > Furthermore, large atomic objects need the _Atomic qualifier for the
> compiler to add (and use) the associated lock.
> 
> If you have larger objects than the ISA can handle, you wouldn't want to
> leave the choice of the synchronization primitive to use to the
> compiler. I don't see how it could possibly know, which one is the most
> appropriate, especially in a DPDK context. It would for example need to
> know if the contending threads are non-preemptable or not.
> 
> In some situations a sequence lock may well be your best option. Will
> the compiler generate one for you?
> 
> If "lock" means std::mutex, it would be a disaster, performance-wise.

Considering that the atomic functions, e.g. atomic_fetch_add(), without _explicit(..., memory_order) means memory_order_seq_cst, I think it does. This makes it relatively straightforward to use atomic types, at the cost of performance.

There's a good description here:
https://en.cppreference.com/w/c/language/atomic

Note that accessing members of an _Atomic struct/union is undefined behavior.
For those, you need to have a non-atomic type, used as "value" to void atomic_store( volatile _Atomic struct mytype * obj, const struct mytype value ), and return value from atomic_load( const volatile _Atomic struct mytype * obj ).

In other words, for structs/unions, _Atomic variables are only accessed through accessor functions taking pointers to them, and thereby transformed from/to values of similar non-atomic type.
I think that this concept also supports your suggestion above: Objects that can be accessed both atomically and non-atomically should be without _Atomic.

But I still think it would be a good idea to mark them as sometimes-atomic, for source code readability/review purposes.

E.g. the mbuf's refcnt field is of the type RTE_ATOMIC(uint16_t). If it is not only accessed through atomic_ accessor functions, should it still be marked RTE_ATOMIC()?

In the future, compilers might warn or error when an _Atomic variable (of any type) is being accessed directly.
The extreme solution would be not to mix atomic and non-atomic access to variables. But that seems unrealistic (at this time).

If we truly want to support C11 atomics, we need to understand and follow the concepts in the standard.

> 
> > Alternatively, we could specify that sometimes-atomic objects cannot be
> larger than 8 byte, which is what MSVC can handle without locking.
> >
> >>
> >> You could turn it around as well, and have such marked _Atomic and have
> >> explicit casts to their non-_Atomic cousins when operated upon by
> >> non-atomic functions. Not sure how realistic that is, since
> >> non-atomicity is the norm. All generic selection-based "functions" must
> >> take this into account.
> >>
> >>>>
> >>>>>> +				      unsigned int nr, int memory_order) \
> >>>>>> +	{								\
> >>>>>> +		RTE_ASSERT(nr < size);					\
> >>>>>> +									\
> >>>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;
> 	\
> >>>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> >>>> mask; \
> >>>>>> +	}
> >>>>>
> >>>>>
> >>>>> Similar considerations regarding volatile qualifier for the "once"
> >>>> operations.
> >>>>>
  
Tyler Retzlaff April 30, 2024, 4:52 p.m. UTC | #8
On Fri, Apr 26, 2024 at 11:39:17AM +0200, Mattias Rönnblom wrote:

[ ... ]

> >
> >>
> >>The only reason for _Atomic being as it is, as far as I can see, is to
> >>accommodate for ISAs which does not have the appropriate atomic machine
> >>instructions, and thus require a lock or some other data associated with
> >>the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >>such ISAs, to my knowledge. I suspect neither never will. So the cast
> >>will continue to work.
> >
> >I tend to agree with you on this.
> >
> >We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.
> >
> 
> I think this is a subject which needs to be further explored.
> 
> Objects that can be accessed both atomically and non-atomically
> should be without _Atomic. With my current understanding of this
> issue, that seems like the best option.

i've been distracted by other work and while not in the scope of this
series i want to say +1 to having this discussion. utilizing a union for
this atomic vs non-atomic access that appears in practice is a good idea.

> 
> You could turn it around as well, and have such marked _Atomic and
> have explicit casts to their non-_Atomic cousins when operated upon
> by non-atomic functions. Not sure how realistic that is, since
> non-atomicity is the norm. All generic selection-based "functions"
> must take this into account.

the problem with casts is they are actually different types and may have
different size and/or alignment relative to their non-atomic types.
for current non-locking atomics the implementations happen to be the
same (presumably because it was practical) but the union is definitely a
cleaner approach.

> 
> >>
> >>>>+				      unsigned int nr, int memory_order) \
> >>>>+	{								\
> >>>>+		RTE_ASSERT(nr < size);					\
> >>>>+									\
> >>>>+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>>+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>>+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >>>>+		return rte_atomic_load_explicit(a_addr, memory_order) &
> >>mask; \
> >>>>+	}
> >>>
> >>>
> >>>Similar considerations regarding volatile qualifier for the "once"
> >>operations.
> >>>
  

Patch

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index a2746e657f..8c38a1ac03 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@ 
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -342,6 +343,177 @@  extern "C" {
 		 uint32_t *: __rte_bit_once_assign32,			\
 		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -429,6 +601,131 @@  __rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_once_clear64(addr, nr);
 }
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**