[v13,1/7] eal: add static per-lcore memory allocation facility
Checks
Commit Message
Introduce DPDK per-lcore id variables, or lcore variables for short.
An lcore variable has one value for every current and future lcore
id-equipped thread.
The primary <rte_lcore_var.h> use case is for statically allocating
small, frequently-accessed data structures, for which one instance
should exist for each lcore.
Lcore variables are similar to thread-local storage (TLS, e.g., C11
_Thread_local), but decoupling the values' life time with that of the
threads.
Lcore variables are also similar in terms of functionality provided by
FreeBSD kernel's DPCPU_*() family of macros and the associated
build-time machinery. DPCPU uses linker scripts, which effectively
prevents the reuse of its, otherwise seemingly viable, approach.
The currently-prevailing way to solve the same problem as lcore
variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
lcore variables over this approach is that data related to the same
lcore now is close (spatially, in memory), rather than data used by
the same module, which in turn avoid excessive use of padding,
polluting caches with unused data.
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
--
PATCH v13:
* Remove _VALUE() suffix from value lookup and iterator macros.
(Morten Brørup and Thomas Monjalon)
* Remove the _ptr() suffix from the value lookup function.
PATCH v12:
* Replace RTE_ASSERT() with RTE_VERIFY(), since performance is not
a concern. (Morten Brørup)
* Fix issue (introduced in v11) where aligned_malloc() was provided
an object size which wasn't an even number of the alignment.
(Stephen Hemminger)
PATCH v11:
* Add a note in the API docs on lcore variables and huge page memory.
(Stephen Hemminger)
* Free lcore var buffers at EAL cleanup. (Thomas Monjalon)
* Tweak naming and include short lcore var buffer use overview
in eal_common_lcore_var.c.
PATCH v10:
* Improve documentation grammar and spelling. (Stephen Hemminger,
Thomas Monjalon)
* Add version.map DPDK version comment. (Thomas Monjalon)
PATCH v9:
* Fixed merge conflicts in release notes.
PATCH v8:
* Work around missing max_align_t definition in MSVC. (Morten Brørup)
PATCH v7:
* Add () to the FOREACH lcore id macro parameter, to allow arbitrary
expression, not just a simple variable name, being passed.
(Konstantin Ananyev)
PATCH v6:
* Have API user provide the loop variable in the FOREACH macro, to
avoid subtle bugs where the loop variable name clashes with some
other user-defined variable. (Konstantin Ananyev)
PATCH v5:
* Update EAL programming guide.
PATCH v2:
* Add Windows support. (Morten Brørup)
* Fix lcore variables API index reference. (Morten Brørup)
* Various improvements of the API documentation. (Morten Brørup)
* Elimination of unused symbol in version.map. (Morten Brørup)
PATCH:
* Update MAINTAINERS and release notes.
* Stop covering included files in extern "C" {}.
RFC v6:
* Include <stdlib.h> to get aligned_alloc().
* Tweak documentation (grammar).
* Provide API-level guarantees that lcore variable values take on an
initial value of zero.
* Fix misplaced __rte_cache_aligned in the API doc example.
RFC v5:
* In Doxygen, consistenly use @<cmd> (and not \<cmd>).
* The RTE_LCORE_VAR_GET() and SET() convience access macros
covered an uncommon use case, where the lcore value is of a
primitive type, rather than a struct, and is thus eliminated
from the API. (Morten Brørup)
* In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR()
RTE_LCORE_VAR_VALUE().
* The underscores are removed from __rte_lcore_var_lcore_ptr() to
signal that this function is a part of the public API.
* Macro arguments are documented.
RFV v4:
* Replace large static array with libc heap-allocated memory. One
implication of this change is there no longer exists a fixed upper
bound for the total amount of memory used by lcore variables.
RTE_MAX_LCORE_VAR has changed meaning, and now represent the
maximum size of any individual lcore variable value.
* Fix issues in example. (Morten Brørup)
* Improve access macro type checking. (Morten Brørup)
* Refer to the lcore variable handle as "handle" and not "name" in
various macros.
* Document lack of thread safety in rte_lcore_var_alloc().
* Provide API-level assurance the lcore variable handle is
always non-NULL, to all applications to use NULL to mean
"not yet allocated".
* Note zero-sized allocations are not allowed.
* Give API-level guarantee the lcore variable values are zeroed.
RFC v3:
* Replace use of GCC-specific alignof(<expression>) with alignof(<type>).
* Update example to reflect FOREACH macro name change (in RFC v2).
RFC v2:
* Use alignof to derive alignment requirements. (Morten Brørup)
* Change name of FOREACH to make it distinct from <rte_lcore.h>'s
*per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup)
* Allow user-specified alignment, but limit max to cache line size.
---
MAINTAINERS | 6 +
config/rte_config.h | 1 +
doc/api/doxy-api-index.md | 1 +
.../prog_guide/env_abstraction_layer.rst | 43 +-
doc/guides/rel_notes/release_24_11.rst | 14 +
lib/eal/common/eal_common_lcore_var.c | 138 +++++++
lib/eal/common/eal_lcore_var.h | 11 +
lib/eal/common/meson.build | 1 +
lib/eal/freebsd/eal.c | 2 +
lib/eal/include/meson.build | 1 +
lib/eal/include/rte_lcore_var.h | 391 ++++++++++++++++++
lib/eal/linux/eal.c | 2 +
lib/eal/version.map | 3 +
13 files changed, 608 insertions(+), 6 deletions(-)
create mode 100644 lib/eal/common/eal_common_lcore_var.c
create mode 100644 lib/eal/common/eal_lcore_var.h
create mode 100644 lib/eal/include/rte_lcore_var.h
Comments
> +void *
> +rte_lcore_var_alloc(size_t size, size_t align)
> +{
> + /* Having the per-lcore buffer size aligned on cache lines
> + * assures as well as having the base pointer aligned on cache
> + * size assures that aligned offsets also translate to alipgned
> + * pointers across all values.
> + */
> + RTE_BUILD_BUG_ON(RTE_MAX_LCORE_VAR % RTE_CACHE_LINE_SIZE != 0);
> + RTE_VERIFY(align <= RTE_CACHE_LINE_SIZE);
> + RTE_VERIFY(size <= RTE_MAX_LCORE_VAR);
> +
> + /* '0' means asking for worst-case alignment requirements */
> + if (align == 0)
> +#ifdef RTE_TOOLCHAIN_MSVC
> + /* MSVC <stddef.h> is missing the max_align_t typedef */
> + align = alignof(double);
> +#else
> + align = alignof(max_align_t);
> +#endif
Do we need worst-case alignment, or does automatic alignment suffice:
/* '0' means asking for automatic alignment requirements */
if (align == 0) {
#ifdef RTE_ARCH_64
align = rte_align64pow2(size);
#else
align = rte_align32pow2(size);
#endif
#ifdef RTE_TOOLCHAIN_MSVC
/* MSVC <stddef.h> is missing the max_align_t typedef */
align = RTE_MIN(align, alignof(double));
#else
align = RTE_MIN(align, alignof(max_align_t));
#endif
}
It will pack small-size lcore variables even tighter.
On 2024-10-15 12:13, Morten Brørup wrote:
>> +void *
>> +rte_lcore_var_alloc(size_t size, size_t align)
>> +{
>> + /* Having the per-lcore buffer size aligned on cache lines
>> + * assures as well as having the base pointer aligned on cache
>> + * size assures that aligned offsets also translate to alipgned
>> + * pointers across all values.
>> + */
>> + RTE_BUILD_BUG_ON(RTE_MAX_LCORE_VAR % RTE_CACHE_LINE_SIZE != 0);
>> + RTE_VERIFY(align <= RTE_CACHE_LINE_SIZE);
>> + RTE_VERIFY(size <= RTE_MAX_LCORE_VAR);
>> +
>> + /* '0' means asking for worst-case alignment requirements */
>> + if (align == 0)
>> +#ifdef RTE_TOOLCHAIN_MSVC
>> + /* MSVC <stddef.h> is missing the max_align_t typedef */
>> + align = alignof(double);
>> +#else
>> + align = alignof(max_align_t);
>> +#endif
>
> Do we need worst-case alignment, or does automatic alignment suffice:
>
I think the term is "natural alignment." As I think I mentioned at some
point, I don't really have an opinion.
Worst case (max_alignt_t) alignment is the same as malloc(), so
potentially what the user may expect. On the other hand, I can't see why
natural alignment (or alignof(max_align_t), whichever is smallest) would
not always suffice. It is a bit harder to explain in the API docs what
alignment you actually get in case you don't go for worst-case alignment.
I think it doesn't matter much, because the user will very likely use
the typed macros (and get whatever alignment the compiler deems
appropriate for that type).
> /* '0' means asking for automatic alignment requirements */
> if (align == 0) {
> #ifdef RTE_ARCH_64
> align = rte_align64pow2(size);
> #else
> align = rte_align32pow2(size);
> #endif
> #ifdef RTE_TOOLCHAIN_MSVC
> /* MSVC <stddef.h> is missing the max_align_t typedef */
> align = RTE_MIN(align, alignof(double));
> #else
> align = RTE_MIN(align, alignof(max_align_t));
> #endif
> }
>
> It will pack small-size lcore variables even tighter.
>
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Tuesday, 15 October 2024 21.03
>
> On 2024-10-15 12:13, Morten Brørup wrote:
> >> +void *
> >> +rte_lcore_var_alloc(size_t size, size_t align)
> >> +{
> >> + /* Having the per-lcore buffer size aligned on cache lines
> >> + * assures as well as having the base pointer aligned on cache
> >> + * size assures that aligned offsets also translate to alipgned
> >> + * pointers across all values.
> >> + */
> >> + RTE_BUILD_BUG_ON(RTE_MAX_LCORE_VAR % RTE_CACHE_LINE_SIZE != 0);
> >> + RTE_VERIFY(align <= RTE_CACHE_LINE_SIZE);
> >> + RTE_VERIFY(size <= RTE_MAX_LCORE_VAR);
> >> +
> >> + /* '0' means asking for worst-case alignment requirements */
> >> + if (align == 0)
> >> +#ifdef RTE_TOOLCHAIN_MSVC
> >> + /* MSVC <stddef.h> is missing the max_align_t typedef */
> >> + align = alignof(double);
> >> +#else
> >> + align = alignof(max_align_t);
> >> +#endif
> >
> > Do we need worst-case alignment, or does automatic alignment suffice:
> >
>
> I think the term is "natural alignment." As I think I mentioned at some
> point, I don't really have an opinion.
Exactly; "natural alignment" was the term I was looking for.
>
> Worst case (max_alignt_t) alignment is the same as malloc(), so
> potentially what the user may expect.
For this type of variables, which are more like "static" variables, I don't think the user expects malloc()-like alignment; I think the user expects natural alignment.
And if the user requires any special alignment, the user will specify it explicitly.
> On the other hand, I can't see why
> natural alignment (or alignof(max_align_t), whichever is smallest)
> would
> not always suffice.
Yes, that was exactly my point.
> It is a bit harder to explain in the API docs what
> alignment you actually get in case you don't go for worst-case
> alignment.
Yeah... using "natural alignment" instead of "worst-case alignment" doesn't really cut it; e.g. if the lcore variable is a struct of two uint16_t, the natural alignment is 2 byte, but it will be 4 byte aligned due to the size.
Maybe "automatic alignment" could be used here... with an explanation that it is the minimum of the size, rounded up to a power of two, or max_align_t.
Anyway, in case of doubt, the developer can look at the implementation - it's one of the benefits of having the source code available. :-)
>
> I think it doesn't matter much, because the user will very likely use
> the typed macros (and get whatever alignment the compiler deems
> appropriate for that type).
Probably.
But the function allowing alignment=0 should still behave 1) as expected by its users, and 2) optimally.
I hope this library is going to be a widely used core component in DPDK, and getting all the small details right will improve the probability of success.
>
> > /* '0' means asking for automatic alignment requirements */
> > if (align == 0) {
> > #ifdef RTE_ARCH_64
> > align = rte_align64pow2(size);
> > #else
> > align = rte_align32pow2(size);
> > #endif
> > #ifdef RTE_TOOLCHAIN_MSVC
> > /* MSVC <stddef.h> is missing the max_align_t typedef */
> > align = RTE_MIN(align, alignof(double));
> > #else
> > align = RTE_MIN(align, alignof(max_align_t));
> > #endif
> > }
> >
> > It will pack small-size lcore variables even tighter.
> >
On Tue, 15 Oct 2024 11:33:38 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> + * Lcore variables
> + *
> + * This API provides a mechanism to create and access per-lcore id
> + * variables in a space- and cycle-efficient manner.
> + *
> + * A per-lcore id variable (or lcore variable for short) holds a
> + * unique value for each EAL thread and registered non-EAL
> + * thread. There is one instance for each current and future lcore
> + * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The
> + * value of the lcore variable for one lcore id is independent from
> + * the values assigned to other lcore ids within the same variable.
> + *
> + * In order to access the values of an lcore variable, a handle is
> + * used. The type of the handle is a pointer to the value's type
> + * (e.g., for an @c uint32_t lcore variable, the handle is a
> + * <code>uint32_t *</code>). The handle type is used to inform the
> + * access macros of the type of the values. A handle may be passed
> + * between modules and threads just like any pointer, but its value
> + * must be treated as an opaque identifier. An allocated handle never
> + * has the value NULL.
> + *
> + * @b Creation
> + *
> + * An lcore variable is created in two steps:
> + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
> + * 2. Allocate lcore variable storage and initialize the handle with
> + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
> + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time
> + * of module initialization, but may be done at any time.
> + *
> + * The lifetime of an lcore variable is not tied to the thread that
> + * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are
> + * available from the moment the lcore variable is created and
> + * continue to exist throughout the entire lifetime of the EAL,
> + * whether or not the lcore id is currently in use.
> + *
> + * Lcore variables cannot and need not be freed.
> + *
> + * @b Access
> + *
> + * The value of any lcore variable for any lcore id may be accessed
> + * from any thread (including unregistered threads), but it should
> + * only be *frequently* read from or written to by the owner.
> + *
> + * Values of the same lcore variable, associated with different lcore
> + * ids may be frequently read or written by their respective owners
> + * without risking false sharing.
> + *
> + * An appropriate synchronization mechanism (e.g., atomic loads and
> + * stores) should be employed to prevent data races between the owning
> + * thread and any other thread accessing the same value instance.
> + *
> + * The value of the lcore variable for a particular lcore id is
> + * accessed using @ref RTE_LCORE_VAR_LCORE.
> + *
> + * A common pattern is for an EAL thread or a registered non-EAL
> + * thread to access its own lcore variable value. For this purpose, a
> + * shorthand exists as @ref RTE_LCORE_VAR.
> + *
> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
> + * pointer with the same type as the value, it may not be directly
> + * dereferenced and must be treated as an opaque identifier.
> + *
> + * Lcore variable handles and value pointers may be freely passed
> + * between different threads.
> + *
> + * @b Storage
> + *
> + * An lcore variable's values may be of a primitive type like @c int,
> + * but would more typically be a @c struct.
> + *
> + * The lcore variable handle introduces a per-variable (not
> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
> + * there are some memory footprint gains to be made by organizing all
> + * per-lcore id data for a particular module as one lcore variable
> + * (e.g., as a struct).
> + *
> + * An application may define an lcore variable handle without ever
> + * allocating it.
> + *
> + * The size of an lcore variable's value must be less than the DPDK
> + * build-time constant @c RTE_MAX_LCORE_VAR.
> + *
> + * Lcore variables are stored in a series of lcore buffers, which are
> + * allocated from the libc heap. Heap allocation failures are treated
> + * as fatal.
> + *
> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
> + * of these constructs are designed to avoid false sharing. In the
> + * case of an lcore variable instance, the thread most recently
> + * accessing nearby data structures should almost-always be the lcore
> + * variable's owner. Adding padding will increase the effective memory
> + * working set size, potentially reducing performance.
> + *
> + * Lcore variable values are initialized to zero by default.
> + *
> + * Lcore variables are not stored in huge page memory.
> + *
> + * @b Example
> + *
> + * Below is an example of the use of an lcore variable:
> + *
> + * @code{.c}
> + * struct foo_lcore_state {
> + * int a;
> + * long b;
> + * };
> + *
> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
> + *
> + * long foo_get_a_plus_b(void)
> + * {
> + * struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
> + *
> + * return state->a + state->b;
> + * }
> + *
> + * RTE_INIT(rte_foo_init)
> + * {
> + * RTE_LCORE_VAR_ALLOC(lcore_states);
> + *
> + * unsigned int lcore_id;
> + * struct foo_lcore_state *state;
> + * RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
> + * (initialize 'state')
> + * }
> + *
> + * (other initialization)
> + * }
> + * @endcode
> + *
> + *
> + * @b Alternatives
> + *
> + * Lcore variables are designed to replace a pattern exemplified below:
> + * @code{.c}
> + * struct __rte_cache_aligned foo_lcore_state {
> + * int a;
> + * long b;
> + * RTE_CACHE_GUARD;
> + * };
> + *
> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
> + * @endcode
> + *
> + * This scheme is simple and effective, but has one drawback: the data
> + * is organized so that objects related to all lcores for a particular
> + * module are kept close in memory. At a bare minimum, this requires
> + * sizing data structures (e.g., using `__rte_cache_aligned`) to an
> + * even number of cache lines to avoid false sharing. With CPU
> + * hardware prefetching and memory loads resulting from speculative
> + * execution (functions which seemingly are getting more eager faster
> + * than they are getting more intelligent), one or more "guard" cache
> + * lines may be required to separate one lcore's data from another's
> + * and prevent false sharing.
> + *
> + * Lcore variables offer the advantage of working with, rather than
> + * against, the CPU's assumptions. A next-line hardware prefetcher,
> + * for example, may function as intended (i.e., to the benefit, not
> + * detriment, of system performance).
> + *
> + * Another alternative to @ref rte_lcore_var.h is the @ref
> + * rte_per_lcore.h API, which makes use of thread-local storage (TLS,
> + * e.g., GCC __thread or C11 _Thread_local). The main differences
> + * between by using the various forms of TLS (e.g., @ref
> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
> + * variables are:
> + *
> + * * The lifecycle of a thread-local variable instance is tied to
> + * that of the thread. The data cannot be accessed before the
> + * thread has been created, nor after it has exited. As a result,
> + * thread-local variables must be initialized in a "lazy" manner
> + * (e.g., at the point of thread creation). Lcore variables may be
> + * accessed immediately after having been allocated (which may occur
> + * before any thread beyond the main thread is running).
> + * * A thread-local variable is duplicated across all threads in the
> + * process, including unregistered non-EAL threads (i.e.,
> + * "regular" threads). For DPDK applications heavily relying on
> + * multi-threading (in conjunction to DPDK's "one thread per core"
> + * pattern), either by having many concurrent threads or
> + * creating/destroying threads at a high rate, an excessive use of
> + * thread-local variables may cause inefficiencies (e.g.,
> + * increased thread creation overhead due to thread-local storage
> + * initialization or increased total RAM footprint usage). Lcore
> + * variables *only* exist for threads with an lcore id.
> + * * If data in thread-local storage may be shared between threads
> + * (i.e., can a pointer to a thread-local variable be passed to
> + * and successfully dereferenced by non-owning thread) depends on
> + * the specifics of the TLS implementation. With GCC __thread and
> + * GCC _Thread_local, data sharing between threads is supported.
> + * In the C11 standard, accessing another thread's _Thread_local
> + * object is implementation-defined. Lcore variable instances may
> + * be accessed reliably by any thread.
> + */
For me this comment too wordy for code and belongs in the documentation instead.
Could also be reduced to more precise succinct language.
On Tue, 15 Oct 2024 11:33:38 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> +/**
> + * Allocate space in the per-lcore id buffers for an lcore variable.
> + *
> + * The pointer returned is only an opaque identifier of the variable. To
> + * get an actual pointer to a particular instance of the variable use
> + * @ref RTE_LCORE_VAR or @ref RTE_LCORE_VAR_LCORE.
> + *
> + * The lcore variable values' memory is set to zero.
> + *
> + * The allocation is always successful, barring a fatal exhaustion of
> + * the per-lcore id buffer space.
> + *
> + * rte_lcore_var_alloc() is not multi-thread safe.
> + *
> + * @param size
> + * The size (in bytes) of the variable's per-lcore id value. Must be > 0.
> + * @param align
> + * If 0, the values will be suitably aligned for any kind of type
> + * (i.e., alignof(max_align_t)). Otherwise, the values will be aligned
> + * on a multiple of *align*, which must be a power of 2 and equal or
> + * less than @c RTE_CACHE_LINE_SIZE.
> + * @return
> + * The variable's handle, stored in a void pointer value. The value
> + * is always non-NULL.
> + */
> +__rte_experimental
> +void *
> +rte_lcore_var_alloc(size_t size, size_t align);
This should have the similar function attributes as rte_malloc now does
where it tells the compiler the size, alignment, and aliasing.
Also there should be mention that there is no free function.
On 2024-10-16 00:33, Stephen Hemminger wrote:
> On Tue, 15 Oct 2024 11:33:38 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> + * Lcore variables
>> + *
>> + * This API provides a mechanism to create and access per-lcore id
>> + * variables in a space- and cycle-efficient manner.
>> + *
>> + * A per-lcore id variable (or lcore variable for short) holds a
>> + * unique value for each EAL thread and registered non-EAL
>> + * thread. There is one instance for each current and future lcore
>> + * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The
>> + * value of the lcore variable for one lcore id is independent from
>> + * the values assigned to other lcore ids within the same variable.
>> + *
>> + * In order to access the values of an lcore variable, a handle is
>> + * used. The type of the handle is a pointer to the value's type
>> + * (e.g., for an @c uint32_t lcore variable, the handle is a
>> + * <code>uint32_t *</code>). The handle type is used to inform the
>> + * access macros of the type of the values. A handle may be passed
>> + * between modules and threads just like any pointer, but its value
>> + * must be treated as an opaque identifier. An allocated handle never
>> + * has the value NULL.
>> + *
>> + * @b Creation
>> + *
>> + * An lcore variable is created in two steps:
>> + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
>> + * 2. Allocate lcore variable storage and initialize the handle with
>> + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
>> + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time
>> + * of module initialization, but may be done at any time.
>> + *
>> + * The lifetime of an lcore variable is not tied to the thread that
>> + * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are
>> + * available from the moment the lcore variable is created and
>> + * continue to exist throughout the entire lifetime of the EAL,
>> + * whether or not the lcore id is currently in use.
>> + *
>> + * Lcore variables cannot and need not be freed.
>> + *
>> + * @b Access
>> + *
>> + * The value of any lcore variable for any lcore id may be accessed
>> + * from any thread (including unregistered threads), but it should
>> + * only be *frequently* read from or written to by the owner.
>> + *
>> + * Values of the same lcore variable, associated with different lcore
>> + * ids may be frequently read or written by their respective owners
>> + * without risking false sharing.
>> + *
>> + * An appropriate synchronization mechanism (e.g., atomic loads and
>> + * stores) should be employed to prevent data races between the owning
>> + * thread and any other thread accessing the same value instance.
>> + *
>> + * The value of the lcore variable for a particular lcore id is
>> + * accessed using @ref RTE_LCORE_VAR_LCORE.
>> + *
>> + * A common pattern is for an EAL thread or a registered non-EAL
>> + * thread to access its own lcore variable value. For this purpose, a
>> + * shorthand exists as @ref RTE_LCORE_VAR.
>> + *
>> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
>> + * pointer with the same type as the value, it may not be directly
>> + * dereferenced and must be treated as an opaque identifier.
>> + *
>> + * Lcore variable handles and value pointers may be freely passed
>> + * between different threads.
>> + *
>> + * @b Storage
>> + *
>> + * An lcore variable's values may be of a primitive type like @c int,
>> + * but would more typically be a @c struct.
>> + *
>> + * The lcore variable handle introduces a per-variable (not
>> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
>> + * there are some memory footprint gains to be made by organizing all
>> + * per-lcore id data for a particular module as one lcore variable
>> + * (e.g., as a struct).
>> + *
>> + * An application may define an lcore variable handle without ever
>> + * allocating it.
>> + *
>> + * The size of an lcore variable's value must be less than the DPDK
>> + * build-time constant @c RTE_MAX_LCORE_VAR.
>> + *
>> + * Lcore variables are stored in a series of lcore buffers, which are
>> + * allocated from the libc heap. Heap allocation failures are treated
>> + * as fatal.
>> + *
>> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
>> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
>> + * of these constructs are designed to avoid false sharing. In the
>> + * case of an lcore variable instance, the thread most recently
>> + * accessing nearby data structures should almost-always be the lcore
>> + * variable's owner. Adding padding will increase the effective memory
>> + * working set size, potentially reducing performance.
>> + *
>> + * Lcore variable values are initialized to zero by default.
>> + *
>> + * Lcore variables are not stored in huge page memory.
>> + *
>> + * @b Example
>> + *
>> + * Below is an example of the use of an lcore variable:
>> + *
>> + * @code{.c}
>> + * struct foo_lcore_state {
>> + * int a;
>> + * long b;
>> + * };
>> + *
>> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
>> + *
>> + * long foo_get_a_plus_b(void)
>> + * {
>> + * struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
>> + *
>> + * return state->a + state->b;
>> + * }
>> + *
>> + * RTE_INIT(rte_foo_init)
>> + * {
>> + * RTE_LCORE_VAR_ALLOC(lcore_states);
>> + *
>> + * unsigned int lcore_id;
>> + * struct foo_lcore_state *state;
>> + * RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
>> + * (initialize 'state')
>> + * }
>> + *
>> + * (other initialization)
>> + * }
>> + * @endcode
>> + *
>> + *
>> + * @b Alternatives
>> + *
>> + * Lcore variables are designed to replace a pattern exemplified below:
>> + * @code{.c}
>> + * struct __rte_cache_aligned foo_lcore_state {
>> + * int a;
>> + * long b;
>> + * RTE_CACHE_GUARD;
>> + * };
>> + *
>> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
>> + * @endcode
>> + *
>> + * This scheme is simple and effective, but has one drawback: the data
>> + * is organized so that objects related to all lcores for a particular
>> + * module are kept close in memory. At a bare minimum, this requires
>> + * sizing data structures (e.g., using `__rte_cache_aligned`) to an
>> + * even number of cache lines to avoid false sharing. With CPU
>> + * hardware prefetching and memory loads resulting from speculative
>> + * execution (functions which seemingly are getting more eager faster
>> + * than they are getting more intelligent), one or more "guard" cache
>> + * lines may be required to separate one lcore's data from another's
>> + * and prevent false sharing.
>> + *
>> + * Lcore variables offer the advantage of working with, rather than
>> + * against, the CPU's assumptions. A next-line hardware prefetcher,
>> + * for example, may function as intended (i.e., to the benefit, not
>> + * detriment, of system performance).
>> + *
>> + * Another alternative to @ref rte_lcore_var.h is the @ref
>> + * rte_per_lcore.h API, which makes use of thread-local storage (TLS,
>> + * e.g., GCC __thread or C11 _Thread_local). The main differences
>> + * between by using the various forms of TLS (e.g., @ref
>> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
>> + * variables are:
>> + *
>> + * * The lifecycle of a thread-local variable instance is tied to
>> + * that of the thread. The data cannot be accessed before the
>> + * thread has been created, nor after it has exited. As a result,
>> + * thread-local variables must be initialized in a "lazy" manner
>> + * (e.g., at the point of thread creation). Lcore variables may be
>> + * accessed immediately after having been allocated (which may occur
>> + * before any thread beyond the main thread is running).
>> + * * A thread-local variable is duplicated across all threads in the
>> + * process, including unregistered non-EAL threads (i.e.,
>> + * "regular" threads). For DPDK applications heavily relying on
>> + * multi-threading (in conjunction to DPDK's "one thread per core"
>> + * pattern), either by having many concurrent threads or
>> + * creating/destroying threads at a high rate, an excessive use of
>> + * thread-local variables may cause inefficiencies (e.g.,
>> + * increased thread creation overhead due to thread-local storage
>> + * initialization or increased total RAM footprint usage). Lcore
>> + * variables *only* exist for threads with an lcore id.
>> + * * If data in thread-local storage may be shared between threads
>> + * (i.e., can a pointer to a thread-local variable be passed to
>> + * and successfully dereferenced by non-owning thread) depends on
>> + * the specifics of the TLS implementation. With GCC __thread and
>> + * GCC _Thread_local, data sharing between threads is supported.
>> + * In the C11 standard, accessing another thread's _Thread_local
>> + * object is implementation-defined. Lcore variable instances may
>> + * be accessed reliably by any thread.
>> + */
>
> For me this comment too wordy for code and belongs in the documentation instead.
> Could also be reduced to more precise succinct language.
>
>
Provided this makes it into RC1, I can move most of this and some of the
information in eal_common_lcore_var.c comments into "the documentation"
as a RC2 patch.
If "the documentation" is a the EAL programmer's guide, a description of
lcore variables (with pictures!) in sufficient detail (both API and
implementation) would make up a large fraction of it. That would look
silly and in the way of more important things. Lcore variables is just a
tiny bit of infrastructure. Other, more central EAL features, like the
RTE spinlock, they have no mention at all in the EAL docs.
Another option I suppose is to documentation it separately from the
"main" EAL programmer's guide, but - correct me if I'm wrong here -
there seem to be no precedent for doing this.
On 2024-10-16 00:35, Stephen Hemminger wrote:
> On Tue, 15 Oct 2024 11:33:38 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> +/**
>> + * Allocate space in the per-lcore id buffers for an lcore variable.
>> + *
>> + * The pointer returned is only an opaque identifier of the variable. To
>> + * get an actual pointer to a particular instance of the variable use
>> + * @ref RTE_LCORE_VAR or @ref RTE_LCORE_VAR_LCORE.
>> + *
>> + * The lcore variable values' memory is set to zero.
>> + *
>> + * The allocation is always successful, barring a fatal exhaustion of
>> + * the per-lcore id buffer space.
>> + *
>> + * rte_lcore_var_alloc() is not multi-thread safe.
>> + *
>> + * @param size
>> + * The size (in bytes) of the variable's per-lcore id value. Must be > 0.
>> + * @param align
>> + * If 0, the values will be suitably aligned for any kind of type
>> + * (i.e., alignof(max_align_t)). Otherwise, the values will be aligned
>> + * on a multiple of *align*, which must be a power of 2 and equal or
>> + * less than @c RTE_CACHE_LINE_SIZE.
>> + * @return
>> + * The variable's handle, stored in a void pointer value. The value
>> + * is always non-NULL.
>> + */
>> +__rte_experimental
>> +void *
>> +rte_lcore_var_alloc(size_t size, size_t align);
>
> This should have the similar function attributes as rte_malloc now does
> where it tells the compiler the size, alignment, and aliasing.
>
> Also there should be mention that there is no free function.
OK, both fixed. Thanks.
16/10/2024 06:13, Mattias Rönnblom:
>
> On 2024-10-16 00:33, Stephen Hemminger wrote:
> > On Tue, 15 Oct 2024 11:33:38 +0200
> > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >
> >> + * Lcore variables
> >> + *
> >> + * This API provides a mechanism to create and access per-lcore id
> >> + * variables in a space- and cycle-efficient manner.
> >> + *
> >> + * A per-lcore id variable (or lcore variable for short) holds a
> >> + * unique value for each EAL thread and registered non-EAL
> >> + * thread. There is one instance for each current and future lcore
> >> + * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The
> >> + * value of the lcore variable for one lcore id is independent from
> >> + * the values assigned to other lcore ids within the same variable.
> >> + *
> >> + * In order to access the values of an lcore variable, a handle is
> >> + * used. The type of the handle is a pointer to the value's type
> >> + * (e.g., for an @c uint32_t lcore variable, the handle is a
> >> + * <code>uint32_t *</code>). The handle type is used to inform the
> >> + * access macros of the type of the values. A handle may be passed
> >> + * between modules and threads just like any pointer, but its value
> >> + * must be treated as an opaque identifier. An allocated handle never
> >> + * has the value NULL.
> >> + *
> >> + * @b Creation
> >> + *
> >> + * An lcore variable is created in two steps:
> >> + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
> >> + * 2. Allocate lcore variable storage and initialize the handle with
> >> + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
> >> + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time
> >> + * of module initialization, but may be done at any time.
> >> + *
> >> + * The lifetime of an lcore variable is not tied to the thread that
> >> + * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are
> >> + * available from the moment the lcore variable is created and
> >> + * continue to exist throughout the entire lifetime of the EAL,
> >> + * whether or not the lcore id is currently in use.
> >> + *
> >> + * Lcore variables cannot and need not be freed.
> >> + *
> >> + * @b Access
> >> + *
> >> + * The value of any lcore variable for any lcore id may be accessed
> >> + * from any thread (including unregistered threads), but it should
> >> + * only be *frequently* read from or written to by the owner.
> >> + *
> >> + * Values of the same lcore variable, associated with different lcore
> >> + * ids may be frequently read or written by their respective owners
> >> + * without risking false sharing.
> >> + *
> >> + * An appropriate synchronization mechanism (e.g., atomic loads and
> >> + * stores) should be employed to prevent data races between the owning
> >> + * thread and any other thread accessing the same value instance.
> >> + *
> >> + * The value of the lcore variable for a particular lcore id is
> >> + * accessed using @ref RTE_LCORE_VAR_LCORE.
> >> + *
> >> + * A common pattern is for an EAL thread or a registered non-EAL
> >> + * thread to access its own lcore variable value. For this purpose, a
> >> + * shorthand exists as @ref RTE_LCORE_VAR.
> >> + *
> >> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
> >> + * pointer with the same type as the value, it may not be directly
> >> + * dereferenced and must be treated as an opaque identifier.
> >> + *
> >> + * Lcore variable handles and value pointers may be freely passed
> >> + * between different threads.
> >> + *
> >> + * @b Storage
> >> + *
> >> + * An lcore variable's values may be of a primitive type like @c int,
> >> + * but would more typically be a @c struct.
> >> + *
> >> + * The lcore variable handle introduces a per-variable (not
> >> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
> >> + * there are some memory footprint gains to be made by organizing all
> >> + * per-lcore id data for a particular module as one lcore variable
> >> + * (e.g., as a struct).
> >> + *
> >> + * An application may define an lcore variable handle without ever
> >> + * allocating it.
> >> + *
> >> + * The size of an lcore variable's value must be less than the DPDK
> >> + * build-time constant @c RTE_MAX_LCORE_VAR.
> >> + *
> >> + * Lcore variables are stored in a series of lcore buffers, which are
> >> + * allocated from the libc heap. Heap allocation failures are treated
> >> + * as fatal.
> >> + *
> >> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
> >> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
> >> + * of these constructs are designed to avoid false sharing. In the
> >> + * case of an lcore variable instance, the thread most recently
> >> + * accessing nearby data structures should almost-always be the lcore
> >> + * variable's owner. Adding padding will increase the effective memory
> >> + * working set size, potentially reducing performance.
> >> + *
> >> + * Lcore variable values are initialized to zero by default.
> >> + *
> >> + * Lcore variables are not stored in huge page memory.
> >> + *
> >> + * @b Example
> >> + *
> >> + * Below is an example of the use of an lcore variable:
> >> + *
> >> + * @code{.c}
> >> + * struct foo_lcore_state {
> >> + * int a;
> >> + * long b;
> >> + * };
> >> + *
> >> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
> >> + *
> >> + * long foo_get_a_plus_b(void)
> >> + * {
> >> + * struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
> >> + *
> >> + * return state->a + state->b;
> >> + * }
> >> + *
> >> + * RTE_INIT(rte_foo_init)
> >> + * {
> >> + * RTE_LCORE_VAR_ALLOC(lcore_states);
> >> + *
> >> + * unsigned int lcore_id;
> >> + * struct foo_lcore_state *state;
> >> + * RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
> >> + * (initialize 'state')
> >> + * }
> >> + *
> >> + * (other initialization)
> >> + * }
> >> + * @endcode
> >> + *
> >> + *
> >> + * @b Alternatives
> >> + *
> >> + * Lcore variables are designed to replace a pattern exemplified below:
> >> + * @code{.c}
> >> + * struct __rte_cache_aligned foo_lcore_state {
> >> + * int a;
> >> + * long b;
> >> + * RTE_CACHE_GUARD;
> >> + * };
> >> + *
> >> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
> >> + * @endcode
> >> + *
> >> + * This scheme is simple and effective, but has one drawback: the data
> >> + * is organized so that objects related to all lcores for a particular
> >> + * module are kept close in memory. At a bare minimum, this requires
> >> + * sizing data structures (e.g., using `__rte_cache_aligned`) to an
> >> + * even number of cache lines to avoid false sharing. With CPU
> >> + * hardware prefetching and memory loads resulting from speculative
> >> + * execution (functions which seemingly are getting more eager faster
> >> + * than they are getting more intelligent), one or more "guard" cache
> >> + * lines may be required to separate one lcore's data from another's
> >> + * and prevent false sharing.
> >> + *
> >> + * Lcore variables offer the advantage of working with, rather than
> >> + * against, the CPU's assumptions. A next-line hardware prefetcher,
> >> + * for example, may function as intended (i.e., to the benefit, not
> >> + * detriment, of system performance).
> >> + *
> >> + * Another alternative to @ref rte_lcore_var.h is the @ref
> >> + * rte_per_lcore.h API, which makes use of thread-local storage (TLS,
> >> + * e.g., GCC __thread or C11 _Thread_local). The main differences
> >> + * between by using the various forms of TLS (e.g., @ref
> >> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
> >> + * variables are:
> >> + *
> >> + * * The lifecycle of a thread-local variable instance is tied to
> >> + * that of the thread. The data cannot be accessed before the
> >> + * thread has been created, nor after it has exited. As a result,
> >> + * thread-local variables must be initialized in a "lazy" manner
> >> + * (e.g., at the point of thread creation). Lcore variables may be
> >> + * accessed immediately after having been allocated (which may occur
> >> + * before any thread beyond the main thread is running).
> >> + * * A thread-local variable is duplicated across all threads in the
> >> + * process, including unregistered non-EAL threads (i.e.,
> >> + * "regular" threads). For DPDK applications heavily relying on
> >> + * multi-threading (in conjunction to DPDK's "one thread per core"
> >> + * pattern), either by having many concurrent threads or
> >> + * creating/destroying threads at a high rate, an excessive use of
> >> + * thread-local variables may cause inefficiencies (e.g.,
> >> + * increased thread creation overhead due to thread-local storage
> >> + * initialization or increased total RAM footprint usage). Lcore
> >> + * variables *only* exist for threads with an lcore id.
> >> + * * If data in thread-local storage may be shared between threads
> >> + * (i.e., can a pointer to a thread-local variable be passed to
> >> + * and successfully dereferenced by non-owning thread) depends on
> >> + * the specifics of the TLS implementation. With GCC __thread and
> >> + * GCC _Thread_local, data sharing between threads is supported.
> >> + * In the C11 standard, accessing another thread's _Thread_local
> >> + * object is implementation-defined. Lcore variable instances may
> >> + * be accessed reliably by any thread.
> >> + */
> >
> > For me this comment too wordy for code and belongs in the documentation instead.
> > Could also be reduced to more precise succinct language.
I agree, this is what I was asking for.
> Provided this makes it into RC1, I can move most of this and some of the
> information in eal_common_lcore_var.c comments into "the documentation"
> as a RC2 patch.
>
> If "the documentation" is a the EAL programmer's guide, a description of
> lcore variables (with pictures!) in sufficient detail (both API and
> implementation) would make up a large fraction of it. That would look
> silly and in the way of more important things. Lcore variables is just a
> tiny bit of infrastructure. Other, more central EAL features, like the
> RTE spinlock, they have no mention at all in the EAL docs.
Please don't take what exists and not exists as an absolute model.
We must improve the doc, split it better and fill the gaps.
In the meantime we want new features like this one to be properly documented.
> Another option I suppose is to documentation it separately from the
> "main" EAL programmer's guide, but - correct me if I'm wrong here -
> there seem to be no precedent for doing this.
For instance, the services cores are a separate chapter of the prog guide.
The lcore variables should be a separate chapter as well.
On 2024-10-16 10:17, Thomas Monjalon wrote:
> 16/10/2024 06:13, Mattias Rönnblom:
>>
>> On 2024-10-16 00:33, Stephen Hemminger wrote:
>>> On Tue, 15 Oct 2024 11:33:38 +0200
>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>>>
>>>> + * Lcore variables
>>>> + *
>>>> + * This API provides a mechanism to create and access per-lcore id
>>>> + * variables in a space- and cycle-efficient manner.
>>>> + *
>>>> + * A per-lcore id variable (or lcore variable for short) holds a
>>>> + * unique value for each EAL thread and registered non-EAL
>>>> + * thread. There is one instance for each current and future lcore
>>>> + * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The
>>>> + * value of the lcore variable for one lcore id is independent from
>>>> + * the values assigned to other lcore ids within the same variable.
>>>> + *
>>>> + * In order to access the values of an lcore variable, a handle is
>>>> + * used. The type of the handle is a pointer to the value's type
>>>> + * (e.g., for an @c uint32_t lcore variable, the handle is a
>>>> + * <code>uint32_t *</code>). The handle type is used to inform the
>>>> + * access macros of the type of the values. A handle may be passed
>>>> + * between modules and threads just like any pointer, but its value
>>>> + * must be treated as an opaque identifier. An allocated handle never
>>>> + * has the value NULL.
>>>> + *
>>>> + * @b Creation
>>>> + *
>>>> + * An lcore variable is created in two steps:
>>>> + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
>>>> + * 2. Allocate lcore variable storage and initialize the handle with
>>>> + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
>>>> + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time
>>>> + * of module initialization, but may be done at any time.
>>>> + *
>>>> + * The lifetime of an lcore variable is not tied to the thread that
>>>> + * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are
>>>> + * available from the moment the lcore variable is created and
>>>> + * continue to exist throughout the entire lifetime of the EAL,
>>>> + * whether or not the lcore id is currently in use.
>>>> + *
>>>> + * Lcore variables cannot and need not be freed.
>>>> + *
>>>> + * @b Access
>>>> + *
>>>> + * The value of any lcore variable for any lcore id may be accessed
>>>> + * from any thread (including unregistered threads), but it should
>>>> + * only be *frequently* read from or written to by the owner.
>>>> + *
>>>> + * Values of the same lcore variable, associated with different lcore
>>>> + * ids may be frequently read or written by their respective owners
>>>> + * without risking false sharing.
>>>> + *
>>>> + * An appropriate synchronization mechanism (e.g., atomic loads and
>>>> + * stores) should be employed to prevent data races between the owning
>>>> + * thread and any other thread accessing the same value instance.
>>>> + *
>>>> + * The value of the lcore variable for a particular lcore id is
>>>> + * accessed using @ref RTE_LCORE_VAR_LCORE.
>>>> + *
>>>> + * A common pattern is for an EAL thread or a registered non-EAL
>>>> + * thread to access its own lcore variable value. For this purpose, a
>>>> + * shorthand exists as @ref RTE_LCORE_VAR.
>>>> + *
>>>> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
>>>> + * pointer with the same type as the value, it may not be directly
>>>> + * dereferenced and must be treated as an opaque identifier.
>>>> + *
>>>> + * Lcore variable handles and value pointers may be freely passed
>>>> + * between different threads.
>>>> + *
>>>> + * @b Storage
>>>> + *
>>>> + * An lcore variable's values may be of a primitive type like @c int,
>>>> + * but would more typically be a @c struct.
>>>> + *
>>>> + * The lcore variable handle introduces a per-variable (not
>>>> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
>>>> + * there are some memory footprint gains to be made by organizing all
>>>> + * per-lcore id data for a particular module as one lcore variable
>>>> + * (e.g., as a struct).
>>>> + *
>>>> + * An application may define an lcore variable handle without ever
>>>> + * allocating it.
>>>> + *
>>>> + * The size of an lcore variable's value must be less than the DPDK
>>>> + * build-time constant @c RTE_MAX_LCORE_VAR.
>>>> + *
>>>> + * Lcore variables are stored in a series of lcore buffers, which are
>>>> + * allocated from the libc heap. Heap allocation failures are treated
>>>> + * as fatal.
>>>> + *
>>>> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
>>>> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
>>>> + * of these constructs are designed to avoid false sharing. In the
>>>> + * case of an lcore variable instance, the thread most recently
>>>> + * accessing nearby data structures should almost-always be the lcore
>>>> + * variable's owner. Adding padding will increase the effective memory
>>>> + * working set size, potentially reducing performance.
>>>> + *
>>>> + * Lcore variable values are initialized to zero by default.
>>>> + *
>>>> + * Lcore variables are not stored in huge page memory.
>>>> + *
>>>> + * @b Example
>>>> + *
>>>> + * Below is an example of the use of an lcore variable:
>>>> + *
>>>> + * @code{.c}
>>>> + * struct foo_lcore_state {
>>>> + * int a;
>>>> + * long b;
>>>> + * };
>>>> + *
>>>> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
>>>> + *
>>>> + * long foo_get_a_plus_b(void)
>>>> + * {
>>>> + * struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
>>>> + *
>>>> + * return state->a + state->b;
>>>> + * }
>>>> + *
>>>> + * RTE_INIT(rte_foo_init)
>>>> + * {
>>>> + * RTE_LCORE_VAR_ALLOC(lcore_states);
>>>> + *
>>>> + * unsigned int lcore_id;
>>>> + * struct foo_lcore_state *state;
>>>> + * RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
>>>> + * (initialize 'state')
>>>> + * }
>>>> + *
>>>> + * (other initialization)
>>>> + * }
>>>> + * @endcode
>>>> + *
>>>> + *
>>>> + * @b Alternatives
>>>> + *
>>>> + * Lcore variables are designed to replace a pattern exemplified below:
>>>> + * @code{.c}
>>>> + * struct __rte_cache_aligned foo_lcore_state {
>>>> + * int a;
>>>> + * long b;
>>>> + * RTE_CACHE_GUARD;
>>>> + * };
>>>> + *
>>>> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
>>>> + * @endcode
>>>> + *
>>>> + * This scheme is simple and effective, but has one drawback: the data
>>>> + * is organized so that objects related to all lcores for a particular
>>>> + * module are kept close in memory. At a bare minimum, this requires
>>>> + * sizing data structures (e.g., using `__rte_cache_aligned`) to an
>>>> + * even number of cache lines to avoid false sharing. With CPU
>>>> + * hardware prefetching and memory loads resulting from speculative
>>>> + * execution (functions which seemingly are getting more eager faster
>>>> + * than they are getting more intelligent), one or more "guard" cache
>>>> + * lines may be required to separate one lcore's data from another's
>>>> + * and prevent false sharing.
>>>> + *
>>>> + * Lcore variables offer the advantage of working with, rather than
>>>> + * against, the CPU's assumptions. A next-line hardware prefetcher,
>>>> + * for example, may function as intended (i.e., to the benefit, not
>>>> + * detriment, of system performance).
>>>> + *
>>>> + * Another alternative to @ref rte_lcore_var.h is the @ref
>>>> + * rte_per_lcore.h API, which makes use of thread-local storage (TLS,
>>>> + * e.g., GCC __thread or C11 _Thread_local). The main differences
>>>> + * between by using the various forms of TLS (e.g., @ref
>>>> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
>>>> + * variables are:
>>>> + *
>>>> + * * The lifecycle of a thread-local variable instance is tied to
>>>> + * that of the thread. The data cannot be accessed before the
>>>> + * thread has been created, nor after it has exited. As a result,
>>>> + * thread-local variables must be initialized in a "lazy" manner
>>>> + * (e.g., at the point of thread creation). Lcore variables may be
>>>> + * accessed immediately after having been allocated (which may occur
>>>> + * before any thread beyond the main thread is running).
>>>> + * * A thread-local variable is duplicated across all threads in the
>>>> + * process, including unregistered non-EAL threads (i.e.,
>>>> + * "regular" threads). For DPDK applications heavily relying on
>>>> + * multi-threading (in conjunction to DPDK's "one thread per core"
>>>> + * pattern), either by having many concurrent threads or
>>>> + * creating/destroying threads at a high rate, an excessive use of
>>>> + * thread-local variables may cause inefficiencies (e.g.,
>>>> + * increased thread creation overhead due to thread-local storage
>>>> + * initialization or increased total RAM footprint usage). Lcore
>>>> + * variables *only* exist for threads with an lcore id.
>>>> + * * If data in thread-local storage may be shared between threads
>>>> + * (i.e., can a pointer to a thread-local variable be passed to
>>>> + * and successfully dereferenced by non-owning thread) depends on
>>>> + * the specifics of the TLS implementation. With GCC __thread and
>>>> + * GCC _Thread_local, data sharing between threads is supported.
>>>> + * In the C11 standard, accessing another thread's _Thread_local
>>>> + * object is implementation-defined. Lcore variable instances may
>>>> + * be accessed reliably by any thread.
>>>> + */
>>>
>>> For me this comment too wordy for code and belongs in the documentation instead.
>>> Could also be reduced to more precise succinct language.
>
> I agree, this is what I was asking for.
>
>
>> Provided this makes it into RC1, I can move most of this and some of the
>> information in eal_common_lcore_var.c comments into "the documentation"
>> as a RC2 patch.
>>
>> If "the documentation" is a the EAL programmer's guide, a description of
>> lcore variables (with pictures!) in sufficient detail (both API and
>> implementation) would make up a large fraction of it. That would look
>> silly and in the way of more important things. Lcore variables is just a
>> tiny bit of infrastructure. Other, more central EAL features, like the
>> RTE spinlock, they have no mention at all in the EAL docs.
>
> Please don't take what exists and not exists as an absolute model.
> We must improve the doc, split it better and fill the gaps.
> In the meantime we want new features like this one to be properly documented.
>
I don't have an issue with raising the bar for new features.
>
>> Another option I suppose is to documentation it separately from the
>> "main" EAL programmer's guide, but - correct me if I'm wrong here -
>> there seem to be no precedent for doing this.
>
> For instance, the services cores are a separate chapter of the prog guide.
Right, forgot about the service cores. I will follow that model.
> The lcore variables should be a separate chapter as well.
>
On Wed, 16 Oct 2024 15:19:09 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> This patch set introduces a new API <rte_lcore_var.h> for static
> per-lcore id data allocation.
>
> Please refer to the <rte_lcore_var.h> API documentation for both a
> rationale for this new API, and a comparison to the alternatives
> available.
>
> The question on how to best allocate static per-lcore memory has been
> up several times on the dev mailing list, for example in the thread on
> "random: use per lcore state" RFC by Stephen Hemminger.
>
> Lcore variables are surely not the answer to all your per-lcore-data
> needs, since it only allows for more-or-less static allocation. In the
> author's opinion, it does however provide a reasonably simple and
> clean and seemingly very much performant solution to a real problem.
>
> Mattias Rönnblom (7):
> eal: add static per-lcore memory allocation facility
> eal: add lcore variable functional tests
> eal: add lcore variable performance test
> random: keep PRNG state in lcore variable
> power: keep per-lcore state in lcore variable
> service: keep per-lcore state in lcore variable
> eal: keep per-lcore power intrinsics state in lcore variable
Still too wordy, would you mind if I have a try and summarizing and
running the text through an editor tool?
On 2024-10-16 16:58, Stephen Hemminger wrote:
> On Wed, 16 Oct 2024 15:19:09 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> This patch set introduces a new API <rte_lcore_var.h> for static
>> per-lcore id data allocation.
>>
>> Please refer to the <rte_lcore_var.h> API documentation for both a
>> rationale for this new API, and a comparison to the alternatives
>> available.
>>
>> The question on how to best allocate static per-lcore memory has been
>> up several times on the dev mailing list, for example in the thread on
>> "random: use per lcore state" RFC by Stephen Hemminger.
>>
>> Lcore variables are surely not the answer to all your per-lcore-data
>> needs, since it only allows for more-or-less static allocation. In the
>> author's opinion, it does however provide a reasonably simple and
>> clean and seemingly very much performant solution to a real problem.
>>
>> Mattias Rönnblom (7):
>> eal: add static per-lcore memory allocation facility
>> eal: add lcore variable functional tests
>> eal: add lcore variable performance test
>> random: keep PRNG state in lcore variable
>> power: keep per-lcore state in lcore variable
>> service: keep per-lcore state in lcore variable
>> eal: keep per-lcore power intrinsics state in lcore variable
>
> Still too wordy, would you mind if I have a try and summarizing and
> running the text through an editor tool?
I think you need to be a little more wordy here. What text? The cover
text? That won't survive anyway.
@@ -282,6 +282,12 @@ F: lib/eal/include/rte_random.h
F: lib/eal/common/rte_random.c
F: app/test/test_rand_perf.c
+Lcore Variables
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_lcore_var.h
+F: lib/eal/common/eal_common_lcore_var.c
+F: app/test/test_lcore_var.c
+
ARM v7
M: Wathsala Vithanage <wathsala.vithanage@arm.com>
F: config/arm/
@@ -41,6 +41,7 @@
/* EAL defines */
#define RTE_CACHE_GUARD_LINES 1
#define RTE_MAX_HEAPS 32
+#define RTE_MAX_LCORE_VAR 1048576
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
@@ -99,6 +99,7 @@ The public API headers are grouped by topics:
[interrupts](@ref rte_interrupts.h),
[launch](@ref rte_launch.h),
[lcore](@ref rte_lcore.h),
+ [lcore variables](@ref rte_lcore_var.h),
[per-lcore](@ref rte_per_lcore.h),
[service cores](@ref rte_service.h),
[keepalive](@ref rte_keepalive.h),
@@ -429,12 +429,43 @@ with them once they're registered.
Per-lcore and Shared Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. note::
-
- lcore refers to a logical execution unit of the processor, sometimes called a hardware *thread*.
-
-Shared variables are the default behavior.
-Per-lcore variables are implemented using *Thread Local Storage* (TLS) to provide per-thread local storage.
+By default, static variables, memory blocks allocated on the DPDK
+heap, and other types of memory are shared by all DPDK threads.
+
+An application, a DPDK library, or a PMD may opt to keep per-thread state.
+
+Per-thread data can be maintained using either *lcore variables* (see
+``rte_lcore_var.h``), *thread-local storage (TLS)* (see
+``rte_per_lcore.h``), or a static array of ``RTE_MAX_LCORE`` elements,
+indexed by ``rte_lcore_id()``. These methods allow per-lcore data to be
+largely internal to the module and not directly exposed in its
+API. Another approach is to explicitly handle per-thread aspects in
+the API (e.g., the ports in the Eventdev API).
+
+Lcore variables are suitable for small objects that are statically
+allocated at the time of module or application initialization. An
+lcore variable takes on one value for each lcore ID-equipped thread
+(i.e., for both EAL threads and registered non-EAL threads, in total
+``RTE_MAX_LCORE`` instances). The lifetime of lcore variables is
+independent of the owning threads and can, therefore, be initialized
+before the threads are created.
+
+Variables with thread-local storage are allocated when the thread is
+created and exist until the thread terminates. These are applicable
+for every thread in the process. Only very small objects should be
+allocated in TLS, as large TLS objects can significantly slow down
+thread creation and may unnecessarily increase the memory footprint of
+applications that extensively use unregistered threads.
+
+A common but now largely obsolete DPDK pattern is to use a static
+array sized according to the maximum number of lcore ID-equipped
+threads (i.e., with ``RTE_MAX_LCORE`` elements). To avoid *false
+sharing*, each element must be both cache-aligned and include an
+``RTE_CACHE_GUARD``. This extensive use of padding causes internal
+fragmentation (i.e., unused space) and reduces cache hit rates.
+
+For more discussions on per-lcore state, refer to the
+``rte_lcore_var.h`` API documentation.
Logs
~~~~
@@ -113,6 +113,20 @@ New Features
* Added independent enqueue feature.
+* **Added EAL per-lcore static memory allocation facility.**
+
+ Added EAL API <rte_lcore_var.h> for statically allocating small,
+ frequently-accessed data structures, for which one instance should
+ exist for each EAL thread and registered non-EAL thread.
+
+ With lcore variables, data is organized spatially on a per-lcore id
+ basis, rather than per library or PMD, avoiding the need for cache
+ aligning (or RTE_CACHE_GUARDing) data structures, which in turn
+ reduces CPU cache internal fragmentation, improving performance.
+
+ Lcore variables are similar to thread-local storage (TLS, e.g.,
+ C11 _Thread_local), but decoupling the values' life time from that
+ of the threads.
Removed Items
-------------
new file mode 100644
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#include <inttypes.h>
+#include <stdlib.h>
+
+#ifdef RTE_EXEC_ENV_WINDOWS
+#include <malloc.h>
+#endif
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include <rte_lcore_var.h>
+
+#include "eal_private.h"
+#include "eal_lcore_var.h"
+
+/*
+ * An lcore var buffer stores at a minimum one, but usually many,
+ * lcore variables. The value instances for all lcore ids are stored
+ * in the same buffer.
+ *
+ * The address of the value of a particular lcore variable associated
+ * with a particular lcore id is:
+ * buffer->data + offset + lcore_id * RTE_MAX_LCORE_VAR.
+ *
+ * In this way, the values associated with a particular lcore id are
+ * grouped spatially close (in the data array), and no padding is
+ * required to prevent false sharing.
+ *
+ * The (buffer->data + offset) base pointer is what is being returned
+ * to the API user as an opaque handle. The handle is a pointer to the
+ * value for lcore id 0, for that lcore variable.
+ *
+ * The implementation maintains a current lcore var buffer (being
+ * allocated from), and an offset representing the amount of data
+ * already allocated (in bytes) in that buffer.
+ *
+ * The offset is progressively incremented (by the size of the
+ * just-allocated lcore variable), as lcore variables are being
+ * allocated.
+ *
+ * When one lcore var buffer is full, a new is allocated off the heap.
+ *
+ * The lcore var buffers are arranged in a single-link list, to allow
+ * freeing them at the point of rte_eal_cleanup(), and thereby avoid
+ * false positives from tools like valgrind memcheck.
+ */
+struct lcore_var_buffer {
+ char data[RTE_MAX_LCORE_VAR * RTE_MAX_LCORE];
+ struct lcore_var_buffer *prev;
+};
+
+static struct lcore_var_buffer *current_buffer;
+
+/* initialized to trigger buffer allocation on first allocation */
+static size_t offset = RTE_MAX_LCORE_VAR;
+
+static void *
+lcore_var_alloc(size_t size, size_t align)
+{
+ void *handle;
+ unsigned int lcore_id;
+ void *value;
+
+ offset = RTE_ALIGN_CEIL(offset, align);
+
+ if (offset + size > RTE_MAX_LCORE_VAR) {
+ struct lcore_var_buffer *prev = current_buffer;
+ size_t alloc_size =
+ RTE_ALIGN_CEIL(sizeof(struct lcore_var_buffer),
+ RTE_CACHE_LINE_SIZE);
+#ifdef RTE_EXEC_ENV_WINDOWS
+ current_buffer = _aligned_malloc(alloc_size, RTE_CACHE_LINE_SIZE);
+#else
+ current_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE, alloc_size);
+
+#endif
+ RTE_VERIFY(current_buffer != NULL);
+
+ current_buffer->prev = prev;
+
+ offset = 0;
+ }
+
+ handle = ¤t_buffer->data[offset];
+
+ offset += size;
+
+ RTE_LCORE_VAR_FOREACH(lcore_id, value, handle)
+ memset(value, 0, size);
+
+ EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a "
+ "%"PRIuPTR"-byte alignment", size, align);
+
+ return handle;
+}
+
+void *
+rte_lcore_var_alloc(size_t size, size_t align)
+{
+ /* Having the per-lcore buffer size aligned on cache lines
+ * assures as well as having the base pointer aligned on cache
+ * size assures that aligned offsets also translate to alipgned
+ * pointers across all values.
+ */
+ RTE_BUILD_BUG_ON(RTE_MAX_LCORE_VAR % RTE_CACHE_LINE_SIZE != 0);
+ RTE_VERIFY(align <= RTE_CACHE_LINE_SIZE);
+ RTE_VERIFY(size <= RTE_MAX_LCORE_VAR);
+
+ /* '0' means asking for worst-case alignment requirements */
+ if (align == 0)
+#ifdef RTE_TOOLCHAIN_MSVC
+ /* MSVC <stddef.h> is missing the max_align_t typedef */
+ align = alignof(double);
+#else
+ align = alignof(max_align_t);
+#endif
+
+ RTE_VERIFY(rte_is_power_of_2(align));
+
+ return lcore_var_alloc(size, align);
+}
+
+void
+eal_lcore_var_cleanup(void)
+{
+ while (current_buffer != NULL) {
+ struct lcore_var_buffer *prev = current_buffer->prev;
+
+ free(current_buffer);
+
+ current_buffer = prev;
+ }
+}
new file mode 100644
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Ericsson AB.
+ */
+
+#ifndef EAL_LCORE_VAR_H
+#define EAL_LCORE_VAR_H
+
+void
+eal_lcore_var_cleanup(void);
+
+#endif
@@ -18,6 +18,7 @@ sources += files(
'eal_common_interrupts.c',
'eal_common_launch.c',
'eal_common_lcore.c',
+ 'eal_common_lcore_var.c',
'eal_common_mcfg.c',
'eal_common_memalloc.c',
'eal_common_memory.c',
@@ -47,6 +47,7 @@
#include "eal_private.h"
#include "eal_thread.h"
+#include "eal_lcore_var.h"
#include "eal_internal_cfg.h"
#include "eal_filesystem.h"
#include "eal_hugepages.h"
@@ -941,6 +942,7 @@ rte_eal_cleanup(void)
/* after this point, any DPDK pointers will become dangling */
rte_eal_memory_detach();
eal_cleanup_config(internal_conf);
+ eal_lcore_var_cleanup();
return 0;
}
@@ -27,6 +27,7 @@ headers += files(
'rte_keepalive.h',
'rte_launch.h',
'rte_lcore.h',
+ 'rte_lcore_var.h',
'rte_lock_annotations.h',
'rte_malloc.h',
'rte_mcslock.h',
new file mode 100644
@@ -0,0 +1,391 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#ifndef _RTE_LCORE_VAR_H_
+#define _RTE_LCORE_VAR_H_
+
+/**
+ * @file
+ *
+ * Lcore variables
+ *
+ * This API provides a mechanism to create and access per-lcore id
+ * variables in a space- and cycle-efficient manner.
+ *
+ * A per-lcore id variable (or lcore variable for short) holds a
+ * unique value for each EAL thread and registered non-EAL
+ * thread. There is one instance for each current and future lcore
+ * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The
+ * value of the lcore variable for one lcore id is independent from
+ * the values assigned to other lcore ids within the same variable.
+ *
+ * In order to access the values of an lcore variable, a handle is
+ * used. The type of the handle is a pointer to the value's type
+ * (e.g., for an @c uint32_t lcore variable, the handle is a
+ * <code>uint32_t *</code>). The handle type is used to inform the
+ * access macros of the type of the values. A handle may be passed
+ * between modules and threads just like any pointer, but its value
+ * must be treated as an opaque identifier. An allocated handle never
+ * has the value NULL.
+ *
+ * @b Creation
+ *
+ * An lcore variable is created in two steps:
+ * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
+ * 2. Allocate lcore variable storage and initialize the handle with
+ * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
+ * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time
+ * of module initialization, but may be done at any time.
+ *
+ * The lifetime of an lcore variable is not tied to the thread that
+ * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are
+ * available from the moment the lcore variable is created and
+ * continue to exist throughout the entire lifetime of the EAL,
+ * whether or not the lcore id is currently in use.
+ *
+ * Lcore variables cannot and need not be freed.
+ *
+ * @b Access
+ *
+ * The value of any lcore variable for any lcore id may be accessed
+ * from any thread (including unregistered threads), but it should
+ * only be *frequently* read from or written to by the owner.
+ *
+ * Values of the same lcore variable, associated with different lcore
+ * ids may be frequently read or written by their respective owners
+ * without risking false sharing.
+ *
+ * An appropriate synchronization mechanism (e.g., atomic loads and
+ * stores) should be employed to prevent data races between the owning
+ * thread and any other thread accessing the same value instance.
+ *
+ * The value of the lcore variable for a particular lcore id is
+ * accessed using @ref RTE_LCORE_VAR_LCORE.
+ *
+ * A common pattern is for an EAL thread or a registered non-EAL
+ * thread to access its own lcore variable value. For this purpose, a
+ * shorthand exists as @ref RTE_LCORE_VAR.
+ *
+ * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
+ * pointer with the same type as the value, it may not be directly
+ * dereferenced and must be treated as an opaque identifier.
+ *
+ * Lcore variable handles and value pointers may be freely passed
+ * between different threads.
+ *
+ * @b Storage
+ *
+ * An lcore variable's values may be of a primitive type like @c int,
+ * but would more typically be a @c struct.
+ *
+ * The lcore variable handle introduces a per-variable (not
+ * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
+ * there are some memory footprint gains to be made by organizing all
+ * per-lcore id data for a particular module as one lcore variable
+ * (e.g., as a struct).
+ *
+ * An application may define an lcore variable handle without ever
+ * allocating it.
+ *
+ * The size of an lcore variable's value must be less than the DPDK
+ * build-time constant @c RTE_MAX_LCORE_VAR.
+ *
+ * Lcore variables are stored in a series of lcore buffers, which are
+ * allocated from the libc heap. Heap allocation failures are treated
+ * as fatal.
+ *
+ * Lcore variables should generally *not* be @ref __rte_cache_aligned
+ * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
+ * of these constructs are designed to avoid false sharing. In the
+ * case of an lcore variable instance, the thread most recently
+ * accessing nearby data structures should almost-always be the lcore
+ * variable's owner. Adding padding will increase the effective memory
+ * working set size, potentially reducing performance.
+ *
+ * Lcore variable values are initialized to zero by default.
+ *
+ * Lcore variables are not stored in huge page memory.
+ *
+ * @b Example
+ *
+ * Below is an example of the use of an lcore variable:
+ *
+ * @code{.c}
+ * struct foo_lcore_state {
+ * int a;
+ * long b;
+ * };
+ *
+ * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
+ *
+ * long foo_get_a_plus_b(void)
+ * {
+ * struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
+ *
+ * return state->a + state->b;
+ * }
+ *
+ * RTE_INIT(rte_foo_init)
+ * {
+ * RTE_LCORE_VAR_ALLOC(lcore_states);
+ *
+ * unsigned int lcore_id;
+ * struct foo_lcore_state *state;
+ * RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
+ * (initialize 'state')
+ * }
+ *
+ * (other initialization)
+ * }
+ * @endcode
+ *
+ *
+ * @b Alternatives
+ *
+ * Lcore variables are designed to replace a pattern exemplified below:
+ * @code{.c}
+ * struct __rte_cache_aligned foo_lcore_state {
+ * int a;
+ * long b;
+ * RTE_CACHE_GUARD;
+ * };
+ *
+ * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
+ * @endcode
+ *
+ * This scheme is simple and effective, but has one drawback: the data
+ * is organized so that objects related to all lcores for a particular
+ * module are kept close in memory. At a bare minimum, this requires
+ * sizing data structures (e.g., using `__rte_cache_aligned`) to an
+ * even number of cache lines to avoid false sharing. With CPU
+ * hardware prefetching and memory loads resulting from speculative
+ * execution (functions which seemingly are getting more eager faster
+ * than they are getting more intelligent), one or more "guard" cache
+ * lines may be required to separate one lcore's data from another's
+ * and prevent false sharing.
+ *
+ * Lcore variables offer the advantage of working with, rather than
+ * against, the CPU's assumptions. A next-line hardware prefetcher,
+ * for example, may function as intended (i.e., to the benefit, not
+ * detriment, of system performance).
+ *
+ * Another alternative to @ref rte_lcore_var.h is the @ref
+ * rte_per_lcore.h API, which makes use of thread-local storage (TLS,
+ * e.g., GCC __thread or C11 _Thread_local). The main differences
+ * between by using the various forms of TLS (e.g., @ref
+ * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
+ * variables are:
+ *
+ * * The lifecycle of a thread-local variable instance is tied to
+ * that of the thread. The data cannot be accessed before the
+ * thread has been created, nor after it has exited. As a result,
+ * thread-local variables must be initialized in a "lazy" manner
+ * (e.g., at the point of thread creation). Lcore variables may be
+ * accessed immediately after having been allocated (which may occur
+ * before any thread beyond the main thread is running).
+ * * A thread-local variable is duplicated across all threads in the
+ * process, including unregistered non-EAL threads (i.e.,
+ * "regular" threads). For DPDK applications heavily relying on
+ * multi-threading (in conjunction to DPDK's "one thread per core"
+ * pattern), either by having many concurrent threads or
+ * creating/destroying threads at a high rate, an excessive use of
+ * thread-local variables may cause inefficiencies (e.g.,
+ * increased thread creation overhead due to thread-local storage
+ * initialization or increased total RAM footprint usage). Lcore
+ * variables *only* exist for threads with an lcore id.
+ * * If data in thread-local storage may be shared between threads
+ * (i.e., can a pointer to a thread-local variable be passed to
+ * and successfully dereferenced by non-owning thread) depends on
+ * the specifics of the TLS implementation. With GCC __thread and
+ * GCC _Thread_local, data sharing between threads is supported.
+ * In the C11 standard, accessing another thread's _Thread_local
+ * object is implementation-defined. Lcore variable instances may
+ * be accessed reliably by any thread.
+ */
+
+#include <stddef.h>
+#include <stdalign.h>
+
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_lcore.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Given the lcore variable type, produces the type of the lcore
+ * variable handle.
+ */
+#define RTE_LCORE_VAR_HANDLE_TYPE(type) \
+ type *
+
+/**
+ * Define an lcore variable handle.
+ *
+ * This macro defines a variable which is used as a handle to access
+ * the various instances of a per-lcore id variable.
+ *
+ * This macro clarifies that the declaration is an lcore handle, not a
+ * regular pointer.
+ *
+ * Add @b static as a prefix in case the lcore variable is only to be
+ * accessed from a particular translation unit.
+ */
+#define RTE_LCORE_VAR_HANDLE(type, name) \
+ RTE_LCORE_VAR_HANDLE_TYPE(type) name
+
+/**
+ * Allocate space for an lcore variable, and initialize its handle.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, align) \
+ handle = rte_lcore_var_alloc(size, align)
+
+/**
+ * Allocate space for an lcore variable, and initialize its handle,
+ * with values aligned for any type of object.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_ALLOC_SIZE(handle, size) \
+ RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, 0)
+
+/**
+ * Allocate space for an lcore variable of the size and alignment requirements
+ * suggested by the handle pointer type, and initialize its handle.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_ALLOC(handle) \
+ RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, sizeof(*(handle)), \
+ alignof(typeof(*(handle))))
+
+/**
+ * Allocate an explicitly-sized, explicitly-aligned lcore variable by
+ * means of a @ref RTE_INIT constructor.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, align) \
+ RTE_INIT(rte_lcore_var_init_ ## name) \
+ { \
+ RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(name, size, align); \
+ }
+
+/**
+ * Allocate an explicitly-sized lcore variable by means of a @ref
+ * RTE_INIT constructor.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_INIT_SIZE(name, size) \
+ RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, 0)
+
+/**
+ * Allocate an lcore variable by means of a @ref RTE_INIT constructor.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_INIT(name) \
+ RTE_INIT(rte_lcore_var_init_ ## name) \
+ { \
+ RTE_LCORE_VAR_ALLOC(name); \
+ }
+
+/**
+ * Get void pointer to lcore variable instance with the specified
+ * lcore id.
+ *
+ * @param lcore_id
+ * The lcore id specifying which of the @c RTE_MAX_LCORE value
+ * instances should be accessed. The lcore id need not be valid
+ * (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
+ * is also not valid (and thus should not be dereferenced).
+ * @param handle
+ * The lcore variable handle.
+ */
+static inline void *
+rte_lcore_var_lcore(unsigned int lcore_id, void *handle)
+{
+ return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR);
+}
+
+/**
+ * Get pointer to lcore variable instance with the specified lcore id.
+ *
+ * @param lcore_id
+ * The lcore id specifying which of the @c RTE_MAX_LCORE value
+ * instances should be accessed. The lcore id need not be valid
+ * (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
+ * is also not valid (and thus should not be dereferenced).
+ * @param handle
+ * The lcore variable handle.
+ */
+#define RTE_LCORE_VAR_LCORE(lcore_id, handle) \
+ ((typeof(handle))rte_lcore_var_lcore(lcore_id, handle))
+
+/**
+ * Get pointer to lcore variable instance of the current thread.
+ *
+ * May only be used by EAL threads and registered non-EAL threads.
+ */
+#define RTE_LCORE_VAR(handle) \
+ RTE_LCORE_VAR_LCORE(rte_lcore_id(), handle)
+
+/**
+ * Iterate over each lcore id's value for an lcore variable.
+ *
+ * @param lcore_id
+ * An <code>unsigned int</code> variable successively set to the
+ * lcore id of every valid lcore id (up to @c RTE_MAX_LCORE).
+ * @param value
+ * A pointer variable successively set to point to lcore variable
+ * value instance of the current lcore id being processed.
+ * @param handle
+ * The lcore variable handle.
+ */
+#define RTE_LCORE_VAR_FOREACH(lcore_id, value, handle) \
+ for ((lcore_id) = \
+ (((value) = RTE_LCORE_VAR_LCORE(0, handle)), 0); \
+ (lcore_id) < RTE_MAX_LCORE; \
+ (lcore_id)++, (value) = RTE_LCORE_VAR_LCORE(lcore_id, \
+ handle))
+
+/**
+ * Allocate space in the per-lcore id buffers for an lcore variable.
+ *
+ * The pointer returned is only an opaque identifier of the variable. To
+ * get an actual pointer to a particular instance of the variable use
+ * @ref RTE_LCORE_VAR or @ref RTE_LCORE_VAR_LCORE.
+ *
+ * The lcore variable values' memory is set to zero.
+ *
+ * The allocation is always successful, barring a fatal exhaustion of
+ * the per-lcore id buffer space.
+ *
+ * rte_lcore_var_alloc() is not multi-thread safe.
+ *
+ * @param size
+ * The size (in bytes) of the variable's per-lcore id value. Must be > 0.
+ * @param align
+ * If 0, the values will be suitably aligned for any kind of type
+ * (i.e., alignof(max_align_t)). Otherwise, the values will be aligned
+ * on a multiple of *align*, which must be a power of 2 and equal or
+ * less than @c RTE_CACHE_LINE_SIZE.
+ * @return
+ * The variable's handle, stored in a void pointer value. The value
+ * is always non-NULL.
+ */
+__rte_experimental
+void *
+rte_lcore_var_alloc(size_t size, size_t align);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_LCORE_VAR_H_ */
@@ -45,6 +45,7 @@
#include <telemetry_internal.h>
#include "eal_private.h"
#include "eal_thread.h"
+#include "eal_lcore_var.h"
#include "eal_internal_cfg.h"
#include "eal_filesystem.h"
#include "eal_hugepages.h"
@@ -1387,6 +1388,7 @@ rte_eal_cleanup(void)
rte_eal_malloc_heap_cleanup();
eal_cleanup_config(internal_conf);
rte_eal_log_cleanup();
+ eal_lcore_var_cleanup();
return 0;
}
@@ -396,6 +396,9 @@ EXPERIMENTAL {
# added in 24.03
rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
+
+ # added in 24.11
+ rte_lcore_var_alloc;
};
INTERNAL {