[7/7] mempool/stack: add non-blocking stack mempool handler
Checks
Commit Message
This commit adds support for non-blocking (linked list based) stack mempool
handler.
In mempool_perf_autotest the lock-based stack outperforms the
non-blocking handler for certain lcore/alloc count/free count
combinations*, however:
- For applications with preemptible pthreads, a lock-based stack's
worst-case performance (i.e. one thread being preempted while
holding the spinlock) is much worse than the non-blocking stack's.
- Using per-thread mempool caches will largely mitigate the performance
difference.
*Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4,
running on isolcpus cores with a tickless scheduler. The lock-based stack's
rate_persec was 0.6x-3.5x the non-blocking stack's.
Signed-off-by: Gage Eads <gage.eads@intel.com>
---
doc/guides/prog_guide/env_abstraction_layer.rst | 5 +++++
doc/guides/rel_notes/release_19_05.rst | 5 +++++
drivers/mempool/stack/rte_mempool_stack.c | 26 +++++++++++++++++++++++--
3 files changed, 34 insertions(+), 2 deletions(-)
Comments
On Fri, Feb 22, 2019 at 10:06:55AM -0600, Gage Eads wrote:
> This commit adds support for non-blocking (linked list based) stack mempool
> handler.
>
> In mempool_perf_autotest the lock-based stack outperforms the
> non-blocking handler for certain lcore/alloc count/free count
> combinations*, however:
> - For applications with preemptible pthreads, a lock-based stack's
> worst-case performance (i.e. one thread being preempted while
> holding the spinlock) is much worse than the non-blocking stack's.
> - Using per-thread mempool caches will largely mitigate the performance
> difference.
>
> *Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4,
> running on isolcpus cores with a tickless scheduler. The lock-based stack's
> rate_persec was 0.6x-3.5x the non-blocking stack's.
>
> Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
@@ -541,6 +541,11 @@ Known Issues
5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+ Alternatively, applications can use the non-blocking stack mempool handler. When considering this handler, note that:
+
+ - it is currently limited to the x86_64 platform, because it uses an instruction (16-byte compare-and-swap) that is not yet available on other platforms.
+ - it has worse average-case performance than the non-preemptive rte_ring, but software caching (e.g. the mempool cache) can mitigate this by reducing the number of stack accesses.
+
+ rte_timer
Running ``rte_timer_manage()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
@@ -74,6 +74,11 @@ New Features
The library supports two stack implementations: lock-based and non-blocking.
The non-blocking implementation is currently limited to x86-64 platforms.
+* **Added Non-blocking Stack Mempool Handler.**
+
+ Added a new non-blocking stack handler, which uses the newly added stack
+ library.
+
Removed Items
-------------
@@ -7,7 +7,7 @@
#include <rte_stack.h>
static int
-stack_alloc(struct rte_mempool *mp)
+__stack_alloc(struct rte_mempool *mp, uint32_t flags)
{
char name[RTE_STACK_NAMESIZE];
struct rte_stack *s;
@@ -20,7 +20,7 @@ stack_alloc(struct rte_mempool *mp)
return -rte_errno;
}
- s = rte_stack_create(name, mp->size, mp->socket_id, 0);
+ s = rte_stack_create(name, mp->size, mp->socket_id, flags);
if (s == NULL)
return -rte_errno;
@@ -30,6 +30,18 @@ stack_alloc(struct rte_mempool *mp)
}
static int
+stack_alloc(struct rte_mempool *mp)
+{
+ return __stack_alloc(mp, 0);
+}
+
+static int
+nb_stack_alloc(struct rte_mempool *mp)
+{
+ return __stack_alloc(mp, STACK_F_NB);
+}
+
+static int
stack_enqueue(struct rte_mempool *mp, void * const *obj_table,
unsigned int n)
{
@@ -72,4 +84,14 @@ static struct rte_mempool_ops ops_stack = {
.get_count = stack_get_count
};
+static struct rte_mempool_ops ops_nb_stack = {
+ .name = "nb_stack",
+ .alloc = nb_stack_alloc,
+ .free = stack_free,
+ .enqueue = stack_enqueue,
+ .dequeue = stack_dequeue,
+ .get_count = stack_get_count
+};
+
MEMPOOL_REGISTER_OPS(ops_stack);
+MEMPOOL_REGISTER_OPS(ops_nb_stack);