[v3] doc: add stack mempool guide
Checks
Commit Message
This guide describes the two stack modes, their tradeoffs, and (via a
reference to the mempool guide) how to enable them.
Signed-off-by: Gage Eads <gage.eads@intel.com>
---
v3: Fixed "Title underline too short" warning
v2: Added commit description
doc/guides/mempool/index.rst | 1 +
doc/guides/mempool/stack.rst | 38 +++++++++++++++++++++++++++++++++++
doc/guides/prog_guide/mempool_lib.rst | 2 ++
doc/guides/prog_guide/stack_lib.rst | 4 ++++
4 files changed, 45 insertions(+)
create mode 100644 doc/guides/mempool/stack.rst
Comments
Hi Gage,
On Mon, Sep 14, 2020 at 04:11:53PM -0500, Gage Eads wrote:
> This guide describes the two stack modes, their tradeoffs, and (via a
> reference to the mempool guide) how to enable them.
>
> Signed-off-by: Gage Eads <gage.eads@intel.com>
> ---
> v3: Fixed "Title underline too short" warning
>
> v2: Added commit description
>
> doc/guides/mempool/index.rst | 1 +
> doc/guides/mempool/stack.rst | 38 +++++++++++++++++++++++++++++++++++
> doc/guides/prog_guide/mempool_lib.rst | 2 ++
> doc/guides/prog_guide/stack_lib.rst | 4 ++++
> 4 files changed, 45 insertions(+)
> create mode 100644 doc/guides/mempool/stack.rst
>
> diff --git a/doc/guides/mempool/index.rst b/doc/guides/mempool/index.rst
> index bbd02fd81..a0e55467e 100644
> --- a/doc/guides/mempool/index.rst
> +++ b/doc/guides/mempool/index.rst
> @@ -14,3 +14,4 @@ application through the mempool API.
> octeontx
> octeontx2
> ring
> + stack
> diff --git a/doc/guides/mempool/stack.rst b/doc/guides/mempool/stack.rst
> new file mode 100644
> index 000000000..bdf19cf04
> --- /dev/null
> +++ b/doc/guides/mempool/stack.rst
> @@ -0,0 +1,38 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2020 Intel Corporation.
> +
> +Stack Mempool Driver
> +====================
> +
> +**rte_mempool_stack** is a pure software mempool driver based on the
> +``rte_stack`` DPDK library. A stack-based mempool is often better suited to
> +packet-processing workloads than a ring-based mempool, since its LIFO behavior
> +results in better temporal locality and a minimal memory footprint even if the
> +mempool is over-provisioned.
Would it make sense to give an example of a use-case where the stack
driver should be used in place of the standard ring-based one?
In most run-to-completion applications, the mbufs stay in per-core
caches, so changing the mempool driver won't have a big impact. However,
I suspect that for applications using a pipeline model (ex: rx on core0,
tx on core1), the stack model would be more efficient. Is it something
that you measured? If yes, it would be useful to explain this in the
documentation.
> +
> +The following modes of operation are available for the stack mempool driver and
> +can be selected as described in :ref:`Mempool_Handlers`:
> +
> +- ``stack``
> +
> + The underlying **rte_stack** operates in standard (lock-based) mode.
> + For more information please refer to :ref:`Stack_Library_Std_Stack`.
> +
> +- ``lf_stack``
> +
> + The underlying **rte_stack** operates in lock-free mode. For more
> + information please refer to :ref:`Stack_Library_LF_Stack`.
> +
> +The standard stack outperforms the lock-free stack on average, however the
> +standard stack is non-preemptive: if a mempool user is preempted while holding
> +the stack lock, that thread will block all other mempool accesses until it
> +returns and releases the lock. As a result, an application using the standard
> +stack whose threads can be preempted can suffer from brief, infrequent
> +performance hiccups.
> +
> +The lock-free stack, by design, is not susceptible to this problem; one thread can
> +be preempted at any point during a push or pop operation and will not impede
> +the progress of any other thread.
> +
> +For a more detailed description of the stack implementations, please refer to
> +:doc:`../prog_guide/stack_lib`.
> diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
> index e3e1f940b..6f3c0067f 100644
> --- a/doc/guides/prog_guide/mempool_lib.rst
> +++ b/doc/guides/prog_guide/mempool_lib.rst
> @@ -105,6 +105,8 @@ These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
> The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
> In contrast to the default caches, user-owned caches can be used by unregistered non-EAL threads too.
>
> +.. _Mempool_Handlers:
> +
> Mempool Handlers
> ------------------------
>
> diff --git a/doc/guides/prog_guide/stack_lib.rst b/doc/guides/prog_guide/stack_lib.rst
> index 8fe8804e3..3097cab0c 100644
> --- a/doc/guides/prog_guide/stack_lib.rst
> +++ b/doc/guides/prog_guide/stack_lib.rst
> @@ -28,6 +28,8 @@ Implementation
> The library supports two types of stacks: standard (lock-based) and lock-free.
> Both types use the same set of interfaces, but their implementations differ.
>
> +.. _Stack_Library_Std_Stack:
> +
> Lock-based Stack
> ----------------
>
> @@ -35,6 +37,8 @@ The lock-based stack consists of a contiguous array of pointers, a current
> index, and a spinlock. Accesses to the stack are made multi-thread safe by the
> spinlock.
>
> +.. _Stack_Library_LF_Stack:
> +
> Lock-free Stack
> ------------------
>
> --
> 2.13.6
>
Hi Olivier,
<snip>
> > +Stack Mempool Driver
> > +====================
> > +
> > +**rte_mempool_stack** is a pure software mempool driver based on the
> > +``rte_stack`` DPDK library. A stack-based mempool is often better suited to
> > +packet-processing workloads than a ring-based mempool, since its LIFO
> behavior
> > +results in better temporal locality and a minimal memory footprint even if the
> > +mempool is over-provisioned.
>
> Would it make sense to give an example of a use-case where the stack
> driver should be used in place of the standard ring-based one?
>
> In most run-to-completion applications, the mbufs stay in per-core
> caches, so changing the mempool driver won't have a big impact. However,
> I suspect that for applications using a pipeline model (ex: rx on core0,
> tx on core1), the stack model would be more efficient. Is it something
> that you measured? If yes, it would be useful to explain this in the
> documentation.
>
Good point, I was overlooking the impact of the per-core caches. I've seen data showing
better overall packet throughput with the stack mempool, and indeed that was a pipelined
application. How about this re-write?
"
**rte_mempool_stack** is a pure software mempool driver based on the
``rte_stack`` DPDK library. For run-to-completion workloads with sufficiently
large per-lcore caches, the mbufs will likely stay in the per-lcore caches and the
mempool type (ring, stack, etc.) will have a negligible impact on performance. However
a stack-based mempool is often better suited to pipelined packet-processing workloads
(which allocate and free mbufs on different lcores) than a ring-based mempool, since its
LIFO behavior results in better temporal locality and a minimal memory footprint even
if the mempool is over-provisioned. Users are encouraged to benchmark with multiple
mempool types to determine which works best for their specific application.
"
Thanks,
Gage
Hi Gage,
On Mon, Sep 21, 2020 at 03:42:28PM +0000, Eads, Gage wrote:
> Hi Olivier,
>
> <snip>
>
> > > +Stack Mempool Driver
> > > +====================
> > > +
> > > +**rte_mempool_stack** is a pure software mempool driver based on the
> > > +``rte_stack`` DPDK library. A stack-based mempool is often better suited to
> > > +packet-processing workloads than a ring-based mempool, since its LIFO
> > behavior
> > > +results in better temporal locality and a minimal memory footprint even if the
> > > +mempool is over-provisioned.
> >
> > Would it make sense to give an example of a use-case where the stack
> > driver should be used in place of the standard ring-based one?
> >
> > In most run-to-completion applications, the mbufs stay in per-core
> > caches, so changing the mempool driver won't have a big impact. However,
> > I suspect that for applications using a pipeline model (ex: rx on core0,
> > tx on core1), the stack model would be more efficient. Is it something
> > that you measured? If yes, it would be useful to explain this in the
> > documentation.
> >
>
> Good point, I was overlooking the impact of the per-core caches. I've seen data showing
> better overall packet throughput with the stack mempool, and indeed that was a pipelined
> application. How about this re-write?
>
> "
> **rte_mempool_stack** is a pure software mempool driver based on the
> ``rte_stack`` DPDK library. For run-to-completion workloads with sufficiently
> large per-lcore caches, the mbufs will likely stay in the per-lcore caches and the
> mempool type (ring, stack, etc.) will have a negligible impact on performance. However
> a stack-based mempool is often better suited to pipelined packet-processing workloads
> (which allocate and free mbufs on different lcores) than a ring-based mempool, since its
> LIFO behavior results in better temporal locality and a minimal memory footprint even
> if the mempool is over-provisioned. Users are encouraged to benchmark with multiple
> mempool types to determine which works best for their specific application.
> "
Yes, this is clear, thanks!
Olivier
@@ -14,3 +14,4 @@ application through the mempool API.
octeontx
octeontx2
ring
+ stack
new file mode 100644
@@ -0,0 +1,38 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2020 Intel Corporation.
+
+Stack Mempool Driver
+====================
+
+**rte_mempool_stack** is a pure software mempool driver based on the
+``rte_stack`` DPDK library. A stack-based mempool is often better suited to
+packet-processing workloads than a ring-based mempool, since its LIFO behavior
+results in better temporal locality and a minimal memory footprint even if the
+mempool is over-provisioned.
+
+The following modes of operation are available for the stack mempool driver and
+can be selected as described in :ref:`Mempool_Handlers`:
+
+- ``stack``
+
+ The underlying **rte_stack** operates in standard (lock-based) mode.
+ For more information please refer to :ref:`Stack_Library_Std_Stack`.
+
+- ``lf_stack``
+
+ The underlying **rte_stack** operates in lock-free mode. For more
+ information please refer to :ref:`Stack_Library_LF_Stack`.
+
+The standard stack outperforms the lock-free stack on average, however the
+standard stack is non-preemptive: if a mempool user is preempted while holding
+the stack lock, that thread will block all other mempool accesses until it
+returns and releases the lock. As a result, an application using the standard
+stack whose threads can be preempted can suffer from brief, infrequent
+performance hiccups.
+
+The lock-free stack, by design, is not susceptible to this problem; one thread can
+be preempted at any point during a push or pop operation and will not impede
+the progress of any other thread.
+
+For a more detailed description of the stack implementations, please refer to
+:doc:`../prog_guide/stack_lib`.
@@ -105,6 +105,8 @@ These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
In contrast to the default caches, user-owned caches can be used by unregistered non-EAL threads too.
+.. _Mempool_Handlers:
+
Mempool Handlers
------------------------
@@ -28,6 +28,8 @@ Implementation
The library supports two types of stacks: standard (lock-based) and lock-free.
Both types use the same set of interfaces, but their implementations differ.
+.. _Stack_Library_Std_Stack:
+
Lock-based Stack
----------------
@@ -35,6 +37,8 @@ The lock-based stack consists of a contiguous array of pointers, a current
index, and a spinlock. Accesses to the stack are made multi-thread safe by the
spinlock.
+.. _Stack_Library_LF_Stack:
+
Lock-free Stack
------------------