[v3] doc: add stack mempool guide
diff mbox series

Message ID 20200914211153.6725-1-gage.eads@intel.com
State Superseded
Delegated to: David Marchand
Headers show
Series
  • [v3] doc: add stack mempool guide
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed
ci/checkpatch success coding style OK

Commit Message

Eads, Gage Sept. 14, 2020, 9:11 p.m. UTC
This guide describes the two stack modes, their tradeoffs, and (via a
reference to the mempool guide) how to enable them.

Signed-off-by: Gage Eads <gage.eads@intel.com>
---
v3: Fixed "Title underline too short" warning

v2: Added commit description

 doc/guides/mempool/index.rst          |  1 +
 doc/guides/mempool/stack.rst          | 38 +++++++++++++++++++++++++++++++++++
 doc/guides/prog_guide/mempool_lib.rst |  2 ++
 doc/guides/prog_guide/stack_lib.rst   |  4 ++++
 4 files changed, 45 insertions(+)
 create mode 100644 doc/guides/mempool/stack.rst

Comments

Olivier Matz Sept. 17, 2020, 2:25 p.m. UTC | #1
Hi Gage,

On Mon, Sep 14, 2020 at 04:11:53PM -0500, Gage Eads wrote:
> This guide describes the two stack modes, their tradeoffs, and (via a
> reference to the mempool guide) how to enable them.
> 
> Signed-off-by: Gage Eads <gage.eads@intel.com>
> ---
> v3: Fixed "Title underline too short" warning
> 
> v2: Added commit description
> 
>  doc/guides/mempool/index.rst          |  1 +
>  doc/guides/mempool/stack.rst          | 38 +++++++++++++++++++++++++++++++++++
>  doc/guides/prog_guide/mempool_lib.rst |  2 ++
>  doc/guides/prog_guide/stack_lib.rst   |  4 ++++
>  4 files changed, 45 insertions(+)
>  create mode 100644 doc/guides/mempool/stack.rst
> 
> diff --git a/doc/guides/mempool/index.rst b/doc/guides/mempool/index.rst
> index bbd02fd81..a0e55467e 100644
> --- a/doc/guides/mempool/index.rst
> +++ b/doc/guides/mempool/index.rst
> @@ -14,3 +14,4 @@ application through the mempool API.
>      octeontx
>      octeontx2
>      ring
> +    stack
> diff --git a/doc/guides/mempool/stack.rst b/doc/guides/mempool/stack.rst
> new file mode 100644
> index 000000000..bdf19cf04
> --- /dev/null
> +++ b/doc/guides/mempool/stack.rst
> @@ -0,0 +1,38 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2020 Intel Corporation.
> +
> +Stack Mempool Driver
> +====================
> +
> +**rte_mempool_stack** is a pure software mempool driver based on the
> +``rte_stack`` DPDK library. A stack-based mempool is often better suited to
> +packet-processing workloads than a ring-based mempool, since its LIFO behavior
> +results in better temporal locality and a minimal memory footprint even if the
> +mempool is over-provisioned.

Would it make sense to give an example of a use-case where the stack
driver should be used in place of the standard ring-based one?

In most run-to-completion applications, the mbufs stay in per-core
caches, so changing the mempool driver won't have a big impact. However,
I suspect that for applications using a pipeline model (ex: rx on core0,
tx on core1), the stack model would be more efficient. Is it something
that you measured? If yes, it would be useful to explain this in the
documentation.


> +
> +The following modes of operation are available for the stack mempool driver and
> +can be selected as described in :ref:`Mempool_Handlers`:
> +
> +- ``stack``
> +
> +  The underlying **rte_stack** operates in standard (lock-based) mode.
> +  For more information please refer to :ref:`Stack_Library_Std_Stack`.
> +
> +- ``lf_stack``
> +
> +  The underlying **rte_stack** operates in lock-free mode. For more
> +  information please refer to :ref:`Stack_Library_LF_Stack`.
> +
> +The standard stack outperforms the lock-free stack on average, however the
> +standard stack is non-preemptive: if a mempool user is preempted while holding
> +the stack lock, that thread will block all other mempool accesses until it
> +returns and releases the lock. As a result, an application using the standard
> +stack whose threads can be preempted can suffer from brief, infrequent
> +performance hiccups.
> +
> +The lock-free stack, by design, is not susceptible to this problem; one thread can
> +be preempted at any point during a push or pop operation and will not impede
> +the progress of any other thread.
> +
> +For a more detailed description of the stack implementations, please refer to
> +:doc:`../prog_guide/stack_lib`.
> diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
> index e3e1f940b..6f3c0067f 100644
> --- a/doc/guides/prog_guide/mempool_lib.rst
> +++ b/doc/guides/prog_guide/mempool_lib.rst
> @@ -105,6 +105,8 @@ These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
>  The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
>  In contrast to the default caches, user-owned caches can be used by unregistered non-EAL threads too.
>  
> +.. _Mempool_Handlers:
> +
>  Mempool Handlers
>  ------------------------
>  
> diff --git a/doc/guides/prog_guide/stack_lib.rst b/doc/guides/prog_guide/stack_lib.rst
> index 8fe8804e3..3097cab0c 100644
> --- a/doc/guides/prog_guide/stack_lib.rst
> +++ b/doc/guides/prog_guide/stack_lib.rst
> @@ -28,6 +28,8 @@ Implementation
>  The library supports two types of stacks: standard (lock-based) and lock-free.
>  Both types use the same set of interfaces, but their implementations differ.
>  
> +.. _Stack_Library_Std_Stack:
> +
>  Lock-based Stack
>  ----------------
>  
> @@ -35,6 +37,8 @@ The lock-based stack consists of a contiguous array of pointers, a current
>  index, and a spinlock. Accesses to the stack are made multi-thread safe by the
>  spinlock.
>  
> +.. _Stack_Library_LF_Stack:
> +
>  Lock-free Stack
>  ------------------
>  
> -- 
> 2.13.6
>
Eads, Gage Sept. 21, 2020, 3:42 p.m. UTC | #2
Hi Olivier,

<snip>

> > +Stack Mempool Driver
> > +====================
> > +
> > +**rte_mempool_stack** is a pure software mempool driver based on the
> > +``rte_stack`` DPDK library. A stack-based mempool is often better suited to
> > +packet-processing workloads than a ring-based mempool, since its LIFO
> behavior
> > +results in better temporal locality and a minimal memory footprint even if the
> > +mempool is over-provisioned.
> 
> Would it make sense to give an example of a use-case where the stack
> driver should be used in place of the standard ring-based one?
> 
> In most run-to-completion applications, the mbufs stay in per-core
> caches, so changing the mempool driver won't have a big impact. However,
> I suspect that for applications using a pipeline model (ex: rx on core0,
> tx on core1), the stack model would be more efficient. Is it something
> that you measured? If yes, it would be useful to explain this in the
> documentation.
> 

Good point, I was overlooking the impact of the per-core caches. I've seen data showing
better overall packet throughput with the stack mempool, and indeed that was a pipelined
application. How about this re-write?

"
**rte_mempool_stack** is a pure software mempool driver based on the
``rte_stack`` DPDK library. For run-to-completion workloads with sufficiently
large per-lcore caches, the mbufs will likely stay in the per-lcore caches and the
mempool type (ring, stack, etc.) will have a negligible impact on performance. However
a stack-based mempool is often better suited to pipelined packet-processing workloads
(which allocate and free mbufs on different lcores) than a ring-based mempool, since its
LIFO behavior results in better temporal locality and a minimal memory footprint even
if the mempool is over-provisioned. Users are encouraged to benchmark with multiple
mempool types to determine which works best for their specific application.
"

Thanks,
Gage
Olivier Matz Oct. 7, 2020, 8:43 a.m. UTC | #3
Hi Gage,

On Mon, Sep 21, 2020 at 03:42:28PM +0000, Eads, Gage wrote:
> Hi Olivier,
> 
> <snip>
> 
> > > +Stack Mempool Driver
> > > +====================
> > > +
> > > +**rte_mempool_stack** is a pure software mempool driver based on the
> > > +``rte_stack`` DPDK library. A stack-based mempool is often better suited to
> > > +packet-processing workloads than a ring-based mempool, since its LIFO
> > behavior
> > > +results in better temporal locality and a minimal memory footprint even if the
> > > +mempool is over-provisioned.
> > 
> > Would it make sense to give an example of a use-case where the stack
> > driver should be used in place of the standard ring-based one?
> > 
> > In most run-to-completion applications, the mbufs stay in per-core
> > caches, so changing the mempool driver won't have a big impact. However,
> > I suspect that for applications using a pipeline model (ex: rx on core0,
> > tx on core1), the stack model would be more efficient. Is it something
> > that you measured? If yes, it would be useful to explain this in the
> > documentation.
> > 
> 
> Good point, I was overlooking the impact of the per-core caches. I've seen data showing
> better overall packet throughput with the stack mempool, and indeed that was a pipelined
> application. How about this re-write?
> 
> "
> **rte_mempool_stack** is a pure software mempool driver based on the
> ``rte_stack`` DPDK library. For run-to-completion workloads with sufficiently
> large per-lcore caches, the mbufs will likely stay in the per-lcore caches and the
> mempool type (ring, stack, etc.) will have a negligible impact on performance. However
> a stack-based mempool is often better suited to pipelined packet-processing workloads
> (which allocate and free mbufs on different lcores) than a ring-based mempool, since its
> LIFO behavior results in better temporal locality and a minimal memory footprint even
> if the mempool is over-provisioned. Users are encouraged to benchmark with multiple
> mempool types to determine which works best for their specific application.
> "

Yes, this is clear, thanks!

Olivier

Patch
diff mbox series

diff --git a/doc/guides/mempool/index.rst b/doc/guides/mempool/index.rst
index bbd02fd81..a0e55467e 100644
--- a/doc/guides/mempool/index.rst
+++ b/doc/guides/mempool/index.rst
@@ -14,3 +14,4 @@  application through the mempool API.
     octeontx
     octeontx2
     ring
+    stack
diff --git a/doc/guides/mempool/stack.rst b/doc/guides/mempool/stack.rst
new file mode 100644
index 000000000..bdf19cf04
--- /dev/null
+++ b/doc/guides/mempool/stack.rst
@@ -0,0 +1,38 @@ 
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Intel Corporation.
+
+Stack Mempool Driver
+====================
+
+**rte_mempool_stack** is a pure software mempool driver based on the
+``rte_stack`` DPDK library. A stack-based mempool is often better suited to
+packet-processing workloads than a ring-based mempool, since its LIFO behavior
+results in better temporal locality and a minimal memory footprint even if the
+mempool is over-provisioned.
+
+The following modes of operation are available for the stack mempool driver and
+can be selected as described in :ref:`Mempool_Handlers`:
+
+- ``stack``
+
+  The underlying **rte_stack** operates in standard (lock-based) mode.
+  For more information please refer to :ref:`Stack_Library_Std_Stack`.
+
+- ``lf_stack``
+
+  The underlying **rte_stack** operates in lock-free mode. For more
+  information please refer to :ref:`Stack_Library_LF_Stack`.
+
+The standard stack outperforms the lock-free stack on average, however the
+standard stack is non-preemptive: if a mempool user is preempted while holding
+the stack lock, that thread will block all other mempool accesses until it
+returns and releases the lock. As a result, an application using the standard
+stack whose threads can be preempted can suffer from brief, infrequent
+performance hiccups.
+
+The lock-free stack, by design, is not susceptible to this problem; one thread can
+be preempted at any point during a push or pop operation and will not impede
+the progress of any other thread.
+
+For a more detailed description of the stack implementations, please refer to
+:doc:`../prog_guide/stack_lib`.
diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
index e3e1f940b..6f3c0067f 100644
--- a/doc/guides/prog_guide/mempool_lib.rst
+++ b/doc/guides/prog_guide/mempool_lib.rst
@@ -105,6 +105,8 @@  These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
 The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
 In contrast to the default caches, user-owned caches can be used by unregistered non-EAL threads too.
 
+.. _Mempool_Handlers:
+
 Mempool Handlers
 ------------------------
 
diff --git a/doc/guides/prog_guide/stack_lib.rst b/doc/guides/prog_guide/stack_lib.rst
index 8fe8804e3..3097cab0c 100644
--- a/doc/guides/prog_guide/stack_lib.rst
+++ b/doc/guides/prog_guide/stack_lib.rst
@@ -28,6 +28,8 @@  Implementation
 The library supports two types of stacks: standard (lock-based) and lock-free.
 Both types use the same set of interfaces, but their implementations differ.
 
+.. _Stack_Library_Std_Stack:
+
 Lock-based Stack
 ----------------
 
@@ -35,6 +37,8 @@  The lock-based stack consists of a contiguous array of pointers, a current
 index, and a spinlock. Accesses to the stack are made multi-thread safe by the
 spinlock.
 
+.. _Stack_Library_LF_Stack:
+
 Lock-free Stack
 ------------------