[v6,10/10] doc: update ring guide
Checks
Commit Message
Changed the rte_ring chapter in programmer's guide to reflect
the addition of new sync modes and peek style API.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
Comments
On Mon, Apr 20, 2020 at 2:12 PM Konstantin Ananyev
<konstantin.ananyev@intel.com> wrote:
>
> Changed the rte_ring chapter in programmer's guide to reflect
> the addition of new sync modes and peek style API.
I'd like to split this as follows, see below.
I have a couple of typos too.
If you are fine with it, I'll proceed and squash when merging.
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
> 1 file changed, 95 insertions(+)
>
> diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
> index 8cb2b2dd4..668e67ecb 100644
> --- a/doc/guides/prog_guide/ring_lib.rst
> +++ b/doc/guides/prog_guide/ring_lib.rst
> @@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
> uint32_t entries = (prod_tail - cons_head);
> uint32_t free_entries = (mask + cons_tail -prod_head);
>
From here, this first part would go to patch2 "ring: prepare ring to
allow new sync schemes".
> +Producer/consumer synchronization modes
> +---------------------------------------
> +
> +rte_ring supports different synchronization modes for porducer and consumers.
producers*
> +These modes can be specified at ring creation/init time via ``flags`` parameter.
> +That should help user to configure ring in way most suitable for his
double space to remove.
users?
> +specific usage scenarios.
> +Currently supported modes:
> +
> +MP/MC (default one)
> +~~~~~~~~~~~~~~~~~~~
> +
> +Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
> +mode for the ring. In this mode multiple threads can enqueue (/dequeue)
> +objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
> +per core) this is usually most suitable and fastest synchronization mode.
the most*
> +As a well known limitaion - it can perform quite pure on some overcommitted
limitation*
> +scenarios.
> +
> +SP/SC
> +~~~~~
> +Single-producer (/single-consumer) mode. In this mode only one thread at a time
> +is allowed to enqueue (/dequeue) objects to (/from) the ring.
End of first part.
Then the second part that would go to patch3 "ring: introduce RTS ring mode".
> +
> +MP_RTS/MC_RTS
> +~~~~~~~~~~~~~
> +
> +Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
> +The main difference from original MP/MC algorithm is that
from the original*
> +tail value is increased not by every thread that finished enqueue/dequeue,
> +but only by the last one.
> +That allows threads to avoid spinning on ring tail value,
> +leaving actual tail value change to the last thread at a given instance.
> +That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail
the Lock-Waiter-Preemption*
> +update and improves average enqueue/dequeue times on overcommitted systems.
> +To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> +one for head update, second for tail update.
> +In comparison original MP/MC algorithm requires one 32-bit CAS
the original*
> +for head update and waiting/spinning on tail value.
> +
End of second part.
Third part that would go to patch 5 "ring: introduce HTS ring mode".
> +MP_HTS/MC_HTS
> +~~~~~~~~~~~~~
> +
> +Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
> +In that mode enqueue/dequeue operation is fully serialized:
> +at any given moment only one enqueue/dequeue operation can proceed.
> +This is achieved by allowing a thread to proceed with changing ``head.value``
> +only when ``head.value == tail.value``.
> +Both head and tail values are updated atomically (as one 64-bit value).
> +To achieve that 64-bit CAS is used by head update routine.
> +That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail
the Lock-Waiter-Preemption*
> +update and helps to improve ring enqueue/dequeue behavior in overcommitted
> +scenarios. Another advantage of fully serialized producer/consumer -
> +it provides ability to implement MT safe peek API for rte_ring.
it provides the ability*
End of 3rd part.
Last part would go to patch 7 "ring: introduce peek style API".
> +
> +
> +Ring Peek API
> +-------------
> +
> +For ring with serialized producer/consumer (HTS sync mode) it is possible
double space.
> +to split public enqueue/dequeue API into two phases:
> +
> +* enqueue/dequeue start
> +
> +* enqueue/dequeue finish
> +
> +That allows user to inspect objects in the ring without removing them
> +from it (aka MT safe peek) and reserve space for the objects in the ring
> +before actual enqueue.
> +Note that this API is available only for two sync modes:
> +
> +* Single Producer/Single Consumer (SP/SC)
> +
> +* Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
> +
> +It is a user responsibility to create/init ring with appropriate sync modes
> +selected. As an example of usage:
> +
> +.. code-block:: c
> +
> + /* read 1 elem from the ring: */
> + uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
> + if (n != 0) {
> + /* examine object */
> + if (object_examine(obj) == KEEP)
> + /* decided to keep it in the ring. */
> + rte_ring_dequeue_finish(ring, 0);
> + else
> + /* decided to remove it from the ring. */
> + rte_ring_dequeue_finish(ring, n);
> + }
> +
> +Note that between ``_start_`` and ``_finish_`` none other thread can proceed
> +with enqueue(/dequeue) operation till ``_finish_`` completes.
> +
>
> On Mon, Apr 20, 2020 at 2:12 PM Konstantin Ananyev
> <konstantin.ananyev@intel.com> wrote:
> >
> > Changed the rte_ring chapter in programmer's guide to reflect
> > the addition of new sync modes and peek style API.
>
> I'd like to split this as follows, see below.
> I have a couple of typos too.
>
>
> If you are fine with it, I'll proceed and squash when merging.
Yes, I am.
Thanks
Konstantin
>
>
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> > doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
> > 1 file changed, 95 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
> > index 8cb2b2dd4..668e67ecb 100644
> > --- a/doc/guides/prog_guide/ring_lib.rst
> > +++ b/doc/guides/prog_guide/ring_lib.rst
> > @@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
> > uint32_t entries = (prod_tail - cons_head);
> > uint32_t free_entries = (mask + cons_tail -prod_head);
> >
>
> From here, this first part would go to patch2 "ring: prepare ring to
> allow new sync schemes".
>
> > +Producer/consumer synchronization modes
> > +---------------------------------------
> > +
> > +rte_ring supports different synchronization modes for porducer and consumers.
>
> producers*
>
> > +These modes can be specified at ring creation/init time via ``flags`` parameter.
> > +That should help user to configure ring in way most suitable for his
>
> double space to remove.
> users?
>
>
> > +specific usage scenarios.
> > +Currently supported modes:
> > +
> > +MP/MC (default one)
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
> > +mode for the ring. In this mode multiple threads can enqueue (/dequeue)
> > +objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
> > +per core) this is usually most suitable and fastest synchronization mode.
>
> the most*
>
> > +As a well known limitaion - it can perform quite pure on some overcommitted
>
> limitation*
>
> > +scenarios.
> > +
> > +SP/SC
> > +~~~~~
> > +Single-producer (/single-consumer) mode. In this mode only one thread at a time
> > +is allowed to enqueue (/dequeue) objects to (/from) the ring.
>
> End of first part.
>
> Then the second part that would go to patch3 "ring: introduce RTS ring mode".
>
> > +
> > +MP_RTS/MC_RTS
> > +~~~~~~~~~~~~~
> > +
> > +Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
> > +The main difference from original MP/MC algorithm is that
>
> from the original*
>
> > +tail value is increased not by every thread that finished enqueue/dequeue,
> > +but only by the last one.
> > +That allows threads to avoid spinning on ring tail value,
> > +leaving actual tail value change to the last thread at a given instance.
> > +That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail
>
> the Lock-Waiter-Preemption*
>
> > +update and improves average enqueue/dequeue times on overcommitted systems.
> > +To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> > +one for head update, second for tail update.
> > +In comparison original MP/MC algorithm requires one 32-bit CAS
>
> the original*
>
> > +for head update and waiting/spinning on tail value.
> > +
>
> End of second part.
>
> Third part that would go to patch 5 "ring: introduce HTS ring mode".
>
>
> > +MP_HTS/MC_HTS
> > +~~~~~~~~~~~~~
> > +
> > +Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
> > +In that mode enqueue/dequeue operation is fully serialized:
> > +at any given moment only one enqueue/dequeue operation can proceed.
> > +This is achieved by allowing a thread to proceed with changing ``head.value``
> > +only when ``head.value == tail.value``.
> > +Both head and tail values are updated atomically (as one 64-bit value).
> > +To achieve that 64-bit CAS is used by head update routine.
> > +That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail
>
> the Lock-Waiter-Preemption*
>
>
> > +update and helps to improve ring enqueue/dequeue behavior in overcommitted
> > +scenarios. Another advantage of fully serialized producer/consumer -
> > +it provides ability to implement MT safe peek API for rte_ring.
>
> it provides the ability*
>
> End of 3rd part.
>
> Last part would go to patch 7 "ring: introduce peek style API".
>
> > +
> > +
> > +Ring Peek API
> > +-------------
> > +
> > +For ring with serialized producer/consumer (HTS sync mode) it is possible
>
> double space.
>
> > +to split public enqueue/dequeue API into two phases:
> > +
> > +* enqueue/dequeue start
> > +
> > +* enqueue/dequeue finish
> > +
> > +That allows user to inspect objects in the ring without removing them
> > +from it (aka MT safe peek) and reserve space for the objects in the ring
> > +before actual enqueue.
> > +Note that this API is available only for two sync modes:
> > +
> > +* Single Producer/Single Consumer (SP/SC)
> > +
> > +* Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
> > +
> > +It is a user responsibility to create/init ring with appropriate sync modes
> > +selected. As an example of usage:
> > +
> > +.. code-block:: c
> > +
> > + /* read 1 elem from the ring: */
> > + uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
> > + if (n != 0) {
> > + /* examine object */
> > + if (object_examine(obj) == KEEP)
> > + /* decided to keep it in the ring. */
> > + rte_ring_dequeue_finish(ring, 0);
> > + else
> > + /* decided to remove it from the ring. */
> > + rte_ring_dequeue_finish(ring, n);
> > + }
> > +
> > +Note that between ``_start_`` and ``_finish_`` none other thread can proceed
> > +with enqueue(/dequeue) operation till ``_finish_`` completes.
> > +
>
>
>
> --
> David Marchand
@@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
uint32_t entries = (prod_tail - cons_head);
uint32_t free_entries = (mask + cons_tail -prod_head);
+Producer/consumer synchronization modes
+---------------------------------------
+
+rte_ring supports different synchronization modes for porducer and consumers.
+These modes can be specified at ring creation/init time via ``flags`` parameter.
+That should help user to configure ring in way most suitable for his
+specific usage scenarios.
+Currently supported modes:
+
+MP/MC (default one)
+~~~~~~~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
+mode for the ring. In this mode multiple threads can enqueue (/dequeue)
+objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
+per core) this is usually most suitable and fastest synchronization mode.
+As a well known limitaion - it can perform quite pure on some overcommitted
+scenarios.
+
+SP/SC
+~~~~~
+Single-producer (/single-consumer) mode. In this mode only one thread at a time
+is allowed to enqueue (/dequeue) objects to (/from) the ring.
+
+MP_RTS/MC_RTS
+~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
+The main difference from original MP/MC algorithm is that
+tail value is increased not by every thread that finished enqueue/dequeue,
+but only by the last one.
+That allows threads to avoid spinning on ring tail value,
+leaving actual tail value change to the last thread at a given instance.
+That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail
+update and improves average enqueue/dequeue times on overcommitted systems.
+To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+one for head update, second for tail update.
+In comparison original MP/MC algorithm requires one 32-bit CAS
+for head update and waiting/spinning on tail value.
+
+MP_HTS/MC_HTS
+~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
+In that mode enqueue/dequeue operation is fully serialized:
+at any given moment only one enqueue/dequeue operation can proceed.
+This is achieved by allowing a thread to proceed with changing ``head.value``
+only when ``head.value == tail.value``.
+Both head and tail values are updated atomically (as one 64-bit value).
+To achieve that 64-bit CAS is used by head update routine.
+That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail
+update and helps to improve ring enqueue/dequeue behavior in overcommitted
+scenarios. Another advantage of fully serialized producer/consumer -
+it provides ability to implement MT safe peek API for rte_ring.
+
+
+Ring Peek API
+-------------
+
+For ring with serialized producer/consumer (HTS sync mode) it is possible
+to split public enqueue/dequeue API into two phases:
+
+* enqueue/dequeue start
+
+* enqueue/dequeue finish
+
+That allows user to inspect objects in the ring without removing them
+from it (aka MT safe peek) and reserve space for the objects in the ring
+before actual enqueue.
+Note that this API is available only for two sync modes:
+
+* Single Producer/Single Consumer (SP/SC)
+
+* Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
+
+It is a user responsibility to create/init ring with appropriate sync modes
+selected. As an example of usage:
+
+.. code-block:: c
+
+ /* read 1 elem from the ring: */
+ uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+ if (n != 0) {
+ /* examine object */
+ if (object_examine(obj) == KEEP)
+ /* decided to keep it in the ring. */
+ rte_ring_dequeue_finish(ring, 0);
+ else
+ /* decided to remove it from the ring. */
+ rte_ring_dequeue_finish(ring, n);
+ }
+
+Note that between ``_start_`` and ``_finish_`` none other thread can proceed
+with enqueue(/dequeue) operation till ``_finish_`` completes.
+
References
----------