[v3,1/3] lib/ring: add peek API

Message ID 20191001062917.35578-2-honnappa.nagarahalli@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series Add RCU reclamation APIs |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-compilation success Compile Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation fail Compilation issues
ci/iol-mellanox-Performance fail Performance Testing issues

Commit Message

Honnappa Nagarahalli Oct. 1, 2019, 6:29 a.m. UTC
  From: Ruifeng Wang <ruifeng.wang@arm.com>

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)
  

Comments

Ananyev, Konstantin Oct. 2, 2019, 6:42 p.m. UTC | #1
> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> Sent: Tuesday, October 1, 2019 7:29 AM
> To: honnappa.nagarahalli@arm.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org;
> paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; ruifeng.wang@arm.com;
> dharmik.thakkar@arm.com; dev@dpdk.org; nd@arm.com
> Subject: [PATCH v3 1/3] lib/ring: add peek API
> 
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> The peek API allows fetching the next available object in the ring
> without dequeuing it. This helps in scenarios where dequeuing of
> objects depend on their value.
> 
> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 2a9f768a1..d3d0d5e18 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  				r->cons.single, available);
>  }
> 
> +/**
> + * Peek one object from a ring.
> + *
> + * The peek API allows fetching the next available object in the ring
> + * without dequeuing it. This API is not multi-thread safe with respect
> + * to other consumer threads.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, object available
> + *   - -ENOENT: Not enough entries in the ring.
> + */
> +__rte_experimental
> +static __rte_always_inline int
> +rte_ring_peek(struct rte_ring *r, void **obj_p)

As it is not MT safe, then I think we need _sc_ in the name,
to follow other rte_ring functions naming conventions
(rte_ring_sc_peek() or so).

As a better alternative what do you think about introducing 
a serialized versions of DPDK rte_ring dequeue functions?
Something like that:

/* same as original ring dequeue, but:
  * 1) move cons.head only if cons.head == const.tail
  * 2) don't update cons.tail
  */
unsigned int
rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
                unsigned int *available);

/* sets both cons.head and cons.tail to cons.head + num */
void rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);

/* resets cons.head to const.tail value */
void rte_ring_serial_dequeue_abort(struct rte_ring *r);

Then your dq_reclaim cycle function will look like that:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t avl, n;
uintptr_t elt[nb_elt];
...

do {

  /* read next elem from the queue */
  n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
  if (n == 0)
      break; 

 /* wrong period, keep elem in the queue */  
 if (rte_rcu_qsbr_check(dr->v, elt[0]) != 1) {
     rte_ring_serial_dequeue_abort(dq->r);
     break;
  }

  /* can reclaim, remove elem from the queue */
  rte_ring_serial_dequeue_finish(dr->q, nb_elt);

   /*call reclaim function */
  dr->f(dr->p, elt);

} while (avl >= nb_elt);

That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
As long as actual reclamation callback itself is MT safe of course.

> +{
> +	uint32_t prod_tail = r->prod.tail;
> +	uint32_t cons_head = r->cons.head;
> +	uint32_t count = (prod_tail - cons_head) & r->mask;
> +	unsigned int n = 1;
> +	if (count) {
> +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> +		return 0;
> +	}
> +	return -ENOENT;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.17.1
  
Honnappa Nagarahalli Oct. 3, 2019, 7:49 p.m. UTC | #2
> > Subject: [PATCH v3 1/3] lib/ring: add peek API
> >
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > The peek API allows fetching the next available object in the ring
> > without dequeuing it. This helps in scenarios where dequeuing of
> > objects depend on their value.
> >
> > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 2a9f768a1..d3d0d5e18 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> **obj_table,
> >  				r->cons.single, available);
> >  }
> >
> > +/**
> > + * Peek one object from a ring.
> > + *
> > + * The peek API allows fetching the next available object in the ring
> > + * without dequeuing it. This API is not multi-thread safe with
> > +respect
> > + * to other consumer threads.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, object available
> > + *   - -ENOENT: Not enough entries in the ring.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline int
> > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> 
> As it is not MT safe, then I think we need _sc_ in the name, to follow other
> rte_ring functions naming conventions
> (rte_ring_sc_peek() or so).
Agree

> 
> As a better alternative what do you think about introducing a serialized
> versions of DPDK rte_ring dequeue functions?
> Something like that:
> 
> /* same as original ring dequeue, but:
>   * 1) move cons.head only if cons.head == const.tail
>   * 2) don't update cons.tail
>   */
> unsigned int
> rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned
> int n,
>                 unsigned int *available);
> 
> /* sets both cons.head and cons.tail to cons.head + num */ void
> rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> 
> /* resets cons.head to const.tail value */ void
> rte_ring_serial_dequeue_abort(struct rte_ring *r);
> 
> Then your dq_reclaim cycle function will look like that:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n; uintptr_t
> elt[nb_elt]; ...
> 
> do {
> 
>   /* read next elem from the queue */
>   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
>   if (n == 0)
>       break;
> 
>  /* wrong period, keep elem in the queue */  if (rte_rcu_qsbr_check(dr->v,
> elt[0]) != 1) {
>      rte_ring_serial_dequeue_abort(dq->r);
>      break;
>   }
> 
>   /* can reclaim, remove elem from the queue */
>   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> 
>    /*call reclaim function */
>   dr->f(dr->p, elt);
> 
> } while (avl >= nb_elt);
> 
> That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> As long as actual reclamation callback itself is MT safe of course.

I think it is a great idea. The other writers would still be polling for the current writer to update the tail or update the head. This makes it a blocking solution.

We can make the other threads not poll i.e. they will quit reclaiming if they see that other writers are dequeuing from the queue. The other way is to use per thread queues.

The other requirement I see is to support unbounded-size data structures where in the data structures do not have a pre-determined number of entries. Also, currently the defer queue size is equal to the total number of entries in a given data structure. There are plans to support dynamically resizable defer queue. This means, memory allocation which will affect the lock-free-ness of the solution.

So, IMO:
1) The API should provide the capability to support different algorithms - may be through some flags?
2) The requirements for the ring are pretty unique to the problem we have here (for ex: move the cons-head only if cons-tail is also the same, skip polling). So, we should probably implement a ring with-in the RCU library?

From the timeline perspective, adding all these capabilities would be difficult to get done with in 19.11 timeline. What I have here satisfies my current needs. I suggest that we make provisions in APIs now to support all these features, but do the implementation in the coming releases. Does this sound ok for you?

> 
> > +{
> > +	uint32_t prod_tail = r->prod.tail;
> > +	uint32_t cons_head = r->cons.head;
> > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > +	unsigned int n = 1;
> > +	if (count) {
> > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > +		return 0;
> > +	}
> > +	return -ENOENT;
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> > 2.17.1
  
Ananyev, Konstantin Oct. 7, 2019, 9:01 a.m. UTC | #3
> 
> > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > >
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > >
> > > The peek API allows fetching the next available object in the ring
> > > without dequeuing it. This helps in scenarios where dequeuing of
> > > objects depend on their value.
> > >
> > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > >  1 file changed, 30 insertions(+)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 2a9f768a1..d3d0d5e18 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> > **obj_table,
> > >  				r->cons.single, available);
> > >  }
> > >
> > > +/**
> > > + * Peek one object from a ring.
> > > + *
> > > + * The peek API allows fetching the next available object in the ring
> > > + * without dequeuing it. This API is not multi-thread safe with
> > > +respect
> > > + * to other consumer threads.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_p
> > > + *   A pointer to a void * pointer (object) that will be filled.
> > > + * @return
> > > + *   - 0: Success, object available
> > > + *   - -ENOENT: Not enough entries in the ring.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline int
> > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> >
> > As it is not MT safe, then I think we need _sc_ in the name, to follow other
> > rte_ring functions naming conventions
> > (rte_ring_sc_peek() or so).
> Agree
> 
> >
> > As a better alternative what do you think about introducing a serialized
> > versions of DPDK rte_ring dequeue functions?
> > Something like that:
> >
> > /* same as original ring dequeue, but:
> >   * 1) move cons.head only if cons.head == const.tail
> >   * 2) don't update cons.tail
> >   */
> > unsigned int
> > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned
> > int n,
> >                 unsigned int *available);
> >
> > /* sets both cons.head and cons.tail to cons.head + num */ void
> > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> >
> > /* resets cons.head to const.tail value */ void
> > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> >
> > Then your dq_reclaim cycle function will look like that:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n; uintptr_t
> > elt[nb_elt]; ...
> >
> > do {
> >
> >   /* read next elem from the queue */
> >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> >   if (n == 0)
> >       break;
> >
> >  /* wrong period, keep elem in the queue */  if (rte_rcu_qsbr_check(dr->v,
> > elt[0]) != 1) {
> >      rte_ring_serial_dequeue_abort(dq->r);
> >      break;
> >   }
> >
> >   /* can reclaim, remove elem from the queue */
> >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> >
> >    /*call reclaim function */
> >   dr->f(dr->p, elt);
> >
> > } while (avl >= nb_elt);
> >
> > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > As long as actual reclamation callback itself is MT safe of course.
> 
> I think it is a great idea. The other writers would still be polling for the current writer to update the tail or update the head. This makes it a
> blocking solution.

Yep, it is a blocking one.

> We can make the other threads not poll i.e. they will quit reclaiming if they see that other writers are dequeuing from the queue. 

Actually didn't think about that possibility, but yes should be possible to have _try_ semantics too. 

>The other  way is to use per thread queues.
> 
> The other requirement I see is to support unbounded-size data structures where in the data structures do not have a pre-determined
> number of entries. Also, currently the defer queue size is equal to the total number of entries in a given data structure. There are plans to
> support dynamically resizable defer queue. This means, memory allocation which will affect the lock-free-ness of the solution.
> 
> So, IMO:
> 1) The API should provide the capability to support different algorithms - may be through some flags?
> 2) The requirements for the ring are pretty unique to the problem we have here (for ex: move the cons-head only if cons-tail is also the
> same, skip polling). So, we should probably implement a ring with-in the RCU library?

Personally, I think such serialization ring API would be useful for other cases too.
There are few cases when user need to read contents of the queue without removing elements from it.
Let say we do use similar approach inside TLDK to implement TCP transmit queue.
If such API would exist in DPDK we can just use it straightway, without maintaining a separate one.

> 
> From the timeline perspective, adding all these capabilities would be difficult to get done with in 19.11 timeline. What I have here satisfies
> my current needs. I suggest that we make provisions in APIs now to support all these features, but do the implementation in the coming
> releases. Does this sound ok for you?

Not sure I understand your suggestion here...
Could you explain it a bit more - how new API will look like and what would be left for the future. 

> 
> >
> > > +{
> > > +	uint32_t prod_tail = r->prod.tail;
> > > +	uint32_t cons_head = r->cons.head;
> > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > +	unsigned int n = 1;
> > > +	if (count) {
> > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > +		return 0;
> > > +	}
> > > +	return -ENOENT;
> > > +}
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > --
> > > 2.17.1
  
Honnappa Nagarahalli Oct. 9, 2019, 4:25 a.m. UTC | #4
<snip>

> 
> >
> > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > >
> > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >
> > > > The peek API allows fetching the next available object in the ring
> > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > objects depend on their value.
> > > >
> > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > ---
> > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 30 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > void
> > > **obj_table,
> > > >  				r->cons.single, available);
> > > >  }
> > > >
> > > > +/**
> > > > + * Peek one object from a ring.
> > > > + *
> > > > + * The peek API allows fetching the next available object in the
> > > > +ring
> > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > +respect
> > > > + * to other consumer threads.
> > > > + *
> > > > + * @param r
> > > > + *   A pointer to the ring structure.
> > > > + * @param obj_p
> > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > + * @return
> > > > + *   - 0: Success, object available
> > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline int
> > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > >
> > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > follow other rte_ring functions naming conventions
> > > (rte_ring_sc_peek() or so).
> > Agree
> >
> > >
> > > As a better alternative what do you think about introducing a
> > > serialized versions of DPDK rte_ring dequeue functions?
> > > Something like that:
> > >
> > > /* same as original ring dequeue, but:
> > >   * 1) move cons.head only if cons.head == const.tail
> > >   * 2) don't update cons.tail
> > >   */
> > > unsigned int
> > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > unsigned int n,
> > >                 unsigned int *available);
> > >
> > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > >
> > > /* resets cons.head to const.tail value */ void
> > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > >
> > > Then your dq_reclaim cycle function will look like that:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > uintptr_t elt[nb_elt]; ...
> > >
> > > do {
> > >
> > >   /* read next elem from the queue */
> > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > >   if (n == 0)
> > >       break;
> > >
> > >  /* wrong period, keep elem in the queue */  if
> > > (rte_rcu_qsbr_check(dr->v,
> > > elt[0]) != 1) {
> > >      rte_ring_serial_dequeue_abort(dq->r);
> > >      break;
> > >   }
> > >
> > >   /* can reclaim, remove elem from the queue */
> > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > >
> > >    /*call reclaim function */
> > >   dr->f(dr->p, elt);
> > >
> > > } while (avl >= nb_elt);
> > >
> > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > As long as actual reclamation callback itself is MT safe of course.
> >
> > I think it is a great idea. The other writers would still be polling
> > for the current writer to update the tail or update the head. This makes it a
> blocking solution.
> 
> Yep, it is a blocking one.
> 
> > We can make the other threads not poll i.e. they will quit reclaiming if they
> see that other writers are dequeuing from the queue.
> 
> Actually didn't think about that possibility, but yes should be possible to have
> _try_ semantics too.
> 
> >The other  way is to use per thread queues.
> >
> > The other requirement I see is to support unbounded-size data
> > structures where in the data structures do not have a pre-determined
> > number of entries. Also, currently the defer queue size is equal to the total
> number of entries in a given data structure. There are plans to support
> dynamically resizable defer queue. This means, memory allocation which will
> affect the lock-free-ness of the solution.
> >
> > So, IMO:
> > 1) The API should provide the capability to support different algorithms -
> may be through some flags?
> > 2) The requirements for the ring are pretty unique to the problem we
> > have here (for ex: move the cons-head only if cons-tail is also the same, skip
> polling). So, we should probably implement a ring with-in the RCU library?
> 
> Personally, I think such serialization ring API would be useful for other cases
> too.
> There are few cases when user need to read contents of the queue without
> removing elements from it.
> Let say we do use similar approach inside TLDK to implement TCP transmit
> queue.
> If such API would exist in DPDK we can just use it straightway, without
> maintaining a separate one.
ok

> 
> >
> > From the timeline perspective, adding all these capabilities would be
> > difficult to get done with in 19.11 timeline. What I have here
> > satisfies my current needs. I suggest that we make provisions in APIs now to
> support all these features, but do the implementation in the coming releases.
> Does this sound ok for you?
> 
> Not sure I understand your suggestion here...
> Could you explain it a bit more - how new API will look like and what would
> be left for the future.
For this patch, I suggest we do not add any more complexity. If someone wants a lock-free/block-free mechanism, it is available by creating per thread defer queues.

We push the following to the future:
1) Dynamically size adjustable defer queue. IMO, with this, the lock-free/block-free reclamation will not be available (memory allocation requires locking). The memory for the defer queue will be allocated/freed in chunks of 'size' elements as the queue grows/shrinks.

2) Constant size defer queue with lock-free and block-free reclamation (single option). The defer queue will be of fixed length 'size'. If the queue gets full an error is returned. The user could provide a 'size' equal to the number of elements in a data structure to ensure queue never gets full.

I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #defines, one for dynamically variable size defer queue and the other for constant size defer queue.

However, IMO, using per thread defer queue is a much simpler way to achieve 2. It does not add any significant burden to the user either.

> 
> >
> > >
> > > > +{
> > > > +	uint32_t prod_tail = r->prod.tail;
> > > > +	uint32_t cons_head = r->cons.head;
> > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > +	unsigned int n = 1;
> > > > +	if (count) {
> > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > +		return 0;
> > > > +	}
> > > > +	return -ENOENT;
> > > > +}
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.17.1
  
Ananyev, Konstantin Oct. 10, 2019, 3:09 p.m. UTC | #5
> <snip>
> 
> >
> > >
> > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > >
> > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >
> > > > > The peek API allows fetching the next available object in the ring
> > > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > > objects depend on their value.
> > > > >
> > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 30 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > > void
> > > > **obj_table,
> > > > >  				r->cons.single, available);
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * Peek one object from a ring.
> > > > > + *
> > > > > + * The peek API allows fetching the next available object in the
> > > > > +ring
> > > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > > +respect
> > > > > + * to other consumer threads.
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_p
> > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > + * @return
> > > > > + *   - 0: Success, object available
> > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline int
> > > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > > >
> > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > follow other rte_ring functions naming conventions
> > > > (rte_ring_sc_peek() or so).
> > > Agree
> > >
> > > >
> > > > As a better alternative what do you think about introducing a
> > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > Something like that:
> > > >
> > > > /* same as original ring dequeue, but:
> > > >   * 1) move cons.head only if cons.head == const.tail
> > > >   * 2) don't update cons.tail
> > > >   */
> > > > unsigned int
> > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n,
> > > >                 unsigned int *available);
> > > >
> > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > > >
> > > > /* resets cons.head to const.tail value */ void
> > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > >
> > > > Then your dq_reclaim cycle function will look like that:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > uintptr_t elt[nb_elt]; ...
> > > >
> > > > do {
> > > >
> > > >   /* read next elem from the queue */
> > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > >   if (n == 0)
> > > >       break;
> > > >
> > > >  /* wrong period, keep elem in the queue */  if
> > > > (rte_rcu_qsbr_check(dr->v,
> > > > elt[0]) != 1) {
> > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > >      break;
> > > >   }
> > > >
> > > >   /* can reclaim, remove elem from the queue */
> > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > >
> > > >    /*call reclaim function */
> > > >   dr->f(dr->p, elt);
> > > >
> > > > } while (avl >= nb_elt);
> > > >
> > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > As long as actual reclamation callback itself is MT safe of course.
> > >
> > > I think it is a great idea. The other writers would still be polling
> > > for the current writer to update the tail or update the head. This makes it a
> > blocking solution.
> >
> > Yep, it is a blocking one.
> >
> > > We can make the other threads not poll i.e. they will quit reclaiming if they
> > see that other writers are dequeuing from the queue.
> >
> > Actually didn't think about that possibility, but yes should be possible to have
> > _try_ semantics too.
> >
> > >The other  way is to use per thread queues.
> > >
> > > The other requirement I see is to support unbounded-size data
> > > structures where in the data structures do not have a pre-determined
> > > number of entries. Also, currently the defer queue size is equal to the total
> > number of entries in a given data structure. There are plans to support
> > dynamically resizable defer queue. This means, memory allocation which will
> > affect the lock-free-ness of the solution.
> > >
> > > So, IMO:
> > > 1) The API should provide the capability to support different algorithms -
> > may be through some flags?
> > > 2) The requirements for the ring are pretty unique to the problem we
> > > have here (for ex: move the cons-head only if cons-tail is also the same, skip
> > polling). So, we should probably implement a ring with-in the RCU library?
> >
> > Personally, I think such serialization ring API would be useful for other cases
> > too.
> > There are few cases when user need to read contents of the queue without
> > removing elements from it.
> > Let say we do use similar approach inside TLDK to implement TCP transmit
> > queue.
> > If such API would exist in DPDK we can just use it straightway, without
> > maintaining a separate one.
> ok
> 
> >
> > >
> > > From the timeline perspective, adding all these capabilities would be
> > > difficult to get done with in 19.11 timeline. What I have here
> > > satisfies my current needs. I suggest that we make provisions in APIs now to
> > support all these features, but do the implementation in the coming releases.
> > Does this sound ok for you?
> >
> > Not sure I understand your suggestion here...
> > Could you explain it a bit more - how new API will look like and what would
> > be left for the future.
> For this patch, I suggest we do not add any more complexity. If someone wants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
> 
> We push the following to the future:
> 1) Dynamically size adjustable defer queue. IMO, with this, the lock-free/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed in chunks of 'size' elements as the queue grows/shrinks.

That one is fine by me.
In fact I don't know would be there a real use-case for dynamic defer queue for rcu var...
But I suppose that's subject for another discussion.

> 
> 2) Constant size defer queue with lock-free and block-free reclamation (single option). The defer queue will be of fixed length 'size'. If the
> queue gets full an error is returned. The user could provide a 'size' equal to the number of elements in a data structure to ensure queue
> never gets full.

Ok so for 19.11 what enqueue/dequeue model do you plan to support?
- MP/MC
- MP/SC
- SP/SC
- non MT at all (only same single thread can do enqueue and dequeue)

And related question:
What additional rte_ring API you plan to introduce in that case?
- None
- rte_ring_sc_peek()
- rte_ring_serial_dequeue()

> 
> I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #defines, one for dynamically variable size defer queue and the
> other for constant size defer queue.
> 
> However, IMO, using per thread defer queue is a much simpler way to achieve 2. It does not add any significant burden to the user either.
> 
> >
> > >
> > > >
> > > > > +{
> > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > +	uint32_t cons_head = r->cons.head;
> > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > +	unsigned int n = 1;
> > > > > +	if (count) {
> > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > +		return 0;
> > > > > +	}
> > > > > +	return -ENOENT;
> > > > > +}
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > --
> > > > > 2.17.1
  
Honnappa Nagarahalli Oct. 11, 2019, 5:03 a.m. UTC | #6
> 
> > <snip>
> >
> > >
> > > >
> > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > >
> > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > >
> > > > > > The peek API allows fetching the next available object in the
> > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > dequeuing of objects depend on their value.
> > > > > >
> > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > ++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 30 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > *r, void
> > > > > **obj_table,
> > > > > >  				r->cons.single, available);  }
> > > > > >
> > > > > > +/**
> > > > > > + * Peek one object from a ring.
> > > > > > + *
> > > > > > + * The peek API allows fetching the next available object in
> > > > > > +the ring
> > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > +with respect
> > > > > > + * to other consumer threads.
> > > > > > + *
> > > > > > + * @param r
> > > > > > + *   A pointer to the ring structure.
> > > > > > + * @param obj_p
> > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > + * @return
> > > > > > + *   - 0: Success, object available
> > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > +*r, void **obj_p)
> > > > >
> > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > follow other rte_ring functions naming conventions
> > > > > (rte_ring_sc_peek() or so).
> > > > Agree
> > > >
> > > > >
> > > > > As a better alternative what do you think about introducing a
> > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > Something like that:
> > > > >
> > > > > /* same as original ring dequeue, but:
> > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > >   * 2) don't update cons.tail
> > > > >   */
> > > > > unsigned int
> > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > **obj_table, unsigned int n,
> > > > >                 unsigned int *available);
> > > > >
> > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > num);
> > > > >
> > > > > /* resets cons.head to const.tail value */ void
> > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > >
> > > > > Then your dq_reclaim cycle function will look like that:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > uintptr_t elt[nb_elt]; ...
> > > > >
> > > > > do {
> > > > >
> > > > >   /* read next elem from the queue */
> > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > >   if (n == 0)
> > > > >       break;
> > > > >
> > > > >  /* wrong period, keep elem in the queue */  if
> > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > elt[0]) != 1) {
> > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > >      break;
> > > > >   }
> > > > >
> > > > >   /* can reclaim, remove elem from the queue */
> > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > >
> > > > >    /*call reclaim function */
> > > > >   dr->f(dr->p, elt);
> > > > >
> > > > > } while (avl >= nb_elt);
> > > > >
> > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > As long as actual reclamation callback itself is MT safe of course.
> > > >
> > > > I think it is a great idea. The other writers would still be
> > > > polling for the current writer to update the tail or update the
> > > > head. This makes it a
> > > blocking solution.
> > >
> > > Yep, it is a blocking one.
> > >
> > > > We can make the other threads not poll i.e. they will quit
> > > > reclaiming if they
> > > see that other writers are dequeuing from the queue.
> > >
> > > Actually didn't think about that possibility, but yes should be
> > > possible to have _try_ semantics too.
> > >
> > > >The other  way is to use per thread queues.
> > > >
> > > > The other requirement I see is to support unbounded-size data
> > > > structures where in the data structures do not have a
> > > > pre-determined number of entries. Also, currently the defer queue
> > > > size is equal to the total
> > > number of entries in a given data structure. There are plans to
> > > support dynamically resizable defer queue. This means, memory
> > > allocation which will affect the lock-free-ness of the solution.
> > > >
> > > > So, IMO:
> > > > 1) The API should provide the capability to support different
> > > > algorithms -
> > > may be through some flags?
> > > > 2) The requirements for the ring are pretty unique to the problem
> > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > the same, skip
> > > polling). So, we should probably implement a ring with-in the RCU library?
> > >
> > > Personally, I think such serialization ring API would be useful for
> > > other cases too.
> > > There are few cases when user need to read contents of the queue
> > > without removing elements from it.
> > > Let say we do use similar approach inside TLDK to implement TCP
> > > transmit queue.
> > > If such API would exist in DPDK we can just use it straightway,
> > > without maintaining a separate one.
> > ok
> >
> > >
> > > >
> > > > From the timeline perspective, adding all these capabilities would
> > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > satisfies my current needs. I suggest that we make provisions in
> > > > APIs now to
> > > support all these features, but do the implementation in the coming
> releases.
> > > Does this sound ok for you?
> > >
> > > Not sure I understand your suggestion here...
> > > Could you explain it a bit more - how new API will look like and
> > > what would be left for the future.
> > For this patch, I suggest we do not add any more complexity. If
> > someone wants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
> >
> > We push the following to the future:
> > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > lock-free/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed in
> chunks of 'size' elements as the queue grows/shrinks.
> 
> That one is fine by me.
> In fact I don't know would be there a real use-case for dynamic defer queue
> for rcu var...
> But I suppose that's subject for another discussion.
Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done regularly.
If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

> 
> >
> > 2) Constant size defer queue with lock-free and block-free reclamation
> > (single option). The defer queue will be of fixed length 'size'. If
> > the queue gets full an error is returned. The user could provide a 'size' equal
> to the number of elements in a data structure to ensure queue never gets full.
> 
> Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> - MP/MC
> - MP/SC
> - SP/SC
Just SP/SC

> - non MT at all (only same single thread can do enqueue and dequeue)
If MT safe is required, one should use 1 defer queue per thread for now.

> 
> And related question:
> What additional rte_ring API you plan to introduce in that case?
> - None
> - rte_ring_sc_peek()
rte_ring_peek will be changed to rte_ring_sc_peek

> - rte_ring_serial_dequeue()
> 
> >
> > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > 2 #defines, one for dynamically variable size defer queue and the other for
> constant size defer queue.
> >
> > However, IMO, using per thread defer queue is a much simpler way to
> achieve 2. It does not add any significant burden to the user either.
> >
> > >
> > > >
> > > > >
> > > > > > +{
> > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > +	unsigned int n = 1;
> > > > > > +	if (count) {
> > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > +		return 0;
> > > > > > +	}
> > > > > > +	return -ENOENT;
> > > > > > +}
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > --
> > > > > > 2.17.1
  
Ananyev, Konstantin Oct. 11, 2019, 2:41 p.m. UTC | #7
> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Friday, October 11, 2019 6:04 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; Ruifeng Wang (Arm Technology
> China) <Ruifeng.Wang@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; dev@dpdk.org; nd <nd@arm.com>; nd
> <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v3 1/3] lib/ring: add peek API
> 
> >
> > > <snip>
> > >
> > > >
> > > > >
> > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > >
> > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > >
> > > > > > > The peek API allows fetching the next available object in the
> > > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > > dequeuing of objects depend on their value.
> > > > > > >
> > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > ---
> > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > ++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 30 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > > *r, void
> > > > > > **obj_table,
> > > > > > >  				r->cons.single, available);  }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * Peek one object from a ring.
> > > > > > > + *
> > > > > > > + * The peek API allows fetching the next available object in
> > > > > > > +the ring
> > > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > > +with respect
> > > > > > > + * to other consumer threads.
> > > > > > > + *
> > > > > > > + * @param r
> > > > > > > + *   A pointer to the ring structure.
> > > > > > > + * @param obj_p
> > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > + * @return
> > > > > > > + *   - 0: Success, object available
> > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > > +*r, void **obj_p)
> > > > > >
> > > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > > follow other rte_ring functions naming conventions
> > > > > > (rte_ring_sc_peek() or so).
> > > > > Agree
> > > > >
> > > > > >
> > > > > > As a better alternative what do you think about introducing a
> > > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > > Something like that:
> > > > > >
> > > > > > /* same as original ring dequeue, but:
> > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > >   * 2) don't update cons.tail
> > > > > >   */
> > > > > > unsigned int
> > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > **obj_table, unsigned int n,
> > > > > >                 unsigned int *available);
> > > > > >
> > > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > > num);
> > > > > >
> > > > > > /* resets cons.head to const.tail value */ void
> > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > >
> > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > >
> > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > > uintptr_t elt[nb_elt]; ...
> > > > > >
> > > > > > do {
> > > > > >
> > > > > >   /* read next elem from the queue */
> > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > >   if (n == 0)
> > > > > >       break;
> > > > > >
> > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > elt[0]) != 1) {
> > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > >      break;
> > > > > >   }
> > > > > >
> > > > > >   /* can reclaim, remove elem from the queue */
> > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > >
> > > > > >    /*call reclaim function */
> > > > > >   dr->f(dr->p, elt);
> > > > > >
> > > > > > } while (avl >= nb_elt);
> > > > > >
> > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > >
> > > > > I think it is a great idea. The other writers would still be
> > > > > polling for the current writer to update the tail or update the
> > > > > head. This makes it a
> > > > blocking solution.
> > > >
> > > > Yep, it is a blocking one.
> > > >
> > > > > We can make the other threads not poll i.e. they will quit
> > > > > reclaiming if they
> > > > see that other writers are dequeuing from the queue.
> > > >
> > > > Actually didn't think about that possibility, but yes should be
> > > > possible to have _try_ semantics too.
> > > >
> > > > >The other  way is to use per thread queues.
> > > > >
> > > > > The other requirement I see is to support unbounded-size data
> > > > > structures where in the data structures do not have a
> > > > > pre-determined number of entries. Also, currently the defer queue
> > > > > size is equal to the total
> > > > number of entries in a given data structure. There are plans to
> > > > support dynamically resizable defer queue. This means, memory
> > > > allocation which will affect the lock-free-ness of the solution.
> > > > >
> > > > > So, IMO:
> > > > > 1) The API should provide the capability to support different
> > > > > algorithms -
> > > > may be through some flags?
> > > > > 2) The requirements for the ring are pretty unique to the problem
> > > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > > the same, skip
> > > > polling). So, we should probably implement a ring with-in the RCU library?
> > > >
> > > > Personally, I think such serialization ring API would be useful for
> > > > other cases too.
> > > > There are few cases when user need to read contents of the queue
> > > > without removing elements from it.
> > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > transmit queue.
> > > > If such API would exist in DPDK we can just use it straightway,
> > > > without maintaining a separate one.
> > > ok
> > >
> > > >
> > > > >
> > > > > From the timeline perspective, adding all these capabilities would
> > > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > > satisfies my current needs. I suggest that we make provisions in
> > > > > APIs now to
> > > > support all these features, but do the implementation in the coming
> > releases.
> > > > Does this sound ok for you?
> > > >
> > > > Not sure I understand your suggestion here...
> > > > Could you explain it a bit more - how new API will look like and
> > > > what would be left for the future.
> > > For this patch, I suggest we do not add any more complexity. If
> > > someone wants a lock-free/block-free mechanism, it is available by creating
> > per thread defer queues.
> > >
> > > We push the following to the future:
> > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > lock-free/block-free reclamation will not be available (memory allocation
> > requires locking). The memory for the defer queue will be allocated/freed in
> > chunks of 'size' elements as the queue grows/shrinks.
> >
> > That one is fine by me.
> > In fact I don't know would be there a real use-case for dynamic defer queue
> > for rcu var...
> > But I suppose that's subject for another discussion.
> Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done
> regularly.
> If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

I understand the intention.
Though I am not very happy with approach where to free one resource we first have to allocate another one.
Sounds like a source of deadlocks and for that case probably unnecessary complication.
But again, as it is not for 19.11 we don't have to discuss it now.
 
> >
> > >
> > > 2) Constant size defer queue with lock-free and block-free reclamation
> > > (single option). The defer queue will be of fixed length 'size'. If
> > > the queue gets full an error is returned. The user could provide a 'size' equal
> > to the number of elements in a data structure to ensure queue never gets full.
> >
> > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > - MP/MC
> > - MP/SC
> > - SP/SC
> Just SP/SC

Ok, just to confirm we are on the same page:
there would be a possibility for one thread do dq_enqueue(), second one do  dq_reclaim() simultaneously
(of course if actual reclamation function is thread safe)?
 
> > - non MT at all (only same single thread can do enqueue and dequeue)
> If MT safe is required, one should use 1 defer queue per thread for now.
> 
> >
> > And related question:
> > What additional rte_ring API you plan to introduce in that case?
> > - None
> > - rte_ring_sc_peek()
> rte_ring_peek will be changed to rte_ring_sc_peek
> 
> > - rte_ring_serial_dequeue()
> >
> > >
> > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > > 2 #defines, one for dynamically variable size defer queue and the other for
> > constant size defer queue.
> > >
> > > However, IMO, using per thread defer queue is a much simpler way to
> > achieve 2. It does not add any significant burden to the user either.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > +	unsigned int n = 1;
> > > > > > > +	if (count) {
> > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > +		return 0;
> > > > > > > +	}
> > > > > > > +	return -ENOENT;
> > > > > > > +}
> > > > > > > +
> > > > > > >  #ifdef __cplusplus
> > > > > > >  }
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.17.1
  
Honnappa Nagarahalli Oct. 11, 2019, 6:28 p.m. UTC | #8
<snip>

> > > >
> > > > >
> > > > > >
> > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > >
> > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > >
> > > > > > > > The peek API allows fetching the next available object in
> > > > > > > > the ring without dequeuing it. This helps in scenarios
> > > > > > > > where dequeuing of objects depend on their value.
> > > > > > > >
> > > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > ---
> > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18
> > > > > > > > 100644
> > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > rte_ring *r, void
> > > > > > > **obj_table,
> > > > > > > >  				r->cons.single, available);  }
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * Peek one object from a ring.
> > > > > > > > + *
> > > > > > > > + * The peek API allows fetching the next available object
> > > > > > > > +in the ring
> > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > +safe with respect
> > > > > > > > + * to other consumer threads.
> > > > > > > > + *
> > > > > > > > + * @param r
> > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > + * @param obj_p
> > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > + * @return
> > > > > > > > + *   - 0: Success, object available
> > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > >
> > > > > > > As it is not MT safe, then I think we need _sc_ in the name,
> > > > > > > to follow other rte_ring functions naming conventions
> > > > > > > (rte_ring_sc_peek() or so).
> > > > > > Agree
> > > > > >
> > > > > > >
> > > > > > > As a better alternative what do you think about introducing
> > > > > > > a serialized versions of DPDK rte_ring dequeue functions?
> > > > > > > Something like that:
> > > > > > >
> > > > > > > /* same as original ring dequeue, but:
> > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > >   * 2) don't update cons.tail
> > > > > > >   */
> > > > > > > unsigned int
> > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > **obj_table, unsigned int n,
> > > > > > >                 unsigned int *available);
> > > > > > >
> > > > > > > /* sets both cons.head and cons.tail to cons.head + num */
> > > > > > > void rte_ring_serial_dequeue_finish(struct rte_ring *r,
> > > > > > > uint32_t num);
> > > > > > >
> > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > >
> > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > >
> > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl,
> > > > > > > n; uintptr_t elt[nb_elt]; ...
> > > > > > >
> > > > > > > do {
> > > > > > >
> > > > > > >   /* read next elem from the queue */
> > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > >   if (n == 0)
> > > > > > >       break;
> > > > > > >
> > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > elt[0]) != 1) {
> > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > >      break;
> > > > > > >   }
> > > > > > >
> > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > >
> > > > > > >    /*call reclaim function */
> > > > > > >   dr->f(dr->p, elt);
> > > > > > >
> > > > > > > } while (avl >= nb_elt);
> > > > > > >
> > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > > >
> > > > > > I think it is a great idea. The other writers would still be
> > > > > > polling for the current writer to update the tail or update
> > > > > > the head. This makes it a
> > > > > blocking solution.
> > > > >
> > > > > Yep, it is a blocking one.
> > > > >
> > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > reclaiming if they
> > > > > see that other writers are dequeuing from the queue.
> > > > >
> > > > > Actually didn't think about that possibility, but yes should be
> > > > > possible to have _try_ semantics too.
> > > > >
> > > > > >The other  way is to use per thread queues.
> > > > > >
> > > > > > The other requirement I see is to support unbounded-size data
> > > > > > structures where in the data structures do not have a
> > > > > > pre-determined number of entries. Also, currently the defer
> > > > > > queue size is equal to the total
> > > > > number of entries in a given data structure. There are plans to
> > > > > support dynamically resizable defer queue. This means, memory
> > > > > allocation which will affect the lock-free-ness of the solution.
> > > > > >
> > > > > > So, IMO:
> > > > > > 1) The API should provide the capability to support different
> > > > > > algorithms -
> > > > > may be through some flags?
> > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > cons-tail is also the same, skip
> > > > > polling). So, we should probably implement a ring with-in the RCU
> library?
> > > > >
> > > > > Personally, I think such serialization ring API would be useful
> > > > > for other cases too.
> > > > > There are few cases when user need to read contents of the queue
> > > > > without removing elements from it.
> > > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > > transmit queue.
> > > > > If such API would exist in DPDK we can just use it straightway,
> > > > > without maintaining a separate one.
> > > > ok
> > > >
> > > > >
> > > > > >
> > > > > > From the timeline perspective, adding all these capabilities
> > > > > > would be difficult to get done with in 19.11 timeline. What I
> > > > > > have here satisfies my current needs. I suggest that we make
> > > > > > provisions in APIs now to
> > > > > support all these features, but do the implementation in the
> > > > > coming
> > > releases.
> > > > > Does this sound ok for you?
> > > > >
> > > > > Not sure I understand your suggestion here...
> > > > > Could you explain it a bit more - how new API will look like and
> > > > > what would be left for the future.
> > > > For this patch, I suggest we do not add any more complexity. If
> > > > someone wants a lock-free/block-free mechanism, it is available by
> > > > creating
> > > per thread defer queues.
> > > >
> > > > We push the following to the future:
> > > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > > lock-free/block-free reclamation will not be available (memory
> > > > allocation
> > > requires locking). The memory for the defer queue will be
> > > allocated/freed in chunks of 'size' elements as the queue grows/shrinks.
> > >
> > > That one is fine by me.
> > > In fact I don't know would be there a real use-case for dynamic
> > > defer queue for rcu var...
> > > But I suppose that's subject for another discussion.
> > Currently, the defer queue size is equal to the number of resources in
> > the data structure. This is unnecessary as the reclamation is done regularly.
> > If a smaller queue size is used, the queue might get full (even after
> reclamation), in which case, the queue size should be increased.
> 
> I understand the intention.
> Though I am not very happy with approach where to free one resource we first
> have to allocate another one.
> Sounds like a source of deadlocks and for that case probably unnecessary
> complication.
It depends on the use case. For some use cases lock-free reader-writer concurrency is enough (in which case there is no need to have a queue large enough to hold all the resources) and some would require lock-free reader-writer and writer-writer concurrency (where, theoretically, a queue large enough to hold all the resources would be required).

> But again, as it is not for 19.11 we don't have to discuss it now.
> 
> > >
> > > >
> > > > 2) Constant size defer queue with lock-free and block-free
> > > > reclamation (single option). The defer queue will be of fixed
> > > > length 'size'. If the queue gets full an error is returned. The
> > > > user could provide a 'size' equal
> > > to the number of elements in a data structure to ensure queue never gets
> full.
> > >
> > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > - MP/MC
> > > - MP/SC
> > > - SP/SC
> > Just SP/SC
> 
> Ok, just to confirm we are on the same page:
> there would be a possibility for one thread do dq_enqueue(), second one do
> dq_reclaim() simultaneously (of course if actual reclamation function is thread
> safe)?
Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.

> 
> > > - non MT at all (only same single thread can do enqueue and dequeue)
> > If MT safe is required, one should use 1 defer queue per thread for now.
> >
> > >
> > > And related question:
> > > What additional rte_ring API you plan to introduce in that case?
> > > - None
> > > - rte_ring_sc_peek()
> > rte_ring_peek will be changed to rte_ring_sc_peek
> >
> > > - rte_ring_serial_dequeue()
> > >
> > > >
> > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > provide
> > > > 2 #defines, one for dynamically variable size defer queue and the
> > > > other for
> > > constant size defer queue.
> > > >
> > > > However, IMO, using per thread defer queue is a much simpler way
> > > > to
> > > achieve 2. It does not add any significant burden to the user either.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > +{
> > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > +	unsigned int n = 1;
> > > > > > > > +	if (count) {
> > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > > +		return 0;
> > > > > > > > +	}
> > > > > > > > +	return -ENOENT;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  #ifdef __cplusplus
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.17.1
  
Ananyev, Konstantin Oct. 13, 2019, 8:09 p.m. UTC | #9
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > > >
> > > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > >
> > > > > > > > > The peek API allows fetching the next available object in
> > > > > > > > > the ring without dequeuing it. This helps in scenarios
> > > > > > > > > where dequeuing of objects depend on their value.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > > ---
> > > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18
> > > > > > > > > 100644
> > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > > rte_ring *r, void
> > > > > > > > **obj_table,
> > > > > > > > >  				r->cons.single, available);  }
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * Peek one object from a ring.
> > > > > > > > > + *
> > > > > > > > > + * The peek API allows fetching the next available object
> > > > > > > > > +in the ring
> > > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > > +safe with respect
> > > > > > > > > + * to other consumer threads.
> > > > > > > > > + *
> > > > > > > > > + * @param r
> > > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > > + * @param obj_p
> > > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > > + * @return
> > > > > > > > > + *   - 0: Success, object available
> > > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > > + */
> > > > > > > > > +__rte_experimental
> > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > > >
> > > > > > > > As it is not MT safe, then I think we need _sc_ in the name,
> > > > > > > > to follow other rte_ring functions naming conventions
> > > > > > > > (rte_ring_sc_peek() or so).
> > > > > > > Agree
> > > > > > >
> > > > > > > >
> > > > > > > > As a better alternative what do you think about introducing
> > > > > > > > a serialized versions of DPDK rte_ring dequeue functions?
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > /* same as original ring dequeue, but:
> > > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > > >   * 2) don't update cons.tail
> > > > > > > >   */
> > > > > > > > unsigned int
> > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > > **obj_table, unsigned int n,
> > > > > > > >                 unsigned int *available);
> > > > > > > >
> > > > > > > > /* sets both cons.head and cons.tail to cons.head + num */
> > > > > > > > void rte_ring_serial_dequeue_finish(struct rte_ring *r,
> > > > > > > > uint32_t num);
> > > > > > > >
> > > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > > >
> > > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > > >
> > > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl,
> > > > > > > > n; uintptr_t elt[nb_elt]; ...
> > > > > > > >
> > > > > > > > do {
> > > > > > > >
> > > > > > > >   /* read next elem from the queue */
> > > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > > >   if (n == 0)
> > > > > > > >       break;
> > > > > > > >
> > > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > > elt[0]) != 1) {
> > > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > > >      break;
> > > > > > > >   }
> > > > > > > >
> > > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > > >
> > > > > > > >    /*call reclaim function */
> > > > > > > >   dr->f(dr->p, elt);
> > > > > > > >
> > > > > > > > } while (avl >= nb_elt);
> > > > > > > >
> > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > > > >
> > > > > > > I think it is a great idea. The other writers would still be
> > > > > > > polling for the current writer to update the tail or update
> > > > > > > the head. This makes it a
> > > > > > blocking solution.
> > > > > >
> > > > > > Yep, it is a blocking one.
> > > > > >
> > > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > > reclaiming if they
> > > > > > see that other writers are dequeuing from the queue.
> > > > > >
> > > > > > Actually didn't think about that possibility, but yes should be
> > > > > > possible to have _try_ semantics too.
> > > > > >
> > > > > > >The other  way is to use per thread queues.
> > > > > > >
> > > > > > > The other requirement I see is to support unbounded-size data
> > > > > > > structures where in the data structures do not have a
> > > > > > > pre-determined number of entries. Also, currently the defer
> > > > > > > queue size is equal to the total
> > > > > > number of entries in a given data structure. There are plans to
> > > > > > support dynamically resizable defer queue. This means, memory
> > > > > > allocation which will affect the lock-free-ness of the solution.
> > > > > > >
> > > > > > > So, IMO:
> > > > > > > 1) The API should provide the capability to support different
> > > > > > > algorithms -
> > > > > > may be through some flags?
> > > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > > cons-tail is also the same, skip
> > > > > > polling). So, we should probably implement a ring with-in the RCU
> > library?
> > > > > >
> > > > > > Personally, I think such serialization ring API would be useful
> > > > > > for other cases too.
> > > > > > There are few cases when user need to read contents of the queue
> > > > > > without removing elements from it.
> > > > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > > > transmit queue.
> > > > > > If such API would exist in DPDK we can just use it straightway,
> > > > > > without maintaining a separate one.
> > > > > ok
> > > > >
> > > > > >
> > > > > > >
> > > > > > > From the timeline perspective, adding all these capabilities
> > > > > > > would be difficult to get done with in 19.11 timeline. What I
> > > > > > > have here satisfies my current needs. I suggest that we make
> > > > > > > provisions in APIs now to
> > > > > > support all these features, but do the implementation in the
> > > > > > coming
> > > > releases.
> > > > > > Does this sound ok for you?
> > > > > >
> > > > > > Not sure I understand your suggestion here...
> > > > > > Could you explain it a bit more - how new API will look like and
> > > > > > what would be left for the future.
> > > > > For this patch, I suggest we do not add any more complexity. If
> > > > > someone wants a lock-free/block-free mechanism, it is available by
> > > > > creating
> > > > per thread defer queues.
> > > > >
> > > > > We push the following to the future:
> > > > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > > > lock-free/block-free reclamation will not be available (memory
> > > > > allocation
> > > > requires locking). The memory for the defer queue will be
> > > > allocated/freed in chunks of 'size' elements as the queue grows/shrinks.
> > > >
> > > > That one is fine by me.
> > > > In fact I don't know would be there a real use-case for dynamic
> > > > defer queue for rcu var...
> > > > But I suppose that's subject for another discussion.
> > > Currently, the defer queue size is equal to the number of resources in
> > > the data structure. This is unnecessary as the reclamation is done regularly.
> > > If a smaller queue size is used, the queue might get full (even after
> > reclamation), in which case, the queue size should be increased.
> >
> > I understand the intention.
> > Though I am not very happy with approach where to free one resource we first
> > have to allocate another one.
> > Sounds like a source of deadlocks and for that case probably unnecessary
> > complication.
> It depends on the use case. For some use cases lock-free reader-writer concurrency is enough (in which case there is no need to have a
> queue large enough to hold all the resources) and some would require lock-free reader-writer and writer-writer concurrency (where,
> theoretically, a queue large enough to hold all the resources would be required).
> 
> > But again, as it is not for 19.11 we don't have to discuss it now.
> >
> > > >
> > > > >
> > > > > 2) Constant size defer queue with lock-free and block-free
> > > > > reclamation (single option). The defer queue will be of fixed
> > > > > length 'size'. If the queue gets full an error is returned. The
> > > > > user could provide a 'size' equal
> > > > to the number of elements in a data structure to ensure queue never gets
> > full.
> > > >
> > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > > - MP/MC
> > > > - MP/SC
> > > > - SP/SC
> > > Just SP/SC
> >
> > Ok, just to confirm we are on the same page:
> > there would be a possibility for one thread do dq_enqueue(), second one do
> > dq_reclaim() simultaneously (of course if actual reclamation function is thread
> > safe)?
> Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.

Ok, and that probably due to nature of ring_sc_peek(), right?.
BuT user can set reclaim threshold higher then number of elems in the defere queue,
and that should help to prevent dq_reclaim() from inside dq_enqueue(), correct?
If so, I have no objections in general to the proposed plan.
Konstantin

> 
> >
> > > > - non MT at all (only same single thread can do enqueue and dequeue)
> > > If MT safe is required, one should use 1 defer queue per thread for now.
> > >
> > > >
> > > > And related question:
> > > > What additional rte_ring API you plan to introduce in that case?
> > > > - None
> > > > - rte_ring_sc_peek()
> > > rte_ring_peek will be changed to rte_ring_sc_peek
> > >
> > > > - rte_ring_serial_dequeue()
> > > >
> > > > >
> > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > > provide
> > > > > 2 #defines, one for dynamically variable size defer queue and the
> > > > > other for
> > > > constant size defer queue.
> > > > >
> > > > > However, IMO, using per thread defer queue is a much simpler way
> > > > > to
> > > > achieve 2. It does not add any significant burden to the user either.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > +{
> > > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > > +	unsigned int n = 1;
> > > > > > > > > +	if (count) {
> > > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > > > +		return 0;
> > > > > > > > > +	}
> > > > > > > > > +	return -ENOENT;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  #ifdef __cplusplus
> > > > > > > > >  }
> > > > > > > > >  #endif
> > > > > > > > > --
> > > > > > > > > 2.17.1
  
Honnappa Nagarahalli Oct. 14, 2019, 4:11 a.m. UTC | #10
<snip>

> > > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > > > >
> > > > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > >
> > > > > > > > > > The peek API allows fetching the next available object
> > > > > > > > > > in the ring without dequeuing it. This helps in
> > > > > > > > > > scenarios where dequeuing of objects depend on their value.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Dharmik Thakkar
> > > > > > > > > > <dharmik.thakkar@arm.com>
> > > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > > > ---
> > > > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > > b/lib/librte_ring/rte_ring.h index
> > > > > > > > > > 2a9f768a1..d3d0d5e18
> > > > > > > > > > 100644
> > > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > > > rte_ring *r, void
> > > > > > > > > **obj_table,
> > > > > > > > > >  				r->cons.single, available);  }
> > > > > > > > > >
> > > > > > > > > > +/**
> > > > > > > > > > + * Peek one object from a ring.
> > > > > > > > > > + *
> > > > > > > > > > + * The peek API allows fetching the next available
> > > > > > > > > > +object in the ring
> > > > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > > > +safe with respect
> > > > > > > > > > + * to other consumer threads.
> > > > > > > > > > + *
> > > > > > > > > > + * @param r
> > > > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > > > + * @param obj_p
> > > > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > > > + * @return
> > > > > > > > > > + *   - 0: Success, object available
> > > > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > > > + */
> > > > > > > > > > +__rte_experimental
> > > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > > > >
> > > > > > > > > As it is not MT safe, then I think we need _sc_ in the
> > > > > > > > > name, to follow other rte_ring functions naming
> > > > > > > > > conventions
> > > > > > > > > (rte_ring_sc_peek() or so).
> > > > > > > > Agree
> > > > > > > >
> > > > > > > > >
> > > > > > > > > As a better alternative what do you think about
> > > > > > > > > introducing a serialized versions of DPDK rte_ring dequeue
> functions?
> > > > > > > > > Something like that:
> > > > > > > > >
> > > > > > > > > /* same as original ring dequeue, but:
> > > > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > > > >   * 2) don't update cons.tail
> > > > > > > > >   */
> > > > > > > > > unsigned int
> > > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > > > **obj_table, unsigned int n,
> > > > > > > > >                 unsigned int *available);
> > > > > > > > >
> > > > > > > > > /* sets both cons.head and cons.tail to cons.head + num
> > > > > > > > > */ void rte_ring_serial_dequeue_finish(struct rte_ring
> > > > > > > > > *r, uint32_t num);
> > > > > > > > >
> > > > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > > > >
> > > > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > > > >
> > > > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t
> > > > > > > > > avl, n; uintptr_t elt[nb_elt]; ...
> > > > > > > > >
> > > > > > > > > do {
> > > > > > > > >
> > > > > > > > >   /* read next elem from the queue */
> > > > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > > > >   if (n == 0)
> > > > > > > > >       break;
> > > > > > > > >
> > > > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > > > elt[0]) != 1) {
> > > > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > > > >      break;
> > > > > > > > >   }
> > > > > > > > >
> > > > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > > > >
> > > > > > > > >    /*call reclaim function */
> > > > > > > > >   dr->f(dr->p, elt);
> > > > > > > > >
> > > > > > > > > } while (avl >= nb_elt);
> > > > > > > > >
> > > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT
> safe.
> > > > > > > > > As long as actual reclamation callback itself is MT safe of
> course.
> > > > > > > >
> > > > > > > > I think it is a great idea. The other writers would still
> > > > > > > > be polling for the current writer to update the tail or
> > > > > > > > update the head. This makes it a
> > > > > > > blocking solution.
> > > > > > >
> > > > > > > Yep, it is a blocking one.
> > > > > > >
> > > > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > > > reclaiming if they
> > > > > > > see that other writers are dequeuing from the queue.
> > > > > > >
> > > > > > > Actually didn't think about that possibility, but yes should
> > > > > > > be possible to have _try_ semantics too.
> > > > > > >
> > > > > > > >The other  way is to use per thread queues.
> > > > > > > >
> > > > > > > > The other requirement I see is to support unbounded-size
> > > > > > > > data structures where in the data structures do not have a
> > > > > > > > pre-determined number of entries. Also, currently the
> > > > > > > > defer queue size is equal to the total
> > > > > > > number of entries in a given data structure. There are plans
> > > > > > > to support dynamically resizable defer queue. This means,
> > > > > > > memory allocation which will affect the lock-free-ness of the
> solution.
> > > > > > > >
> > > > > > > > So, IMO:
> > > > > > > > 1) The API should provide the capability to support
> > > > > > > > different algorithms -
> > > > > > > may be through some flags?
> > > > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > > > cons-tail is also the same, skip
> > > > > > > polling). So, we should probably implement a ring with-in
> > > > > > > the RCU
> > > library?
> > > > > > >
> > > > > > > Personally, I think such serialization ring API would be
> > > > > > > useful for other cases too.
> > > > > > > There are few cases when user need to read contents of the
> > > > > > > queue without removing elements from it.
> > > > > > > Let say we do use similar approach inside TLDK to implement
> > > > > > > TCP transmit queue.
> > > > > > > If such API would exist in DPDK we can just use it
> > > > > > > straightway, without maintaining a separate one.
> > > > > > ok
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > From the timeline perspective, adding all these
> > > > > > > > capabilities would be difficult to get done with in 19.11
> > > > > > > > timeline. What I have here satisfies my current needs. I
> > > > > > > > suggest that we make provisions in APIs now to
> > > > > > > support all these features, but do the implementation in the
> > > > > > > coming
> > > > > releases.
> > > > > > > Does this sound ok for you?
> > > > > > >
> > > > > > > Not sure I understand your suggestion here...
> > > > > > > Could you explain it a bit more - how new API will look like
> > > > > > > and what would be left for the future.
> > > > > > For this patch, I suggest we do not add any more complexity.
> > > > > > If someone wants a lock-free/block-free mechanism, it is
> > > > > > available by creating
> > > > > per thread defer queues.
> > > > > >
> > > > > > We push the following to the future:
> > > > > > 1) Dynamically size adjustable defer queue. IMO, with this,
> > > > > > the lock-free/block-free reclamation will not be available
> > > > > > (memory allocation
> > > > > requires locking). The memory for the defer queue will be
> > > > > allocated/freed in chunks of 'size' elements as the queue
> grows/shrinks.
> > > > >
> > > > > That one is fine by me.
> > > > > In fact I don't know would be there a real use-case for dynamic
> > > > > defer queue for rcu var...
> > > > > But I suppose that's subject for another discussion.
> > > > Currently, the defer queue size is equal to the number of
> > > > resources in the data structure. This is unnecessary as the reclamation is
> done regularly.
> > > > If a smaller queue size is used, the queue might get full (even
> > > > after
> > > reclamation), in which case, the queue size should be increased.
> > >
> > > I understand the intention.
> > > Though I am not very happy with approach where to free one resource
> > > we first have to allocate another one.
> > > Sounds like a source of deadlocks and for that case probably
> > > unnecessary complication.
> > It depends on the use case. For some use cases lock-free reader-writer
> > concurrency is enough (in which case there is no need to have a queue
> > large enough to hold all the resources) and some would require lock-free
> reader-writer and writer-writer concurrency (where, theoretically, a queue
> large enough to hold all the resources would be required).
> >
> > > But again, as it is not for 19.11 we don't have to discuss it now.
> > >
> > > > >
> > > > > >
> > > > > > 2) Constant size defer queue with lock-free and block-free
> > > > > > reclamation (single option). The defer queue will be of fixed
> > > > > > length 'size'. If the queue gets full an error is returned.
> > > > > > The user could provide a 'size' equal
> > > > > to the number of elements in a data structure to ensure queue
> > > > > never gets
> > > full.
> > > > >
> > > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > > > - MP/MC
> > > > > - MP/SC
> > > > > - SP/SC
> > > > Just SP/SC
> > >
> > > Ok, just to confirm we are on the same page:
> > > there would be a possibility for one thread do dq_enqueue(), second
> > > one do
> > > dq_reclaim() simultaneously (of course if actual reclamation
> > > function is thread safe)?
> > Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.
This is not completely correct (as you have pointed out below) as dq_enqueue, will end up calling do_reclaim
> 
> Ok, and that probably due to nature of ring_sc_peek(), right?.
> BuT user can set reclaim threshold higher then number of elems in the defere
> queue, and that should help to prevent dq_reclaim() from inside
> dq_enqueue(), correct?
Yes, that is possible.

> If so, I have no objections in general to the proposed plan.
> Konstantin
> 
> >
> > >
> > > > > - non MT at all (only same single thread can do enqueue and
> > > > > dequeue)
> > > > If MT safe is required, one should use 1 defer queue per thread for now.
> > > >
> > > > >
> > > > > And related question:
> > > > > What additional rte_ring API you plan to introduce in that case?
> > > > > - None
> > > > > - rte_ring_sc_peek()
> > > > rte_ring_peek will be changed to rte_ring_sc_peek
> > > >
> > > > > - rte_ring_serial_dequeue()
> > > > >
> > > > > >
> > > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > > > provide
> > > > > > 2 #defines, one for dynamically variable size defer queue and
> > > > > > the other for
> > > > > constant size defer queue.
> > > > > >
> > > > > > However, IMO, using per thread defer queue is a much simpler
> > > > > > way to
> > > > > achieve 2. It does not add any significant burden to the user either.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +{
> > > > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > > > +	unsigned int n = 1;
> > > > > > > > > > +	if (count) {
> > > > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n,
> void *);
> > > > > > > > > > +		return 0;
> > > > > > > > > > +	}
> > > > > > > > > > +	return -ENOENT;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  #ifdef __cplusplus
> > > > > > > > > >  }
> > > > > > > > > >  #endif
> > > > > > > > > > --
> > > > > > > > > > 2.17.1
  

Patch

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@  rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif