[dpdk-dev] 答复: [PATCH] Add user defined tag calculation callback tolibrte_distributor.

Message ID CAHVfvh7ggGB_q1Rs1c3-9PRwDr_GKA+etaMXRSeKCfUKoUx8hQ@mail.gmail.com (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Qinglai Xiao Nov. 7, 2014, 2:31 p.m. UTC
  Hi Bruce,

Pls have a quick look at the diff to see if this is exactly what you mean
about the bitmask.
I just wrote it without even compiling, just to express the idea. So it may
leave some places unpatched.
If this is agreed, I will make a decent test to verify it before sending
the patch for RFC.

        union rte_distributor_buffer bufs[RTE_MAX_LCORE];
@@ -188,6 +190,7 @@ static inline void
 handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
 {
        d->in_flight_tags[wkr] = 0;
+       d->in_flight_mask &= ~(1 << wkr);
        d->bufs[wkr].bufptr64 = 0;
        if (unlikely(d->backlog[wkr].count != 0)) {
                /* On return of a packet, we need to move the
@@ -241,6 +244,7 @@ process_returns(struct rte_distributor *d)
                        else {
                                d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
                                d->in_flight_tags[wkr] = 0;
+                               d->in_flight_mask &= ~(1 << wkr);
                        }
                        oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
                } else if (data & RTE_DISTRIB_RETURN_BUF) {
@@ -282,12 +286,13 @@ rte_distributor_process(struct rte_distributor *d,
                        next_mb = mbufs[next_idx++];
                        next_value = (((int64_t)(uintptr_t)next_mb)
                                        << RTE_DISTRIB_FLAG_BITS);
-                       new_tag = (next_mb->hash.rss | 1);
+                       new_tag = next_mb->hash.rss;

                        uint32_t match = 0;
                        unsigned i;
                        for (i = 0; i < d->num_workers; i++)
-                               match |= (!(d->in_flight_tags[i] ^ new_tag)
+                               match |= (((!(d->in_flight_tags[i] ^
new_tag)) &
+                                               (d->in_flight_bitmask >> i))
                                        << i);

                        if (match) {
@@ -309,6 +314,7 @@ rte_distributor_process(struct rte_distributor *d,
                        else {
                                d->bufs[wkr].bufptr64 = next_value;
                                d->in_flight_tags[wkr] = new_tag;
+                               d->in_flight_bitmask |= 1 << wkr;
                                next_mb = NULL;
                        }
                        oldbuf = data >> RTE_DISTRIB_FLAG_BITS;



On Fri, Nov 7, 2014 at 3:53 PM, Bruce Richardson <bruce.richardson@intel.com
> wrote:

> On Fri, Nov 07, 2014 at 02:38:13PM +0200, jigsaw wrote:
> > Hi Bruce,
> >
> > >>  If a tag value of zero is ever passed in, then it will start matching
> > against cores which are not doing any processing.
> >
> > Yes, this is true according to current bookkeeping of inflight tags.
> >
> > But if the slot in in_flight_tags is not a uint32_t but a struct which
> has
> > a filed as indication of "on/off", and also with corresponding changes in
> > looking for a matched tag, then the need for 1 bit mask can be
> eliminated.
> > Of course this change requires a little bit more, O(n), memory space and
> > costs O(n) more branch misses. But the benefit is a more free interface
> to
> > user app.
> >
> > This is just another trade-off. Since I am in need of such freedom, I am
> > more interested in the free use of 32bits.
>
> If you do implement such a change, I would suggest you simply add a bitmask
> to the distributor indicating valid workers. Then when we do the check
> for tag matches, we just need an extra "and" instruction to eliminate
> invalid
> workers from the match.
>
> /Bruce
>
> >
> > thx &
> > rgds,
> > -qinglai
> >
> >
> > On Fri, Nov 7, 2014 at 11:45 AM, Bruce Richardson <
> > bruce.richardson@intel.com> wrote:
> >
> > > On Thu, Nov 06, 2014 at 09:52:25PM +0200, jigsaw wrote:
> > > > Hi Bruce,
> > > >
> > > > Actually IMHO it is good to leave the freedom to user to decide how
> to
> > > > interpret the tag value, i.e. remove the OR 1 bit.
> > > > If the tag value is zero, then we assume the programmer know what he
> is
> > > > doing. Of course this shall be clearly documented in the
> comment/doxgen.
> > > >
> > > >
> > > > thx &
> > > > rgds,
> > > > -qinglai
> > >
> > > I don't believe that will work. If a tag value of zero is ever passed
> > > in, then it will start matching against cores which are not doing any
> > > processing. Then it will get queued up to get sent to those cores, and
> so
> > > never get processed.
> > > We need a bit somewhere inside the tag to permanently set - though it
> can
> > > be configurable.
> > >
> > > /Bruce
> > >
> > > >
> > > > On Thu, Nov 6, 2014 at 8:01 PM, jigsaw <jigsaw@gmail.com> wrote:
> > > >
> > > > > Hi Bruce,
> > > > >
> > > > > In my use case, unfortunately the tag is not hash. And the tag can
> be
> > > on
> > > > > either low or high bits, depending on configuration.
> > > > > I wonder if it is possible to let the user to decide which bit to
> mask,
> > > > > i.e. to add another param to rte_distributor_create to define the
> mask.
> > > > >
> > > > > thx &
> > > > > rgds,
> > > > > -qinglai
> > > > >
> > > > > On Thu, Nov 6, 2014 at 3:59 PM, Bruce Richardson <
> > > > > bruce.richardson@intel.com> wrote:
> > > > >
> > > > >> On Thu, Nov 06, 2014 at 02:36:09PM +0200, Qinglai Xiao wrote:
> > > > >> > Hi Bruce,
> > > > >> >
> > > > >> > There is a subtle case in which tag values are 2 and 3,
> > > respectively.
> > > > >> Then these two tags cannot be distinguished. There should be a
> better
> > > way
> > > > >> so as to handle this situation.
> > > > >>
> > > > >> It's not just in that, case, it's in any case where a pair of tags
> > > differ
> > > > >> by
> > > > >> only a single bit. I've been assuming that the tag is likely to
> be a
> > > hash
> > > > >> value in most cases - given that it's only 32-bit - in which case
> it
> > > just
> > > > >> doesn't
> > > > >> matter which bit we chose to permanently set to 1, but if there
> are
> > > > >> scenarios
> > > > >> where it's likely that the low bits are used but the high ones not
> > > so, we
> > > > >> can
> > > > >> look to change which bit is set to 1. Either way, the distributor
> just
> > > > >> uses a
> > > > >> 31-bit tag rather than a 32-bit one.
> > > > >>
> > > > >> /Bruce
> > > > >>
> > > > >> >
> > > > >> > thx &
> > > > >> > rgds
> > > > >> > -qinglai
> > > > >> >
> > > > >> > -----原始邮件-----
> > > > >> > 发件人: "Thomas Monjalon" <thomas.monjalon@6wind.com>
> > > > >> > 发送时间: ‎2014/‎11/‎6 12:36
> > > > >> > 收件人: "Bruce Richardson" <bruce.richardson@intel.com>
> > > > >> > 抄送: "dev@dpdk.org" <dev@dpdk.org>; "jigsaw" <jigsaw@gmail.com>
> > > > >> > 主题: Re: [dpdk-dev] [PATCH] Add user defined tag calculation
> callback
> > > > >> tolibrte_distributor.
> > > > >> >
> > > > >> > 2014-11-06 09:22, Bruce Richardson:
> > > > >> > > On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > > > >> > > >
> > > > >>
> > >
> http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> > > > >> > > >
> > > > >> > > >         new_tag = (next_mb->hash.rss | 1);
> > > > >> > > >
> > > > >> > > > Why the logical OR is needed?
> > > > >> > >
> > > > >> > > That's needed to ensure that we never track a tag with an
> actual
> > > > >> value of zero.
> > > > >> > > We instead always force the low bit to be 1, so that we can
> use
> > > zero
> > > > >> as an
> > > > >> > > "empty" value.
> > > > >> >
> > > > >> > Bruce, could you check how this code may be better commented
> please?
> > > > >> > This discussion shows that the distributor library probably
> needs
> > > more
> > > > >> > explanations in the code or doxygen.
> > > > >> >
> > > > >> > Thanks
> > > > >> > --
> > > > >> > Thomas
> > > > >>
> > > > >
> > > > >
> > >
>
  

Comments

Bruce Richardson Nov. 7, 2014, 2:44 p.m. UTC | #1
On Fri, Nov 07, 2014 at 04:31:18PM +0200, jigsaw wrote:
> Hi Bruce,
> 
> Pls have a quick look at the diff to see if this is exactly what you mean
> about the bitmask.
> I just wrote it without even compiling, just to express the idea. So it may
> leave some places unpatched.
> If this is agreed, I will make a decent test to verify it before sending
> the patch for RFC.
> 
> diff --git a/lib/librte_distributor/rte_distributor.c
> b/lib/librte_distributor/rte_di
> index 585ff88..d606bcf 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -92,6 +92,8 @@ struct rte_distributor {
>         unsigned num_workers;                 /**< Number of workers
> polling */
> 
>         uint32_t in_flight_tags[RTE_MAX_LCORE];
> +       uint32_t in_flight_bitmask;
> +
>         struct rte_distributor_backlog backlog[RTE_MAX_LCORE];
> 
>         union rte_distributor_buffer bufs[RTE_MAX_LCORE];
> @@ -188,6 +190,7 @@ static inline void
>  handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
>  {
>         d->in_flight_tags[wkr] = 0;
> +       d->in_flight_mask &= ~(1 << wkr);
>         d->bufs[wkr].bufptr64 = 0;
>         if (unlikely(d->backlog[wkr].count != 0)) {
>                 /* On return of a packet, we need to move the
> @@ -241,6 +244,7 @@ process_returns(struct rte_distributor *d)
>                         else {
>                                 d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
>                                 d->in_flight_tags[wkr] = 0;
> +                               d->in_flight_mask &= ~(1 << wkr);
>                         }
>                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
>                 } else if (data & RTE_DISTRIB_RETURN_BUF) {
> @@ -282,12 +286,13 @@ rte_distributor_process(struct rte_distributor *d,
>                         next_mb = mbufs[next_idx++];
>                         next_value = (((int64_t)(uintptr_t)next_mb)
>                                         << RTE_DISTRIB_FLAG_BITS);
> -                       new_tag = (next_mb->hash.rss | 1);
> +                       new_tag = next_mb->hash.rss;
> 
>                         uint32_t match = 0;
>                         unsigned i;
>                         for (i = 0; i < d->num_workers; i++)
> -                               match |= (!(d->in_flight_tags[i] ^ new_tag)
> +                               match |= (((!(d->in_flight_tags[i] ^
> new_tag)) &
> +                                               (d->in_flight_bitmask >> i))

I would not do the bitmask comparison here, as that's extra instruction in the
loop. Instead, because its a bitmask, build up the match variable as it was
before, and then just do a single and operation afterwards, outside the loop
body.

/Bruce

>                                         << i);
> 
>                         if (match) {
> @@ -309,6 +314,7 @@ rte_distributor_process(struct rte_distributor *d,
>                         else {
>                                 d->bufs[wkr].bufptr64 = next_value;
>                                 d->in_flight_tags[wkr] = new_tag;
> +                               d->in_flight_bitmask |= 1 << wkr;
>                                 next_mb = NULL;
>                         }
>                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> 
> 
>
  
Qinglai Xiao Nov. 7, 2014, 2:52 p.m. UTC | #2
Yeah that's better. As below, right?

@@ -290,6 +294,7 @@ rte_distributor_process(struct rte_distributor *d,
                                match |= (!(d->in_flight_tags[i] ^ new_tag)
                                        << i);

+                       match &= d->in_flight_bitmask;
                        if (match) {
                                next_mb = NULL;
                                unsigned worker = __builtin_ctz(match);


On Fri, Nov 7, 2014 at 4:44 PM, Bruce Richardson <bruce.richardson@intel.com
> wrote:

> On Fri, Nov 07, 2014 at 04:31:18PM +0200, jigsaw wrote:
> > Hi Bruce,
> >
> > Pls have a quick look at the diff to see if this is exactly what you mean
> > about the bitmask.
> > I just wrote it without even compiling, just to express the idea. So it
> may
> > leave some places unpatched.
> > If this is agreed, I will make a decent test to verify it before sending
> > the patch for RFC.
> >
> > diff --git a/lib/librte_distributor/rte_distributor.c
> > b/lib/librte_distributor/rte_di
> > index 585ff88..d606bcf 100644
> > --- a/lib/librte_distributor/rte_distributor.c
> > +++ b/lib/librte_distributor/rte_distributor.c
> > @@ -92,6 +92,8 @@ struct rte_distributor {
> >         unsigned num_workers;                 /**< Number of workers
> > polling */
> >
> >         uint32_t in_flight_tags[RTE_MAX_LCORE];
> > +       uint32_t in_flight_bitmask;
> > +
> >         struct rte_distributor_backlog backlog[RTE_MAX_LCORE];
> >
> >         union rte_distributor_buffer bufs[RTE_MAX_LCORE];
> > @@ -188,6 +190,7 @@ static inline void
> >  handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
> >  {
> >         d->in_flight_tags[wkr] = 0;
> > +       d->in_flight_mask &= ~(1 << wkr);
> >         d->bufs[wkr].bufptr64 = 0;
> >         if (unlikely(d->backlog[wkr].count != 0)) {
> >                 /* On return of a packet, we need to move the
> > @@ -241,6 +244,7 @@ process_returns(struct rte_distributor *d)
> >                         else {
> >                                 d->bufs[wkr].bufptr64 =
> RTE_DISTRIB_GET_BUF;
> >                                 d->in_flight_tags[wkr] = 0;
> > +                               d->in_flight_mask &= ~(1 << wkr);
> >                         }
> >                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> >                 } else if (data & RTE_DISTRIB_RETURN_BUF) {
> > @@ -282,12 +286,13 @@ rte_distributor_process(struct rte_distributor *d,
> >                         next_mb = mbufs[next_idx++];
> >                         next_value = (((int64_t)(uintptr_t)next_mb)
> >                                         << RTE_DISTRIB_FLAG_BITS);
> > -                       new_tag = (next_mb->hash.rss | 1);
> > +                       new_tag = next_mb->hash.rss;
> >
> >                         uint32_t match = 0;
> >                         unsigned i;
> >                         for (i = 0; i < d->num_workers; i++)
> > -                               match |= (!(d->in_flight_tags[i] ^
> new_tag)
> > +                               match |= (((!(d->in_flight_tags[i] ^
> > new_tag)) &
> > +                                               (d->in_flight_bitmask >>
> i))
>
> I would not do the bitmask comparison here, as that's extra instruction in
> the
> loop. Instead, because its a bitmask, build up the match variable as it was
> before, and then just do a single and operation afterwards, outside the
> loop
> body.
>
> /Bruce
>
> >                                         << i);
> >
> >                         if (match) {
> > @@ -309,6 +314,7 @@ rte_distributor_process(struct rte_distributor *d,
> >                         else {
> >                                 d->bufs[wkr].bufptr64 = next_value;
> >                                 d->in_flight_tags[wkr] = new_tag;
> > +                               d->in_flight_bitmask |= 1 << wkr;
> >                                 next_mb = NULL;
> >                         }
> >                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> >
> >
> >
>
  
Bruce Richardson Nov. 7, 2014, 3:04 p.m. UTC | #3
On Fri, Nov 07, 2014 at 04:52:46PM +0200, jigsaw wrote:
> Yeah that's better. As below, right?

Yep.

> 
> @@ -290,6 +294,7 @@ rte_distributor_process(struct rte_distributor *d,
>                                 match |= (!(d->in_flight_tags[i] ^ new_tag)
>                                         << i);
> 
> +                       match &= d->in_flight_bitmask;
>                         if (match) {
>                                 next_mb = NULL;
>                                 unsigned worker = __builtin_ctz(match);
> 
> 
> On Fri, Nov 7, 2014 at 4:44 PM, Bruce Richardson <bruce.richardson@intel.com
> > wrote:
> 
> > On Fri, Nov 07, 2014 at 04:31:18PM +0200, jigsaw wrote:
> > > Hi Bruce,
> > >
> > > Pls have a quick look at the diff to see if this is exactly what you mean
> > > about the bitmask.
> > > I just wrote it without even compiling, just to express the idea. So it
> > may
> > > leave some places unpatched.
> > > If this is agreed, I will make a decent test to verify it before sending
> > > the patch for RFC.
> > >
> > > diff --git a/lib/librte_distributor/rte_distributor.c
> > > b/lib/librte_distributor/rte_di
> > > index 585ff88..d606bcf 100644
> > > --- a/lib/librte_distributor/rte_distributor.c
> > > +++ b/lib/librte_distributor/rte_distributor.c
> > > @@ -92,6 +92,8 @@ struct rte_distributor {
> > >         unsigned num_workers;                 /**< Number of workers
> > > polling */
> > >
> > >         uint32_t in_flight_tags[RTE_MAX_LCORE];
> > > +       uint32_t in_flight_bitmask;
> > > +
> > >         struct rte_distributor_backlog backlog[RTE_MAX_LCORE];
> > >
> > >         union rte_distributor_buffer bufs[RTE_MAX_LCORE];
> > > @@ -188,6 +190,7 @@ static inline void
> > >  handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
> > >  {
> > >         d->in_flight_tags[wkr] = 0;
> > > +       d->in_flight_mask &= ~(1 << wkr);
> > >         d->bufs[wkr].bufptr64 = 0;
> > >         if (unlikely(d->backlog[wkr].count != 0)) {
> > >                 /* On return of a packet, we need to move the
> > > @@ -241,6 +244,7 @@ process_returns(struct rte_distributor *d)
> > >                         else {
> > >                                 d->bufs[wkr].bufptr64 =
> > RTE_DISTRIB_GET_BUF;
> > >                                 d->in_flight_tags[wkr] = 0;
> > > +                               d->in_flight_mask &= ~(1 << wkr);
> > >                         }
> > >                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> > >                 } else if (data & RTE_DISTRIB_RETURN_BUF) {
> > > @@ -282,12 +286,13 @@ rte_distributor_process(struct rte_distributor *d,
> > >                         next_mb = mbufs[next_idx++];
> > >                         next_value = (((int64_t)(uintptr_t)next_mb)
> > >                                         << RTE_DISTRIB_FLAG_BITS);
> > > -                       new_tag = (next_mb->hash.rss | 1);
> > > +                       new_tag = next_mb->hash.rss;
> > >
> > >                         uint32_t match = 0;
> > >                         unsigned i;
> > >                         for (i = 0; i < d->num_workers; i++)
> > > -                               match |= (!(d->in_flight_tags[i] ^
> > new_tag)
> > > +                               match |= (((!(d->in_flight_tags[i] ^
> > > new_tag)) &
> > > +                                               (d->in_flight_bitmask >>
> > i))
> >
> > I would not do the bitmask comparison here, as that's extra instruction in
> > the
> > loop. Instead, because its a bitmask, build up the match variable as it was
> > before, and then just do a single and operation afterwards, outside the
> > loop
> > body.
> >
> > /Bruce
> >
> > >                                         << i);
> > >
> > >                         if (match) {
> > > @@ -309,6 +314,7 @@ rte_distributor_process(struct rte_distributor *d,
> > >                         else {
> > >                                 d->bufs[wkr].bufptr64 = next_value;
> > >                                 d->in_flight_tags[wkr] = new_tag;
> > > +                               d->in_flight_bitmask |= 1 << wkr;
> > >                                 next_mb = NULL;
> > >                         }
> > >                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> > >
> > >
> > >
> >
  
Qinglai Xiao Nov. 7, 2014, 3:18 p.m. UTC | #4
OK thanks Bruce. I will get the patch done in coming week. -qinglai

On Fri, Nov 7, 2014 at 5:04 PM, Bruce Richardson <bruce.richardson@intel.com
> wrote:

> On Fri, Nov 07, 2014 at 04:52:46PM +0200, jigsaw wrote:
> > Yeah that's better. As below, right?
>
> Yep.
>
> >
> > @@ -290,6 +294,7 @@ rte_distributor_process(struct rte_distributor *d,
> >                                 match |= (!(d->in_flight_tags[i] ^
> new_tag)
> >                                         << i);
> >
> > +                       match &= d->in_flight_bitmask;
> >                         if (match) {
> >                                 next_mb = NULL;
> >                                 unsigned worker = __builtin_ctz(match);
> >
> >
> > On Fri, Nov 7, 2014 at 4:44 PM, Bruce Richardson <
> bruce.richardson@intel.com
> > > wrote:
> >
> > > On Fri, Nov 07, 2014 at 04:31:18PM +0200, jigsaw wrote:
> > > > Hi Bruce,
> > > >
> > > > Pls have a quick look at the diff to see if this is exactly what you
> mean
> > > > about the bitmask.
> > > > I just wrote it without even compiling, just to express the idea. So
> it
> > > may
> > > > leave some places unpatched.
> > > > If this is agreed, I will make a decent test to verify it before
> sending
> > > > the patch for RFC.
> > > >
> > > > diff --git a/lib/librte_distributor/rte_distributor.c
> > > > b/lib/librte_distributor/rte_di
> > > > index 585ff88..d606bcf 100644
> > > > --- a/lib/librte_distributor/rte_distributor.c
> > > > +++ b/lib/librte_distributor/rte_distributor.c
> > > > @@ -92,6 +92,8 @@ struct rte_distributor {
> > > >         unsigned num_workers;                 /**< Number of workers
> > > > polling */
> > > >
> > > >         uint32_t in_flight_tags[RTE_MAX_LCORE];
> > > > +       uint32_t in_flight_bitmask;
> > > > +
> > > >         struct rte_distributor_backlog backlog[RTE_MAX_LCORE];
> > > >
> > > >         union rte_distributor_buffer bufs[RTE_MAX_LCORE];
> > > > @@ -188,6 +190,7 @@ static inline void
> > > >  handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
> > > >  {
> > > >         d->in_flight_tags[wkr] = 0;
> > > > +       d->in_flight_mask &= ~(1 << wkr);
> > > >         d->bufs[wkr].bufptr64 = 0;
> > > >         if (unlikely(d->backlog[wkr].count != 0)) {
> > > >                 /* On return of a packet, we need to move the
> > > > @@ -241,6 +244,7 @@ process_returns(struct rte_distributor *d)
> > > >                         else {
> > > >                                 d->bufs[wkr].bufptr64 =
> > > RTE_DISTRIB_GET_BUF;
> > > >                                 d->in_flight_tags[wkr] = 0;
> > > > +                               d->in_flight_mask &= ~(1 << wkr);
> > > >                         }
> > > >                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> > > >                 } else if (data & RTE_DISTRIB_RETURN_BUF) {
> > > > @@ -282,12 +286,13 @@ rte_distributor_process(struct rte_distributor
> *d,
> > > >                         next_mb = mbufs[next_idx++];
> > > >                         next_value = (((int64_t)(uintptr_t)next_mb)
> > > >                                         << RTE_DISTRIB_FLAG_BITS);
> > > > -                       new_tag = (next_mb->hash.rss | 1);
> > > > +                       new_tag = next_mb->hash.rss;
> > > >
> > > >                         uint32_t match = 0;
> > > >                         unsigned i;
> > > >                         for (i = 0; i < d->num_workers; i++)
> > > > -                               match |= (!(d->in_flight_tags[i] ^
> > > new_tag)
> > > > +                               match |= (((!(d->in_flight_tags[i] ^
> > > > new_tag)) &
> > > > +
>  (d->in_flight_bitmask >>
> > > i))
> > >
> > > I would not do the bitmask comparison here, as that's extra
> instruction in
> > > the
> > > loop. Instead, because its a bitmask, build up the match variable as
> it was
> > > before, and then just do a single and operation afterwards, outside the
> > > loop
> > > body.
> > >
> > > /Bruce
> > >
> > > >                                         << i);
> > > >
> > > >                         if (match) {
> > > > @@ -309,6 +314,7 @@ rte_distributor_process(struct rte_distributor
> *d,
> > > >                         else {
> > > >                                 d->bufs[wkr].bufptr64 = next_value;
> > > >                                 d->in_flight_tags[wkr] = new_tag;
> > > > +                               d->in_flight_bitmask |= 1 << wkr;
> > > >                                 next_mb = NULL;
> > > >                         }
> > > >                         oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> > > >
> > > >
> > > >
> > >
>
  

Patch

diff --git a/lib/librte_distributor/rte_distributor.c
b/lib/librte_distributor/rte_di
index 585ff88..d606bcf 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -92,6 +92,8 @@  struct rte_distributor {
        unsigned num_workers;                 /**< Number of workers
polling */

        uint32_t in_flight_tags[RTE_MAX_LCORE];
+       uint32_t in_flight_bitmask;
+
        struct rte_distributor_backlog backlog[RTE_MAX_LCORE];