[1/2] bitmap: add create bitmap with all bits set

Message ID 1583828479-204084-2-git-send-email-suanmingm@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Series bitmap: add create bitmap with all bits set |


Context Check Description
ci/checkpatch success coding style OK
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Suanming Mou March 10, 2020, 8:21 a.m. UTC
Currently, in the case to use bitmap as resource allocator, after
bitmap creation, all the bitmap bits should be set to indicate the
bit available. Every time when allocate one bit, search for the set
bits and clear it to make it in use.

Add a new rte_bitmap_init_with_all_set() function to have a quick
fill up the bitmap bits.

Comparing with the case create the bitmap as empty and set the bitmap
one by one, the new function costs less cycles.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
 lib/librte_eal/common/include/rte_bitmap.h | 32 ++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)


Andrzej Ostruszka April 3, 2020, 2:49 p.m. UTC | #1
Hello Suanming

Please find my comments below.  However please note that so far I have
never used DPDK bitmaps so I might not be the best person to comment -
this patch needs some attention so I spent some time on it.

Overall I'm fine with the changes however since this is a performance
enhancement I've added some remarks/comments.

On 3/10/20 9:21 AM, Suanming Mou wrote:
> Currently, in the case to use bitmap as resource allocator, after
> bitmap creation, all the bitmap bits should be set to indicate the
> bit available. Every time when allocate one bit, search for the set
> bits and clear it to make it in use.
> Add a new rte_bitmap_init_with_all_set() function to have a quick
> fill up the bitmap bits.
> Comparing with the case create the bitmap as empty and set the bitmap
> one by one, the new function costs less cycles.
> Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
> ---
>  lib/librte_eal/common/include/rte_bitmap.h | 32 ++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> diff --git a/lib/librte_eal/common/include/rte_bitmap.h b/lib/librte_eal/common/include/rte_bitmap.h
> index 6b846f2..36b32e4 100644
> --- a/lib/librte_eal/common/include/rte_bitmap.h
> +++ b/lib/librte_eal/common/include/rte_bitmap.h
> @@ -483,6 +483,38 @@ struct rte_bitmap {
>  	return 0;
>  }
> +/**
> + * Bitmap initialization with all bits set
> + *
> + * @param n_bits
> + *   Number of pre-allocated bits in array2.
> + * @param mem
> + *   Base address of array1 and array2.
> + * @param mem_size
> + *   Minimum expected size of bitmap.
> + * @return
> + *   Handle to bitmap instance.
> + */
> +static inline struct rte_bitmap *
> +rte_bitmap_init_with_all_set(uint32_t n_bits, uint8_t *mem, uint32_t mem_size)
> +{
> +	uint32_t i;
> +	uint32_t slabs = n_bits / RTE_BITMAP_SLAB_BIT_SIZE;
> +	struct rte_bitmap *bmp = rte_bitmap_init(n_bits, mem, mem_size);
> +
> +	if (!bmp)
> +		return NULL;
> +	/* Fill the arry2 byte aligned bits. */
> +	memset(bmp->array2, 0xff, slabs * sizeof(bmp->array2[0]));

In rte_bitmap_init() we clear memory with 0 and now we set it with 1s.
Maybe separating the configuration from the actual initialization would
be better?  So that you call __rte_bitmap_init() and later zero in
rte_bitmap_init() and set to 1s here.

> +	/* Fill the arry1 bits. */
> +	for (i = 0; i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
> +		rte_bitmap_set(bmp, i);

Maybe you could here also compute the number of array1 bytes that can be
set to FF and use memset() and for the remaining user rte_bitmap_set()?
Right now you are also touching array2 memory which was already set above.

> +	/* Fill the arry2 left not byte aligned bits. */
> +	for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
> +		rte_bitmap_set(bmp, i);
> +	return bmp;
> +}
> +

With regards
Andrzej Ostruszka
Suanming Mou April 7, 2020, 6:19 a.m. UTC | #2
On 4/3/2020 10:49 PM, Andrzej Ostruszka wrote:
> Hello Suanming
> Please find my comments below.  However please note that so far I have
> never used DPDK bitmaps so I might not be the best person to comment -
> this patch needs some attention so I spent some time on it.
> Overall I'm fine with the changes however since this is a performance
> enhancement I've added some remarks/comments.
Hi Andrzej , thanks for your suggestions.
> On 3/10/20 9:21 AM, Suanming Mou wrote:
>> Currently, in the case to use bitmap as resource allocator, after
>> bitmap creation, all the bitmap bits should be set to indicate the
>> bit available. Every time when allocate one bit, search for the set
>> bits and clear it to make it in use.
>> Add a new rte_bitmap_init_with_all_set() function to have a quick
>> fill up the bitmap bits.
>> Comparing with the case create the bitmap as empty and set the bitmap
>> one by one, the new function costs less cycles.
>> Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
>> ---
>>   lib/librte_eal/common/include/rte_bitmap.h | 32 ++++++++++++++++++++++++++++++
>>   1 file changed, 32 insertions(+)
>> diff --git a/lib/librte_eal/common/include/rte_bitmap.h b/lib/librte_eal/common/include/rte_bitmap.h
>> index 6b846f2..36b32e4 100644
>> --- a/lib/librte_eal/common/include/rte_bitmap.h
>> +++ b/lib/librte_eal/common/include/rte_bitmap.h
>> @@ -483,6 +483,38 @@ struct rte_bitmap {
>>   	return 0;
>>   }
>> +/**
>> + * Bitmap initialization with all bits set
>> + *
>> + * @param n_bits
>> + *   Number of pre-allocated bits in array2.
>> + * @param mem
>> + *   Base address of array1 and array2.
>> + * @param mem_size
>> + *   Minimum expected size of bitmap.
>> + * @return
>> + *   Handle to bitmap instance.
>> + */
>> +static inline struct rte_bitmap *
>> +rte_bitmap_init_with_all_set(uint32_t n_bits, uint8_t *mem, uint32_t mem_size)
>> +{
>> +	uint32_t i;
>> +	uint32_t slabs = n_bits / RTE_BITMAP_SLAB_BIT_SIZE;
>> +	struct rte_bitmap *bmp = rte_bitmap_init(n_bits, mem, mem_size);
>> +
>> +	if (!bmp)
>> +		return NULL;
>> +	/* Fill the arry2 byte aligned bits. */
>> +	memset(bmp->array2, 0xff, slabs * sizeof(bmp->array2[0]));
> In rte_bitmap_init() we clear memory with 0 and now we set it with 1s.
> Maybe separating the configuration from the actual initialization would
> be better?  So that you call __rte_bitmap_init() and later zero in
> rte_bitmap_init() and set to 1s here.

Good idea. In fact, the first proposal was to add a new function which 
can set all the n_bits.

Since currently, the bitmap struct does not contain n_bits, the 
rte_bitmap_init_with_all_set() was introduced.

>> +	/* Fill the arry1 bits. */
>> +	for (i = 0; i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
>> +		rte_bitmap_set(bmp, i);
> Maybe you could here also compute the number of array1 bytes that can be
> set to FF and use memset() and for the remaining user rte_bitmap_set()?
> Right now you are also touching array2 memory which was already set above.

The RTE_BITMAP_CL_BIT_SIZE is 512 with cache_line size 64. Maybe for 
most of the cases which creates the bitmap less than 4K bits will not 
have chance with the memset.

Anyway, will add it.

>> +	/* Fill the arry2 left not byte aligned bits. */
>> +	for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
>> +		rte_bitmap_set(bmp, i);
>> +	return bmp;
>> +}
>> +
> With regards
> Andrzej Ostruszka
Suanming Mou April 7, 2020, 3 p.m. UTC | #3
Hi guys,

Since we are all quite curious about which is the best implementation 
for the performance, I just did some test on my server.

There will be 3 implementations.
1. Clear all the array1 and array2 bits first, then set the bits we 
needed.(The current implementation in the patch).
2. Set all the bits in array1 and array2 first, then clear the not 
needed bits.
3. Set the needed bits in array1 and array2, and clear the left not need 
(As we are allocate more memory as the alignment, clear not needed bits 
should be done anyway.)

So it's call the 3 implementation Cs, Sc, sc:
Capital 'C' means clear all bits.
Lowercase 'c' means clear not needed bits.
Capital 'S' means set all bits.
Lowercase 's' means set needed bits.

I add some test code in the bitmap_test code, here is the cycle for 
different bits with different implementations.
Set bits:63
Cs   Sc   sc
1018 1089 1078

Set bits:126
Cs   Sc   sc
972  1082 1048

Set bits:252
Cs   Sc   sc
918  1039 1029

Set bits:504
Cs   Sc   sc
861  986  957

Set bits:1008
Cs   Sc   sc
802  882  851

Set bits:2016
Cs   Sc   sc
618  646  625

Set bits:4032
Cs   Sc   sc
272  215  209

Set bits:8064
Cs   Sc   sc
537  392  391

Set bits:16128
Cs   Sc   sc
1083 786  798

As we can see, after 4K bits, the Cs case comes disadvantage, before 4K 
bits, it works much better.
And since the cycles before 4K  bits does not show more significant 
differences, it should be OK to use the Sc or sc cases.
Maybe better to choose the sc code.

Testing code as below:
static void
test_tsc(uint32_t n_bits)
         void *mem;
         uint32_t i;
         uint64_t start, cost, cost2, cost3;
         uint32_t bmp_size;
         struct rte_bitmap *bmp;

         bmp_size =

         mem = rte_zmalloc("test_bmap", bmp_size, RTE_CACHE_LINE_SIZE);
         if (mem == NULL) {
                 printf("Failed to allocate memory for bitmap\n");
         /* Make the memory hot.*/
         bmp = rte_bitmap_init_with_all_set(n_bits, mem, bmp_size);
         if (bmp == NULL) {
                 printf("Failed to init bitmap\n");
         /* Clear all bits first, set needed. */
         start = rte_rdtsc();
         for (i = 0; i < 1000; i++)
                 rte_bitmap_init_with_all_set(n_bits, mem, bmp_size);
         cost = (rte_rdtsc() - start) / 1000;
         /* Set all bits first, clear not needed. */
         start = rte_rdtsc();
         for (i = 0; i < 1000; i++)
                 rte_bitmap_init_with_all_set2(n_bits, mem, bmp_size);
         cost2 = (rte_rdtsc() - start) / 1000;
         /* Set needed bits, clear left. */
         start = rte_rdtsc();
         for (i = 0; i < 1000; i++)
                 rte_bitmap_init_with_all_set3(n_bits, mem, bmp_size);
         cost3 = (rte_rdtsc() - start) / 1000;

         printf("Set bits:%d\nCs   Sc   sc\n", n_bits);
         printf("%-4ld %-4ld %-4ld\n\n", cost, cost2, cost3);


         uint32_t i;

         for (i = 63; i < (63 << 9); i<<=1)

Sc code as below:
static inline struct rte_bitmap *
rte_bitmap_init_with_all_set2(uint32_t n_bits, uint8_t *mem, uint32_t 
         uint32_t i;
         uint32_t slabs;
         struct rte_bitmap *bmp;

         bmp = __rte_bitmap_init(n_bits, mem, mem_size);
         if (!bmp)
                 return NULL;
         memset(bmp->array1, 0xff, bmp->array1_size * sizeof(uint64_t));
         memset(bmp->array2, 0xff, bmp->array2_size * sizeof(uint64_t));
         /* Fill the arry1 slab aligned bits. */
         slabs = n_bits >> (RTE_BITMAP_SLAB_BIT_SIZE_LOG2 +
         /* Clear the array1 left slabs. */
         memset(&bmp->array1[slabs], 0, (bmp->array1_size - slabs) *
         /* Fill the array1 middle not full set slab. */
         i = slabs << (RTE_BITMAP_SLAB_BIT_SIZE_LOG2 +
         for (;i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
                 rte_bitmap_set(bmp, i);
         /* Clear the array2 left slabs. */
         slabs = n_bits >> RTE_BITMAP_SLAB_BIT_SIZE_LOG2;
         memset(&bmp->array2[slabs], 0, (bmp->array2_size - slabs) *
         /* Fill the array2 middle not full set slab. */
         for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
                 rte_bitmap_set(bmp, i);
         return bmp;

sc code as below:
static inline struct rte_bitmap *
rte_bitmap_init_with_all_set3(uint32_t n_bits, uint8_t *mem, uint32_t 
         uint32_t i;
         uint32_t slabs;
         struct rte_bitmap *bmp;

         bmp = __rte_bitmap_init(n_bits, mem, mem_size);
         if (!bmp)
                 return NULL;
         /* Fill the arry1 slab aligned bits. */
         slabs = n_bits >> (RTE_BITMAP_SLAB_BIT_SIZE_LOG2 +
         memset(bmp->array1, 0xff, slabs * sizeof(bmp->array1[0]));
         /* Clear the array1 left slabs. */
         memset(&bmp->array1[slabs], 0, (bmp->array1_size - slabs) *
         /* Fill the array1 middle not full set slab. */
         i = slabs << (RTE_BITMAP_SLAB_BIT_SIZE_LOG2 +
         for (;i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
                 rte_bitmap_set(bmp, i);
         /* Fill the arry2 slab aligned bits. */
         slabs = n_bits >> RTE_BITMAP_SLAB_BIT_SIZE_LOG2;
         memset(bmp->array2, 0xff, slabs * sizeof(bmp->array2[0]));
         /* Clear the array2 left slabs. */
         memset(&bmp->array2[slabs], 0, (bmp->array2_size - slabs) *
         /* Fill the array2 middle not full set slab. */
         for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
                 rte_bitmap_set(bmp, i);
         return bmp;

Any comments or suggestions?

Cristian Dumitrescu April 7, 2020, 5:48 p.m. UTC | #4
Hi Suanming,

> -----Original Message-----
> From: Suanming Mou <suanmingm@mellanox.com>
> Sent: Tuesday, March 10, 2020 8:21 AM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 1/2] bitmap: add create bitmap with all bits set
> Currently, in the case to use bitmap as resource allocator, after
> bitmap creation, all the bitmap bits should be set to indicate the
> bit available. Every time when allocate one bit, search for the set
> bits and clear it to make it in use.
> Add a new rte_bitmap_init_with_all_set() function to have a quick
> fill up the bitmap bits.
> Comparing with the case create the bitmap as empty and set the bitmap
> one by one, the new function costs less cycles.
> Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
> ---
>  lib/librte_eal/common/include/rte_bitmap.h | 32
> ++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> diff --git a/lib/librte_eal/common/include/rte_bitmap.h
> b/lib/librte_eal/common/include/rte_bitmap.h
> index 6b846f2..36b32e4 100644
> --- a/lib/librte_eal/common/include/rte_bitmap.h
> +++ b/lib/librte_eal/common/include/rte_bitmap.h
> @@ -483,6 +483,38 @@ struct rte_bitmap {
>  	return 0;
>  }
> +/**
> + * Bitmap initialization with all bits set
> + *
> + * @param n_bits
> + *   Number of pre-allocated bits in array2.
> + * @param mem
> + *   Base address of array1 and array2.
> + * @param mem_size
> + *   Minimum expected size of bitmap.
> + * @return
> + *   Handle to bitmap instance.
> + */
> +static inline struct rte_bitmap *
> +rte_bitmap_init_with_all_set(uint32_t n_bits, uint8_t *mem, uint32_t
> mem_size)
> +{
> +	uint32_t i;
> +	uint32_t slabs = n_bits / RTE_BITMAP_SLAB_BIT_SIZE;
> +	struct rte_bitmap *bmp = rte_bitmap_init(n_bits, mem, mem_size);
> +
> +	if (!bmp)
> +		return NULL;
> +	/* Fill the arry2 byte aligned bits. */
> +	memset(bmp->array2, 0xff, slabs * sizeof(bmp->array2[0]));
> +	/* Fill the arry1 bits. */
> +	for (i = 0; i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
> +		rte_bitmap_set(bmp, i);
> +	/* Fill the arry2 left not byte aligned bits. */
> +	for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
> +		rte_bitmap_set(bmp, i);
> +	return bmp;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --

I agree that starting with all bits set could be very useful for some apps.

I agree that having a customized implementation for starting with all bits set -- as opposed to simply start with all bits cleared and calling the API in a loop to set each bit -- could be useful, as it could reduce the initialization time.

What I don't understand is your implementation of it: why still calling the API to set all bits in a loop? If we are to add this, I suggest we create a fully customized implementation that sets the fields on struct rte_bitmap to the right values. Makes sense?

Suanming Mou April 8, 2020, 2:57 a.m. UTC | #5

> -----Original Message-----
> From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Sent: Wednesday, April 8, 2020 1:48 AM
> To: Suanming Mou <suanmingm@mellanox.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH 1/2] bitmap: add create bitmap with all bits set
> Hi Suanming,
> > -----Original Message-----
> > From: Suanming Mou <suanmingm@mellanox.com>
> > Sent: Tuesday, March 10, 2020 8:21 AM
> > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > Cc: dev@dpdk.org
> > Subject: [PATCH 1/2] bitmap: add create bitmap with all bits set
> >
> > Currently, in the case to use bitmap as resource allocator, after
> > bitmap creation, all the bitmap bits should be set to indicate the bit
> > available. Every time when allocate one bit, search for the set bits
> > and clear it to make it in use.
> >
> > Add a new rte_bitmap_init_with_all_set() function to have a quick fill
> > up the bitmap bits.
> >
> > Comparing with the case create the bitmap as empty and set the bitmap
> > one by one, the new function costs less cycles.
> >
> > Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
> > ---
> >  lib/librte_eal/common/include/rte_bitmap.h | 32
> > ++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/include/rte_bitmap.h
> > b/lib/librte_eal/common/include/rte_bitmap.h
> > index 6b846f2..36b32e4 100644
> > --- a/lib/librte_eal/common/include/rte_bitmap.h
> > +++ b/lib/librte_eal/common/include/rte_bitmap.h
> > @@ -483,6 +483,38 @@ struct rte_bitmap {
> >  	return 0;
> >  }
> >
> > +/**
> > + * Bitmap initialization with all bits set
> > + *
> > + * @param n_bits
> > + *   Number of pre-allocated bits in array2.
> > + * @param mem
> > + *   Base address of array1 and array2.
> > + * @param mem_size
> > + *   Minimum expected size of bitmap.
> > + * @return
> > + *   Handle to bitmap instance.
> > + */
> > +static inline struct rte_bitmap *
> > +rte_bitmap_init_with_all_set(uint32_t n_bits, uint8_t *mem, uint32_t
> > mem_size)
> > +{
> > +	uint32_t i;
> > +	uint32_t slabs = n_bits / RTE_BITMAP_SLAB_BIT_SIZE;
> > +	struct rte_bitmap *bmp = rte_bitmap_init(n_bits, mem, mem_size);
> > +
> > +	if (!bmp)
> > +		return NULL;
> > +	/* Fill the arry2 byte aligned bits. */
> > +	memset(bmp->array2, 0xff, slabs * sizeof(bmp->array2[0]));
> > +	/* Fill the arry1 bits. */
> > +	for (i = 0; i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
> > +		rte_bitmap_set(bmp, i);
> > +	/* Fill the arry2 left not byte aligned bits. */
> > +	for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
> > +		rte_bitmap_set(bmp, i);
> > +	return bmp;
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> >
> I agree that starting with all bits set could be very useful for some apps.
> I agree that having a customized implementation for starting with all bits set --
> as opposed to simply start with all bits cleared and calling the API in a loop to set
> each bit -- could be useful, as it could reduce the initialization time.
> What I don't understand is your implementation of it: why still calling the API to
> set all bits in a loop? If we are to add this, I suggest we create a fully customized
> implementation that sets the fields on struct rte_bitmap to the right values.
> Makes sense?
Thanks for the suggestion. Will update.
> Thanks,
> Cristian


diff --git a/lib/librte_eal/common/include/rte_bitmap.h b/lib/librte_eal/common/include/rte_bitmap.h
index 6b846f2..36b32e4 100644
--- a/lib/librte_eal/common/include/rte_bitmap.h
+++ b/lib/librte_eal/common/include/rte_bitmap.h
@@ -483,6 +483,38 @@  struct rte_bitmap {
 	return 0;
+ * Bitmap initialization with all bits set
+ *
+ * @param n_bits
+ *   Number of pre-allocated bits in array2.
+ * @param mem
+ *   Base address of array1 and array2.
+ * @param mem_size
+ *   Minimum expected size of bitmap.
+ * @return
+ *   Handle to bitmap instance.
+ */
+static inline struct rte_bitmap *
+rte_bitmap_init_with_all_set(uint32_t n_bits, uint8_t *mem, uint32_t mem_size)
+	uint32_t i;
+	uint32_t slabs = n_bits / RTE_BITMAP_SLAB_BIT_SIZE;
+	struct rte_bitmap *bmp = rte_bitmap_init(n_bits, mem, mem_size);
+	if (!bmp)
+		return NULL;
+	/* Fill the arry2 byte aligned bits. */
+	memset(bmp->array2, 0xff, slabs * sizeof(bmp->array2[0]));
+	/* Fill the arry1 bits. */
+	for (i = 0; i < n_bits; i += RTE_BITMAP_CL_BIT_SIZE)
+		rte_bitmap_set(bmp, i);
+	/* Fill the arry2 left not byte aligned bits. */
+	for (i = slabs * RTE_BITMAP_SLAB_BIT_SIZE; i < n_bits; i++)
+		rte_bitmap_set(bmp, i);
+	return bmp;
 #ifdef __cplusplus