[dpdk-dev,RFC,2/6] mempool: implement clustered object allocation

Message ID 1511539591-20966-3-git-send-email-arybchenko@solarflare.com
State Superseded, archived
Delegated to: Thomas Monjalon
Headers show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Andrew Rybchenko Nov. 24, 2017, 4:06 p.m.
From: "Artem V. Andreev" <Artem.Andreev@oktetlabs.ru>

Clustered allocation is required to simplify packaging objects into
buckets and search of the bucket control structure by an object.

Signed-off-by: Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
 lib/librte_mempool/rte_mempool.c | 39 +++++++++++++++++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.h | 23 +++++++++++++++++++++--
 test/test/test_mempool.c         |  2 +-
 3 files changed, 57 insertions(+), 7 deletions(-)

Comments

Olivier Matz Dec. 14, 2017, 1:37 p.m. | #1
On Fri, Nov 24, 2017 at 04:06:27PM +0000, Andrew Rybchenko wrote:
> From: "Artem V. Andreev" <Artem.Andreev@oktetlabs.ru>
> 
> Clustered allocation is required to simplify packaging objects into
> buckets and search of the bucket control structure by an object.
> 
> Signed-off-by: Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>  lib/librte_mempool/rte_mempool.c | 39 +++++++++++++++++++++++++++++++++++----
>  lib/librte_mempool/rte_mempool.h | 23 +++++++++++++++++++++--
>  test/test/test_mempool.c         |  2 +-
>  3 files changed, 57 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index d50dba4..43455a3 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>   */
>  size_t
>  rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
> -		      unsigned int flags)
> +		      unsigned int flags,
> +		      const struct rte_mempool_info *info)
>  {
>  	size_t obj_per_page, pg_num, pg_sz;
>  	unsigned int mask;
> @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>  	if (total_elt_sz == 0)
>  		return 0;
>  
> +	if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) {
> +		unsigned int align_shift =
> +			rte_bsf32(
> +				rte_align32pow2(total_elt_sz *
> +						info->cluster_size));
> +		if (pg_shift < align_shift) {
> +			return ((elt_num / info->cluster_size) + 2)
> +				<< align_shift;
> +		}
> +	}
> +

+Cc Santosh for this

To be honnest, that was my fear when introducing
MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more
and more specific flags in generic code.

I feel that the hidden meaning of these flags is more "if driver == foo",
which shows that something is wrong is the current design.

We have to think about another way to do. Let me try to propose
something (to be deepen).

The standard way to create a mempool is:

  mp = create_empty(...)
  set_ops_by_name(mp, "my-driver")    // optional
  populate_default(mp)                // or populate_*()
  obj_iter(mp, callback, arg)         // optional, to init objects
  // and optional local func to init mempool priv

First, we can consider deprecating some APIs like:
 - rte_mempool_xmem_create()
 - rte_mempool_xmem_size()
 - rte_mempool_xmem_usage()
 - rte_mempool_populate_iova_tab()

These functions were introduced for xen, which was recently
removed. They are complex to use, and are not used anywhere else in
DPDK.

Then, instead of having flags (quite hard to understand without knowing
the underlying driver), we can let the mempool drivers do the
populate_default() operation. For that we can add a populate_default
field in mempool ops. Same for populate_virt(), populate_anon(), and
populate_phys() which can return -ENOTSUP if this is not
implemented/implementable on a specific driver, or if flags
(NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function
pointer is NULL, use the generic function.

Thanks to this, the generic code would remain understandable and won't
have to care about how memory should be allocated for a specific driver.

Thoughts?


[...]

> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 3c59d36..9bcb8b7 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -220,7 +220,10 @@ struct rte_mempool_memhdr {
>  /*
>   * Additional information about the mempool
>   */
> -struct rte_mempool_info;
> +struct rte_mempool_info {
> +	/** Number of objects in a cluster */
> +	unsigned int cluster_size;
> +};

I think what I'm proposing would also prevent to introduce this
structure, which is generic but only applies to this driver.
Andrew Rybchenko Jan. 17, 2018, 3:03 p.m. | #2
On 12/14/2017 04:37 PM, Olivier MATZ wrote:
> On Fri, Nov 24, 2017 at 04:06:27PM +0000, Andrew Rybchenko wrote:
>> From: "Artem V. Andreev" <Artem.Andreev@oktetlabs.ru>
>>
>> Clustered allocation is required to simplify packaging objects into
>> buckets and search of the bucket control structure by an object.
>>
>> Signed-off-by: Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>>   lib/librte_mempool/rte_mempool.c | 39 +++++++++++++++++++++++++++++++++++----
>>   lib/librte_mempool/rte_mempool.h | 23 +++++++++++++++++++++--
>>   test/test/test_mempool.c         |  2 +-
>>   3 files changed, 57 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>> index d50dba4..43455a3 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>>    */
>>   size_t
>>   rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>> -		      unsigned int flags)
>> +		      unsigned int flags,
>> +		      const struct rte_mempool_info *info)
>>   {
>>   	size_t obj_per_page, pg_num, pg_sz;
>>   	unsigned int mask;
>> @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>>   	if (total_elt_sz == 0)
>>   		return 0;
>>   
>> +	if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) {
>> +		unsigned int align_shift =
>> +			rte_bsf32(
>> +				rte_align32pow2(total_elt_sz *
>> +						info->cluster_size));
>> +		if (pg_shift < align_shift) {
>> +			return ((elt_num / info->cluster_size) + 2)
>> +				<< align_shift;
>> +		}
>> +	}
>> +
> +Cc Santosh for this
>
> To be honnest, that was my fear when introducing
> MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more
> and more specific flags in generic code.
>
> I feel that the hidden meaning of these flags is more "if driver == foo",
> which shows that something is wrong is the current design.
>
> We have to think about another way to do. Let me try to propose
> something (to be deepen).
>
> The standard way to create a mempool is:
>
>    mp = create_empty(...)
>    set_ops_by_name(mp, "my-driver")    // optional
>    populate_default(mp)                // or populate_*()
>    obj_iter(mp, callback, arg)         // optional, to init objects
>    // and optional local func to init mempool priv
>
> First, we can consider deprecating some APIs like:
>   - rte_mempool_xmem_create()
>   - rte_mempool_xmem_size()
>   - rte_mempool_xmem_usage()
>   - rte_mempool_populate_iova_tab()
>
> These functions were introduced for xen, which was recently
> removed. They are complex to use, and are not used anywhere else in
> DPDK.
>
> Then, instead of having flags (quite hard to understand without knowing
> the underlying driver), we can let the mempool drivers do the
> populate_default() operation. For that we can add a populate_default
> field in mempool ops. Same for populate_virt(), populate_anon(), and
> populate_phys() which can return -ENOTSUP if this is not
> implemented/implementable on a specific driver, or if flags
> (NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function
> pointer is NULL, use the generic function.
>
> Thanks to this, the generic code would remain understandable and won't
> have to care about how memory should be allocated for a specific driver.
>
> Thoughts?

Yes, I agree. This week we'll provide updated version of the RFC which
covers it including transition of the mempool/octeontx. I think it is 
sufficient
to introduce two new ops:
  1. To calculate memory space required to store specified number of objects
  2. To populate objects in the provided memory chunk (the op will be called
      from rte_mempool_populate_iova() which is a leaf function for all
      rte_mempool_populate_*() calls.
It will allow to avoid duplication and keep memchunks housekeeping inside
mempool library.

> [...]
>
>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>> index 3c59d36..9bcb8b7 100644
>> --- a/lib/librte_mempool/rte_mempool.h
>> +++ b/lib/librte_mempool/rte_mempool.h
>> @@ -220,7 +220,10 @@ struct rte_mempool_memhdr {
>>   /*
>>    * Additional information about the mempool
>>    */
>> -struct rte_mempool_info;
>> +struct rte_mempool_info {
>> +	/** Number of objects in a cluster */
>> +	unsigned int cluster_size;
>> +};
> I think what I'm proposing would also prevent to introduce this
> structure, which is generic but only applies to this driver.

Yes
Santosh Shukla Jan. 17, 2018, 3:55 p.m. | #3
On Wednesday 17 January 2018 08:33 PM, Andrew Rybchenko wrote:
> On 12/14/2017 04:37 PM, Olivier MATZ wrote:
>> On Fri, Nov 24, 2017 at 04:06:27PM +0000, Andrew Rybchenko wrote:
>>> From: "Artem V. Andreev" <Artem.Andreev@oktetlabs.ru>
>>>
>>> Clustered allocation is required to simplify packaging objects into
>>> buckets and search of the bucket control structure by an object.
>>>
>>> Signed-off-by: Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
>>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>> ---
>>>   lib/librte_mempool/rte_mempool.c | 39 +++++++++++++++++++++++++++++++++++----
>>>   lib/librte_mempool/rte_mempool.h | 23 +++++++++++++++++++++--
>>>   test/test/test_mempool.c         |  2 +-
>>>   3 files changed, 57 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>>> index d50dba4..43455a3 100644
>>> --- a/lib/librte_mempool/rte_mempool.c
>>> +++ b/lib/librte_mempool/rte_mempool.c
>>> @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>>>    */
>>>   size_t
>>>   rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>>> -              unsigned int flags)
>>> +              unsigned int flags,
>>> +              const struct rte_mempool_info *info)
>>>   {
>>>       size_t obj_per_page, pg_num, pg_sz;
>>>       unsigned int mask;
>>> @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>>>       if (total_elt_sz == 0)
>>>           return 0;
>>>   +    if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) {
>>> +        unsigned int align_shift =
>>> +            rte_bsf32(
>>> +                rte_align32pow2(total_elt_sz *
>>> +                        info->cluster_size));
>>> +        if (pg_shift < align_shift) {
>>> +            return ((elt_num / info->cluster_size) + 2)
>>> +                << align_shift;
>>> +        }
>>> +    }
>>> +
>> +Cc Santosh for this
>>
>> To be honnest, that was my fear when introducing
>> MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more
>> and more specific flags in generic code.
>>
>> I feel that the hidden meaning of these flags is more "if driver == foo",
>> which shows that something is wrong is the current design.
>>
>> We have to think about another way to do. Let me try to propose
>> something (to be deepen).
>>
>> The standard way to create a mempool is:
>>
>>    mp = create_empty(...)
>>    set_ops_by_name(mp, "my-driver")    // optional
>>    populate_default(mp)                // or populate_*()
>>    obj_iter(mp, callback, arg)         // optional, to init objects
>>    // and optional local func to init mempool priv
>>
>> First, we can consider deprecating some APIs like:
>>   - rte_mempool_xmem_create()
>>   - rte_mempool_xmem_size()
>>   - rte_mempool_xmem_usage()
>>   - rte_mempool_populate_iova_tab()
>>
>> These functions were introduced for xen, which was recently
>> removed. They are complex to use, and are not used anywhere else in
>> DPDK.
>>
>> Then, instead of having flags (quite hard to understand without knowing
>> the underlying driver), we can let the mempool drivers do the
>> populate_default() operation. For that we can add a populate_default
>> field in mempool ops. Same for populate_virt(), populate_anon(), and
>> populate_phys() which can return -ENOTSUP if this is not
>> implemented/implementable on a specific driver, or if flags
>> (NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function
>> pointer is NULL, use the generic function.
>>
>> Thanks to this, the generic code would remain understandable and won't
>> have to care about how memory should be allocated for a specific driver.
>>
>> Thoughts?
>
> Yes, I agree. This week we'll provide updated version of the RFC which
> covers it including transition of the mempool/octeontx. I think it is sufficient
> to introduce two new ops:
>  1. To calculate memory space required to store specified number of objects
>  2. To populate objects in the provided memory chunk (the op will be called
>      from rte_mempool_populate_iova() which is a leaf function for all
>      rte_mempool_populate_*() calls.
> It will allow to avoid duplication and keep memchunks housekeeping inside
> mempool library.
>
There is also a downside of letting mempool driver to populate, which was raised in other thread.
http://dpdk.org/dev/patchwork/patch/31943/

Thanks.
Andrew Rybchenko Jan. 17, 2018, 4:37 p.m. | #4
On 01/17/2018 06:55 PM, santosh wrote:
> On Wednesday 17 January 2018 08:33 PM, Andrew Rybchenko wrote:
>> On 12/14/2017 04:37 PM, Olivier MATZ wrote:
>>> On Fri, Nov 24, 2017 at 04:06:27PM +0000, Andrew Rybchenko wrote:
>>>> From: "Artem V. Andreev" <Artem.Andreev@oktetlabs.ru>
>>>>
>>>> Clustered allocation is required to simplify packaging objects into
>>>> buckets and search of the bucket control structure by an object.
>>>>
>>>> Signed-off-by: Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
>>>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> ---
>>>>    lib/librte_mempool/rte_mempool.c | 39 +++++++++++++++++++++++++++++++++++----
>>>>    lib/librte_mempool/rte_mempool.h | 23 +++++++++++++++++++++--
>>>>    test/test/test_mempool.c         |  2 +-
>>>>    3 files changed, 57 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>>>> index d50dba4..43455a3 100644
>>>> --- a/lib/librte_mempool/rte_mempool.c
>>>> +++ b/lib/librte_mempool/rte_mempool.c
>>>> @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>>>>     */
>>>>    size_t
>>>>    rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>>>> -              unsigned int flags)
>>>> +              unsigned int flags,
>>>> +              const struct rte_mempool_info *info)
>>>>    {
>>>>        size_t obj_per_page, pg_num, pg_sz;
>>>>        unsigned int mask;
>>>> @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
>>>>        if (total_elt_sz == 0)
>>>>            return 0;
>>>>    +    if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) {
>>>> +        unsigned int align_shift =
>>>> +            rte_bsf32(
>>>> +                rte_align32pow2(total_elt_sz *
>>>> +                        info->cluster_size));
>>>> +        if (pg_shift < align_shift) {
>>>> +            return ((elt_num / info->cluster_size) + 2)
>>>> +                << align_shift;
>>>> +        }
>>>> +    }
>>>> +
>>> +Cc Santosh for this
>>>
>>> To be honnest, that was my fear when introducing
>>> MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more
>>> and more specific flags in generic code.
>>>
>>> I feel that the hidden meaning of these flags is more "if driver == foo",
>>> which shows that something is wrong is the current design.
>>>
>>> We have to think about another way to do. Let me try to propose
>>> something (to be deepen).
>>>
>>> The standard way to create a mempool is:
>>>
>>>     mp = create_empty(...)
>>>     set_ops_by_name(mp, "my-driver")    // optional
>>>     populate_default(mp)                // or populate_*()
>>>     obj_iter(mp, callback, arg)         // optional, to init objects
>>>     // and optional local func to init mempool priv
>>>
>>> First, we can consider deprecating some APIs like:
>>>    - rte_mempool_xmem_create()
>>>    - rte_mempool_xmem_size()
>>>    - rte_mempool_xmem_usage()
>>>    - rte_mempool_populate_iova_tab()
>>>
>>> These functions were introduced for xen, which was recently
>>> removed. They are complex to use, and are not used anywhere else in
>>> DPDK.
>>>
>>> Then, instead of having flags (quite hard to understand without knowing
>>> the underlying driver), we can let the mempool drivers do the
>>> populate_default() operation. For that we can add a populate_default
>>> field in mempool ops. Same for populate_virt(), populate_anon(), and
>>> populate_phys() which can return -ENOTSUP if this is not
>>> implemented/implementable on a specific driver, or if flags
>>> (NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function
>>> pointer is NULL, use the generic function.
>>>
>>> Thanks to this, the generic code would remain understandable and won't
>>> have to care about how memory should be allocated for a specific driver.
>>>
>>> Thoughts?
>> Yes, I agree. This week we'll provide updated version of the RFC which
>> covers it including transition of the mempool/octeontx. I think it is sufficient
>> to introduce two new ops:
>>   1. To calculate memory space required to store specified number of objects
>>   2. To populate objects in the provided memory chunk (the op will be called
>>       from rte_mempool_populate_iova() which is a leaf function for all
>>       rte_mempool_populate_*() calls.
>> It will allow to avoid duplication and keep memchunks housekeeping inside
>> mempool library.
>>
> There is also a downside of letting mempool driver to populate, which was raised in other thread.
> http://dpdk.org/dev/patchwork/patch/31943/

I've seen the note about code duplication. Let's discuss it when v2 is sent.
I think our approach minimizes it and allows to have only specific code 
in the
driver callback.

Patch

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d50dba4..43455a3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -239,7 +239,8 @@  rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      unsigned int flags,
+		      const struct rte_mempool_info *info)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 	unsigned int mask;
@@ -252,6 +253,17 @@  rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 	if (total_elt_sz == 0)
 		return 0;
 
+	if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) {
+		unsigned int align_shift =
+			rte_bsf32(
+				rte_align32pow2(total_elt_sz *
+						info->cluster_size));
+		if (pg_shift < align_shift) {
+			return ((elt_num / info->cluster_size) + 2)
+				<< align_shift;
+		}
+	}
+
 	if (pg_shift == 0)
 		return total_elt_sz * elt_num;
 
@@ -362,6 +374,7 @@  rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	void *opaque)
 {
 	unsigned total_elt_sz;
+	unsigned int page_align_size = 0;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -407,7 +420,11 @@  rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp->flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
+	if (mp->flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) {
+		page_align_size = rte_align32pow2(total_elt_sz *
+						  mp->info.cluster_size);
+		off = RTE_PTR_ALIGN_CEIL(vaddr, page_align_size) - vaddr;
+	} else if (mp->flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
 		/* align object start address to a multiple of total_elt_sz */
 		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
 	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
@@ -424,6 +441,10 @@  rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
 		off += mp->elt_size + mp->trailer_size;
 		i++;
+		if ((mp->flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) &&
+		    (i % mp->info.cluster_size) == 0)
+			off = RTE_PTR_ALIGN_CEIL((char *)vaddr + off,
+						 page_align_size) - vaddr;
 	}
 
 	/* not enough room to store one object */
@@ -579,6 +600,16 @@  rte_mempool_populate_default(struct rte_mempool *mp)
 	if ((ret < 0) && (ret != -ENOTSUP))
 		return ret;
 
+	ret = rte_mempool_ops_get_info(mp, &mp->info);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+	if (ret == -ENOTSUP)
+		mp->info.cluster_size = 0;
+
+	if ((mp->info.cluster_size == 0) &&
+	    (mp_flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS))
+		return -EINVAL;
+
 	/* update mempool capabilities */
 	mp->flags |= mp_flags;
 
@@ -595,7 +626,7 @@  rte_mempool_populate_default(struct rte_mempool *mp)
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
 		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+					     mp->flags, &mp->info);
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -653,7 +684,7 @@  get_anon_size(const struct rte_mempool *mp)
 	pg_shift = rte_bsf32(pg_sz);
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+				       mp->flags, &mp->info);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 3c59d36..9bcb8b7 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -220,7 +220,10 @@  struct rte_mempool_memhdr {
 /*
  * Additional information about the mempool
  */
-struct rte_mempool_info;
+struct rte_mempool_info {
+	/** Number of objects in a cluster */
+	unsigned int cluster_size;
+};
 
 /**
  * The RTE mempool structure.
@@ -265,6 +268,7 @@  struct rte_mempool {
 	struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool */
 	uint32_t nb_mem_chunks;          /**< Number of memory chunks */
 	struct rte_mempool_memhdr_list mem_list; /**< List of memory chunks */
+	struct rte_mempool_info info; /**< Additional mempool info */
 
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 	/** Per-lcore statistics. */
@@ -298,6 +302,17 @@  struct rte_mempool {
 #define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
+ * This capability flag is advertised by a mempool handler. Used for a case
+ * where mempool driver wants clusters of objects start at a power-of-two
+ * boundary
+ *
+ * Note:
+ * - This flag should not be passed by application.
+ *   Flag used for mempool driver only.
+ */
+#define MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS 0x0100
+
+/**
  * @internal When debug is enabled, store some statistics.
  *
  * @param mp
@@ -1605,11 +1620,15 @@  uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
  * @param flags
  *  The mempool flags.
+ * @param info
+ *  A pointer to the mempool's additional info (may be NULL unless
+ *  MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS is set in @arg flags)
  * @return
  *   Required memory size aligned at page boundary.
  */
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
-	uint32_t pg_shift, unsigned int flags);
+			     uint32_t pg_shift, unsigned int flags,
+			     const struct rte_mempool_info *info);
 
 /**
  * Get the size of memory required to store mempool elements.
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 37ead50..f4bb9a9 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -485,7 +485,7 @@  test_mempool_xmem_misc(void)
 	elt_num = MAX_KEEP;
 	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
 	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
+				   0, NULL);
 
 	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
 		MEMPOOL_PG_SHIFT_MAX, 0);