[RFC,4/4] mempool: prevent objects from being across pages

Message ID 20190719133845.32432-5-olivier.matz@6wind.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series mempool: avoid objects allocations across pages |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Olivier Matz July 19, 2019, 1:38 p.m. UTC
  When using iova contiguous memory and objets smaller than page size,
ensure that objects are not located across several pages.

Signed-off-by: Vamsi Krishna Attunuru <vattunuru@marvell.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mempool/rte_mempool_ops_default.c | 39 ++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)
  

Comments

Burakov, Anatoly July 19, 2019, 2:03 p.m. UTC | #1
On 19-Jul-19 2:38 PM, Olivier Matz wrote:
> When using iova contiguous memory and objets smaller than page size,
> ensure that objects are not located across several pages.
> 
> Signed-off-by: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>   lib/librte_mempool/rte_mempool_ops_default.c | 39 ++++++++++++++++++++++++++--
>   1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
> index 4e2bfc82d..2bbd67367 100644
> --- a/lib/librte_mempool/rte_mempool_ops_default.c
> +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> @@ -45,19 +45,54 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
>   	return mem_size;
>   }
>   
> +/* Returns -1 if object falls on a page boundary, else returns 0 */
> +static inline int
> +mempool_check_obj_bounds(void *obj, uint64_t pg_sz, size_t elt_sz)
> +{
> +	uintptr_t page_end, elt_addr = (uintptr_t)obj;
> +	uint32_t pg_shift;
> +	uint64_t page_mask;
> +
> +	if (pg_sz == 0)
> +		return 0;
> +	if (elt_sz > pg_sz)
> +		return 0;
> +
> +	pg_shift = rte_bsf32(pg_sz);
> +	page_mask =  ~((1ull << pg_shift) - 1);
> +	page_end = (elt_addr & page_mask) + pg_sz;

This looks like RTE_PTR_ALIGN should do this without the magic? E.g.

page_end = RTE_PTR_ALIGN(elt_addr, pg_sz)

would that not be equivalent?
  
Burakov, Anatoly July 19, 2019, 2:11 p.m. UTC | #2
On 19-Jul-19 2:38 PM, Olivier Matz wrote:
> When using iova contiguous memory and objets smaller than page size,
> ensure that objects are not located across several pages.
> 
> Signed-off-by: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---

<snip>

>   	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>   
> -	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
> +	for (off = 0, i = 0; i < max_objs; i++) {
> +		/* align offset to next page start if required */
> +		if (mempool_check_obj_bounds((char *)vaddr + off,
> +						pg_sz, total_elt_sz) < 0) {
> +			off += RTE_PTR_ALIGN_CEIL((char *)vaddr + off, pg_sz) -
> +				((char *)vaddr + off);

Same here, PTR_ALIGN plus PTR_SUB. It's perhaps not as cool as a 
one-liner, but this is not a hot path :)
  
Andrew Rybchenko Aug. 7, 2019, 3:21 p.m. UTC | #3
On 7/19/19 4:38 PM, Olivier Matz wrote:
> When using iova contiguous memory and objets smaller than page size,
> ensure that objects are not located across several pages.

It looks like as an attempt to make exception a generic rule and
I think it is not a good idea.

mempool has a notion of IOVA contiguous/non-contiguous objects
depending if PA or VA. rte_mempool_op_populate_default() gets
a memory chunk which is contiguous in VA and, if iova is not bad,
IOVA-contiguous. The patch always enforces page boundaries even
if it is not required. For example, if memory chunk is IOVA_PA
contiguous, the patch could result in holes and extra memory usage

> Signed-off-by: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>   lib/librte_mempool/rte_mempool_ops_default.c | 39 ++++++++++++++++++++++++++--
>   1 file changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
> index 4e2bfc82d..2bbd67367 100644
> --- a/lib/librte_mempool/rte_mempool_ops_default.c
> +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> @@ -45,19 +45,54 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
>   	return mem_size;
>   }
>   
> +/* Returns -1 if object falls on a page boundary, else returns 0 */
> +static inline int
> +mempool_check_obj_bounds(void *obj, uint64_t pg_sz, size_t elt_sz)
> +{
> +	uintptr_t page_end, elt_addr = (uintptr_t)obj;
> +	uint32_t pg_shift;
> +	uint64_t page_mask;
> +
> +	if (pg_sz == 0)
> +		return 0;
> +	if (elt_sz > pg_sz)
> +		return 0;
> +
> +	pg_shift = rte_bsf32(pg_sz);
> +	page_mask =  ~((1ull << pg_shift) - 1);
> +	page_end = (elt_addr & page_mask) + pg_sz;
> +
> +	if (elt_addr + elt_sz > page_end)
> +		return -1;
> +
> +	return 0;
> +}
> +
>   int
>   rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
>   		void *vaddr, rte_iova_t iova, size_t len,
>   		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
>   {
> -	size_t total_elt_sz;
> +	size_t total_elt_sz, pg_sz;
>   	size_t off;
>   	unsigned int i;
>   	void *obj;
>   
> +	rte_mempool_get_page_size(mp, &pg_sz);
> +
>   	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>   
> -	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
> +	for (off = 0, i = 0; i < max_objs; i++) {
> +		/* align offset to next page start if required */
> +		if (mempool_check_obj_bounds((char *)vaddr + off,
> +						pg_sz, total_elt_sz) < 0) {
> +			off += RTE_PTR_ALIGN_CEIL((char *)vaddr + off, pg_sz) -
> +				((char *)vaddr + off);
> +		}
> +
> +		if (off + total_elt_sz > len)
> +			break;
> +
>   		off += mp->header_size;
>   		obj = (char *)vaddr + off;
>   		obj_cb(mp, obj_cb_arg, obj,
  
Olivier Matz Oct. 28, 2019, 2:07 p.m. UTC | #4
On Wed, Aug 07, 2019 at 06:21:58PM +0300, Andrew Rybchenko wrote:
> On 7/19/19 4:38 PM, Olivier Matz wrote:
> > When using iova contiguous memory and objets smaller than page size,
> > ensure that objects are not located across several pages.
> 
> It looks like as an attempt to make exception a generic rule and
> I think it is not a good idea.
> 
> mempool has a notion of IOVA contiguous/non-contiguous objects
> depending if PA or VA. rte_mempool_op_populate_default() gets
> a memory chunk which is contiguous in VA and, if iova is not bad,
> IOVA-contiguous. The patch always enforces page boundaries even
> if it is not required. For example, if memory chunk is IOVA_PA
> contiguous, the patch could result in holes and extra memory usage

Yes, it may increase memory usage, but the amount should be limited.
On the other hand, the new patchset provides enhancements that will
reduce the memory consumption.

More importantly, it will fix the KNI + IOVA=VA issue. I also wonder if
this problem couldn't happen in case IOVA=PA. Are there any guarantees
that on all architectures a PA-contiguous in always VA-contiguous in the
kernel?



> 
> > Signed-off-by: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >   lib/librte_mempool/rte_mempool_ops_default.c | 39 ++++++++++++++++++++++++++--
> >   1 file changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
> > index 4e2bfc82d..2bbd67367 100644
> > --- a/lib/librte_mempool/rte_mempool_ops_default.c
> > +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> > @@ -45,19 +45,54 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
> >   	return mem_size;
> >   }
> > +/* Returns -1 if object falls on a page boundary, else returns 0 */
> > +static inline int
> > +mempool_check_obj_bounds(void *obj, uint64_t pg_sz, size_t elt_sz)
> > +{
> > +	uintptr_t page_end, elt_addr = (uintptr_t)obj;
> > +	uint32_t pg_shift;
> > +	uint64_t page_mask;
> > +
> > +	if (pg_sz == 0)
> > +		return 0;
> > +	if (elt_sz > pg_sz)
> > +		return 0;
> > +
> > +	pg_shift = rte_bsf32(pg_sz);
> > +	page_mask =  ~((1ull << pg_shift) - 1);
> > +	page_end = (elt_addr & page_mask) + pg_sz;
> > +
> > +	if (elt_addr + elt_sz > page_end)
> > +		return -1;
> > +
> > +	return 0;
> > +}
> > +
> >   int
> >   rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
> >   		void *vaddr, rte_iova_t iova, size_t len,
> >   		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
> >   {
> > -	size_t total_elt_sz;
> > +	size_t total_elt_sz, pg_sz;
> >   	size_t off;
> >   	unsigned int i;
> >   	void *obj;
> > +	rte_mempool_get_page_size(mp, &pg_sz);
> > +
> >   	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> > -	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
> > +	for (off = 0, i = 0; i < max_objs; i++) {
> > +		/* align offset to next page start if required */
> > +		if (mempool_check_obj_bounds((char *)vaddr + off,
> > +						pg_sz, total_elt_sz) < 0) {
> > +			off += RTE_PTR_ALIGN_CEIL((char *)vaddr + off, pg_sz) -
> > +				((char *)vaddr + off);
> > +		}
> > +
> > +		if (off + total_elt_sz > len)
> > +			break;
> > +
> >   		off += mp->header_size;
> >   		obj = (char *)vaddr + off;
> >   		obj_cb(mp, obj_cb_arg, obj,
>
  
Olivier Matz Oct. 28, 2019, 2:07 p.m. UTC | #5
Hi Anatoly,

On Fri, Jul 19, 2019 at 03:03:29PM +0100, Burakov, Anatoly wrote:
> On 19-Jul-19 2:38 PM, Olivier Matz wrote:
> > When using iova contiguous memory and objets smaller than page size,
> > ensure that objects are not located across several pages.
> > 
> > Signed-off-by: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >   lib/librte_mempool/rte_mempool_ops_default.c | 39 ++++++++++++++++++++++++++--
> >   1 file changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
> > index 4e2bfc82d..2bbd67367 100644
> > --- a/lib/librte_mempool/rte_mempool_ops_default.c
> > +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> > @@ -45,19 +45,54 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
> >   	return mem_size;
> >   }
> > +/* Returns -1 if object falls on a page boundary, else returns 0 */
> > +static inline int
> > +mempool_check_obj_bounds(void *obj, uint64_t pg_sz, size_t elt_sz)
> > +{
> > +	uintptr_t page_end, elt_addr = (uintptr_t)obj;
> > +	uint32_t pg_shift;
> > +	uint64_t page_mask;
> > +
> > +	if (pg_sz == 0)
> > +		return 0;
> > +	if (elt_sz > pg_sz)
> > +		return 0;
> > +
> > +	pg_shift = rte_bsf32(pg_sz);
> > +	page_mask =  ~((1ull << pg_shift) - 1);
> > +	page_end = (elt_addr & page_mask) + pg_sz;
> 
> This looks like RTE_PTR_ALIGN should do this without the magic? E.g.
> 
> page_end = RTE_PTR_ALIGN(elt_addr, pg_sz)
> 
> would that not be equivalent?

Yes, I simplified this part in the new version, thanks.

Olivier
  
Andrew Rybchenko Oct. 29, 2019, 11:03 a.m. UTC | #6
On 10/28/19 5:07 PM, Olivier Matz wrote:
> On Wed, Aug 07, 2019 at 06:21:58PM +0300, Andrew Rybchenko wrote:
>> On 7/19/19 4:38 PM, Olivier Matz wrote:
>>> When using iova contiguous memory and objets smaller than page size,
>>> ensure that objects are not located across several pages.
>> It looks like as an attempt to make exception a generic rule and
>> I think it is not a good idea.
>>
>> mempool has a notion of IOVA contiguous/non-contiguous objects
>> depending if PA or VA. rte_mempool_op_populate_default() gets
>> a memory chunk which is contiguous in VA and, if iova is not bad,
>> IOVA-contiguous. The patch always enforces page boundaries even
>> if it is not required. For example, if memory chunk is IOVA_PA
>> contiguous, the patch could result in holes and extra memory usage
> Yes, it may increase memory usage, but the amount should be limited.
> On the other hand, the new patchset provides enhancements that will
> reduce the memory consumption.
>
> More importantly, it will fix the KNI + IOVA=VA issue. I also wonder if
> this problem couldn't happen in case IOVA=PA. Are there any guarantees
> that on all architectures a PA-contiguous in always VA-contiguous in the
> kernel?

I have no idea. The question is really interesting.
I hope it is true, otherwise we're in trouble since it is
assumed in the code as far as know.
  

Patch

diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 4e2bfc82d..2bbd67367 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -45,19 +45,54 @@  rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 	return mem_size;
 }
 
+/* Returns -1 if object falls on a page boundary, else returns 0 */
+static inline int
+mempool_check_obj_bounds(void *obj, uint64_t pg_sz, size_t elt_sz)
+{
+	uintptr_t page_end, elt_addr = (uintptr_t)obj;
+	uint32_t pg_shift;
+	uint64_t page_mask;
+
+	if (pg_sz == 0)
+		return 0;
+	if (elt_sz > pg_sz)
+		return 0;
+
+	pg_shift = rte_bsf32(pg_sz);
+	page_mask =  ~((1ull << pg_shift) - 1);
+	page_end = (elt_addr & page_mask) + pg_sz;
+
+	if (elt_addr + elt_sz > page_end)
+		return -1;
+
+	return 0;
+}
+
 int
 rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
 		void *vaddr, rte_iova_t iova, size_t len,
 		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
 {
-	size_t total_elt_sz;
+	size_t total_elt_sz, pg_sz;
 	size_t off;
 	unsigned int i;
 	void *obj;
 
+	rte_mempool_get_page_size(mp, &pg_sz);
+
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+	for (off = 0, i = 0; i < max_objs; i++) {
+		/* align offset to next page start if required */
+		if (mempool_check_obj_bounds((char *)vaddr + off,
+						pg_sz, total_elt_sz) < 0) {
+			off += RTE_PTR_ALIGN_CEIL((char *)vaddr + off, pg_sz) -
+				((char *)vaddr + off);
+		}
+
+		if (off + total_elt_sz > len)
+			break;
+
 		off += mp->header_size;
 		obj = (char *)vaddr + off;
 		obj_cb(mp, obj_cb_arg, obj,