[1/2] mempool: use actual IOVA addresses when populating

Message ID 825d02ef7f7b6ab65a36d9fa4719847228537384.1573739893.git.anatoly.burakov@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [1/2] mempool: use actual IOVA addresses when populating |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-compilation success Compile Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Burakov, Anatoly Nov. 14, 2019, 1:58 p.m. UTC
  Currently, when mempool is being populated, we get IOVA address
of every segment using rte_mem_virt2iova(). This works for internal
memory, but does not really work for external memory, and does not
work on platforms which return RTE_BAD_IOVA as a result of this
call (such as FreeBSD). Moreover, even when it works, the function
in question will do unnecessary pagewalks in IOVA as PA mode, as
it falls back to rte_mem_virt2phy() instead of just doing a lookup in
internal memseg table.

To fix it, replace the call to first attempt to look through the
internal memseg table (this takes care of internal and external memory),
and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
translation via memseg table.

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_mempool/rte_mempool.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)
  

Comments

Olivier Matz Nov. 15, 2019, 8:46 a.m. UTC | #1
On Thu, Nov 14, 2019 at 01:58:20PM +0000, Anatoly Burakov wrote:
> Currently, when mempool is being populated, we get IOVA address
> of every segment using rte_mem_virt2iova(). This works for internal
> memory, but does not really work for external memory, and does not
> work on platforms which return RTE_BAD_IOVA as a result of this
> call (such as FreeBSD). Moreover, even when it works, the function
> in question will do unnecessary pagewalks in IOVA as PA mode, as
> it falls back to rte_mem_virt2phy() instead of just doing a lookup in
> internal memseg table.
> 
> To fix it, replace the call to first attempt to look through the
> internal memseg table (this takes care of internal and external memory),
> and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
> translation via memseg table.
> 
> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")

Shouldn't we also add
Fixes: 035ee5bea5ef ("mempool: remove optimistic IOVA-contiguous allocation")
?

From what I understand, even if the problem existed in populate_virt()
before, this is the commit that makes the problem visible in most cases.

> Cc: stable@dpdk.org
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>
  
Burakov, Anatoly Nov. 15, 2019, 10:12 a.m. UTC | #2
On 15-Nov-19 8:46 AM, Olivier Matz wrote:
> On Thu, Nov 14, 2019 at 01:58:20PM +0000, Anatoly Burakov wrote:
>> Currently, when mempool is being populated, we get IOVA address
>> of every segment using rte_mem_virt2iova(). This works for internal
>> memory, but does not really work for external memory, and does not
>> work on platforms which return RTE_BAD_IOVA as a result of this
>> call (such as FreeBSD). Moreover, even when it works, the function
>> in question will do unnecessary pagewalks in IOVA as PA mode, as
>> it falls back to rte_mem_virt2phy() instead of just doing a lookup in
>> internal memseg table.
>>
>> To fix it, replace the call to first attempt to look through the
>> internal memseg table (this takes care of internal and external memory),
>> and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
>> translation via memseg table.
>>
>> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
> 
> Shouldn't we also add
> Fixes: 035ee5bea5ef ("mempool: remove optimistic IOVA-contiguous allocation")
> ?
> 
>>From what I understand, even if the problem existed in populate_virt()
> before, this is the commit that makes the problem visible in most cases.

The commit didn't introduce the issue - only revealed it. So the commit 
itself isn't buggy, and there's nothing to fix :) However, i'm not 
opposed to sharing the blame!

> 
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
>
  
Li, WenjieX A Nov. 19, 2019, 1:45 a.m. UTC | #3
Tested-by: Chen, BoX C <box.c.chen@intel.com>

> -----Original Message-----
> From: stable [mailto:stable-bounces@dpdk.org] On Behalf Of Anatoly Burakov
> Sent: Thursday, November 14, 2019 9:58 PM
> To: dev@dpdk.org
> Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; david.marchand@redhat.com; stable@dpdk.org
> Subject: [dpdk-stable] [PATCH 1/2] mempool: use actual IOVA addresses when
> populating
> 
> Currently, when mempool is being populated, we get IOVA address of every
> segment using rte_mem_virt2iova(). This works for internal memory, but does
> not really work for external memory, and does not work on platforms which
> return RTE_BAD_IOVA as a result of this call (such as FreeBSD). Moreover, even
> when it works, the function in question will do unnecessary pagewalks in IOVA
> as PA mode, as it falls back to rte_mem_virt2phy() instead of just doing a lookup
> in internal memseg table.
> 
> To fix it, replace the call to first attempt to look through the internal memseg
> table (this takes care of internal and external memory), and fall back to
> rte_mem_virt2iova() when unable to perform VA->IOVA translation via memseg
> table.
> 
> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_mempool/rte_mempool.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c
> b/lib/librte_mempool/rte_mempool.c
> index 40cae3eb67..8da2e471c7 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -356,6 +356,19 @@ rte_mempool_populate_iova(struct rte_mempool *mp,
> char *vaddr,
>  	return ret;
>  }
> 
> +static rte_iova_t
> +get_iova(void *addr)
> +{
> +	struct rte_memseg *ms;
> +
> +	/* try registered memory first */
> +	ms = rte_mem_virt2memseg(addr, NULL);
> +	if (ms == NULL || ms->iova == RTE_BAD_IOVA)
> +		/* fall back to actual physical address */
> +		return rte_mem_virt2iova(addr);
> +	return ms->iova + RTE_PTR_DIFF(addr, ms->addr); }
> +
>  /* Populate the mempool with a virtual area. Return the number of
>   * objects added, or a negative value on error.
>   */
> @@ -375,7 +388,7 @@ rte_mempool_populate_virt(struct rte_mempool *mp,
> char *addr,
>  	for (off = 0; off < len &&
>  		     mp->populated_size < mp->size; off += phys_len) {
> 
> -		iova = rte_mem_virt2iova(addr + off);
> +		iova = get_iova(addr + off);
> 
>  		if (iova == RTE_BAD_IOVA && rte_eal_has_hugepages()) {
>  			ret = -EINVAL;
> @@ -391,7 +404,7 @@ rte_mempool_populate_virt(struct rte_mempool *mp,
> char *addr,
>  		     phys_len = RTE_MIN(phys_len + pg_sz, len - off)) {
>  			rte_iova_t iova_tmp;
> 
> -			iova_tmp = rte_mem_virt2iova(addr + off + phys_len);
> +			iova_tmp = get_iova(addr + off + phys_len);
> 
>  			if (iova_tmp == RTE_BAD_IOVA ||
>  					iova_tmp != iova + phys_len)
> --
> 2.17.1
  
David Marchand Nov. 19, 2019, 8:56 p.m. UTC | #4
On Thu, Nov 14, 2019 at 2:58 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> Currently, when mempool is being populated, we get IOVA address
> of every segment using rte_mem_virt2iova(). This works for internal
> memory, but does not really work for external memory, and does not
> work on platforms which return RTE_BAD_IOVA as a result of this
> call (such as FreeBSD). Moreover, even when it works, the function
> in question will do unnecessary pagewalks in IOVA as PA mode, as
> it falls back to rte_mem_virt2phy() instead of just doing a lookup in
> internal memseg table.
>
> To fix it, replace the call to first attempt to look through the
> internal memseg table (this takes care of internal and external memory),
> and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
> translation via memseg table.
>
> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
> Cc: stable@dpdk.org
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>

Series applied, thanks.

--
David Marchand
  

Patch

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 40cae3eb67..8da2e471c7 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -356,6 +356,19 @@  rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	return ret;
 }
 
+static rte_iova_t
+get_iova(void *addr)
+{
+	struct rte_memseg *ms;
+
+	/* try registered memory first */
+	ms = rte_mem_virt2memseg(addr, NULL);
+	if (ms == NULL || ms->iova == RTE_BAD_IOVA)
+		/* fall back to actual physical address */
+		return rte_mem_virt2iova(addr);
+	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
+}
+
 /* Populate the mempool with a virtual area. Return the number of
  * objects added, or a negative value on error.
  */
@@ -375,7 +388,7 @@  rte_mempool_populate_virt(struct rte_mempool *mp, char *addr,
 	for (off = 0; off < len &&
 		     mp->populated_size < mp->size; off += phys_len) {
 
-		iova = rte_mem_virt2iova(addr + off);
+		iova = get_iova(addr + off);
 
 		if (iova == RTE_BAD_IOVA && rte_eal_has_hugepages()) {
 			ret = -EINVAL;
@@ -391,7 +404,7 @@  rte_mempool_populate_virt(struct rte_mempool *mp, char *addr,
 		     phys_len = RTE_MIN(phys_len + pg_sz, len - off)) {
 			rte_iova_t iova_tmp;
 
-			iova_tmp = rte_mem_virt2iova(addr + off + phys_len);
+			iova_tmp = get_iova(addr + off + phys_len);
 
 			if (iova_tmp == RTE_BAD_IOVA ||
 					iova_tmp != iova + phys_len)