vfio: map contiguous areas in one go

Message ID c6550c8e66dbc7a54a0b495ecda58b0eb79c07ca.1582637079.git.anatoly.burakov@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series vfio: map contiguous areas in one go |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed

Commit Message

Burakov, Anatoly Feb. 25, 2020, 1:24 p.m. UTC
  Currently, when we are creating DMA mappings for memory that's
either external or is backed by hugepages in IOVA as PA mode, we
assume that each page is necessarily discontiguous. This may not
actually be the case, especially for external memory, where the
user is able to create their own IOVA table and make it
contiguous. This is a problem because VFIO has a limited number
of DMA mappings, and it does not appear to concatenate them and
treats each mapping as separate, even when they cover adjacent
areas.

Fix this so that we always map contiguous memory in a single
chunk, as opposed to mapping each segment separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linux/eal/eal_vfio.c | 59 +++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 8 deletions(-)
  

Comments

Ray Kinsella Feb. 25, 2020, 1:49 p.m. UTC | #1
Hi Anatoly,

On 25/02/2020 13:24, Anatoly Burakov wrote:
> Currently, when we are creating DMA mappings for memory that's
> either external or is backed by hugepages in IOVA as PA mode, we
> assume that each page is necessarily discontiguous. This may not
> actually be the case, especially for external memory, where the
> user is able to create their own IOVA table and make it
> contiguous. This is a problem because VFIO has a limited number
> of DMA mappings, and it does not appear to concatenate them and
> treats each mapping as separate, even when they cover adjacent
> areas.
> > Fix this so that we always map contiguous memory in a single
> chunk, as opposed to mapping each segment separately.

Can I confirm my understanding.

We are essentially correcting user errant behavior,
trading off startup/mapping time to save IOMMU resources?

> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/linux/eal/eal_vfio.c | 59 +++++++++++++++++++++++++----
>  1 file changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
> index 01b5ef3f42..4502aefed3 100644
> --- a/lib/librte_eal/linux/eal/eal_vfio.c
> +++ b/lib/librte_eal/linux/eal/eal_vfio.c
> @@ -514,9 +514,11 @@ static void
>  vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
>  		void *arg __rte_unused)
>  {
> +	rte_iova_t iova_start, iova_expected;
>  	struct rte_memseg_list *msl;
>  	struct rte_memseg *ms;
>  	size_t cur_len = 0;
> +	uint64_t va_start;
>  
>  	msl = rte_mem_virt2memseg_list(addr);
>  
> @@ -545,22 +547,63 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
>  #endif
>  	/* memsegs are contiguous in memory */
>  	ms = rte_mem_virt2memseg(addr, msl);
> +
> +	/*
> +	 * This memory is not guaranteed to be contiguous, but it still could
> +	 * be, or it could have some small contiguous chunks. Since the number
> +	 * of VFIO mappings is limited, and VFIO appears to not concatenate
> +	 * adjacent mappings, we have to do this ourselves.
> +	 *
> +	 * So, find contiguous chunks, then map them.
> +	 */
> +	va_start = ms->addr_64;
> +	iova_start = iova_expected = ms->iova;
>  	while (cur_len < len) {
> +		bool new_contig_area = ms->iova != iova_expected;
> +		bool last_seg = (len - cur_len) == ms->len;
> +		bool skip_last = false;
> +
> +		/* only do mappings when current contiguous area ends */
> +		if (new_contig_area) {
> +			if (type == RTE_MEM_EVENT_ALLOC)
> +				vfio_dma_mem_map(default_vfio_cfg, va_start,
> +						iova_start,
> +						iova_expected - iova_start, 1);
> +			else
> +				vfio_dma_mem_map(default_vfio_cfg, va_start,
> +						iova_start,
> +						iova_expected - iova_start, 0);
> +			va_start = ms->addr_64;
> +			iova_start = ms->iova;
> +		}
>  		/* some memory segments may have invalid IOVA */
>  		if (ms->iova == RTE_BAD_IOVA) {
>  			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
>  					ms->addr);
> -			goto next;
> +			skip_last = true;
>  		}
> -		if (type == RTE_MEM_EVENT_ALLOC)
> -			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
> -					ms->iova, ms->len, 1);
> -		else
> -			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
> -					ms->iova, ms->len, 0);
> -next:
> +		iova_expected = ms->iova + ms->len;
>  		cur_len += ms->len;
>  		++ms;
> +
> +		/*
> +		 * don't count previous segment, and don't attempt to
> +		 * dereference a potentially invalid pointer.
> +		 */
> +		if (skip_last && !last_seg) {
> +			iova_expected = iova_start = ms->iova;
> +			va_start = ms->addr_64;
> +		} else if (!skip_last && last_seg) {
> +			/* this is the last segment and we're not skipping */
> +			if (type == RTE_MEM_EVENT_ALLOC)
> +				vfio_dma_mem_map(default_vfio_cfg, va_start,
> +						iova_start,
> +						iova_expected - iova_start, 1);
> +			else
> +				vfio_dma_mem_map(default_vfio_cfg, va_start,
> +						iova_start,
> +						iova_expected - iova_start, 0);
> +		}
>  	}
>  #ifdef RTE_ARCH_PPC_64
>  	cur_len = 0;
>
  
Burakov, Anatoly Feb. 25, 2020, 2:21 p.m. UTC | #2
On 25-Feb-20 1:49 PM, Ray Kinsella wrote:
> Hi Anatoly,
> 
> On 25/02/2020 13:24, Anatoly Burakov wrote:
>> Currently, when we are creating DMA mappings for memory that's
>> either external or is backed by hugepages in IOVA as PA mode, we
>> assume that each page is necessarily discontiguous. This may not
>> actually be the case, especially for external memory, where the
>> user is able to create their own IOVA table and make it
>> contiguous. This is a problem because VFIO has a limited number
>> of DMA mappings, and it does not appear to concatenate them and
>> treats each mapping as separate, even when they cover adjacent
>> areas.
>>> Fix this so that we always map contiguous memory in a single
>> chunk, as opposed to mapping each segment separately.
> 
> Can I confirm my understanding.
> 
> We are essentially correcting user errant behavior,
> trading off startup/mapping time to save IOMMU resources?
> 

That's not quite what we're doing.

First of all, in terms of "trading startup/mapping time", i think this 
will actually be faster because the DMA map is the more 
resource-intensive part of this loop by far, and we're doing _less_ of 
those (because we're concatenating). We're also doing the same number of 
loop iterations as before.

To be perfectly clear: we're not reordering segments here - segments 
have to be VA- and IOVA-contiguous in the first place, otherwise we're 
breaking them up. It's just that previously, we were also breaking up 
contiguous segments into separate, per-page mappings, but now we 
concatenate them.
  
David Marchand March 27, 2020, 10:13 a.m. UTC | #3
On Tue, Feb 25, 2020 at 2:25 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> Currently, when we are creating DMA mappings for memory that's
> either external or is backed by hugepages in IOVA as PA mode, we
> assume that each page is necessarily discontiguous. This may not
> actually be the case, especially for external memory, where the
> user is able to create their own IOVA table and make it
> contiguous. This is a problem because VFIO has a limited number
> of DMA mappings, and it does not appear to concatenate them and
> treats each mapping as separate, even when they cover adjacent
> areas.
>
> Fix this so that we always map contiguous memory in a single
> chunk, as opposed to mapping each segment separately.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks.
  

Patch

diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 01b5ef3f42..4502aefed3 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -514,9 +514,11 @@  static void
 vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 		void *arg __rte_unused)
 {
+	rte_iova_t iova_start, iova_expected;
 	struct rte_memseg_list *msl;
 	struct rte_memseg *ms;
 	size_t cur_len = 0;
+	uint64_t va_start;
 
 	msl = rte_mem_virt2memseg_list(addr);
 
@@ -545,22 +547,63 @@  vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 #endif
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
+
+	/*
+	 * This memory is not guaranteed to be contiguous, but it still could
+	 * be, or it could have some small contiguous chunks. Since the number
+	 * of VFIO mappings is limited, and VFIO appears to not concatenate
+	 * adjacent mappings, we have to do this ourselves.
+	 *
+	 * So, find contiguous chunks, then map them.
+	 */
+	va_start = ms->addr_64;
+	iova_start = iova_expected = ms->iova;
 	while (cur_len < len) {
+		bool new_contig_area = ms->iova != iova_expected;
+		bool last_seg = (len - cur_len) == ms->len;
+		bool skip_last = false;
+
+		/* only do mappings when current contiguous area ends */
+		if (new_contig_area) {
+			if (type == RTE_MEM_EVENT_ALLOC)
+				vfio_dma_mem_map(default_vfio_cfg, va_start,
+						iova_start,
+						iova_expected - iova_start, 1);
+			else
+				vfio_dma_mem_map(default_vfio_cfg, va_start,
+						iova_start,
+						iova_expected - iova_start, 0);
+			va_start = ms->addr_64;
+			iova_start = ms->iova;
+		}
 		/* some memory segments may have invalid IOVA */
 		if (ms->iova == RTE_BAD_IOVA) {
 			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
 					ms->addr);
-			goto next;
+			skip_last = true;
 		}
-		if (type == RTE_MEM_EVENT_ALLOC)
-			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
-					ms->iova, ms->len, 1);
-		else
-			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
-					ms->iova, ms->len, 0);
-next:
+		iova_expected = ms->iova + ms->len;
 		cur_len += ms->len;
 		++ms;
+
+		/*
+		 * don't count previous segment, and don't attempt to
+		 * dereference a potentially invalid pointer.
+		 */
+		if (skip_last && !last_seg) {
+			iova_expected = iova_start = ms->iova;
+			va_start = ms->addr_64;
+		} else if (!skip_last && last_seg) {
+			/* this is the last segment and we're not skipping */
+			if (type == RTE_MEM_EVENT_ALLOC)
+				vfio_dma_mem_map(default_vfio_cfg, va_start,
+						iova_start,
+						iova_expected - iova_start, 1);
+			else
+				vfio_dma_mem_map(default_vfio_cfg, va_start,
+						iova_start,
+						iova_expected - iova_start, 0);
+		}
 	}
 #ifdef RTE_ARCH_PPC_64
 	cur_len = 0;