[v2,2/3] vfio: fix DMA mapping granularity for type1 iova as va

Message ID 20201105090423.11954-3-ndabilpuram@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series fix issue with partial DMA unmap |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Nithin Dabilpuram Nov. 5, 2020, 9:04 a.m. UTC
  Partial unmapping is not supported for VFIO IOMMU type1
by kernel. Though kernel gives return as zero, the unmapped size
returned will not be same as expected. So check for
returned unmap size and return error.

For IOVA as PA, DMA mapping is already at memseg size
granularity. Do the same even for IOVA as VA mode as
DMA map/unmap triggered by heap allocations,
maintain granularity of memseg page size so that heap
expansion and contraction does not have this issue.

For user requested DMA map/unmap disallow partial unmapping
for VFIO type1.

Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Cc: anatoly.burakov@intel.com
Cc: stable@dpdk.org

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------
 lib/librte_eal/linux/eal_vfio.h |  1 +
 2 files changed, 29 insertions(+), 6 deletions(-)
  

Comments

Burakov, Anatoly Nov. 10, 2020, 2:04 p.m. UTC | #1
On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote:
> Partial unmapping is not supported for VFIO IOMMU type1
> by kernel. Though kernel gives return as zero, the unmapped size
> returned will not be same as expected. So check for
> returned unmap size and return error.
> 
> For IOVA as PA, DMA mapping is already at memseg size
> granularity. Do the same even for IOVA as VA mode as
> DMA map/unmap triggered by heap allocations,
> maintain granularity of memseg page size so that heap
> expansion and contraction does not have this issue.
> 
> For user requested DMA map/unmap disallow partial unmapping
> for VFIO type1.
> 
> Fixes: 73a639085938 ("vfio: allow to map other memory regions")
> Cc: anatoly.burakov@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> ---

Maybe i just didn't have enough coffee today, but i still don't see why 
this "partial unmap" thing exists.

We are already mapping the addresses page-by-page, so surely "partial" 
unmaps can't even exist in the first place?
  
Burakov, Anatoly Nov. 10, 2020, 2:17 p.m. UTC | #2
On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote:
> Partial unmapping is not supported for VFIO IOMMU type1
> by kernel. Though kernel gives return as zero, the unmapped size
> returned will not be same as expected. So check for
> returned unmap size and return error.
> 
> For IOVA as PA, DMA mapping is already at memseg size
> granularity. Do the same even for IOVA as VA mode as
> DMA map/unmap triggered by heap allocations,
> maintain granularity of memseg page size so that heap
> expansion and contraction does not have this issue.
> 
> For user requested DMA map/unmap disallow partial unmapping
> for VFIO type1.
> 
> Fixes: 73a639085938 ("vfio: allow to map other memory regions")
> Cc: anatoly.burakov@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> ---

<snip>

> @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
>   	/* for IOVA as VA mode, no need to care for IOVA addresses */
>   	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
>   		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
> -		if (type == RTE_MEM_EVENT_ALLOC)
> -			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
> -					len, 1);
> -		else
> -			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
> -					len, 0);
> +		uint64_t page_sz = msl->page_sz;
> +
> +		/* Maintain granularity of DMA map/unmap to memseg size */
> +		for (; cur_len < len; cur_len += page_sz) {
> +			if (type == RTE_MEM_EVENT_ALLOC)
> +				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
> +						 vfio_va, page_sz, 1);
> +			else
> +				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
> +						 vfio_va, page_sz, 0);

I think you're mapping the same address here, over and over. Perhaps you 
meant `vfio_va + cur_len` for the mapping addresses?
  
Burakov, Anatoly Nov. 10, 2020, 2:22 p.m. UTC | #3
On 10-Nov-20 2:04 PM, Burakov, Anatoly wrote:
> On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote:
>> Partial unmapping is not supported for VFIO IOMMU type1
>> by kernel. Though kernel gives return as zero, the unmapped size
>> returned will not be same as expected. So check for
>> returned unmap size and return error.
>>
>> For IOVA as PA, DMA mapping is already at memseg size
>> granularity. Do the same even for IOVA as VA mode as
>> DMA map/unmap triggered by heap allocations,
>> maintain granularity of memseg page size so that heap
>> expansion and contraction does not have this issue.
>>
>> For user requested DMA map/unmap disallow partial unmapping
>> for VFIO type1.
>>
>> Fixes: 73a639085938 ("vfio: allow to map other memory regions")
>> Cc: anatoly.burakov@intel.com
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
>> ---
> 
> Maybe i just didn't have enough coffee today, but i still don't see why 
> this "partial unmap" thing exists.

Oh, right, this is for *user* mapped memory. Disregard this email.

> 
> We are already mapping the addresses page-by-page, so surely "partial" 
> unmaps can't even exist in the first place?
>
  
Nithin Dabilpuram Nov. 11, 2020, 5:08 a.m. UTC | #4
On Tue, Nov 10, 2020 at 02:17:39PM +0000, Burakov, Anatoly wrote:
> On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote:
> > Partial unmapping is not supported for VFIO IOMMU type1
> > by kernel. Though kernel gives return as zero, the unmapped size
> > returned will not be same as expected. So check for
> > returned unmap size and return error.
> > 
> > For IOVA as PA, DMA mapping is already at memseg size
> > granularity. Do the same even for IOVA as VA mode as
> > DMA map/unmap triggered by heap allocations,
> > maintain granularity of memseg page size so that heap
> > expansion and contraction does not have this issue.
> > 
> > For user requested DMA map/unmap disallow partial unmapping
> > for VFIO type1.
> > 
> > Fixes: 73a639085938 ("vfio: allow to map other memory regions")
> > Cc: anatoly.burakov@intel.com
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > ---
> 
> <snip>
> 
> > @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
> >   	/* for IOVA as VA mode, no need to care for IOVA addresses */
> >   	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
> >   		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
> > -		if (type == RTE_MEM_EVENT_ALLOC)
> > -			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
> > -					len, 1);
> > -		else
> > -			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
> > -					len, 0);
> > +		uint64_t page_sz = msl->page_sz;
> > +
> > +		/* Maintain granularity of DMA map/unmap to memseg size */
> > +		for (; cur_len < len; cur_len += page_sz) {
> > +			if (type == RTE_MEM_EVENT_ALLOC)
> > +				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
> > +						 vfio_va, page_sz, 1);
> > +			else
> > +				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
> > +						 vfio_va, page_sz, 0);
> 
> I think you're mapping the same address here, over and over. Perhaps you
> meant `vfio_va + cur_len` for the mapping addresses?

There is a 'vfio_va += page_sz;' in next line right ?
> 
> -- 
> Thanks,
> Anatoly
  
Burakov, Anatoly Nov. 11, 2020, 10 a.m. UTC | #5
On 11-Nov-20 5:08 AM, Nithin Dabilpuram wrote:
> On Tue, Nov 10, 2020 at 02:17:39PM +0000, Burakov, Anatoly wrote:
>> On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote:
>>> Partial unmapping is not supported for VFIO IOMMU type1
>>> by kernel. Though kernel gives return as zero, the unmapped size
>>> returned will not be same as expected. So check for
>>> returned unmap size and return error.
>>>
>>> For IOVA as PA, DMA mapping is already at memseg size
>>> granularity. Do the same even for IOVA as VA mode as
>>> DMA map/unmap triggered by heap allocations,
>>> maintain granularity of memseg page size so that heap
>>> expansion and contraction does not have this issue.
>>>
>>> For user requested DMA map/unmap disallow partial unmapping
>>> for VFIO type1.
>>>
>>> Fixes: 73a639085938 ("vfio: allow to map other memory regions")
>>> Cc: anatoly.burakov@intel.com
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
>>> ---
>>
>> <snip>
>>
>>> @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
>>>    	/* for IOVA as VA mode, no need to care for IOVA addresses */
>>>    	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
>>>    		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
>>> -		if (type == RTE_MEM_EVENT_ALLOC)
>>> -			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
>>> -					len, 1);
>>> -		else
>>> -			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
>>> -					len, 0);
>>> +		uint64_t page_sz = msl->page_sz;
>>> +
>>> +		/* Maintain granularity of DMA map/unmap to memseg size */
>>> +		for (; cur_len < len; cur_len += page_sz) {
>>> +			if (type == RTE_MEM_EVENT_ALLOC)
>>> +				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
>>> +						 vfio_va, page_sz, 1);
>>> +			else
>>> +				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
>>> +						 vfio_va, page_sz, 0);
>>
>> I think you're mapping the same address here, over and over. Perhaps you
>> meant `vfio_va + cur_len` for the mapping addresses?
> 
> There is a 'vfio_va += page_sz;' in next line right ?
>>
>> -- 
>> Thanks,
>> Anatoly

Oh, right, my apologies. I did need more coffee :D

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  

Patch

diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c
index dbefcba..b4f9c33 100644
--- a/lib/librte_eal/linux/eal_vfio.c
+++ b/lib/librte_eal/linux/eal_vfio.c
@@ -69,6 +69,7 @@  static const struct vfio_iommu_type iommu_types[] = {
 	{
 		.type_id = RTE_VFIO_TYPE1,
 		.name = "Type 1",
+		.partial_unmap = false,
 		.dma_map_func = &vfio_type1_dma_map,
 		.dma_user_map_func = &vfio_type1_dma_mem_map
 	},
@@ -76,6 +77,7 @@  static const struct vfio_iommu_type iommu_types[] = {
 	{
 		.type_id = RTE_VFIO_SPAPR,
 		.name = "sPAPR",
+		.partial_unmap = true,
 		.dma_map_func = &vfio_spapr_dma_map,
 		.dma_user_map_func = &vfio_spapr_dma_mem_map
 	},
@@ -83,6 +85,7 @@  static const struct vfio_iommu_type iommu_types[] = {
 	{
 		.type_id = RTE_VFIO_NOIOMMU,
 		.name = "No-IOMMU",
+		.partial_unmap = true,
 		.dma_map_func = &vfio_noiommu_dma_map,
 		.dma_user_map_func = &vfio_noiommu_dma_mem_map
 	},
@@ -525,12 +528,19 @@  vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	/* for IOVA as VA mode, no need to care for IOVA addresses */
 	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
 		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
-		if (type == RTE_MEM_EVENT_ALLOC)
-			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
-					len, 1);
-		else
-			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
-					len, 0);
+		uint64_t page_sz = msl->page_sz;
+
+		/* Maintain granularity of DMA map/unmap to memseg size */
+		for (; cur_len < len; cur_len += page_sz) {
+			if (type == RTE_MEM_EVENT_ALLOC)
+				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
+						 vfio_va, page_sz, 1);
+			else
+				vfio_dma_mem_map(default_vfio_cfg, vfio_va,
+						 vfio_va, page_sz, 0);
+			vfio_va += page_sz;
+		}
+
 		return;
 	}
 
@@ -1369,6 +1379,12 @@  vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 			RTE_LOG(ERR, EAL, "  cannot clear DMA remapping, error %i (%s)\n",
 					errno, strerror(errno));
 			return -1;
+		} else if (dma_unmap.size != len) {
+			RTE_LOG(ERR, EAL, "  unexpected size %"PRIu64" of DMA "
+				"remapping cleared instead of %"PRIu64"\n",
+				(uint64_t)dma_unmap.size, len);
+			rte_errno = EIO;
+			return -1;
 		}
 	}
 
@@ -1839,6 +1855,12 @@  container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
 		/* we're partially unmapping a previously mapped region, so we
 		 * need to split entry into two.
 		 */
+		if (!vfio_cfg->vfio_iommu_type->partial_unmap) {
+			RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n");
+			rte_errno = ENOTSUP;
+			ret = -1;
+			goto out;
+		}
 		if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) {
 			RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n");
 			rte_errno = ENOMEM;
diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h
index cb2d35f..6ebaca6 100644
--- a/lib/librte_eal/linux/eal_vfio.h
+++ b/lib/librte_eal/linux/eal_vfio.h
@@ -113,6 +113,7 @@  typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova,
 struct vfio_iommu_type {
 	int type_id;
 	const char *name;
+	bool partial_unmap;
 	vfio_dma_user_func_t dma_user_map_func;
 	vfio_dma_func_t dma_map_func;
 };