[v5] vfio: fix workaround of BAR mapping

Message ID 20180720081347.6123-1-t.yoshimura8869@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v5] vfio: fix workaround of BAR mapping |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Takeshi Yoshimura July 20, 2018, 8:13 a.m. UTC
  Currently, VFIO will try to map around MSI-X table in the BARs. When
MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
VFIO will just skip the BAR.

Recent kernel versions will allow VFIO to map the entire BAR containing
MSI-X tables (*), so instead of trying to map around the MSI-X vector
or skipping the BAR entirely if it's not possible, we can now try
mapping the entire BAR first. If mapping the entire BAR doesn't
succeed, fall back to the old behavior of mapping around MSI-X table or
skipping the BAR.

(*): "vfio-pci: Allow mapping MSIX BAR",
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")

Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Thanks, Anatoly.
I updated the code with munmap in an error path.
I also fixed the message and the wrong link.

Regards,
Takeshi

 drivers/bus/pci/linux/pci_vfio.c | 93 ++++++++++++++++++++++------------------
 1 file changed, 52 insertions(+), 41 deletions(-)
  

Comments

Thomas Monjalon July 26, 2018, 9:35 a.m. UTC | #1
20/07/2018 10:13, Takeshi Yoshimura:
> Currently, VFIO will try to map around MSI-X table in the BARs. When
> MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
> VFIO will just skip the BAR.
> 
> Recent kernel versions will allow VFIO to map the entire BAR containing
> MSI-X tables (*), so instead of trying to map around the MSI-X vector
> or skipping the BAR entirely if it's not possible, we can now try
> mapping the entire BAR first. If mapping the entire BAR doesn't
> succeed, fall back to the old behavior of mapping around MSI-X table or
> skipping the BAR.
> 
> (*): "vfio-pci: Allow mapping MSIX BAR",
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
> 
> Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
> 
> Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks
  
Jerin Jacob July 29, 2018, 8:44 a.m. UTC | #2
-----Original Message-----
> Date: Thu, 26 Jul 2018 11:35:43 +0200
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
> Cc: dev@dpdk.org, Anatoly Burakov <anatoly.burakov@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v5] vfio: fix workaround of BAR mapping
> 
> 
> 20/07/2018 10:13, Takeshi Yoshimura:
> > Currently, VFIO will try to map around MSI-X table in the BARs. When
> > MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
> > VFIO will just skip the BAR.
> >
> > Recent kernel versions will allow VFIO to map the entire BAR containing
> > MSI-X tables (*), so instead of trying to map around the MSI-X vector
> > or skipping the BAR entirely if it's not possible, we can now try
> > mapping the entire BAR first. If mapping the entire BAR doesn't
> > succeed, fall back to the old behavior of mapping around MSI-X table or
> > skipping the BAR.
> >
> > (*): "vfio-pci: Allow mapping MSIX BAR",
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
> >
> > Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
> >
> > Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
> > Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>


This change set breaks thunderx/octeontx platform with following error.(Tested with 4.9.0 kernel)

EAL:   probe driver: 177d:a034 net_thunderx
EAL:   using IOMMU type 1 (Type 1)
EAL: pci_map_resource(): cannot mmap(44, 0xffff60200000, 0x200000, 0x40000000000): Invalid argument (0xffffffffffffffff)
EAL: PCI device 0001:01:00.2 on NUMA socket 0
EAL:   probe driver: 177d:a034 net_thunderx
EAL: pci_map_resource(): cannot mmap(47, 0xffff60600000, 0x200000, 0x40000000000): Invalid argument (0xffffffffffffffff)

According Linux kernel change, user space application suppose to use VFIO_REGION_INFO_CAP_MSIX_MAPPABLE
capability to detect this feature to work < 4.15 kernel. Right? if so, Why we
are doing this retry based logic?





> 
> Applied, thanks
> 
> 
>
  
Burakov, Anatoly July 30, 2018, 8:51 a.m. UTC | #3
On 29-Jul-18 9:44 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Thu, 26 Jul 2018 11:35:43 +0200
>> From: Thomas Monjalon <thomas@monjalon.net>
>> To: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
>> Cc: dev@dpdk.org, Anatoly Burakov <anatoly.burakov@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v5] vfio: fix workaround of BAR mapping
>>
>>
>> 20/07/2018 10:13, Takeshi Yoshimura:
>>> Currently, VFIO will try to map around MSI-X table in the BARs. When
>>> MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
>>> VFIO will just skip the BAR.
>>>
>>> Recent kernel versions will allow VFIO to map the entire BAR containing
>>> MSI-X tables (*), so instead of trying to map around the MSI-X vector
>>> or skipping the BAR entirely if it's not possible, we can now try
>>> mapping the entire BAR first. If mapping the entire BAR doesn't
>>> succeed, fall back to the old behavior of mapping around MSI-X table or
>>> skipping the BAR.
>>>
>>> (*): "vfio-pci: Allow mapping MSIX BAR",
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
>>> commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
>>>
>>> Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
>>>
>>> Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> 
> This change set breaks thunderx/octeontx platform with following error.(Tested with 4.9.0 kernel)
> 
> EAL:   probe driver: 177d:a034 net_thunderx
> EAL:   using IOMMU type 1 (Type 1)
> EAL: pci_map_resource(): cannot mmap(44, 0xffff60200000, 0x200000, 0x40000000000): Invalid argument (0xffffffffffffffff)
> EAL: PCI device 0001:01:00.2 on NUMA socket 0
> EAL:   probe driver: 177d:a034 net_thunderx
> EAL: pci_map_resource(): cannot mmap(47, 0xffff60600000, 0x200000, 0x40000000000): Invalid argument (0xffffffffffffffff)
> 
> According Linux kernel change, user space application suppose to use VFIO_REGION_INFO_CAP_MSIX_MAPPABLE
> capability to detect this feature to work < 4.15 kernel. Right? if so, Why we
> are doing this retry based logic?

I don't think anything's broken there - just a gratuitous error message.

But yes, i seem to have missed the region info flag. It was my 
suggestion to use the retry logic. I'll submit a patch fixing this.

> 
> 
> 
> 
> 
>>
>> Applied, thanks
>>
>>
>>
>
  
Burakov, Anatoly July 30, 2018, 10:03 a.m. UTC | #4
On 30-Jul-18 9:51 AM, Burakov, Anatoly wrote:
> On 29-Jul-18 9:44 AM, Jerin Jacob wrote:
>> -----Original Message-----
>>> Date: Thu, 26 Jul 2018 11:35:43 +0200
>>> From: Thomas Monjalon <thomas@monjalon.net>
>>> To: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
>>> Cc: dev@dpdk.org, Anatoly Burakov <anatoly.burakov@intel.com>
>>> Subject: Re: [dpdk-dev] [PATCH v5] vfio: fix workaround of BAR mapping
>>>
>>>
>>> 20/07/2018 10:13, Takeshi Yoshimura:
>>>> Currently, VFIO will try to map around MSI-X table in the BARs. When
>>>> MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
>>>> VFIO will just skip the BAR.
>>>>
>>>> Recent kernel versions will allow VFIO to map the entire BAR containing
>>>> MSI-X tables (*), so instead of trying to map around the MSI-X vector
>>>> or skipping the BAR entirely if it's not possible, we can now try
>>>> mapping the entire BAR first. If mapping the entire BAR doesn't
>>>> succeed, fall back to the old behavior of mapping around MSI-X table or
>>>> skipping the BAR.
>>>>
>>>> (*): "vfio-pci: Allow mapping MSIX BAR",
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
>>>> commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
>>>>
>>>> Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
>>>>
>>>> Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>
>>
>> This change set breaks thunderx/octeontx platform with following 
>> error.(Tested with 4.9.0 kernel)
>>
>> EAL:   probe driver: 177d:a034 net_thunderx
>> EAL:   using IOMMU type 1 (Type 1)
>> EAL: pci_map_resource(): cannot mmap(44, 0xffff60200000, 0x200000, 
>> 0x40000000000): Invalid argument (0xffffffffffffffff)
>> EAL: PCI device 0001:01:00.2 on NUMA socket 0
>> EAL:   probe driver: 177d:a034 net_thunderx
>> EAL: pci_map_resource(): cannot mmap(47, 0xffff60600000, 0x200000, 
>> 0x40000000000): Invalid argument (0xffffffffffffffff)
>>
>> According Linux kernel change, user space application suppose to use 
>> VFIO_REGION_INFO_CAP_MSIX_MAPPABLE
>> capability to detect this feature to work < 4.15 kernel. Right? if so, 
>> Why we
>> are doing this retry based logic?
> 
> I don't think anything's broken there - just a gratuitous error message.
> 
> But yes, i seem to have missed the region info flag. It was my 
> suggestion to use the retry logic. I'll submit a patch fixing this.
> 

The patch to fix it involves way more work than i am comfortable with 
submitting to rc3, so i believe we should revert this patch and postpone 
the change to 18.11.
  

Patch

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index aeeaa9ed8..07188c071 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -332,50 +332,59 @@  pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 	void *bar_addr;
 	struct pci_msix_table *msix_table = &vfio_res->msix_table;
 	struct pci_map *bar = &vfio_res->maps[bar_index];
+	bool again = false;
 
 	if (bar->size == 0)
 		/* Skip this BAR */
 		return 0;
 
-	if (msix_table->bar_index == bar_index) {
-		/*
-		 * VFIO will not let us map the MSI-X table,
-		 * but we can map around it.
-		 */
-		uint32_t table_start = msix_table->offset;
-		uint32_t table_end = table_start + msix_table->size;
-		table_end = (table_end + ~PAGE_MASK) & PAGE_MASK;
-		table_start &= PAGE_MASK;
-
-		if (table_start == 0 && table_end >= bar->size) {
-			/* Cannot map this BAR */
-			RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
-			bar->size = 0;
-			bar->addr = 0;
-			return 0;
-		}
-
-		memreg[0].offset = bar->offset;
-		memreg[0].size = table_start;
-		memreg[1].offset = bar->offset + table_end;
-		memreg[1].size = bar->size - table_end;
-
-		RTE_LOG(DEBUG, EAL,
-			"Trying to map BAR%d that contains the MSI-X "
-			"table. Trying offsets: "
-			"0x%04lx:0x%04lx, 0x%04lx:0x%04lx\n", bar_index,
-			memreg[0].offset, memreg[0].size,
-			memreg[1].offset, memreg[1].size);
-	} else {
-		memreg[0].offset = bar->offset;
-		memreg[0].size = bar->size;
-	}
-
 	/* reserve the address using an inaccessible mapping */
 	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
 			MAP_ANONYMOUS | additional_flags, -1, 0);
-	if (bar_addr != MAP_FAILED) {
+	if (bar_addr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL,
+			"Failed to create inaccessible mapping for BAR%d\n",
+			bar_index);
+		return -1;
+	}
+
+	memreg[0].offset = bar->offset;
+	memreg[0].size = bar->size;
+	do {
 		void *map_addr = NULL;
+		if (again) {
+			/*
+			 * VFIO did not let us map the MSI-X table,
+			 * but we can map around it.
+			 */
+			uint32_t table_start = msix_table->offset;
+			uint32_t table_end = table_start + msix_table->size;
+			table_end = (table_end + ~PAGE_MASK) & PAGE_MASK;
+			table_start &= PAGE_MASK;
+
+			if (table_start == 0 && table_end >= bar->size) {
+				/* Cannot map this BAR */
+				RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n",
+						bar_index);
+				munmap(bar_addr, bar->size);
+				bar->size = 0;
+				bar->addr = 0;
+				return 0;
+			}
+
+			memreg[0].offset = bar->offset;
+			memreg[0].size = table_start;
+			memreg[1].offset = bar->offset + table_end;
+			memreg[1].size = bar->size - table_end;
+
+			RTE_LOG(DEBUG, EAL,
+				"Trying to map BAR%d that contains the MSI-X "
+				"table. Trying offsets: "
+				"0x%04lx:0x%04lx, 0x%04lx:0x%04lx\n", bar_index,
+				memreg[0].offset, memreg[0].size,
+				memreg[1].offset, memreg[1].size);
+		}
+
 		if (memreg[0].size) {
 			/* actual map of first part */
 			map_addr = pci_map_resource(bar_addr, vfio_dev_fd,
@@ -384,6 +393,12 @@  pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 							MAP_FIXED);
 		}
 
+		if (map_addr == MAP_FAILED &&
+			msix_table->bar_index == bar_index && !again) {
+			again = true;
+			continue;
+		}
+
 		/* if there's a second part, try to map it */
 		if (map_addr != MAP_FAILED
 			&& memreg[1].offset && memreg[1].size) {
@@ -404,12 +419,8 @@  pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 					bar_index);
 			return -1;
 		}
-	} else {
-		RTE_LOG(ERR, EAL,
-				"Failed to create inaccessible mapping for BAR%d\n",
-				bar_index);
-		return -1;
-	}
+		break;
+	} while (again);
 
 	bar->addr = bar_addr;
 	return 0;