[dpdk-dev] VFIO: Avoid to enable vfio while the module not loaded

Message ID 1417664219-19679-1-git-send-email-michael.qiu@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Michael Qiu Dec. 4, 2014, 3:36 a.m. UTC
  When vfio module is not loaded when kernel support vfio feature,
the routine still try to open the container to get file
description.

This action is not safe, and of cause got error messages:

EAL: Detected 40 lcore(s)
EAL:   unsupported IOMMU type!
EAL: VFIO support could not be initialized
EAL: Setting up memory...

This may make user confuse, this patch make it reasonable
and much more soomth to user.

Signed-off-by: Michael Qiu <michael.qiu@intel.com>
---
 lib/librte_eal/common/include/rte_common.h | 37 ++++++++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 23 +++++++++++++------
 2 files changed, 53 insertions(+), 7 deletions(-)
  

Comments

Burakov, Anatoly Dec. 4, 2014, 1:12 p.m. UTC | #1
Hi Michael

> When vfio module is not loaded when kernel support vfio feature, the
> routine still try to open the container to get file description.
> 
> This action is not safe, and of cause got error messages:
> 
> EAL: Detected 40 lcore(s)
> EAL:   unsupported IOMMU type!
> EAL: VFIO support could not be initialized
> EAL: Setting up memory...
> 
> This may make user confuse, this patch make it reasonable and much more
> soomth to user.

 Not sure I agree with the premise of this patch.

First of all, if VFIO driver is not enabled, the container file would not be present and you would get a different error (namely, "cannot open VFIO container", in pci_vfio_get_container_fd()). If you have a container file,  that means VFIO driver is loaded, so I'm not sure why you get the "unsupported IOMMU type" error. I suppose it could happen when vfio is loaded but vfio_iommu_type1 isn't?

And even then, this error is harmless and doesn't do anything, so I'm not sure what this patch is supposed to fix. The error messages tells the user exactly what happens.
 
Thanks,
Anatoly
  
Burakov, Anatoly Dec. 4, 2014, 1:13 p.m. UTC | #2
Hi Michael

> When vfio module is not loaded when kernel support vfio feature, the
> routine still try to open the container to get file description.
> 
> This action is not safe, and of cause got error messages:
> 
> EAL: Detected 40 lcore(s)
> EAL:   unsupported IOMMU type!
> EAL: VFIO support could not be initialized
> EAL: Setting up memory...
> 
> This may make user confuse, this patch make it reasonable and much more
> soomth to user.

 Not sure I agree with the premise of this patch.

First of all, if VFIO driver is not enabled, the container file would not be present and you would get a different error (namely, "cannot open VFIO container", in pci_vfio_get_container_fd()). If you have a container file,  that means VFIO driver is loaded, so I'm not sure why you get the "unsupported IOMMU type" error. I suppose it could happen when vfio is loaded but vfio_iommu_type1 isn't?

And even then, this error is harmless and doesn't do anything, so I'm not sure what this patch is supposed to fix. The error messages tells the user exactly what happens.
 
Thanks,
Anatoly
  
Michael Qiu Dec. 4, 2014, 1:47 p.m. UTC | #3
On 12/4/2014 9:12 PM, Burakov, Anatoly wrote:
> Hi Michael
>
>> When vfio module is not loaded when kernel support vfio feature, the
>> routine still try to open the container to get file description.
>>
>> This action is not safe, and of cause got error messages:
>>
>> EAL: Detected 40 lcore(s)
>> EAL:   unsupported IOMMU type!
>> EAL: VFIO support could not be initialized
>> EAL: Setting up memory...
>>
>> This may make user confuse, this patch make it reasonable and much more
>> soomth to user.
>  Not sure I agree with the premise of this patch.
>
> First of all, if VFIO driver is not enabled, the container file would not be present and you would get a different error (namely, "cannot open VFIO container", in pci_vfio_get_container_fd()). If you have a container file,  that means VFIO driver is loaded, so I'm not sure why you get the "unsupported IOMMU type" error. I suppose it could happen when vfio is loaded but vfio_iommu_type1 isn't?

But indeed, when try to unload both vfio and vfio_iommu_type1,
/dev/vfio/vfio still there, I'm also surprise.

My ENV is fedora20, kernel version 3.6.7-200 X86_64.

Believe or not, you can have a try, it seems a kernel issue.

When you unload both two modules, then open /dev/vfio/vfio, you will
find it can be opened with no errors(but this time both two modules
loaded automatically, strange enough)

Also you can use ioctl to get API Version. But when you try to get the
iommu type, it will return a "0" not expect value of  '1'.

Then you can shutdown DPDK, reopen like test-pmd, all works fine :)

I will take a deep look at in the kernel side, to find out why this happens.

Thanks,
Michael
> And even then, this error is harmless and doesn't do anything, so I'm not sure what this patch is supposed to fix. The error messages tells the user exactly what happens.
>  
> Thanks,
> Anatoly 
>
  
Burakov, Anatoly Dec. 4, 2014, 4:31 p.m. UTC | #4
Hi Michael

> But indeed, when try to unload both vfio and vfio_iommu_type1,
> /dev/vfio/vfio still there, I'm also surprise.
> 
> My ENV is fedora20, kernel version 3.6.7-200 X86_64.
> 
> Believe or not, you can have a try, it seems a kernel issue.
> 
> When you unload both two modules, then open /dev/vfio/vfio, you will find
> it can be opened with no errors(but this time both two modules loaded
> automatically, strange enough)
> 

Thanks to Sergio, we found a most likely cause for this. This patch to Linux kernel by Alex Williamson of Red Hat:

https://lkml.org/lkml/2013/12/12/421

it seems, however, that it has been merged into 3.14. Your kernel, by your own admission, is 3.6. Are you sure this is the right kernel version? Because my own machine has Fedora 18 with a 3.11 kernel, and it (correctly) does not display this behavior. So unless Fedora 20 backported those changes to kernel 3.6, this shouldn't happen on your set up. (but it doesn't really matter, just FYI - the patch still should be fixed and resubmitted, just as we discussed)

Thanks,
Anatoly
  
Michael Qiu Dec. 5, 2014, 4:01 a.m. UTC | #5
On 12/5/2014 12:31 AM, Burakov, Anatoly wrote:
> Hi Michael
>
>> But indeed, when try to unload both vfio and vfio_iommu_type1,
>> /dev/vfio/vfio still there, I'm also surprise.
>>
>> My ENV is fedora20, kernel version 3.6.7-200 X86_64.
>>
>> Believe or not, you can have a try, it seems a kernel issue.
>>
>> When you unload both two modules, then open /dev/vfio/vfio, you will find
>> it can be opened with no errors(but this time both two modules loaded
>> automatically, strange enough)
>>
> Thanks to Sergio, we found a most likely cause for this. This patch to Linux kernel by Alex Williamson of Red Hat:
>
> https://lkml.org/lkml/2013/12/12/421
>
> it seems, however, that it has been merged into 3.14. Your kernel, by your own admission, is 3.6. Are you sure this is the right kernel version? Because my own machine has Fedora 18 with a 3.11 kernel, and it (correctly) does not 

Sorry, the kernel version is 3.16 :), just make a mistake :)

Thanks,
Michael
> display this behavior. So unless Fedora 20 backported those changes to kernel 3.6, this shouldn't happen on your set up. (but it doesn't really matter, just FYI - the patch still should be fixed and resubmitted, just as we discussed)
>
> Thanks,
> Anatoly
>
  

Patch

diff --git a/lib/librte_eal/common/include/rte_common.h b/lib/librte_eal/common/include/rte_common.h
index 921b91f..333aa6b 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -50,6 +50,8 @@  extern "C" {
 #include <ctype.h>
 #include <errno.h>
 #include <limits.h>
+#include <string.h>
+#include <stdio.h>
 
 /*********** Macros to eliminate unused variable warnings ********/
 
@@ -382,6 +384,41 @@  rte_exit(int exit_code, const char *format, ...)
 	__attribute__((noreturn))
 	__attribute__((format(printf, 2, 3)));
 
+/**
+ * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
+ * etc.) loaded.
+ *
+ * @param module_name
+ *	The module's name which need to be checked
+ *
+ * @return
+ * 	-1 means some error happens(NULL pointer or open failure)
+ * 	0  means the module not loaded
+ * 	1  means the module loaded
+ */
+static inline int
+check_module(const char *module_name)
+{
+	char mod_name[30]; /* Any module names can be longer than 30 bytes? */
+	int ret = 0;
+
+	if (NULL == module_name)
+		return -1;
+	FILE * fd = fopen("/proc/modules", "r");
+	if( fd == NULL)
+		return -1;
+	while(!feof(fd)) {
+		fscanf(fd, "%s %*[^\n]", mod_name);
+		if(!strcmp(mod_name, module_name)) {
+			ret = 1;
+			break;
+		}
+	}
+	fclose(fd);
+
+	return ret;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c1246e8..a11cc4b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -44,6 +44,7 @@ 
 #include <rte_tailq.h>
 #include <rte_eal_memconfig.h>
 #include <rte_malloc.h>
+#include <rte_common.h>
 
 #include "eal_filesystem.h"
 #include "eal_pci_init.h"
@@ -342,7 +343,8 @@  pci_vfio_get_container_fd(void)
 				RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
 						"error %i (%s)\n", errno, strerror(errno));
 			else
-				RTE_LOG(ERR, EAL, "  unsupported IOMMU type!\n");
+				RTE_LOG(ERR, EAL, "  unsupported IOMMU type! "
+					"expect: 1, actual: %d\n", ret);
 			close(vfio_container_fd);
 			return -1;
 		}
@@ -788,13 +790,20 @@  pci_vfio_enable(void)
 		vfio_cfg.vfio_groups[i].fd = -1;
 		vfio_cfg.vfio_groups[i].group_no = -1;
 	}
-	vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();
 
-	/* check if we have VFIO driver enabled */
-	if (vfio_cfg.vfio_container_fd != -1)
-		vfio_cfg.vfio_enabled = 1;
-	else
-		RTE_LOG(INFO, EAL, "VFIO support could not be initialized\n");
+	if (check_module("vfio") == 1 &&
+	    check_module("vfio_iommu_type1") == 1) {
+		vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();
+
+		/* check if we have VFIO driver enabled */
+		if (vfio_cfg.vfio_container_fd != -1)
+			vfio_cfg.vfio_enabled = 1;
+		else
+			RTE_LOG(INFO, EAL, "VFIO support could not be"
+				" initialized\n");
+	} else
+		RTE_LOG(INFO, EAL, "VFIO modules are not all loaded,"
+			" skip VFIO support ...\n");
 
 	return 0;
 }