[dpdk-dev,v2] eal:Map rte cfg and uio at the end of hugepage mem

Message ID 1445068109-8943-1-git-send-email-nissimn@radware.com (mailing list archive)
State Changes Requested, archived
Headers

Commit Message

Nissim Nisimov Oct. 17, 2015, 7:48 a.m. UTC
  * this patch removes unnecessary call to rte_eal_memory_init() introduced in the first version.

Problem:
In DPDK Primary/Secondary module we assume mapping same regions of virtual memory addresses for Primary process and Secondary.
An issue may occur when the Primary and secondary processes are not symmetric in such way that the code is not the same (for example, Primary process is a traffic distributer and secondary is a worker). The result may be that specific virtual address region in the first process won't be available in the second process.

Changes done at eal init:
map all related rte configuration and uio sections close to the end of huge pages memory (that mean rte_eal_memory_init() should be called before rte_config_init() in primary process)
According to our observations there will be more probability to success when allocating rte_config and uio memzones after huge pages sections (actually uio is already allocated after the huge pages area)

Signed-off-by: Nissim Nisimov <nissimn@radware.com>
---
 lib/librte_eal/linuxapp/eal/eal.c         | 28 +++++++++++++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 10 +++++++---
 2 files changed, 28 insertions(+), 10 deletions(-)
  

Comments

Bruce Richardson Oct. 19, 2015, 1:07 p.m. UTC | #1
On Sat, Oct 17, 2015 at 10:48:29AM +0300, Nissim Nisimov wrote:
> * this patch removes unnecessary call to rte_eal_memory_init() introduced in the first version.
> 
> Problem:
> In DPDK Primary/Secondary module we assume mapping same regions of virtual memory addresses for Primary process and Secondary.
> An issue may occur when the Primary and secondary processes are not symmetric in such way that the code is not the same (for example, Primary process is a traffic distributer and secondary is a worker). The result may be that specific virtual address region in the first process won't be available in the second process.
> 
> Changes done at eal init:
> map all related rte configuration and uio sections close to the end of huge pages memory (that mean rte_eal_memory_init() should be called before rte_config_init() in primary process)
> According to our observations there will be more probability to success when allocating rte_config and uio memzones after huge pages sections (actually uio is already allocated after the huge pages area)
> 
> Signed-off-by: Nissim Nisimov <nissimn@radware.com>

Hi Nissim,

thanks for the V2. Some review comments below. Also, when applying there were
some whitespace errors reported from git:

git apply dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch
dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch:64: trailing whitespace.

dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch:111: space before tab in indent.
                        pci_map_addr = pci_find_max_end_va();
dpdk-dev-v2-eal-Map-rte-cfg-and-uio-at-the-end-of-hugepage-mem.patch:113: space before tab in indent.
                        pci_map_addr = (void*)RTE_PTR_ALIGN(((char*)rte_eal_get_configuration()->mem_config->mem_cfg_addr + sizeof(struct rte_mem_config)),sysconf(_SC_PAGE_SIZE));
warning: 3 lines add whitespace errors.


> ---
>  lib/librte_eal/linuxapp/eal/eal.c         | 28 +++++++++++++++++++++-------
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 10 +++++++---
>  2 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index 33e1067..f85eb63 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -87,6 +87,7 @@
>  
>  #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
>  
> +void *pci_find_max_end_va(void);

Probably want a blank line after the prototype, since it's unrelated to the line
afterward. Maybe mark it extern too. 
[The alternative to this function prototype here is obviously to include
eal_pci_init.h to get the function prototype - not sure which solution is better]

>  /* Allow the application to print its usage message too if set */
>  static rte_usage_hook_t	rte_application_usage_hook = NULL;
>  
> @@ -189,12 +190,15 @@ rte_eal_config_create(void)
>  		return;
>  
>  	/* map the config before hugepage address so that we don't waste a page */
> -	if (internal_config.base_virtaddr != 0)
> +	if (internal_config.base_virtaddr != 0){
>  		rte_mem_cfg_addr = (void *)
>  			RTE_ALIGN_FLOOR(internal_config.base_virtaddr -
>  			sizeof(struct rte_mem_config), sysconf(_SC_PAGE_SIZE));
> -	else
> -		rte_mem_cfg_addr = NULL;
> +        }
> +	else{
> +		rte_mem_cfg_addr = pci_find_max_end_va();

Would it work as well putting the config immediately before the hugepages
rather than at the end, as is done in the case of having a cmdline-specified virtual
address? If so, it should remove the need to change the eal_pci_uio.c file.

> +                RTE_LOG(INFO, EAL, "rte_mem_cfg_addr =  0x%llx PType=%s\n",(unsigned long long)rte_mem_cfg_addr,rte_config.process_type == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
> +        }
> 

I would suggest making the RTE_LOG statment unconditional. Even having it
printing when the value is specified can be useful to confirm the parameter was
parsed correctly. [It would also reduce the diff as you won't have to add in the
braces to the "if/else" statements].
Also:
* use tabs, not spaces for indentation.
* I would suggest making the LOG statement be at the debug, rather than info
level, as it's not something that normally needs to be printed.

>  	if (mem_cfg_fd < 0){
>  		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
> @@ -227,7 +231,7 @@ rte_eal_config_create(void)
>  	/* store address of the config in the config itself so that secondary
>  	 * processes could later map the config into this exact location */
>  	rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
> -
> +        
>  }
Remove this whitespace change from diff.

>  
>  /* attach to an existing shared memory config */
> @@ -784,6 +788,13 @@ rte_eal_init(int argc, char **argv)
>  
>  	rte_srand(rte_rdtsc());
>  
> +        /* Primary process should allocate hugepages before configuration */
> +	if(internal_config.process_type == RTE_PROC_PRIMARY){
> +		RTE_LOG(INFO, EAL, "Calling rte_eal_memory_init as =%s\n",(rte_config.process_type == RTE_PROC_PRIMARY) ? "PRIMARY" : "SECONDARY");

Split long lines like this after the comma. However, in this case, the process
type is already known, so just hardcode the printing.
Q: Is the printout actually needed - even when debugging? Are there not already
statements in the code to tell if it's a primary or secondary process? [Without
the printout, the two if statments can be collapsed into one.]

> +		if (rte_eal_memory_init() < 0)
> +			rte_panic("Cannot init memory\n");
> +	}
> +
>  	rte_config_init();
>  
>  	if (rte_eal_pci_init() < 0)
> @@ -793,9 +804,12 @@ rte_eal_init(int argc, char **argv)
>  	if (rte_eal_ivshmem_init() < 0)
>  		rte_panic("Cannot init IVSHMEM\n");
>  #endif
> -
> -	if (rte_eal_memory_init() < 0)
> -		rte_panic("Cannot init memory\n");
> +        /* secondary process will call memory init only after calling to rte_config_init() */
> +	if(internal_config.process_type == RTE_PROC_SECONDARY){
> +		RTE_LOG(INFO, EAL, "Calling rte_eal_memory_init as =%s\n",rte_config.process_type == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
> +		if (rte_eal_memory_init() < 0)
> +			rte_panic("Cannot init memory\n");
> +	}

Same comment as above.

>  
>  	/* the directories are locked during eal_hugepage_info_init */
>  	eal_hugedirs_unlock();
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index ac50e13..6812c37 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -338,9 +338,13 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
>  	}
>  
>  	/* try mapping somewhere close to the end of hugepages */
> -	if (pci_map_addr == NULL)
> -		pci_map_addr = pci_find_max_end_va();
> -
> +	if (pci_map_addr == NULL){
> +		if (internal_config.base_virtaddr != 0){
> +                	pci_map_addr = pci_find_max_end_va();
> +                } else{
> +                	pci_map_addr = (void*)RTE_PTR_ALIGN(((char*)rte_eal_get_configuration()->mem_config->mem_cfg_addr + sizeof(struct rte_mem_config)),sysconf(_SC_PAGE_SIZE));
> +                }
> +        }
>  	mapaddr = pci_map_resource(pci_map_addr, fd, 0,
>  			(size_t)dev->mem_resource[res_idx].len, 0);
>  	close(fd);
> -- 
> 2.6.1
>
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 33e1067..f85eb63 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -87,6 +87,7 @@ 
 
 #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
 
+void *pci_find_max_end_va(void);
 /* Allow the application to print its usage message too if set */
 static rte_usage_hook_t	rte_application_usage_hook = NULL;
 
@@ -189,12 +190,15 @@  rte_eal_config_create(void)
 		return;
 
 	/* map the config before hugepage address so that we don't waste a page */
-	if (internal_config.base_virtaddr != 0)
+	if (internal_config.base_virtaddr != 0){
 		rte_mem_cfg_addr = (void *)
 			RTE_ALIGN_FLOOR(internal_config.base_virtaddr -
 			sizeof(struct rte_mem_config), sysconf(_SC_PAGE_SIZE));
-	else
-		rte_mem_cfg_addr = NULL;
+        }
+	else{
+		rte_mem_cfg_addr = pci_find_max_end_va();
+                RTE_LOG(INFO, EAL, "rte_mem_cfg_addr =  0x%llx PType=%s\n",(unsigned long long)rte_mem_cfg_addr,rte_config.process_type == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
+        }
 
 	if (mem_cfg_fd < 0){
 		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
@@ -227,7 +231,7 @@  rte_eal_config_create(void)
 	/* store address of the config in the config itself so that secondary
 	 * processes could later map the config into this exact location */
 	rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
-
+        
 }
 
 /* attach to an existing shared memory config */
@@ -784,6 +788,13 @@  rte_eal_init(int argc, char **argv)
 
 	rte_srand(rte_rdtsc());
 
+        /* Primary process should allocate hugepages before configuration */
+	if(internal_config.process_type == RTE_PROC_PRIMARY){
+		RTE_LOG(INFO, EAL, "Calling rte_eal_memory_init as =%s\n",(rte_config.process_type == RTE_PROC_PRIMARY) ? "PRIMARY" : "SECONDARY");
+		if (rte_eal_memory_init() < 0)
+			rte_panic("Cannot init memory\n");
+	}
+
 	rte_config_init();
 
 	if (rte_eal_pci_init() < 0)
@@ -793,9 +804,12 @@  rte_eal_init(int argc, char **argv)
 	if (rte_eal_ivshmem_init() < 0)
 		rte_panic("Cannot init IVSHMEM\n");
 #endif
-
-	if (rte_eal_memory_init() < 0)
-		rte_panic("Cannot init memory\n");
+        /* secondary process will call memory init only after calling to rte_config_init() */
+	if(internal_config.process_type == RTE_PROC_SECONDARY){
+		RTE_LOG(INFO, EAL, "Calling rte_eal_memory_init as =%s\n",rte_config.process_type == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
+		if (rte_eal_memory_init() < 0)
+			rte_panic("Cannot init memory\n");
+	}
 
 	/* the directories are locked during eal_hugepage_info_init */
 	eal_hugedirs_unlock();
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ac50e13..6812c37 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -338,9 +338,13 @@  pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 	}
 
 	/* try mapping somewhere close to the end of hugepages */
-	if (pci_map_addr == NULL)
-		pci_map_addr = pci_find_max_end_va();
-
+	if (pci_map_addr == NULL){
+		if (internal_config.base_virtaddr != 0){
+                	pci_map_addr = pci_find_max_end_va();
+                } else{
+                	pci_map_addr = (void*)RTE_PTR_ALIGN(((char*)rte_eal_get_configuration()->mem_config->mem_cfg_addr + sizeof(struct rte_mem_config)),sysconf(_SC_PAGE_SIZE));
+                }
+        }
 	mapaddr = pci_map_resource(pci_map_addr, fd, 0,
 			(size_t)dev->mem_resource[res_idx].len, 0);
 	close(fd);