[dpdk-dev,2/2] lib/librte_eal: Remove unnecessary hugepage zero-filling

Message ID 1448219615-63746-3-git-send-email-zhihong.wang@intel.com (mailing list archive)
State Accepted, archived
Headers

Commit Message

Zhihong Wang Nov. 22, 2015, 7:13 p.m. UTC
  The kernel fills new allocated (huge) pages with zeros.
DPDK just has to populate page tables to trigger the allocation.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)
  

Comments

Stephen Hemminger Nov. 23, 2015, 2:28 a.m. UTC | #1
On Sun, 22 Nov 2015 14:13:35 -0500
Zhihong Wang <zhihong.wang@intel.com> wrote:

> The kernel fills new allocated (huge) pages with zeros.
> DPDK just has to populate page tables to trigger the allocation.
> 
> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_memory.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 0de75cd..21a5146 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -399,8 +399,10 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>  			return -1;
>  		}
>  
> +		/* map the segment, and populate page tables,
> +		 * the kernel fills this segment with zeros */
>  		virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
> -				MAP_SHARED, fd, 0);
> +				MAP_SHARED | MAP_POPULATE, fd, 0);
>  		if (virtaddr == MAP_FAILED) {
>  			RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__,
>  					strerror(errno));
> @@ -410,7 +412,6 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>  
>  		if (orig) {
>  			hugepg_tbl[i].orig_va = virtaddr;
> -			memset(virtaddr, 0, hugepage_sz);
>  		}
>  		else {
>  			hugepg_tbl[i].final_va = virtaddr;
> @@ -529,22 +530,16 @@ remap_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
>  
>  			old_addr = vma_addr;
>  
> -			/* map new, bigger segment */
> +			/* map new, bigger segment, and populate page tables,
> +			 * the kernel fills this segment with zeros */
>  			vma_addr = mmap(vma_addr, total_size,
> -					PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +					PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 0);
>  
>  			if (vma_addr == MAP_FAILED || vma_addr != old_addr) {
>  				RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__, strerror(errno));
>  				close(fd);
>  				return -1;
>  			}
> -
> -			/* touch the page. this is needed because kernel postpones mapping
> -			 * creation until the first page fault. with this, we pin down
> -			 * the page and it is marked as used and gets into process' pagemap.
> -			 */
> -			for (offset = 0; offset < total_size; offset += hugepage_sz)
> -				*((volatile uint8_t*) RTE_PTR_ADD(vma_addr, offset));
>  		}
>  
>  		/* set shared flock on the file. */
> @@ -592,9 +587,6 @@ remap_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
>  			}
>  		}
>  
> -		/* zero out the whole segment */
> -		memset(hugepg_tbl[page_idx].final_va, 0, total_size);
> -
>  		page_idx++;
>  	}
>  

Nice, especially on slow machines or with large memory.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>
  
Thomas Monjalon Nov. 24, 2015, 9:13 p.m. UTC | #2
2015-11-22 18:28, Stephen Hemminger:
> On Sun, 22 Nov 2015 14:13:35 -0500
> Zhihong Wang <zhihong.wang@intel.com> wrote:
> 
> > The kernel fills new allocated (huge) pages with zeros.
> > DPDK just has to populate page tables to trigger the allocation.
> > 
> > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> 
> Nice, especially on slow machines or with large memory.
> 
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>

Yes very nice.
I think it's too late to integrate this change which can have some
unpredictable side effects.
Do you agree to wait for 2.3?
  
Stephen Hemminger Nov. 24, 2015, 10:44 p.m. UTC | #3
On Tue, 24 Nov 2015 22:13:28 +0100
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-11-22 18:28, Stephen Hemminger:
> > On Sun, 22 Nov 2015 14:13:35 -0500
> > Zhihong Wang <zhihong.wang@intel.com> wrote:
> > 
> > > The kernel fills new allocated (huge) pages with zeros.
> > > DPDK just has to populate page tables to trigger the allocation.
> > > 
> > > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > 
> > Nice, especially on slow machines or with large memory.
> > 
> > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> Yes very nice.
> I think it's too late to integrate this change which can have some
> unpredictable side effects.
> Do you agree to wait for 2.3?

What side effects? Either it is zero or it is not.
Only some broken architecture would have an issue.
  
Thomas Monjalon Nov. 24, 2015, 11:04 p.m. UTC | #4
2015-11-24 14:44, Stephen Hemminger:
> On Tue, 24 Nov 2015 22:13:28 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > 2015-11-22 18:28, Stephen Hemminger:
> > > On Sun, 22 Nov 2015 14:13:35 -0500
> > > Zhihong Wang <zhihong.wang@intel.com> wrote:
> > > 
> > > > The kernel fills new allocated (huge) pages with zeros.
> > > > DPDK just has to populate page tables to trigger the allocation.
> > > > 
> > > > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > > 
> > > Nice, especially on slow machines or with large memory.
> > > 
> > > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> > 
> > Yes very nice.
> > I think it's too late to integrate this change which can have some
> > unpredictable side effects.
> > Do you agree to wait for 2.3?
> 
> What side effects? Either it is zero or it is not.
> Only some broken architecture would have an issue.

I mean it changes the memory allocator behaviour. It's not something we
want to discover a new bug just before the release.
This kind of important change must be integrated at the beginning of the
release cycle.
I'm asking for opinions because it would be really nice to have.
  
Yuanhan Liu Nov. 25, 2015, 1:57 a.m. UTC | #5
On Wed, Nov 25, 2015 at 12:04:16AM +0100, Thomas Monjalon wrote:
> 2015-11-24 14:44, Stephen Hemminger:
> > On Tue, 24 Nov 2015 22:13:28 +0100
> > Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 
> > > 2015-11-22 18:28, Stephen Hemminger:
> > > > On Sun, 22 Nov 2015 14:13:35 -0500
> > > > Zhihong Wang <zhihong.wang@intel.com> wrote:
> > > > 
> > > > > The kernel fills new allocated (huge) pages with zeros.
> > > > > DPDK just has to populate page tables to trigger the allocation.
> > > > > 
> > > > > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > > > 
> > > > Nice, especially on slow machines or with large memory.
> > > > 
> > > > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> > > 
> > > Yes very nice.
> > > I think it's too late to integrate this change which can have some
> > > unpredictable side effects.
> > > Do you agree to wait for 2.3?
> > 
> > What side effects? Either it is zero or it is not.
> > Only some broken architecture would have an issue.
> 
> I mean it changes the memory allocator behaviour. It's not something we
> want to discover a new bug just before the release.
> This kind of important change must be integrated at the beginning of the
> release cycle.

+ 1

And it could be a new feature (or highlight) of 2.3: reduced dpdk
startup time by ... :)

	--yliu
  
Zhihong Wang Nov. 25, 2015, 1:59 a.m. UTC | #6
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, November 25, 2015 7:04 AM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> 2015-11-24 14:44, Stephen Hemminger:
> > On Tue, 24 Nov 2015 22:13:28 +0100
> > Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> >
> > > 2015-11-22 18:28, Stephen Hemminger:
> > > > On Sun, 22 Nov 2015 14:13:35 -0500 Zhihong Wang
> > > > <zhihong.wang@intel.com> wrote:
> > > >
> > > > > The kernel fills new allocated (huge) pages with zeros.
> > > > > DPDK just has to populate page tables to trigger the allocation.
> > > > >
> > > > > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > > >
> > > > Nice, especially on slow machines or with large memory.
> > > >
> > > > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> > >
> > > Yes very nice.
> > > I think it's too late to integrate this change which can have some
> > > unpredictable side effects.
> > > Do you agree to wait for 2.3?
> >
> > What side effects? Either it is zero or it is not.
> > Only some broken architecture would have an issue.
> 
> I mean it changes the memory allocator behaviour. It's not something we want to
> discover a new bug just before the release.
> This kind of important change must be integrated at the beginning of the release
> cycle.
> I'm asking for opinions because it would be really nice to have.

Literally this patch doesn't change anything, it just keeps DPDK from zero-filling pages again which have just been zero-filled.
It would be nice to have this patch in DPDK 2.2 since it can reduce the startup time nearly by half for hugepage cases.
But I understand longer merge/test window make it safer for a release.
It makes sense either way.
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 0de75cd..21a5146 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -399,8 +399,10 @@  map_all_hugepages(struct hugepage_file *hugepg_tbl,
 			return -1;
 		}
 
+		/* map the segment, and populate page tables,
+		 * the kernel fills this segment with zeros */
 		virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
-				MAP_SHARED, fd, 0);
+				MAP_SHARED | MAP_POPULATE, fd, 0);
 		if (virtaddr == MAP_FAILED) {
 			RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__,
 					strerror(errno));
@@ -410,7 +412,6 @@  map_all_hugepages(struct hugepage_file *hugepg_tbl,
 
 		if (orig) {
 			hugepg_tbl[i].orig_va = virtaddr;
-			memset(virtaddr, 0, hugepage_sz);
 		}
 		else {
 			hugepg_tbl[i].final_va = virtaddr;
@@ -529,22 +530,16 @@  remap_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
 
 			old_addr = vma_addr;
 
-			/* map new, bigger segment */
+			/* map new, bigger segment, and populate page tables,
+			 * the kernel fills this segment with zeros */
 			vma_addr = mmap(vma_addr, total_size,
-					PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+					PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 0);
 
 			if (vma_addr == MAP_FAILED || vma_addr != old_addr) {
 				RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__, strerror(errno));
 				close(fd);
 				return -1;
 			}
-
-			/* touch the page. this is needed because kernel postpones mapping
-			 * creation until the first page fault. with this, we pin down
-			 * the page and it is marked as used and gets into process' pagemap.
-			 */
-			for (offset = 0; offset < total_size; offset += hugepage_sz)
-				*((volatile uint8_t*) RTE_PTR_ADD(vma_addr, offset));
 		}
 
 		/* set shared flock on the file. */
@@ -592,9 +587,6 @@  remap_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
 			}
 		}
 
-		/* zero out the whole segment */
-		memset(hugepg_tbl[page_idx].final_va, 0, total_size);
-
 		page_idx++;
 	}