[dpdk-dev,RFC,2/2] lib/librte_eal: Remove unnecessary hugepage zero-filling

Message ID 1447817231-10510-3-git-send-email-zhihong.wang@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Zhihong Wang Nov. 18, 2015, 3:27 a.m. UTC
  The kernel fills new allocated (huge) pages with zeros.
DPDK just has to touch the pages to trigger the allocation.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)
  

Comments

John McNamara Nov. 18, 2015, 10:39 a.m. UTC | #1
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
> Sent: Wednesday, November 18, 2015 3:27 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> The kernel fills new allocated (huge) pages with zeros.
> DPDK just has to touch the pages to trigger the allocation.
> 
> ...
>  		if (orig) {
>  			hugepg_tbl[i].orig_va = virtaddr;
> -			memset(virtaddr, 0, hugepage_sz);
> +			memset(virtaddr, 0, 8);
>  		}

Probably worth adding a one or two line comment here to avoid someone thinking that it is a bug at some later stage. The text in the commit message above is suitable.

John.
--
  
Zhihong Wang Nov. 18, 2015, 10:44 a.m. UTC | #2
> -----Original Message-----
> From: Mcnamara, John
> Sent: Wednesday, November 18, 2015 6:40 PM
> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
> > Sent: Wednesday, November 18, 2015 3:27 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> > hugepage zero-filling
> >
> > The kernel fills new allocated (huge) pages with zeros.
> > DPDK just has to touch the pages to trigger the allocation.
> >
> > ...
> >  		if (orig) {
> >  			hugepg_tbl[i].orig_va = virtaddr;
> > -			memset(virtaddr, 0, hugepage_sz);
> > +			memset(virtaddr, 0, 8);
> >  		}
> 
> Probably worth adding a one or two line comment here to avoid someone
> thinking that it is a bug at some later stage. The text in the commit message
> above is suitable.
> 

Good suggestion! Will add it :)

> John.
> --
  
Huawei Xie Nov. 18, 2015, 12:07 p.m. UTC | #3
On 11/18/2015 6:45 PM, Wang, Zhihong wrote:
>> -----Original Message-----
>> From: Mcnamara, John
>> Sent: Wednesday, November 18, 2015 6:40 PM
>> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
>> Subject: RE: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>> hugepage zero-filling
>>
>>
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
>>> Sent: Wednesday, November 18, 2015 3:27 AM
>>> To: dev@dpdk.org
>>> Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>>> hugepage zero-filling
>>>
>>> The kernel fills new allocated (huge) pages with zeros.
>>> DPDK just has to touch the pages to trigger the allocation.
I think we shouldn't reply on the assumption that kernel has zeroed the
memory. Kernel zeroes the memory mostly to avoid information leakage.It
could also achieve this by setting each bit to 1.
What we indeed need to check is later DPDK initialization code doesn't
assume the memory has been zeroed. Otherwise zero only that part of the
memory. Does this makes sense?

>>> ...
>>>  		if (orig) {
>>>  			hugepg_tbl[i].orig_va = virtaddr;
>>> -			memset(virtaddr, 0, hugepage_sz);
>>> +			memset(virtaddr, 0, 8);
>>>  		}
>> Probably worth adding a one or two line comment here to avoid someone
>> thinking that it is a bug at some later stage. The text in the commit message
>> above is suitable.
>>
> Good suggestion! Will add it :)
>
>> John.
>> --
>
  
Stephen Hemminger Nov. 18, 2015, 4 p.m. UTC | #4
On Wed, 18 Nov 2015 12:07:54 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:

> >>> The kernel fills new allocated (huge) pages with zeros.
> >>> DPDK just has to touch the pages to trigger the allocation.  
> I think we shouldn't reply on the assumption that kernel has zeroed the
> memory. Kernel zeroes the memory mostly to avoid information leakage.It
> could also achieve this by setting each bit to 1.
> What we indeed need to check is later DPDK initialization code doesn't
> assume the memory has been zeroed. Otherwise zero only that part of the
> memory. Does this makes sense?

If all new pages are zero, why does DPDK have to pre-touch the pages
at all?

I thought there as some optimization to initialize hugepages since
Oracle has same problem with their Shared Global Area which was why
hugpages were invented anyway
  
Bruce Richardson Nov. 18, 2015, 4:13 p.m. UTC | #5
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, November 18, 2015 4:00 PM
> To: Xie, Huawei <huawei.xie@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> On Wed, 18 Nov 2015 12:07:54 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
> 
> > >>> The kernel fills new allocated (huge) pages with zeros.
> > >>> DPDK just has to touch the pages to trigger the allocation.
> > I think we shouldn't reply on the assumption that kernel has zeroed
> > the memory. Kernel zeroes the memory mostly to avoid information
> > leakage.It could also achieve this by setting each bit to 1.
> > What we indeed need to check is later DPDK initialization code doesn't
> > assume the memory has been zeroed. Otherwise zero only that part of
> > the memory. Does this makes sense?
> 
> If all new pages are zero, why does DPDK have to pre-touch the pages at
> all?

The pages won't actually be mapped into the processes address space until accessed. 

/Bruce
  
Stephen Hemminger Nov. 18, 2015, 7:09 p.m. UTC | #6
On Wed, 18 Nov 2015 16:13:32 +0000
"Richardson, Bruce" <bruce.richardson@intel.com> wrote:

> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Wednesday, November 18, 2015 4:00 PM
> > To: Xie, Huawei <huawei.xie@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> > hugepage zero-filling
> > 
> > On Wed, 18 Nov 2015 12:07:54 +0000
> > "Xie, Huawei" <huawei.xie@intel.com> wrote:
> > 
> > > >>> The kernel fills new allocated (huge) pages with zeros.
> > > >>> DPDK just has to touch the pages to trigger the allocation.
> > > I think we shouldn't reply on the assumption that kernel has zeroed
> > > the memory. Kernel zeroes the memory mostly to avoid information
> > > leakage.It could also achieve this by setting each bit to 1.
> > > What we indeed need to check is later DPDK initialization code doesn't
> > > assume the memory has been zeroed. Otherwise zero only that part of
> > > the memory. Does this makes sense?
> > 
> > If all new pages are zero, why does DPDK have to pre-touch the pages at
> > all?
> 
> The pages won't actually be mapped into the processes address space until accessed. 
> 
> /Bruce

Isn't that what mmap MAP_POPULATE flag (not currently used) will do.
  
Zhihong Wang Nov. 19, 2015, 2:15 a.m. UTC | #7
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Thursday, November 19, 2015 3:09 AM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> On Wed, 18 Nov 2015 16:13:32 +0000
> "Richardson, Bruce" <bruce.richardson@intel.com> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> > > Hemminger
> > > Sent: Wednesday, November 18, 2015 4:00 PM
> > > To: Xie, Huawei <huawei.xie@intel.com>
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
> > > unnecessary hugepage zero-filling
> > >
> > > On Wed, 18 Nov 2015 12:07:54 +0000
> > > "Xie, Huawei" <huawei.xie@intel.com> wrote:
> > >
> > > > >>> The kernel fills new allocated (huge) pages with zeros.
> > > > >>> DPDK just has to touch the pages to trigger the allocation.
> > > > I think we shouldn't reply on the assumption that kernel has
> > > > zeroed the memory. Kernel zeroes the memory mostly to avoid
> > > > information leakage.It could also achieve this by setting each bit to 1.
> > > > What we indeed need to check is later DPDK initialization code
> > > > doesn't assume the memory has been zeroed. Otherwise zero only
> > > > that part of the memory. Does this makes sense?
> > >
> > > If all new pages are zero, why does DPDK have to pre-touch the pages
> > > at all?
> >
> > The pages won't actually be mapped into the processes address space until
> accessed.
> >
> > /Bruce
> 
> Isn't that what mmap MAP_POPULATE flag (not currently used) will do.

Yes, the MAP_POPULATE does literally the same thing.
This flag is implemented since Linux 2.5.46 according to Linux man page, guess that's why DPDK fault the page tables manually in the first place. :)

I think we can use this flag since it makes the code clearer.

/Zhihong
  
Zhihong Wang Nov. 19, 2015, 3:54 a.m. UTC | #8
> -----Original Message-----
> From: Xie, Huawei
> Sent: Wednesday, November 18, 2015 8:08 PM
> To: Wang, Zhihong <zhihong.wang@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> On 11/18/2015 6:45 PM, Wang, Zhihong wrote:
> >> -----Original Message-----
> >> From: Mcnamara, John
> >> Sent: Wednesday, November 18, 2015 6:40 PM
> >> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> >> Subject: RE: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
> >> unnecessary hugepage zero-filling
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
> >>> Sent: Wednesday, November 18, 2015 3:27 AM
> >>> To: dev@dpdk.org
> >>> Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
> >>> unnecessary hugepage zero-filling
> >>>
> >>> The kernel fills new allocated (huge) pages with zeros.
> >>> DPDK just has to touch the pages to trigger the allocation.
> I think we shouldn't reply on the assumption that kernel has zeroed the memory.

I understand the concern.
In my opinion application shouldn't assume malloced memory to be zero-filled. So it should be okay for DPDK even if the kernel doesn't zero the page at all.

I agree that we should check if any code accidentally make that assumption. Currently there's rte_pktmbuf_init() for packet mbuf initialization.

/Zhihong


> Kernel zeroes the memory mostly to avoid information leakage.It could also
> achieve this by setting each bit to 1.
> What we indeed need to check is later DPDK initialization code doesn't assume
> the memory has been zeroed. Otherwise zero only that part of the memory.
> Does this makes sense?
> 
> >>> ...
> >>>  		if (orig) {
> >>>  			hugepg_tbl[i].orig_va = virtaddr;
> >>> -			memset(virtaddr, 0, hugepage_sz);
> >>> +			memset(virtaddr, 0, 8);
> >>>  		}
> >> Probably worth adding a one or two line comment here to avoid someone
> >> thinking that it is a bug at some later stage. The text in the commit
> >> message above is suitable.
> >>
> > Good suggestion! Will add it :)
> >
> >> John.
> >> --
> >
  
Huawei Xie Nov. 19, 2015, 6:04 a.m. UTC | #9
On 11/19/2015 10:16 AM, Wang, Zhihong wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
>> Sent: Thursday, November 19, 2015 3:09 AM
>> To: Richardson, Bruce <bruce.richardson@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>> hugepage zero-filling
>>
>> On Wed, 18 Nov 2015 16:13:32 +0000
>> "Richardson, Bruce" <bruce.richardson@intel.com> wrote:
>>
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
>>>> Hemminger
>>>> Sent: Wednesday, November 18, 2015 4:00 PM
>>>> To: Xie, Huawei <huawei.xie@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>> unnecessary hugepage zero-filling
>>>>
>>>> On Wed, 18 Nov 2015 12:07:54 +0000
>>>> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>>>>
>>>>>>>> The kernel fills new allocated (huge) pages with zeros.
>>>>>>>> DPDK just has to touch the pages to trigger the allocation.
>>>>> I think we shouldn't reply on the assumption that kernel has
>>>>> zeroed the memory. Kernel zeroes the memory mostly to avoid
>>>>> information leakage.It could also achieve this by setting each bit to 1.
>>>>> What we indeed need to check is later DPDK initialization code
>>>>> doesn't assume the memory has been zeroed. Otherwise zero only
>>>>> that part of the memory. Does this makes sense?
>>>> If all new pages are zero, why does DPDK have to pre-touch the pages
>>>> at all?
>>> The pages won't actually be mapped into the processes address space until
>> accessed.
>>> /Bruce
>> Isn't that what mmap MAP_POPULATE flag (not currently used) will do.
> Yes, the MAP_POPULATE does literally the same thing.
> This flag is implemented since Linux 2.5.46 according to Linux man page, guess that's why DPDK fault the page tables manually in the first place. :)
>
> I think we can use this flag since it makes the code clearer.
The manual says MAP_POPULATE is only supported for private mappings
since Linux 2.6.23.
>
> /Zhihong
>
>
  
Huawei Xie Nov. 19, 2015, 6:09 a.m. UTC | #10
On 11/19/2015 11:54 AM, Wang, Zhihong wrote:
>
>> -----Original Message-----
>> From: Xie, Huawei
>> Sent: Wednesday, November 18, 2015 8:08 PM
>> To: Wang, Zhihong <zhihong.wang@intel.com>; Mcnamara, John
>> <john.mcnamara@intel.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>> hugepage zero-filling
>>
>> On 11/18/2015 6:45 PM, Wang, Zhihong wrote:
>>>> -----Original Message-----
>>>> From: Mcnamara, John
>>>> Sent: Wednesday, November 18, 2015 6:40 PM
>>>> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
>>>> Subject: RE: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>> unnecessary hugepage zero-filling
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
>>>>> Sent: Wednesday, November 18, 2015 3:27 AM
>>>>> To: dev@dpdk.org
>>>>> Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>>> unnecessary hugepage zero-filling
>>>>>
>>>>> The kernel fills new allocated (huge) pages with zeros.
>>>>> DPDK just has to touch the pages to trigger the allocation.
>> I think we shouldn't reply on the assumption that kernel has zeroed the memory.
> I understand the concern.
> In my opinion application shouldn't assume malloced memory to be zero-filled. So it should be okay for DPDK even if the kernel doesn't zero the page at all.
For malloc, we shouldn't make this assumption because it might allocate
just freed memory from the heap. Hugetlbfs is different. Let us listen
to other people's opinion.
It will make life much easier if we could make this assumption.
>
> I agree that we should check if any code accidentally make that assumption. Currently there's rte_pktmbuf_init() for packet mbuf initialization.
>
> /Zhihong
>
>
>> Kernel zeroes the memory mostly to avoid information leakage.It could also
>> achieve this by setting each bit to 1.
>> What we indeed need to check is later DPDK initialization code doesn't assume
>> the memory has been zeroed. Otherwise zero only that part of the memory.
>> Does this makes sense?
>>
>>>>> ...
>>>>>  		if (orig) {
>>>>>  			hugepg_tbl[i].orig_va = virtaddr;
>>>>> -			memset(virtaddr, 0, hugepage_sz);
>>>>> +			memset(virtaddr, 0, 8);
>>>>>  		}
>>>> Probably worth adding a one or two line comment here to avoid someone
>>>> thinking that it is a bug at some later stage. The text in the commit
>>>> message above is suitable.
>>>>
>>> Good suggestion! Will add it :)
>>>
>>>> John.
>>>> --
>
  
Zhihong Wang Nov. 19, 2015, 6:32 a.m. UTC | #11
> -----Original Message-----
> From: Xie, Huawei
> Sent: Thursday, November 19, 2015 2:05 PM
> To: Wang, Zhihong <zhihong.wang@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> hugepage zero-filling
> 
> On 11/19/2015 10:16 AM, Wang, Zhihong wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> >> Hemminger
> >> Sent: Thursday, November 19, 2015 3:09 AM
> >> To: Richardson, Bruce <bruce.richardson@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
> >> unnecessary hugepage zero-filling
> >>
> >> On Wed, 18 Nov 2015 16:13:32 +0000
> >> "Richardson, Bruce" <bruce.richardson@intel.com> wrote:
> >>
> >>>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> >>>> Hemminger
> >>>> Sent: Wednesday, November 18, 2015 4:00 PM
> >>>> To: Xie, Huawei <huawei.xie@intel.com>
> >>>> Cc: dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
> >>>> unnecessary hugepage zero-filling
> >>>>
> >>>> On Wed, 18 Nov 2015 12:07:54 +0000
> >>>> "Xie, Huawei" <huawei.xie@intel.com> wrote:
> >>>>
> >>>>>>>> The kernel fills new allocated (huge) pages with zeros.
> >>>>>>>> DPDK just has to touch the pages to trigger the allocation.
> >>>>> I think we shouldn't reply on the assumption that kernel has
> >>>>> zeroed the memory. Kernel zeroes the memory mostly to avoid
> >>>>> information leakage.It could also achieve this by setting each bit to 1.
> >>>>> What we indeed need to check is later DPDK initialization code
> >>>>> doesn't assume the memory has been zeroed. Otherwise zero only
> >>>>> that part of the memory. Does this makes sense?
> >>>> If all new pages are zero, why does DPDK have to pre-touch the
> >>>> pages at all?
> >>> The pages won't actually be mapped into the processes address space
> >>> until
> >> accessed.
> >>> /Bruce
> >> Isn't that what mmap MAP_POPULATE flag (not currently used) will do.
> > Yes, the MAP_POPULATE does literally the same thing.
> > This flag is implemented since Linux 2.5.46 according to Linux man
> > page, guess that's why DPDK fault the page tables manually in the
> > first place. :)
> >
> > I think we can use this flag since it makes the code clearer.
> The manual says MAP_POPULATE is only supported for private mappings since
> Linux 2.6.23.

I've done check before and MAP_SHARED | MAP_POPULATE worked together correctly. Is there any implicit complication here?


> >
> > /Zhihong
> >
> >
  
Sergio Gonzalez Monroy Nov. 19, 2015, 9:14 a.m. UTC | #12
On 18/11/2015 12:07, Xie, Huawei wrote:
> On 11/18/2015 6:45 PM, Wang, Zhihong wrote:
>>> -----Original Message-----
>>> From: Mcnamara, John
>>> Sent: Wednesday, November 18, 2015 6:40 PM
>>> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
>>> Subject: RE: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>>> hugepage zero-filling
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
>>>> Sent: Wednesday, November 18, 2015 3:27 AM
>>>> To: dev@dpdk.org
>>>> Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>>>> hugepage zero-filling
>>>>
>>>> The kernel fills new allocated (huge) pages with zeros.
>>>> DPDK just has to touch the pages to trigger the allocation.
> I think we shouldn't reply on the assumption that kernel has zeroed the
> memory. Kernel zeroes the memory mostly to avoid information leakage.It
> could also achieve this by setting each bit to 1.
> What we indeed need to check is later DPDK initialization code doesn't
> assume the memory has been zeroed. Otherwise zero only that part of the
> memory. Does this makes sense?

Why cannot we rely on the kernel zeroing the memory ?
If that behavior were to change, then we can zero out the memory ourselves.

Bruce pointed out to me that the semantics have changed a bit since we 
introduced
rte_memzone_free.
Before that, all memzone reserve were zero out by default.
Is there code relying on that? I'm not sure, but it still is good 
practice to do it.

I submitted an RFC regarding this:
http://dpdk.org/ml/archives/dev/2015-November/028416.html

The idea would be to keep the available memory we are managing zeroed at 
all times.

Sergio
>>>> ...
>>>>   		if (orig) {
>>>>   			hugepg_tbl[i].orig_va = virtaddr;
>>>> -			memset(virtaddr, 0, hugepage_sz);
>>>> +			memset(virtaddr, 0, 8);
>>>>   		}
>>> Probably worth adding a one or two line comment here to avoid someone
>>> thinking that it is a bug at some later stage. The text in the commit message
>>> above is suitable.
>>>
>> Good suggestion! Will add it :)
>>
>>> John.
>>> --
  
Sergio Gonzalez Monroy Nov. 19, 2015, 9:18 a.m. UTC | #13
On 19/11/2015 06:32, Wang, Zhihong wrote:
>> -----Original Message-----
>> From: Xie, Huawei
>> Sent: Thursday, November 19, 2015 2:05 PM
>> To: Wang, Zhihong <zhihong.wang@intel.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Richardson, Bruce
>> <bruce.richardson@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>> hugepage zero-filling
>>
>> On 11/19/2015 10:16 AM, Wang, Zhihong wrote:
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
>>>> Hemminger
>>>> Sent: Thursday, November 19, 2015 3:09 AM
>>>> To: Richardson, Bruce <bruce.richardson@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>> unnecessary hugepage zero-filling
>>>>
>>>> On Wed, 18 Nov 2015 16:13:32 +0000
>>>> "Richardson, Bruce" <bruce.richardson@intel.com> wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
>>>>>> Hemminger
>>>>>> Sent: Wednesday, November 18, 2015 4:00 PM
>>>>>> To: Xie, Huawei <huawei.xie@intel.com>
>>>>>> Cc: dev@dpdk.org
>>>>>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>>>> unnecessary hugepage zero-filling
>>>>>>
>>>>>> On Wed, 18 Nov 2015 12:07:54 +0000
>>>>>> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>>>>>>
>>>>>>>>>> The kernel fills new allocated (huge) pages with zeros.
>>>>>>>>>> DPDK just has to touch the pages to trigger the allocation.
>>>>>>> I think we shouldn't reply on the assumption that kernel has
>>>>>>> zeroed the memory. Kernel zeroes the memory mostly to avoid
>>>>>>> information leakage.It could also achieve this by setting each bit to 1.
>>>>>>> What we indeed need to check is later DPDK initialization code
>>>>>>> doesn't assume the memory has been zeroed. Otherwise zero only
>>>>>>> that part of the memory. Does this makes sense?
>>>>>> If all new pages are zero, why does DPDK have to pre-touch the
>>>>>> pages at all?
>>>>> The pages won't actually be mapped into the processes address space
>>>>> until
>>>> accessed.
>>>>> /Bruce
>>>> Isn't that what mmap MAP_POPULATE flag (not currently used) will do.
>>> Yes, the MAP_POPULATE does literally the same thing.
>>> This flag is implemented since Linux 2.5.46 according to Linux man
>>> page, guess that's why DPDK fault the page tables manually in the
>>> first place. :)
>>>
>>> I think we can use this flag since it makes the code clearer.
>> The manual says MAP_POPULATE is only supported for private mappings since
>> Linux 2.6.23.
> I've done check before and MAP_SHARED | MAP_POPULATE worked together correctly. Is there any implicit complication here?
>

None that I can see.

Sergio
>>> /Zhihong
>>>
>>>
  
Bruce Richardson Nov. 20, 2015, 12:15 p.m. UTC | #14
On Wed, Nov 18, 2015 at 11:09:06AM -0800, Stephen Hemminger wrote:
> On Wed, 18 Nov 2015 16:13:32 +0000
> "Richardson, Bruce" <bruce.richardson@intel.com> wrote:
> 
> > 
> > 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > > Sent: Wednesday, November 18, 2015 4:00 PM
> > > To: Xie, Huawei <huawei.xie@intel.com>
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
> > > hugepage zero-filling
> > > 
> > > On Wed, 18 Nov 2015 12:07:54 +0000
> > > "Xie, Huawei" <huawei.xie@intel.com> wrote:
> > > 
> > > > >>> The kernel fills new allocated (huge) pages with zeros.
> > > > >>> DPDK just has to touch the pages to trigger the allocation.
> > > > I think we shouldn't reply on the assumption that kernel has zeroed
> > > > the memory. Kernel zeroes the memory mostly to avoid information
> > > > leakage.It could also achieve this by setting each bit to 1.
> > > > What we indeed need to check is later DPDK initialization code doesn't
> > > > assume the memory has been zeroed. Otherwise zero only that part of
> > > > the memory. Does this makes sense?
> > > 
> > > If all new pages are zero, why does DPDK have to pre-touch the pages at
> > > all?
> > 
> > The pages won't actually be mapped into the processes address space until accessed. 
> > 
> > /Bruce
> 
> Isn't that what mmap MAP_POPULATE flag (not currently used) will do.

Hadn't seen that flag before, but it does indeed look to do exactly what we
want.

/Bruce
  
Huawei Xie Nov. 23, 2015, 2:54 a.m. UTC | #15
On 11/19/2015 2:32 PM, Wang, Zhihong wrote:
>> -----Original Message-----
>> From: Xie, Huawei
>> Sent: Thursday, November 19, 2015 2:05 PM
>> To: Wang, Zhihong <zhihong.wang@intel.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Richardson, Bruce
>> <bruce.richardson@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary
>> hugepage zero-filling
>>
>> On 11/19/2015 10:16 AM, Wang, Zhihong wrote:
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
>>>> Hemminger
>>>> Sent: Thursday, November 19, 2015 3:09 AM
>>>> To: Richardson, Bruce <bruce.richardson@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>> unnecessary hugepage zero-filling
>>>>
>>>> On Wed, 18 Nov 2015 16:13:32 +0000
>>>> "Richardson, Bruce" <bruce.richardson@intel.com> wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
>>>>>> Hemminger
>>>>>> Sent: Wednesday, November 18, 2015 4:00 PM
>>>>>> To: Xie, Huawei <huawei.xie@intel.com>
>>>>>> Cc: dev@dpdk.org
>>>>>> Subject: Re: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>>>> unnecessary hugepage zero-filling
>>>>>>
>>>>>> On Wed, 18 Nov 2015 12:07:54 +0000
>>>>>> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>>>>>>
>>>>>>>>>> The kernel fills new allocated (huge) pages with zeros.
>>>>>>>>>> DPDK just has to touch the pages to trigger the allocation.
>>>>>>> I think we shouldn't reply on the assumption that kernel has
>>>>>>> zeroed the memory. Kernel zeroes the memory mostly to avoid
>>>>>>> information leakage.It could also achieve this by setting each bit to 1.
>>>>>>> What we indeed need to check is later DPDK initialization code
>>>>>>> doesn't assume the memory has been zeroed. Otherwise zero only
>>>>>>> that part of the memory. Does this makes sense?
>>>>>> If all new pages are zero, why does DPDK have to pre-touch the
>>>>>> pages at all?
>>>>> The pages won't actually be mapped into the processes address space
>>>>> until
>>>> accessed.
>>>>> /Bruce
>>>> Isn't that what mmap MAP_POPULATE flag (not currently used) will do.
>>> Yes, the MAP_POPULATE does literally the same thing.
>>> This flag is implemented since Linux 2.5.46 according to Linux man
>>> page, guess that's why DPDK fault the page tables manually in the
>>> first place. :)
>>>
>>> I think we can use this flag since it makes the code clearer.
>> The manual says MAP_POPULATE is only supported for private mappings since
>> Linux 2.6.23.
> I've done check before and MAP_SHARED | MAP_POPULATE worked together correctly. Is there any implicit complication here?
I checked shared mapping with MAP_POPULATE between two processes. It
works. Then i did some search, find the manual is also ambiguous to
others and thus have been changed, :).
Before: MAP_POPULATE is only supported for private mappings since Linux
2.6.23.
After: MAP_POPULATE is supported for private mappings only since Linux
2.6.23.
http://stackoverflow.com/questions/23502361/linux-mmap-with-map-populate-man-page-seems-to-give-wrong-info
>
>>> /Zhihong
>>>
>>>
>
  
Huawei Xie Nov. 23, 2015, 3:46 a.m. UTC | #16
On 11/19/2015 5:15 PM, Gonzalez Monroy, Sergio wrote:
> On 18/11/2015 12:07, Xie, Huawei wrote:
>> On 11/18/2015 6:45 PM, Wang, Zhihong wrote:
>>>> -----Original Message-----
>>>> From: Mcnamara, John
>>>> Sent: Wednesday, November 18, 2015 6:40 PM
>>>> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
>>>> Subject: RE: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>> unnecessary
>>>> hugepage zero-filling
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang
>>>>> Sent: Wednesday, November 18, 2015 3:27 AM
>>>>> To: dev@dpdk.org
>>>>> Subject: [dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove
>>>>> unnecessary
>>>>> hugepage zero-filling
>>>>>
>>>>> The kernel fills new allocated (huge) pages with zeros.
>>>>> DPDK just has to touch the pages to trigger the allocation.
>> I think we shouldn't reply on the assumption that kernel has zeroed the
>> memory. Kernel zeroes the memory mostly to avoid information leakage.It
>> could also achieve this by setting each bit to 1.
>> What we indeed need to check is later DPDK initialization code doesn't
>> assume the memory has been zeroed. Otherwise zero only that part of the
>> memory. Does this makes sense?
>
> Why cannot we rely on the kernel zeroing the memory ?
> If that behavior were to change, then we can zero out the memory
> ourselves.
It is undocumented kernel behavior. My opinion is if not a big burden,
zero out the needed memory ourselves, otherwise resort to this kernel
behavior.

>
> Bruce pointed out to me that the semantics have changed a bit since we
> introduced
> rte_memzone_free.
> Before that, all memzone reserve were zero out by default.
> Is there code relying on that? I'm not sure, but it still is good
> practice to do it.
>
> I submitted an RFC regarding this:
> http://dpdk.org/ml/archives/dev/2015-November/028416.html
>
> The idea would be to keep the available memory we are managing zeroed
> at all times.
>
> Sergio
>>>>> ...
>>>>>           if (orig) {
>>>>>               hugepg_tbl[i].orig_va = virtaddr;
>>>>> -            memset(virtaddr, 0, hugepage_sz);
>>>>> +            memset(virtaddr, 0, 8);
>>>>>           }
>>>> Probably worth adding a one or two line comment here to avoid someone
>>>> thinking that it is a bug at some later stage. The text in the
>>>> commit message
>>>> above is suitable.
>>>>
>>> Good suggestion! Will add it :)
>>>
>>>> John.
>>>> -- 
>
>
  
Stephen Hemminger Nov. 23, 2015, 4:07 a.m. UTC | #17
On Mon, 23 Nov 2015 03:46:31 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:

> >
> > Why cannot we rely on the kernel zeroing the memory ?
> > If that behavior were to change, then we can zero out the memory
> > ourselves.  
> It is undocumented kernel behavior. My opinion is if not a big burden,
> zero out the needed memory ourselves, otherwise resort to this kernel
> behavior.

Really, I think it is more an oversight of missing documentation,
the kernel has always (and will continue) to zero out memory that is given
to a process. If it didn't it would be a massive security hole.
  
Huawei Xie Nov. 23, 2015, 5:05 a.m. UTC | #18
On 11/23/2015 12:07 PM, Stephen Hemminger wrote:
> On Mon, 23 Nov 2015 03:46:31 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>>> Why cannot we rely on the kernel zeroing the memory ?
>>> If that behavior were to change, then we can zero out the memory
>>> ourselves.  
>> It is undocumented kernel behavior. My opinion is if not a big burden,
>> zero out the needed memory ourselves, otherwise resort to this kernel
>> behavior.
> Really, I think it is more an oversight of missing documentation,
> the kernel has always (and will continue) to zero out memory that is given
> to a process. If it didn't it would be a massive security hole.
Agree. I believe this behavior will not change in future. For the
security issue, kernel could also set all bits like to 1. Just wonder if
this is best practice and whether there are other user space programs
rely on this behavior.
  
Stephen Hemminger Nov. 23, 2015, 6:52 a.m. UTC | #19
On Mon, 23 Nov 2015 05:05:21 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:

> On 11/23/2015 12:07 PM, Stephen Hemminger wrote:
> > On Mon, 23 Nov 2015 03:46:31 +0000
> > "Xie, Huawei" <huawei.xie@intel.com> wrote:
> >
> >>> Why cannot we rely on the kernel zeroing the memory ?
> >>> If that behavior were to change, then we can zero out the memory
> >>> ourselves.  
> >> It is undocumented kernel behavior. My opinion is if not a big burden,
> >> zero out the needed memory ourselves, otherwise resort to this kernel
> >> behavior.
> > Really, I think it is more an oversight of missing documentation,
> > the kernel has always (and will continue) to zero out memory that is given
> > to a process. If it didn't it would be a massive security hole.
> Agree. I believe this behavior will not change in future. For the
> security issue, kernel could also set all bits like to 1. Just wonder if
> this is best practice and whether there are other user space programs
> rely on this behavior.
> 

Glibc almost certainly depends on this, because heap is grown by mmaping addtional
memory.
  
Bruce Richardson Nov. 23, 2015, 10:18 a.m. UTC | #20
On Mon, Nov 23, 2015 at 02:54:54AM +0000, Xie, Huawei wrote:
 I checked shared mapping with MAP_POPULATE between two processes. It
> works. Then i did some search, find the manual is also ambiguous to
> others and thus have been changed, :).
> Before: MAP_POPULATE is only supported for private mappings since Linux
> 2.6.23.
> After: MAP_POPULATE is supported for private mappings only since Linux
> 2.6.23.
> http://stackoverflow.com/questions/23502361/linux-mmap-with-map-populate-man-page-seems-to-give-wrong-info
> >

Doesn't matter much for us, as we use shared rather than private mappings for the
hugepages.

/Bruce
  
Huawei Xie Nov. 25, 2015, 6:24 p.m. UTC | #21
On 11/23/2015 2:52 PM, Stephen Hemminger wrote:
> On Mon, 23 Nov 2015 05:05:21 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>> On 11/23/2015 12:07 PM, Stephen Hemminger wrote:
>>> On Mon, 23 Nov 2015 03:46:31 +0000
>>> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>>>
>>>>> Why cannot we rely on the kernel zeroing the memory ?
>>>>> If that behavior were to change, then we can zero out the memory
>>>>> ourselves.  
>>>> It is undocumented kernel behavior. My opinion is if not a big burden,
>>>> zero out the needed memory ourselves, otherwise resort to this kernel
>>>> behavior.
>>> Really, I think it is more an oversight of missing documentation,
>>> the kernel has always (and will continue) to zero out memory that is given
>>> to a process. If it didn't it would be a massive security hole.
>> Agree. I believe this behavior will not change in future. For the
>> security issue, kernel could also set all bits like to 1. Just wonder if
>> this is best practice and whether there are other user space programs
>> rely on this behavior.
>>
> Glibc almost certainly depends on this, because heap is grown by mmaping addtional
> memory.
Thanks. It is OK, but still don't feel good, :). It is like that we
require the clear_user_page(and other unix, windows equivalent) is
always memset(ptr, 0 , PAGE_SIZE). To me, memset(ptr, 1, PAGE_SIZE)
doesn't make difference.
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 0de75cd..af823dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -410,7 +410,7 @@  map_all_hugepages(struct hugepage_file *hugepg_tbl,
 
 		if (orig) {
 			hugepg_tbl[i].orig_va = virtaddr;
-			memset(virtaddr, 0, hugepage_sz);
+			memset(virtaddr, 0, 8);
 		}
 		else {
 			hugepg_tbl[i].final_va = virtaddr;
@@ -592,9 +592,6 @@  remap_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
 			}
 		}
 
-		/* zero out the whole segment */
-		memset(hugepg_tbl[page_idx].final_va, 0, total_size);
-
 		page_idx++;
 	}