eal: add option to not store segment fd's

Message ID 07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series eal: add option to not store segment fd's |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

Burakov, Anatoly Feb. 22, 2019, 5:12 p.m. UTC
  Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.

While the problem can be worked around using --single-file-segments
option, it does not work if --legacy-mem mode is also used. Add a
(yet another) EAL flag to disable storing fd's internally. This
will sacrifice compability with Virtio with vhost-backend, but
at least select() and friends will work.

[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/linux_gsg/linux_eal_parameters.rst |  4 ++++
 .../prog_guide/env_abstraction_layer.rst      | 19 +++++++++++++++++++
 lib/librte_eal/common/eal_internal_cfg.h      |  4 ++++
 lib/librte_eal/common/eal_options.h           |  2 ++
 lib/librte_eal/linuxapp/eal/eal.c             |  4 ++++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 19 ++++++++++++++++++-
 6 files changed, 51 insertions(+), 1 deletion(-)
  

Comments

David Marchand March 29, 2019, 9:50 a.m. UTC | #1
On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Due to internal glibc limitations [1], DPDK may exhaust internal
> file descriptor limits when using smaller page sizes, which results
> in inability to use system calls such as select() by user
> applications.
>
> While the problem can be worked around using --single-file-segments
> option, it does not work if --legacy-mem mode is also used. Add a
> (yet another) EAL flag to disable storing fd's internally. This
> will sacrifice compability with Virtio with vhost-backend, but
> at least select() and friends will work.
>
> [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html


Sorry, I am a bit lost and I never took the time to look in the new memory
allocation system.
This gives the impression that we are accumulating workarounds, between
legacy-mem, single-file-segments, now no-seg-fds.

Iiuc, everything revolves around the need for per page locks.
Can you summarize why we need them?

Thanks.
  
Burakov, Anatoly March 29, 2019, 10:33 a.m. UTC | #2
On 29-Mar-19 9:50 AM, David Marchand wrote:
> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     Due to internal glibc limitations [1], DPDK may exhaust internal
>     file descriptor limits when using smaller page sizes, which results
>     in inability to use system calls such as select() by user
>     applications.
> 
>     While the problem can be worked around using --single-file-segments
>     option, it does not work if --legacy-mem mode is also used. Add a
>     (yet another) EAL flag to disable storing fd's internally. This
>     will sacrifice compability with Virtio with vhost-backend, but
>     at least select() and friends will work.
> 
>     [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
> 
> 
> Sorry, I am a bit lost and I never took the time to look in the new 
> memory allocation system.
> This gives the impression that we are accumulating workarounds, between 
> legacy-mem, single-file-segments, now no-seg-fds.

Yep. I don't like this any more than you do, but i think there are users 
of all of these, so we can't just drop them willy-nilly. My great hope 
was that by now everyone would move on to use VFIO so legacy mem 
wouldn't be needed (the only reason it exists is to provide 
compatibility for use cases where lots of IOVA-contiguous memory is 
required, and VFIO cannot be used), but apparently that is too much to 
ask :/

> 
> Iiuc, everything revolves around the need for per page locks.
> Can you summarize why we need them?

The short answer is multiprocess. We have to be able to map and unmap 
pages individually, and for that we need to be sure that we can, in 
fact, remove a page because no one else uses it. We also need to store 
fd's because virtio with vhost-user backend needs them to work, because 
it relies on sharing memory between processes using fd's.

> 
> Thanks.
> 
> -- 
> David Marchand
  
Thomas Monjalon March 29, 2019, 11:34 a.m. UTC | #3
29/03/2019 11:33, Burakov, Anatoly:
> On 29-Mar-19 9:50 AM, David Marchand wrote:
> > On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov 
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> > 
> >     Due to internal glibc limitations [1], DPDK may exhaust internal
> >     file descriptor limits when using smaller page sizes, which results
> >     in inability to use system calls such as select() by user
> >     applications.
> > 
> >     While the problem can be worked around using --single-file-segments
> >     option, it does not work if --legacy-mem mode is also used. Add a
> >     (yet another) EAL flag to disable storing fd's internally. This
> >     will sacrifice compability with Virtio with vhost-backend, but
> >     at least select() and friends will work.
> > 
> >     [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
> > 
> > 
> > Sorry, I am a bit lost and I never took the time to look in the new 
> > memory allocation system.
> > This gives the impression that we are accumulating workarounds, between 
> > legacy-mem, single-file-segments, now no-seg-fds.
> 
> Yep. I don't like this any more than you do, but i think there are users 
> of all of these, so we can't just drop them willy-nilly. My great hope 
> was that by now everyone would move on to use VFIO so legacy mem 
> wouldn't be needed (the only reason it exists is to provide 
> compatibility for use cases where lots of IOVA-contiguous memory is 
> required, and VFIO cannot be used), but apparently that is too much to 
> ask :/
> 
> > 
> > Iiuc, everything revolves around the need for per page locks.
> > Can you summarize why we need them?
> 
> The short answer is multiprocess. We have to be able to map and unmap 
> pages individually, and for that we need to be sure that we can, in 
> fact, remove a page because no one else uses it. We also need to store 
> fd's because virtio with vhost-user backend needs them to work, because 
> it relies on sharing memory between processes using fd's.

It's a pity adding an option to workaround a limitation of a corner case.
It adds complexity that we will have to support forever,
and it's even not perfect because of vhost.

Might there be another solution?
  
Burakov, Anatoly March 29, 2019, 12:05 p.m. UTC | #4
On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
> 29/03/2019 11:33, Burakov, Anatoly:
>> On 29-Mar-19 9:50 AM, David Marchand wrote:
>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>>
>>>      Due to internal glibc limitations [1], DPDK may exhaust internal
>>>      file descriptor limits when using smaller page sizes, which results
>>>      in inability to use system calls such as select() by user
>>>      applications.
>>>
>>>      While the problem can be worked around using --single-file-segments
>>>      option, it does not work if --legacy-mem mode is also used. Add a
>>>      (yet another) EAL flag to disable storing fd's internally. This
>>>      will sacrifice compability with Virtio with vhost-backend, but
>>>      at least select() and friends will work.
>>>
>>>      [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
>>>
>>>
>>> Sorry, I am a bit lost and I never took the time to look in the new
>>> memory allocation system.
>>> This gives the impression that we are accumulating workarounds, between
>>> legacy-mem, single-file-segments, now no-seg-fds.
>>
>> Yep. I don't like this any more than you do, but i think there are users
>> of all of these, so we can't just drop them willy-nilly. My great hope
>> was that by now everyone would move on to use VFIO so legacy mem
>> wouldn't be needed (the only reason it exists is to provide
>> compatibility for use cases where lots of IOVA-contiguous memory is
>> required, and VFIO cannot be used), but apparently that is too much to
>> ask :/
>>
>>>
>>> Iiuc, everything revolves around the need for per page locks.
>>> Can you summarize why we need them?
>>
>> The short answer is multiprocess. We have to be able to map and unmap
>> pages individually, and for that we need to be sure that we can, in
>> fact, remove a page because no one else uses it. We also need to store
>> fd's because virtio with vhost-user backend needs them to work, because
>> it relies on sharing memory between processes using fd's.
> 
> It's a pity adding an option to workaround a limitation of a corner case.
> It adds complexity that we will have to support forever,
> and it's even not perfect because of vhost.
> 
> Might there be another solution?
> 

If there is one, i'm all ears. I don't see any solutions aside from 
adding limitations.

For example, we could drop the single/multi file segments mode and just 
make single file segments a default and the only available mode, but 
this has certain risks because older kernels do not support fallocate() 
on hugetlbfs.

We could further draw a line in the sand, and say that, for example, 
19.11 (or 20.11) will not have legacy mem mode, and everyone should use 
VFIO by now and if you don't it's your own fault.

We could also cut down on the number of fd's we use in single-file 
segments mode by not using locks and simply deleting pages in the 
primary, but yanking out hugepages from under secondaries' feet makes me 
feel uneasy, even if technically by the time that happens, they're not 
supposed to be used anyway. This could mean that the patch is no longer 
necessary because we don't use that many fd's any more.

However, if we are to support all that we support now, the only option 
here is to pile on more workarounds.
  
Thomas Monjalon March 29, 2019, 12:40 p.m. UTC | #5
29/03/2019 13:05, Burakov, Anatoly:
> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
> > 29/03/2019 11:33, Burakov, Anatoly:
> >> On 29-Mar-19 9:50 AM, David Marchand wrote:
> >>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
> >>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >>>
> >>>      Due to internal glibc limitations [1], DPDK may exhaust internal
> >>>      file descriptor limits when using smaller page sizes, which results
> >>>      in inability to use system calls such as select() by user
> >>>      applications.
> >>>
> >>>      While the problem can be worked around using --single-file-segments
> >>>      option, it does not work if --legacy-mem mode is also used. Add a
> >>>      (yet another) EAL flag to disable storing fd's internally. This
> >>>      will sacrifice compability with Virtio with vhost-backend, but
> >>>      at least select() and friends will work.
> >>>
> >>>      [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
> >>>
> >>>
> >>> Sorry, I am a bit lost and I never took the time to look in the new
> >>> memory allocation system.
> >>> This gives the impression that we are accumulating workarounds, between
> >>> legacy-mem, single-file-segments, now no-seg-fds.
> >>
> >> Yep. I don't like this any more than you do, but i think there are users
> >> of all of these, so we can't just drop them willy-nilly. My great hope
> >> was that by now everyone would move on to use VFIO so legacy mem
> >> wouldn't be needed (the only reason it exists is to provide
> >> compatibility for use cases where lots of IOVA-contiguous memory is
> >> required, and VFIO cannot be used), but apparently that is too much to
> >> ask :/
> >>
> >>>
> >>> Iiuc, everything revolves around the need for per page locks.
> >>> Can you summarize why we need them?
> >>
> >> The short answer is multiprocess. We have to be able to map and unmap
> >> pages individually, and for that we need to be sure that we can, in
> >> fact, remove a page because no one else uses it. We also need to store
> >> fd's because virtio with vhost-user backend needs them to work, because
> >> it relies on sharing memory between processes using fd's.
> > 
> > It's a pity adding an option to workaround a limitation of a corner case.
> > It adds complexity that we will have to support forever,
> > and it's even not perfect because of vhost.
> > 
> > Might there be another solution?
> > 
> 
> If there is one, i'm all ears. I don't see any solutions aside from 
> adding limitations.
> 
> For example, we could drop the single/multi file segments mode and just 
> make single file segments a default and the only available mode, but 
> this has certain risks because older kernels do not support fallocate() 
> on hugetlbfs.
> 
> We could further draw a line in the sand, and say that, for example, 
> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use 
> VFIO by now and if you don't it's your own fault.
> 
> We could also cut down on the number of fd's we use in single-file 
> segments mode by not using locks and simply deleting pages in the 
> primary, but yanking out hugepages from under secondaries' feet makes me 
> feel uneasy, even if technically by the time that happens, they're not 
> supposed to be used anyway. This could mean that the patch is no longer 
> necessary because we don't use that many fd's any more.

This last option is interesting. Is it realistic?
  
Burakov, Anatoly March 29, 2019, 1:24 p.m. UTC | #6
On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
> 29/03/2019 13:05, Burakov, Anatoly:
>> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
>>> 29/03/2019 11:33, Burakov, Anatoly:
>>>> On 29-Mar-19 9:50 AM, David Marchand wrote:
>>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
>>>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>>>>
>>>>>       Due to internal glibc limitations [1], DPDK may exhaust internal
>>>>>       file descriptor limits when using smaller page sizes, which results
>>>>>       in inability to use system calls such as select() by user
>>>>>       applications.
>>>>>
>>>>>       While the problem can be worked around using --single-file-segments
>>>>>       option, it does not work if --legacy-mem mode is also used. Add a
>>>>>       (yet another) EAL flag to disable storing fd's internally. This
>>>>>       will sacrifice compability with Virtio with vhost-backend, but
>>>>>       at least select() and friends will work.
>>>>>
>>>>>       [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
>>>>>
>>>>>
>>>>> Sorry, I am a bit lost and I never took the time to look in the new
>>>>> memory allocation system.
>>>>> This gives the impression that we are accumulating workarounds, between
>>>>> legacy-mem, single-file-segments, now no-seg-fds.
>>>>
>>>> Yep. I don't like this any more than you do, but i think there are users
>>>> of all of these, so we can't just drop them willy-nilly. My great hope
>>>> was that by now everyone would move on to use VFIO so legacy mem
>>>> wouldn't be needed (the only reason it exists is to provide
>>>> compatibility for use cases where lots of IOVA-contiguous memory is
>>>> required, and VFIO cannot be used), but apparently that is too much to
>>>> ask :/
>>>>
>>>>>
>>>>> Iiuc, everything revolves around the need for per page locks.
>>>>> Can you summarize why we need them?
>>>>
>>>> The short answer is multiprocess. We have to be able to map and unmap
>>>> pages individually, and for that we need to be sure that we can, in
>>>> fact, remove a page because no one else uses it. We also need to store
>>>> fd's because virtio with vhost-user backend needs them to work, because
>>>> it relies on sharing memory between processes using fd's.
>>>
>>> It's a pity adding an option to workaround a limitation of a corner case.
>>> It adds complexity that we will have to support forever,
>>> and it's even not perfect because of vhost.
>>>
>>> Might there be another solution?
>>>
>>
>> If there is one, i'm all ears. I don't see any solutions aside from
>> adding limitations.
>>
>> For example, we could drop the single/multi file segments mode and just
>> make single file segments a default and the only available mode, but
>> this has certain risks because older kernels do not support fallocate()
>> on hugetlbfs.
>>
>> We could further draw a line in the sand, and say that, for example,
>> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use
>> VFIO by now and if you don't it's your own fault.
>>
>> We could also cut down on the number of fd's we use in single-file
>> segments mode by not using locks and simply deleting pages in the
>> primary, but yanking out hugepages from under secondaries' feet makes me
>> feel uneasy, even if technically by the time that happens, they're not
>> supposed to be used anyway. This could mean that the patch is no longer
>> necessary because we don't use that many fd's any more.
> 
> This last option is interesting. Is it realistic?
> 

I can do it in current release cycle, but i'm not sure if it's too late 
to do such changes. I guess it's OK since the validation cycle is just 
starting? I'll throw something together and see if it crashes and burns.
  
Thomas Monjalon March 29, 2019, 1:34 p.m. UTC | #7
29/03/2019 14:24, Burakov, Anatoly:
> On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
> > 29/03/2019 13:05, Burakov, Anatoly:
> >> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
> >>> 29/03/2019 11:33, Burakov, Anatoly:
> >>>> On 29-Mar-19 9:50 AM, David Marchand wrote:
> >>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
> >>>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >>>>>
> >>>>>       Due to internal glibc limitations [1], DPDK may exhaust internal
> >>>>>       file descriptor limits when using smaller page sizes, which results
> >>>>>       in inability to use system calls such as select() by user
> >>>>>       applications.
> >>>>>
> >>>>>       While the problem can be worked around using --single-file-segments
> >>>>>       option, it does not work if --legacy-mem mode is also used. Add a
> >>>>>       (yet another) EAL flag to disable storing fd's internally. This
> >>>>>       will sacrifice compability with Virtio with vhost-backend, but
> >>>>>       at least select() and friends will work.
> >>>>>
> >>>>>       [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
> >>>>>
> >>>>>
> >>>>> Sorry, I am a bit lost and I never took the time to look in the new
> >>>>> memory allocation system.
> >>>>> This gives the impression that we are accumulating workarounds, between
> >>>>> legacy-mem, single-file-segments, now no-seg-fds.
> >>>>
> >>>> Yep. I don't like this any more than you do, but i think there are users
> >>>> of all of these, so we can't just drop them willy-nilly. My great hope
> >>>> was that by now everyone would move on to use VFIO so legacy mem
> >>>> wouldn't be needed (the only reason it exists is to provide
> >>>> compatibility for use cases where lots of IOVA-contiguous memory is
> >>>> required, and VFIO cannot be used), but apparently that is too much to
> >>>> ask :/
> >>>>
> >>>>>
> >>>>> Iiuc, everything revolves around the need for per page locks.
> >>>>> Can you summarize why we need them?
> >>>>
> >>>> The short answer is multiprocess. We have to be able to map and unmap
> >>>> pages individually, and for that we need to be sure that we can, in
> >>>> fact, remove a page because no one else uses it. We also need to store
> >>>> fd's because virtio with vhost-user backend needs them to work, because
> >>>> it relies on sharing memory between processes using fd's.
> >>>
> >>> It's a pity adding an option to workaround a limitation of a corner case.
> >>> It adds complexity that we will have to support forever,
> >>> and it's even not perfect because of vhost.
> >>>
> >>> Might there be another solution?
> >>>
> >>
> >> If there is one, i'm all ears. I don't see any solutions aside from
> >> adding limitations.
> >>
> >> For example, we could drop the single/multi file segments mode and just
> >> make single file segments a default and the only available mode, but
> >> this has certain risks because older kernels do not support fallocate()
> >> on hugetlbfs.
> >>
> >> We could further draw a line in the sand, and say that, for example,
> >> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use
> >> VFIO by now and if you don't it's your own fault.
> >>
> >> We could also cut down on the number of fd's we use in single-file
> >> segments mode by not using locks and simply deleting pages in the
> >> primary, but yanking out hugepages from under secondaries' feet makes me
> >> feel uneasy, even if technically by the time that happens, they're not
> >> supposed to be used anyway. This could mean that the patch is no longer
> >> necessary because we don't use that many fd's any more.
> > 
> > This last option is interesting. Is it realistic?
> > 
> 
> I can do it in current release cycle, but i'm not sure if it's too late 
> to do such changes. I guess it's OK since the validation cycle is just 
> starting? I'll throw something together and see if it crashes and burns.

OK let's try that.
  
Maxime Coquelin March 29, 2019, 1:35 p.m. UTC | #8
On 3/29/19 2:24 PM, Burakov, Anatoly wrote:
> On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
>> 29/03/2019 13:05, Burakov, Anatoly:
>>> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
>>>> 29/03/2019 11:33, Burakov, Anatoly:
>>>>> On 29-Mar-19 9:50 AM, David Marchand wrote:
>>>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
>>>>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>>>>>
>>>>>>       Due to internal glibc limitations [1], DPDK may exhaust 
>>>>>> internal
>>>>>>       file descriptor limits when using smaller page sizes, which 
>>>>>> results
>>>>>>       in inability to use system calls such as select() by user
>>>>>>       applications.
>>>>>>
>>>>>>       While the problem can be worked around using 
>>>>>> --single-file-segments
>>>>>>       option, it does not work if --legacy-mem mode is also used. 
>>>>>> Add a
>>>>>>       (yet another) EAL flag to disable storing fd's internally. This
>>>>>>       will sacrifice compability with Virtio with vhost-backend, but
>>>>>>       at least select() and friends will work.
>>>>>>
>>>>>>       [1] 
>>>>>> https://mails.dpdk.org/archives/dev/2019-February/124386.html
>>>>>>
>>>>>>
>>>>>> Sorry, I am a bit lost and I never took the time to look in the new
>>>>>> memory allocation system.
>>>>>> This gives the impression that we are accumulating workarounds, 
>>>>>> between
>>>>>> legacy-mem, single-file-segments, now no-seg-fds.
>>>>>
>>>>> Yep. I don't like this any more than you do, but i think there are 
>>>>> users
>>>>> of all of these, so we can't just drop them willy-nilly. My great hope
>>>>> was that by now everyone would move on to use VFIO so legacy mem
>>>>> wouldn't be needed (the only reason it exists is to provide
>>>>> compatibility for use cases where lots of IOVA-contiguous memory is
>>>>> required, and VFIO cannot be used), but apparently that is too much to
>>>>> ask :/
>>>>>
>>>>>>
>>>>>> Iiuc, everything revolves around the need for per page locks.
>>>>>> Can you summarize why we need them?
>>>>>
>>>>> The short answer is multiprocess. We have to be able to map and unmap
>>>>> pages individually, and for that we need to be sure that we can, in
>>>>> fact, remove a page because no one else uses it. We also need to store
>>>>> fd's because virtio with vhost-user backend needs them to work, 
>>>>> because
>>>>> it relies on sharing memory between processes using fd's.

I guess you mean virtio-user.
Have you looked how Qemu does to share the guest memory with external
process like vhost-user backend? It works quite well with 2MB pages,
even with large VMs.

>>>>
>>>> It's a pity adding an option to workaround a limitation of a corner 
>>>> case.
>>>> It adds complexity that we will have to support forever,
>>>> and it's even not perfect because of vhost.
>>>>
>>>> Might there be another solution?
>>>>
>>>
>>> If there is one, i'm all ears. I don't see any solutions aside from
>>> adding limitations.
>>>
>>> For example, we could drop the single/multi file segments mode and just
>>> make single file segments a default and the only available mode, but
>>> this has certain risks because older kernels do not support fallocate()
>>> on hugetlbfs.
>>>
>>> We could further draw a line in the sand, and say that, for example,
>>> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use
>>> VFIO by now and if you don't it's your own fault.
>>>
>>> We could also cut down on the number of fd's we use in single-file
>>> segments mode by not using locks and simply deleting pages in the
>>> primary, but yanking out hugepages from under secondaries' feet makes me
>>> feel uneasy, even if technically by the time that happens, they're not
>>> supposed to be used anyway. This could mean that the patch is no longer
>>> necessary because we don't use that many fd's any more.
>>
>> This last option is interesting. Is it realistic?
>>
> 
> I can do it in current release cycle, but i'm not sure if it's too late 
> to do such changes. I guess it's OK since the validation cycle is just 
> starting? I'll throw something together and see if it crashes and burns.
> 

Reducing the number of FDs is really important IMHO, as the application
using the DPDK library could also need several FDs for other purpose.
  
Burakov, Anatoly March 29, 2019, 2:21 p.m. UTC | #9
On 29-Mar-19 1:34 PM, Thomas Monjalon wrote:
> 29/03/2019 14:24, Burakov, Anatoly:
>> On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
>>> 29/03/2019 13:05, Burakov, Anatoly:
>>>> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
>>>>> 29/03/2019 11:33, Burakov, Anatoly:
>>>>>> On 29-Mar-19 9:50 AM, David Marchand wrote:
>>>>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
>>>>>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>>>>>>
>>>>>>>        Due to internal glibc limitations [1], DPDK may exhaust internal
>>>>>>>        file descriptor limits when using smaller page sizes, which results
>>>>>>>        in inability to use system calls such as select() by user
>>>>>>>        applications.
>>>>>>>
>>>>>>>        While the problem can be worked around using --single-file-segments
>>>>>>>        option, it does not work if --legacy-mem mode is also used. Add a
>>>>>>>        (yet another) EAL flag to disable storing fd's internally. This
>>>>>>>        will sacrifice compability with Virtio with vhost-backend, but
>>>>>>>        at least select() and friends will work.
>>>>>>>
>>>>>>>        [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
>>>>>>>
>>>>>>>
>>>>>>> Sorry, I am a bit lost and I never took the time to look in the new
>>>>>>> memory allocation system.
>>>>>>> This gives the impression that we are accumulating workarounds, between
>>>>>>> legacy-mem, single-file-segments, now no-seg-fds.
>>>>>>
>>>>>> Yep. I don't like this any more than you do, but i think there are users
>>>>>> of all of these, so we can't just drop them willy-nilly. My great hope
>>>>>> was that by now everyone would move on to use VFIO so legacy mem
>>>>>> wouldn't be needed (the only reason it exists is to provide
>>>>>> compatibility for use cases where lots of IOVA-contiguous memory is
>>>>>> required, and VFIO cannot be used), but apparently that is too much to
>>>>>> ask :/
>>>>>>
>>>>>>>
>>>>>>> Iiuc, everything revolves around the need for per page locks.
>>>>>>> Can you summarize why we need them?
>>>>>>
>>>>>> The short answer is multiprocess. We have to be able to map and unmap
>>>>>> pages individually, and for that we need to be sure that we can, in
>>>>>> fact, remove a page because no one else uses it. We also need to store
>>>>>> fd's because virtio with vhost-user backend needs them to work, because
>>>>>> it relies on sharing memory between processes using fd's.
>>>>>
>>>>> It's a pity adding an option to workaround a limitation of a corner case.
>>>>> It adds complexity that we will have to support forever,
>>>>> and it's even not perfect because of vhost.
>>>>>
>>>>> Might there be another solution?
>>>>>
>>>>
>>>> If there is one, i'm all ears. I don't see any solutions aside from
>>>> adding limitations.
>>>>
>>>> For example, we could drop the single/multi file segments mode and just
>>>> make single file segments a default and the only available mode, but
>>>> this has certain risks because older kernels do not support fallocate()
>>>> on hugetlbfs.
>>>>
>>>> We could further draw a line in the sand, and say that, for example,
>>>> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use
>>>> VFIO by now and if you don't it's your own fault.
>>>>
>>>> We could also cut down on the number of fd's we use in single-file
>>>> segments mode by not using locks and simply deleting pages in the
>>>> primary, but yanking out hugepages from under secondaries' feet makes me
>>>> feel uneasy, even if technically by the time that happens, they're not
>>>> supposed to be used anyway. This could mean that the patch is no longer
>>>> necessary because we don't use that many fd's any more.
>>>
>>> This last option is interesting. Is it realistic?
>>>
>>
>> I can do it in current release cycle, but i'm not sure if it's too late
>> to do such changes. I guess it's OK since the validation cycle is just
>> starting? I'll throw something together and see if it crashes and burns.
> 
> OK let's try that.
> 

Bear in mind though that this will not work for legacy mem mode, because 
it cannot use single file segments mode without significant rework of 
page allocation code. So, legacy mem mode will still have this issue, 
unless we make it non-compatible with virtio with vhost-user backend.
  

Patch

diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst
index c63f0f49a..d50a7067e 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -94,6 +94,10 @@  Memory-related options
 
     Free hugepages back to system exactly as they were originally allocated.
 
+*   ``--no-seg-fds``
+
+    Do not store segment file descriptors in EAL.
+
 Other options
 ~~~~~~~~~~~~~
 
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba..ad540f158 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -214,6 +214,25 @@  Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
++ Segment file descriptors
+
+On Linux, in most cases, EAL will store segment file descriptors in EAL. This
+can become a problem when using smaller page sizes due to underlying limitations
+of ``glibc`` library. For example, Linux API calls such as ``select()`` may not
+work correctly because ``glibc`` does not support more than certain number of
+file descriptors.
+
+There are several possible workarounds for this issue. One is to use
+``--single-file-segments`` mode, as that mode will not use a file descriptor per
+each page. This is the recommended way of solving this issue, as it keeps
+compatibility with Virtio with vhost-user backend. This option is not available
+when using ``--legacy-mem`` mode.
+
+The other option is to use ``--no-seg-fds`` command-line parameter,
+to prevent EAL from storing any page file descriptors. This will break
+compatibility with Virtio with vhost-user backend, but this option will work
+with ``--legacy-mem`` mode.
+
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 60eaead8f..96596c6b6 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -63,6 +63,10 @@  struct internal_config {
 	/**< true if storing all pages within single files (per-page-size,
 	 * per-node) non-legacy mode only.
 	 */
+	volatile unsigned no_seg_fds;
+	/**< true if no segment file descriptors are to be stored internally
+	 * by EAL.
+	 */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
 	/** default interrupt mode for VFIO */
 	volatile enum rte_intr_mode vfio_intr_mode;
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 58ee9ae33..94e39aed8 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -67,6 +67,8 @@  enum {
 	OPT_IOVA_MODE_NUM,
 #define OPT_MATCH_ALLOCATIONS  "match-allocations"
 	OPT_MATCH_ALLOCATIONS_NUM,
+#define OPT_NO_SEG_FDS         "no-seg-fds"
+	OPT_NO_SEG_FDS_NUM,
 	OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 13f401684..e8a98c505 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -519,6 +519,7 @@  eal_usage(const char *prgname)
 	       "  --"OPT_LEGACY_MEM"        Legacy memory mode (no dynamic allocation, contiguous segments)\n"
 	       "  --"OPT_SINGLE_FILE_SEGMENTS" Put all hugepage memory in single files\n"
 	       "  --"OPT_MATCH_ALLOCATIONS" Free hugepages exactly as allocated\n"
+	       "  --"OPT_NO_SEG_FDS"        Do not store segment file descriptors in EAL\n"
 	       "\n");
 	/* Allow the application to print its usage message too if hook is set */
 	if ( rte_application_usage_hook ) {
@@ -815,6 +816,9 @@  eal_parse_args(int argc, char **argv)
 		case OPT_MATCH_ALLOCATIONS_NUM:
 			internal_config.match_allocations = 1;
 			break;
+		case OPT_NO_SEG_FDS_NUM:
+			internal_config.no_seg_fds = 1;
+			break;
 
 		default:
 			if (opt < OPT_LONG_MIN_NUM && isprint(opt)) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b6fb183db..420f82a54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1518,6 +1518,10 @@  eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
 	if (internal_config.single_file_segments)
 		return -ENOTSUP;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	/* if list is not allocated, allocate it */
 	if (fd_list[list_idx].len == 0) {
 		int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1539,6 +1543,10 @@  eal_memalloc_set_seg_list_fd(int list_idx, int fd)
 	if (!internal_config.single_file_segments)
 		return -ENOTSUP;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	/* if list is not allocated, allocate it */
 	if (fd_list[list_idx].len == 0) {
 		int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1557,6 +1565,10 @@  eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
 {
 	int fd;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	if (internal_config.in_memory || internal_config.no_hugetlbfs) {
 #ifndef MEMFD_SUPPORTED
 		/* in in-memory or no-huge mode, we rely on memfd support */
@@ -1614,6 +1626,10 @@  eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	if (internal_config.in_memory || internal_config.no_hugetlbfs) {
 #ifndef MEMFD_SUPPORTED
 		/* in in-memory or no-huge mode, we rely on memfd support */
@@ -1679,7 +1695,8 @@  eal_memalloc_init(void)
 	}
 
 	/* initialize all of the fd lists */
-	if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+	if (!internal_config.no_seg_fds &&
+			rte_memseg_list_walk(fd_list_create_walk, NULL))
 		return -1;
 	return 0;
 }