From patchwork Fri Feb 22 17:12:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 50469 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 418022BE9; Fri, 22 Feb 2019 18:12:47 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id E01D42BE5 for ; Fri, 22 Feb 2019 18:12:44 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2019 09:12:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,400,1544515200"; d="scan'208";a="140822149" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 22 Feb 2019 09:12:42 -0800 Received: from sivswdev05.ir.intel.com (sivswdev05.ir.intel.com [10.243.17.64]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id x1MHCfql015820; Fri, 22 Feb 2019 17:12:41 GMT Received: from sivswdev05.ir.intel.com (localhost [127.0.0.1]) by sivswdev05.ir.intel.com with ESMTP id x1MHCf6U006396; Fri, 22 Feb 2019 17:12:41 GMT Received: (from aburakov@localhost) by sivswdev05.ir.intel.com with LOCAL id x1MHCfPU006223; Fri, 22 Feb 2019 17:12:41 GMT From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , iain.barker@oracle.com, edwin.leung@oracle.com Date: Fri, 22 Feb 2019 17:12:41 +0000 Message-Id: <07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 Subject: [dpdk-dev] [PATCH] eal: add option to not store segment fd's X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Due to internal glibc limitations [1], DPDK may exhaust internal file descriptor limits when using smaller page sizes, which results in inability to use system calls such as select() by user applications. While the problem can be worked around using --single-file-segments option, it does not work if --legacy-mem mode is also used. Add a (yet another) EAL flag to disable storing fd's internally. This will sacrifice compability with Virtio with vhost-backend, but at least select() and friends will work. [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html Signed-off-by: Anatoly Burakov --- doc/guides/linux_gsg/linux_eal_parameters.rst | 4 ++++ .../prog_guide/env_abstraction_layer.rst | 19 +++++++++++++++++++ lib/librte_eal/common/eal_internal_cfg.h | 4 ++++ lib/librte_eal/common/eal_options.h | 2 ++ lib/librte_eal/linuxapp/eal/eal.c | 4 ++++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 19 ++++++++++++++++++- 6 files changed, 51 insertions(+), 1 deletion(-) diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst index c63f0f49a..d50a7067e 100644 --- a/doc/guides/linux_gsg/linux_eal_parameters.rst +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst @@ -94,6 +94,10 @@ Memory-related options Free hugepages back to system exactly as they were originally allocated. +* ``--no-seg-fds`` + + Do not store segment file descriptors in EAL. + Other options ~~~~~~~~~~~~~ diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 929d76dba..ad540f158 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -214,6 +214,25 @@ Normally, these options do not need to be changed. can later be mapped into that preallocated VA space (if dynamic memory mode is enabled), and can optionally be mapped into it at startup. ++ Segment file descriptors + +On Linux, in most cases, EAL will store segment file descriptors in EAL. This +can become a problem when using smaller page sizes due to underlying limitations +of ``glibc`` library. For example, Linux API calls such as ``select()`` may not +work correctly because ``glibc`` does not support more than certain number of +file descriptors. + +There are several possible workarounds for this issue. One is to use +``--single-file-segments`` mode, as that mode will not use a file descriptor per +each page. This is the recommended way of solving this issue, as it keeps +compatibility with Virtio with vhost-user backend. This option is not available +when using ``--legacy-mem`` mode. + +The other option is to use ``--no-seg-fds`` command-line parameter, +to prevent EAL from storing any page file descriptors. This will break +compatibility with Virtio with vhost-user backend, but this option will work +with ``--legacy-mem`` mode. + Support for Externally Allocated Memory ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index 60eaead8f..96596c6b6 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -63,6 +63,10 @@ struct internal_config { /**< true if storing all pages within single files (per-page-size, * per-node) non-legacy mode only. */ + volatile unsigned no_seg_fds; + /**< true if no segment file descriptors are to be stored internally + * by EAL. + */ volatile int syslog_facility; /**< facility passed to openlog() */ /** default interrupt mode for VFIO */ volatile enum rte_intr_mode vfio_intr_mode; diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index 58ee9ae33..94e39aed8 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -67,6 +67,8 @@ enum { OPT_IOVA_MODE_NUM, #define OPT_MATCH_ALLOCATIONS "match-allocations" OPT_MATCH_ALLOCATIONS_NUM, +#define OPT_NO_SEG_FDS "no-seg-fds" + OPT_NO_SEG_FDS_NUM, OPT_LONG_MAX_NUM }; diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 13f401684..e8a98c505 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -519,6 +519,7 @@ eal_usage(const char *prgname) " --"OPT_LEGACY_MEM" Legacy memory mode (no dynamic allocation, contiguous segments)\n" " --"OPT_SINGLE_FILE_SEGMENTS" Put all hugepage memory in single files\n" " --"OPT_MATCH_ALLOCATIONS" Free hugepages exactly as allocated\n" + " --"OPT_NO_SEG_FDS" Do not store segment file descriptors in EAL\n" "\n"); /* Allow the application to print its usage message too if hook is set */ if ( rte_application_usage_hook ) { @@ -815,6 +816,9 @@ eal_parse_args(int argc, char **argv) case OPT_MATCH_ALLOCATIONS_NUM: internal_config.match_allocations = 1; break; + case OPT_NO_SEG_FDS_NUM: + internal_config.no_seg_fds = 1; + break; default: if (opt < OPT_LONG_MIN_NUM && isprint(opt)) { diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index b6fb183db..420f82a54 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -1518,6 +1518,10 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) if (internal_config.single_file_segments) return -ENOTSUP; + /* no seg fds mode doesn't support segment fd's */ + if (internal_config.no_seg_fds) + return -ENOTSUP; + /* if list is not allocated, allocate it */ if (fd_list[list_idx].len == 0) { int len = mcfg->memsegs[list_idx].memseg_arr.len; @@ -1539,6 +1543,10 @@ eal_memalloc_set_seg_list_fd(int list_idx, int fd) if (!internal_config.single_file_segments) return -ENOTSUP; + /* no seg fds mode doesn't support segment fd's */ + if (internal_config.no_seg_fds) + return -ENOTSUP; + /* if list is not allocated, allocate it */ if (fd_list[list_idx].len == 0) { int len = mcfg->memsegs[list_idx].memseg_arr.len; @@ -1557,6 +1565,10 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx) { int fd; + /* no seg fds mode doesn't support segment fd's */ + if (internal_config.no_seg_fds) + return -ENOTSUP; + if (internal_config.in_memory || internal_config.no_hugetlbfs) { #ifndef MEMFD_SUPPORTED /* in in-memory or no-huge mode, we rely on memfd support */ @@ -1614,6 +1626,10 @@ eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + /* no seg fds mode doesn't support segment fd's */ + if (internal_config.no_seg_fds) + return -ENOTSUP; + if (internal_config.in_memory || internal_config.no_hugetlbfs) { #ifndef MEMFD_SUPPORTED /* in in-memory or no-huge mode, we rely on memfd support */ @@ -1679,7 +1695,8 @@ eal_memalloc_init(void) } /* initialize all of the fd lists */ - if (rte_memseg_list_walk(fd_list_create_walk, NULL)) + if (!internal_config.no_seg_fds && + rte_memseg_list_walk(fd_list_create_walk, NULL)) return -1; return 0; }