From patchwork Fri Jul 13 10:27:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43014 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5F6A44D27; Fri, 13 Jul 2018 12:27:24 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 0342A4F90 for ; Fri, 13 Jul 2018 12:27:19 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="56238911" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga007.jf.intel.com with ESMTP; 13 Jul 2018 03:27:16 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARFP2007517; Fri, 13 Jul 2018 11:27:15 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARFP4027994; Fri, 13 Jul 2018 11:27:15 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARFgd027990; Fri, 13 Jul 2018 11:27:15 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:07 +0100 Message-Id: <53d64ed31c0cc2f570dd78e0226c3fcd986184e0.1531477505.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 1/9] fbarray: support no-shconf mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When using --no-shconf option, the expectation is that no multiprocess will be supported as no shared files are created. However, fbarray still creates some shared files that prevent multiple processes with the same prefix from starting. Fix this by avoiding creating shared files whenever noshconf option is specified. Since virtual areas we get from eal_get_virtual_area() are read-only, remap them as writable. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++--------- 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 977174c4f..43caf3ced 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -705,39 +705,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, if (data == NULL) goto fail; - eal_get_fbarray_path(path, sizeof(path), name); + if (internal_config.no_shconf) { + /* remap virtual area as writable */ + void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (new_data == MAP_FAILED) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n", + __func__, strerror(errno)); + goto fail; + } + } else { + eal_get_fbarray_path(path, sizeof(path), name); - /* - * Each fbarray is unique to process namespace, i.e. the filename - * depends on process prefix. Try to take out a lock and see if we - * succeed. If we don't, someone else is using it already. - */ - fd = open(path, O_CREAT | O_RDWR, 0600); - if (fd < 0) { - RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__, - path, strerror(errno)); - rte_errno = errno; - goto fail; - } else if (flock(fd, LOCK_EX | LOCK_NB)) { - RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__, - path, strerror(errno)); - rte_errno = EBUSY; - goto fail; - } + /* + * Each fbarray is unique to process namespace, i.e. the + * filename depends on process prefix. Try to take out a lock + * and see if we succeed. If we don't, someone else is using it + * already. + */ + fd = open(path, O_CREAT | O_RDWR, 0600); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", + __func__, path, strerror(errno)); + rte_errno = errno; + goto fail; + } else if (flock(fd, LOCK_EX | LOCK_NB)) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", + __func__, path, strerror(errno)); + rte_errno = EBUSY; + goto fail; + } - /* take out a non-exclusive lock, so that other processes could still - * attach to it, but no other process could reinitialize it. - */ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; - goto fail; - } + /* take out a non-exclusive lock, so that other processes could + * still attach to it, but no other process could reinitialize + * it. + */ + if (flock(fd, LOCK_SH | LOCK_NB)) { + rte_errno = errno; + goto fail; + } - if (resize_and_map(fd, data, mmap_len)) - goto fail; + if (resize_and_map(fd, data, mmap_len)) + goto fail; - /* we've mmap'ed the file, we can now close the fd */ - close(fd); + /* we've mmap'ed the file, we can now close the fd */ + close(fd); + } /* initialize the data */ memset(data, 0, mmap_len); From patchwork Fri Jul 13 10:27:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43015 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 60EFF5592; Fri, 13 Jul 2018 12:27:26 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 34EA54F90 for ; Fri, 13 Jul 2018 12:27:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="64427675" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 13 Jul 2018 03:27:16 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARFoh007520; Fri, 13 Jul 2018 11:27:15 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARFGZ028001; Fri, 13 Jul 2018 11:27:15 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARFNB027997; Fri, 13 Jul 2018 11:27:15 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:08 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 2/9] ipc: add support for no-shconf mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" IPC is an inter-process communication mechanism. Since no secondaries can ever be expected to run in no-shconf mode, IPC will be useless, so do not enable it in the first place. In the interests of API usage convenience, we will still allow registering callbacks, but obviously they won't ever be triggered. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index f010ef59e..c19b4b406 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -626,6 +626,14 @@ rte_mp_channel_init(void) int dir_fd; pthread_t mp_handle_tid, async_reply_handle_tid; + /* in no shared files mode, we do not have secondary processes support, + * so no need to initialize IPC. + */ + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n"); + return 0; + } + /* create filter path */ create_socket_path("*", path, sizeof(path)); strlcpy(mp_filter, basename(path), sizeof(mp_filter)); @@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply, if (check_input(req) == false) return -1; + + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + if (gettimeofday(&now, NULL) < 0) { RTE_LOG(ERR, EAL, "Faile to get current time\n"); rte_errno = errno; @@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, if (check_input(req) == false) return -1; + + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + if (gettimeofday(&now, NULL) < 0) { RTE_LOG(ERR, EAL, "Faile to get current time\n"); rte_errno = errno; @@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer) return -1; } + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + return mp_send(msg, peer, MP_REP); } From patchwork Fri Jul 13 10:27:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43020 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 47ABC5F1A; Fri, 13 Jul 2018 12:27:35 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 7B10B4F90 for ; Fri, 13 Jul 2018 12:27:22 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="215716965" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga004.jf.intel.com with ESMTP; 13 Jul 2018 03:27:16 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARG5Y007523; Fri, 13 Jul 2018 11:27:16 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARFML028008; Fri, 13 Jul 2018 11:27:15 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARFow028004; Fri, 13 Jul 2018 11:27:15 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:09 +0100 Message-Id: <1657ee590afbaddc3fb66c3dbcddd767bf45b6e5.1531477505.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 3/9] eal: add support for no-shconf for hugepage info X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Do not create any shared hugepage size info files if we were asked to not create any shared files. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 ++++ lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c index 836feb672..1e8f5df23 100644 --- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c @@ -101,6 +101,10 @@ eal_hugepage_info_init(void) hpi->num_pages[0] = num_buffers; hpi->lock_descriptor = fd; + /* for no shared files mode, do not create shared memory config */ + if (internal_config.no_shconf) + return 0; + tmp_hpi = create_shared_memory(eal_hugepage_info_path(), sizeof(internal_config.hugepage_info)); if (tmp_hpi == NULL ) { diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 7eca711ba..7f8e2fd9c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -446,6 +446,10 @@ eal_hugepage_info_init(void) if (hugepage_info_init() < 0) return -1; + /* for no shared files mode, we're done */ + if (internal_config.no_shconf) + return 0; + hpi = &internal_config.hugepage_info[0]; tmp_hpi = create_shared_memory(eal_hugepage_info_path(), From patchwork Fri Jul 13 10:27:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43017 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 601F75B14; Fri, 13 Jul 2018 12:27:30 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id CD0B54F9C for ; Fri, 13 Jul 2018 12:27:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="66664259" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 13 Jul 2018 03:27:16 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARGoA007526; Fri, 13 Jul 2018 11:27:16 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARGjW028019; Fri, 13 Jul 2018 11:27:16 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARGJh028015; Fri, 13 Jul 2018 11:27:16 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:10 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 4/9] eal: add support for no-shconf in hugepage data file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Do not create a shared hugepage data file if we were asked to not create any shared files. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 5d3c8831b..ddfa8b133 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -521,7 +521,18 @@ static void * create_shared_memory(const char *filename, const size_t mem_size) { void *retval; - int fd = open(filename, O_CREAT | O_RDWR, 0666); + int fd; + + /* if no shared files mode is used, create anonymous memory instead */ + if (internal_config.no_shconf) { + retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (retval == MAP_FAILED) + return NULL; + return retval; + } + + fd = open(filename, O_CREAT | O_RDWR, 0666); if (fd < 0) return NULL; if (ftruncate(fd, mem_size) < 0) { From patchwork Fri Jul 13 10:27:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43022 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7E86D5F0D; Fri, 13 Jul 2018 12:28:05 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 5BFB85B3C for ; Fri, 13 Jul 2018 12:28:03 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:28:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="54756313" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga008.fm.intel.com with ESMTP; 13 Jul 2018 03:27:16 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARGZq007529; Fri, 13 Jul 2018 11:27:16 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARG45028027; Fri, 13 Jul 2018 11:27:16 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARG0S028023; Fri, 13 Jul 2018 11:27:16 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:11 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 5/9] eal: do not create runtime dir in no-shconf mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Now that the rest of the EAL is adjusted to not create any shared files, prevent runtime directory from ever being created. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/bsdapp/eal/eal.c | 3 ++- lib/librte_eal/linuxapp/eal/eal.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index dc279542d..13b6f8ae1 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv) } /* create runtime data directory */ - if (eal_create_runtime_dir() < 0) { + if (internal_config.no_shconf == 0 && + eal_create_runtime_dir() < 0) { rte_eal_init_alert("Cannot create runtime directory\n"); rte_errno = EACCES; return -1; diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index ec7cea55d..191960caa 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -832,7 +832,8 @@ rte_eal_init(int argc, char **argv) } /* create runtime data directory */ - if (eal_create_runtime_dir() < 0) { + if (internal_config.no_shconf == 0 && + eal_create_runtime_dir() < 0) { rte_eal_init_alert("Cannot create runtime directory\n"); rte_errno = EACCES; return -1; From patchwork Fri Jul 13 10:27:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43019 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E48515B3E; Fri, 13 Jul 2018 12:27:33 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 295714CE4 for ; Fri, 13 Jul 2018 12:27:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="64427677" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 13 Jul 2018 03:27:16 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARGST007532; Fri, 13 Jul 2018 11:27:16 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARGXJ028034; Fri, 13 Jul 2018 11:27:16 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARG3k028030; Fri, 13 Jul 2018 11:27:16 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:12 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 6/9] mem: add support for hugepage-unlink mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Unlink hugepages after creating them, to honor the hugepage-unlink mode. We cannot resize non-existing files, so make single file segments explicitly unsupported. Signed-off-by: Anatoly Burakov --- Notes: v1->v2: - Move check for hugepage unlink into this patch, to be consistent with commit message RFC->v1: - Use --huge-unlink only RFC->v1: - Use --huge-unlink only lib/librte_eal/common/eal_common_options.c | 6 ++++++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 45ea01a8b..df5d53648 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1332,6 +1332,12 @@ eal_check_common_options(struct internal_config *internal_cfg) " is only supported in non-legacy memory mode\n"); return -1; } + if (internal_cfg->single_file_segments && + internal_cfg->hugepage_unlink) { + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " + "not compatible with --"OPT_HUGE_UNLINK"\n"); + return -1; + } return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 69604f823..d610923b8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -489,6 +489,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, __func__, strerror(errno)); goto resized; } + if (internal_config.hugepage_unlink) { + if (unlink(path)) { + RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + __func__, strerror(errno)); + goto resized; + } + } } /* @@ -587,7 +594,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, /* ignore failure, can't make it any worse */ } else { /* only remove file if we can take out a write lock */ - if (lock(fd, LOCK_EX) == 1) + if (internal_config.hugepage_unlink == 0 && + lock(fd, LOCK_EX) == 1) unlink(path); close(fd); } @@ -612,6 +620,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, return -1; } + /* if we've already unlinked the page, nothing needs to be done */ + if (internal_config.hugepage_unlink) { + memset(ms, 0, sizeof(*ms)); + return 0; + } + /* if we are not in single file segments mode, we're going to unmap the * segment and thus drop the lock on original fd, but hugepage dir is * now locked so we can take out another one without races. From patchwork Fri Jul 13 10:27:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43021 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A39E95F25; Fri, 13 Jul 2018 12:27:36 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 1E4204CE4 for ; Fri, 13 Jul 2018 12:27:22 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="215716977" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga004.jf.intel.com with ESMTP; 13 Jul 2018 03:27:17 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARGJU007535; Fri, 13 Jul 2018 11:27:16 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARG25028044; Fri, 13 Jul 2018 11:27:16 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARGEf028040; Fri, 13 Jul 2018 11:27:16 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:13 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 7/9] eal: add --in-memory option X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This command-line option will cause DPDK to operate entirely in memory and not create any shared files at runtime, including any shared configuration or hugetlbfs files. This is useful for debug purposes, as well as for certain use cases like containers or automatic memory cleanup. Currently, this option acts as a strict superset of --no-shconf and --huge-unlink commands. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Do not deprecate old options, instead just coopt them lib/librte_eal/common/eal_common_options.c | 18 ++++++++++++++---- lib/librte_eal/common/eal_internal_cfg.h | 4 ++++ lib/librte_eal/common/eal_options.h | 2 ++ 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index df5d53648..f308b57c3 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -66,6 +66,7 @@ eal_long_options[] = { {OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM }, {OPT_NO_PCI, 0, NULL, OPT_NO_PCI_NUM }, {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM }, + {OPT_IN_MEMORY, 0, NULL, OPT_IN_MEMORY_NUM }, {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM }, {OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM }, {OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM }, @@ -1170,6 +1171,13 @@ eal_parse_common_option(int opt, const char *optarg, conf->no_shconf = 1; break; + case OPT_IN_MEMORY_NUM: + conf->in_memory = 1; + /* in-memory is a superset of noshconf and huge-unlink */ + conf->no_shconf = 1; + conf->hugepage_unlink = 1; + break; + case OPT_PROC_TYPE_NUM: conf->process_type = eal_parse_proc_type(optarg); break; @@ -1321,8 +1329,8 @@ eal_check_common_options(struct internal_config *internal_cfg) "be specified together with --"OPT_NO_HUGE"\n"); return -1; } - - if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) { + if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink && + !internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " "be specified together with --"OPT_NO_HUGE"\n"); return -1; @@ -1330,12 +1338,12 @@ eal_check_common_options(struct internal_config *internal_cfg) if (internal_config.force_socket_limits && internal_config.legacy_mem) { RTE_LOG(ERR, EAL, "Option --"OPT_SOCKET_LIMIT " is only supported in non-legacy memory mode\n"); - return -1; } if (internal_cfg->single_file_segments && internal_cfg->hugepage_unlink) { RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " - "not compatible with --"OPT_HUGE_UNLINK"\n"); + "not compatible with neither --"OPT_IN_MEMORY" nor " + "--"OPT_HUGE_UNLINK"\n"); return -1; } @@ -1386,6 +1394,8 @@ eal_common_usage(void) " Set specific log level\n" " -v Display version information on startup\n" " -h, --help This help\n" + " --"OPT_IN_MEMORY" Operate entirely in memory. This will \n" + " disable secondary process support\n" "\nEAL options for DEBUG use only:\n" " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index d66cd0313..00ee6e06e 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -41,6 +41,10 @@ struct internal_config { volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping * instead of native TSC */ volatile unsigned no_shconf; /**< true if there is no shared config */ + volatile unsigned in_memory; + /**< true if DPDK should operate entirely in-memory and not create any + * shared files or runtime data. + */ volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */ volatile enum rte_proc_type_t process_type; /**< multi-process proc type */ /** true to try allocating memory on specific sockets */ diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index 6d92f64a8..96e166787 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -45,6 +45,8 @@ enum { OPT_NO_PCI_NUM, #define OPT_NO_SHCONF "no-shconf" OPT_NO_SHCONF_NUM, +#define OPT_IN_MEMORY "in-memory" + OPT_IN_MEMORY_NUM, #define OPT_SOCKET_MEM "socket-mem" OPT_SOCKET_MEM_NUM, #define OPT_SOCKET_LIMIT "socket-limit" From patchwork Fri Jul 13 10:27:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43018 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4853E5B1E; Fri, 13 Jul 2018 12:27:32 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id CD1905323 for ; Fri, 13 Jul 2018 12:27:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="245411941" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 13 Jul 2018 03:27:17 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARGuu007538; Fri, 13 Jul 2018 11:27:16 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARGN9028051; Fri, 13 Jul 2018 11:27:16 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARGKA028047; Fri, 13 Jul 2018 11:27:16 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Neil Horman , John McNamara , Marko Kovacevic , ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:14 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 8/9] doc: add deprecation notice for EAL command line options X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Options --no-shconf and --huge-unlink will be removed, and replaced with --in-memory option, which will be a superset of these two, and an offially support method to run DPDK entirely in memory. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Add this patch doc/guides/rel_notes/deprecation.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5de59833d..dd1b5c5d8 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -8,6 +8,11 @@ API and ABI deprecation notices are to be posted here. Deprecation Notices ------------------- +* eal: command-line options ``--no-shconf`` and ``--huge-unlink`` will be + removed, and replaced with a single option ``--in-memory``, which will + enable DPDK to operate entirely in memory, without creating any files on any + filesystems. + * eal: DPDK runtime configuration file (located at ``/var/run/._config``) will be moved. The new path will be as follows: From patchwork Fri Jul 13 10:27:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 43016 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 767B258C4; Fri, 13 Jul 2018 12:27:28 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 331AC4CE4 for ; Fri, 13 Jul 2018 12:27:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 03:27:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="54095974" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 13 Jul 2018 03:27:17 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DARHBG007541; Fri, 13 Jul 2018 11:27:17 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DARGIr028062; Fri, 13 Jul 2018 11:27:16 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DARG82028058; Fri, 13 Jul 2018 11:27:16 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 11:27:15 +0100 Message-Id: <8fef5019ebdb9d941c1ade936fcdda1ace303738.1531477505.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 9/9] mem: support in-memory mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement the final piece of the in-memory mode puzzle - enable running DPDK entirely in memory, without creating any files. To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work without hugetlbfs mountpoints. In order to enable this, a few things needed to be changed. First of all, we need to allow empty hugetlbfs mountpoints in hugepage_info, and handle them correctly (by not trying to create any files and lock any directories). Next, we need to reorder the mapping sequence, because the page is not really allocated until the page fault, and we cannot get its IOVA address before we trigger the page fault. Finally, decide at compile time whether we are going to be supporting anonymous hugepages or not, because we cannot check for it at runtime. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the kernel requirements down to 3.8, and does not impose any restrictions glibc (as far as i known). Unfortunately, there's a bit of an issue with this approach, because mmap() is stupid and will happily ignore unsupported arguments. This means that if the binary were to be compiled on a 3.8+ kernel but run on a pre-3.8 kernel (such as currently supported minimum of 3.2), then most likely the memory would be allocated using regular pages, causing unthinkable performance degradation. No solution to this problem is currently known to me. .../linuxapp/eal/eal_hugepage_info.c | 91 +++++++----- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 130 +++++++++++------- lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +- 3 files changed, 139 insertions(+), 85 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 7f8e2fd9c..3a7d4b222 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -18,6 +18,8 @@ #include #include +#include /* for hugetlb-related flags */ + #include #include #include @@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b) return hpi_b->hugepage_sz - hpi_a->hugepage_sz; } +static void +calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent) +{ + uint64_t total_pages = 0; + unsigned int i; + + /* + * first, try to put all hugepages into relevant sockets, but + * if first attempts fails, fall back to collecting all pages + * in one socket and sorting them later + */ + total_pages = 0; + /* we also don't want to do this for legacy init */ + if (!internal_config.legacy_mem) + for (i = 0; i < rte_socket_count(); i++) { + int socket = rte_socket_id_by_idx(i); + unsigned int num_pages = + get_num_hugepages_on_node( + dirent->d_name, socket); + hpi->num_pages[socket] = num_pages; + total_pages += num_pages; + } + /* + * we failed to sort memory from the get go, so fall + * back to old way + */ + if (total_pages == 0) { + hpi->num_pages[0] = get_num_hugepages(dirent->d_name); + +#ifndef RTE_ARCH_64 + /* for 32-bit systems, limit number of hugepages to + * 1GB per page size */ + hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0], + RTE_PGSIZE_1G / hpi->hugepage_sz); +#endif + } +} + static int hugepage_info_init(void) { const char dirent_start_text[] = "hugepages-"; const size_t dirent_start_len = sizeof(dirent_start_text) - 1; - unsigned int i, total_pages, num_sizes = 0; + unsigned int i, num_sizes = 0; DIR *dir; struct dirent *dirent; @@ -355,6 +395,22 @@ hugepage_info_init(void) "%" PRIu64 " reserved, but no mounted " "hugetlbfs found for that size\n", num_pages, hpi->hugepage_sz); + /* if we have kernel support for reserving hugepages + * through mmap, and we're in in-memory mode, treat this + * page size as valid. we cannot be in legacy mode at + * this point because we've checked this earlier in the + * init process. + */ +#ifdef MAP_HUGE_SHIFT + if (internal_config.in_memory) { + RTE_LOG(DEBUG, EAL, "In-memory mode enabled, " + "hugepages of size %" PRIu64 " bytes " + "will be allocated anonymously\n", + hpi->hugepage_sz); + calc_num_pages(hpi, dirent); + num_sizes++; + } +#endif continue; } @@ -371,35 +427,7 @@ hugepage_info_init(void) if (clear_hugedir(hpi->hugedir) == -1) break; - /* - * first, try to put all hugepages into relevant sockets, but - * if first attempts fails, fall back to collecting all pages - * in one socket and sorting them later - */ - total_pages = 0; - /* we also don't want to do this for legacy init */ - if (!internal_config.legacy_mem) - for (i = 0; i < rte_socket_count(); i++) { - int socket = rte_socket_id_by_idx(i); - unsigned int num_pages = - get_num_hugepages_on_node( - dirent->d_name, socket); - hpi->num_pages[socket] = num_pages; - total_pages += num_pages; - } - /* - * we failed to sort memory from the get go, so fall - * back to old way - */ - if (total_pages == 0) - hpi->num_pages[0] = get_num_hugepages(dirent->d_name); - -#ifndef RTE_ARCH_64 - /* for 32-bit systems, limit number of hugepages to - * 1GB per page size */ - hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0], - RTE_PGSIZE_1G / hpi->hugepage_sz); -#endif + calc_num_pages(hpi, dirent); num_sizes++; } @@ -423,8 +451,7 @@ hugepage_info_init(void) for (j = 0; j < RTE_MAX_NUMA_NODES; j++) num_pages += hpi->num_pages[j]; - if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 && - num_pages > 0) + if (num_pages > 0) return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index d610923b8..10c959da4 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -28,6 +28,7 @@ #include #endif #include +#include /* for hugetlb-related mmap flags */ #include #include @@ -41,6 +42,15 @@ #include "eal_memalloc.h" #include "eal_private.h" +const int anonymous_hugepages_supported = +#ifdef MAP_HUGE_SHIFT + 1; +#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT +#else + 0; +#define RTE_MAP_HUGE_SHIFT 26 +#endif + /* * not all kernel version support fallocate on hugetlbfs, so fall back to * ftruncate and disallow deallocation if fallocate is not supported. @@ -461,6 +471,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, int cur_socket_id = 0; #endif uint64_t map_offset; + rte_iova_t iova; + void *va; char path[PATH_MAX]; int ret = 0; int fd; @@ -468,43 +480,57 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, int flags; void *new_addr; - /* takes out a read lock on segment or segment list */ - fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); - if (fd < 0) { - RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n"); - return -1; - } - alloc_sz = hi->hugepage_sz; - if (internal_config.single_file_segments) { - map_offset = seg_idx * alloc_sz; - ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset, - alloc_sz, true); - if (ret < 0) - goto resized; + if (internal_config.in_memory && anonymous_hugepages_supported) { + int log2, flags; + + log2 = rte_log2_u32(alloc_sz); + /* as per mmap() manpage, all page sizes are log2 of page size + * shifted by MAP_HUGE_SHIFT + */ + flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED | + MAP_PRIVATE | MAP_ANONYMOUS; + fd = -1; + va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0); } else { - map_offset = 0; - if (ftruncate(fd, alloc_sz) < 0) { - RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n", - __func__, strerror(errno)); - goto resized; + /* takes out a read lock on segment or segment list */ + fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); + if (fd < 0) { + RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n"); + return -1; } - if (internal_config.hugepage_unlink) { - if (unlink(path)) { - RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + + if (internal_config.single_file_segments) { + map_offset = seg_idx * alloc_sz; + ret = resize_hugefile(fd, path, list_idx, seg_idx, + map_offset, alloc_sz, true); + if (ret < 0) + goto resized; + } else { + map_offset = 0; + if (ftruncate(fd, alloc_sz) < 0) { + RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n", __func__, strerror(errno)); goto resized; } + if (internal_config.hugepage_unlink) { + if (unlink(path)) { + RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + __func__, strerror(errno)); + goto resized; + } + } } + + /* + * map the segment, and populate page tables, the kernel fills + * this segment with zeros if it's a new page. + */ + va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, + map_offset); } - /* - * map the segment, and populate page tables, the kernel fills this - * segment with zeros if it's a new page. - */ - void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset); - if (va == MAP_FAILED) { RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__, strerror(errno)); @@ -519,24 +545,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, goto resized; } - rte_iova_t iova = rte_mem_virt2iova(addr); - if (iova == RTE_BAD_PHYS_ADDR) { - RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n", - __func__); - goto mapped; - } - -#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES - move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0); - - if (cur_socket_id != socket_id) { - RTE_LOG(DEBUG, EAL, - "%s(): allocation happened on wrong socket (wanted %d, got %d)\n", - __func__, socket_id, cur_socket_id); - goto mapped; - } -#endif - /* In linux, hugetlb limitations, like cgroup, are * enforced at fault time instead of mmap(), even * with the option of MAP_POPULATE. Kernel will send @@ -549,9 +557,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, (unsigned int)(alloc_sz >> 20)); goto mapped; } - /* for non-single file segments, we can close fd here */ - if (!internal_config.single_file_segments) - close(fd); /* we need to trigger a write to the page to enforce page fault and * ensure that page is accessible to us, but we can't overwrite value @@ -560,6 +565,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, */ *(volatile int *)addr = *(volatile int *)addr; + iova = rte_mem_virt2iova(addr); + if (iova == RTE_BAD_PHYS_ADDR) { + RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n", + __func__); + goto mapped; + } + +#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES + move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0); + + if (cur_socket_id != socket_id) { + RTE_LOG(DEBUG, EAL, + "%s(): allocation happened on wrong socket (wanted %d, got %d)\n", + __func__, socket_id, cur_socket_id); + goto mapped; + } +#endif + /* for non-single file segments that aren't in-memory, we can close fd + * here */ + if (!internal_config.single_file_segments && !internal_config.in_memory) + close(fd); + ms->addr = addr; ms->hugepage_sz = alloc_sz; ms->len = alloc_sz; @@ -595,6 +622,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, } else { /* only remove file if we can take out a write lock */ if (internal_config.hugepage_unlink == 0 && + internal_config.in_memory == 0 && lock(fd, LOCK_EX) == 1) unlink(path); close(fd); @@ -705,7 +733,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg) * during init, we already hold a write lock, so don't try to take out * another one. */ - if (wa->hi->lock_descriptor == -1) { + if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) { dir_fd = open(wa->hi->hugedir, O_RDONLY); if (dir_fd < 0) { RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n", @@ -809,7 +837,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg) * during init, we already hold a write lock, so don't try to take out * another one. */ - if (wa->hi->lock_descriptor == -1) { + if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) { dir_fd = open(wa->hi->hugedir, O_RDONLY); if (dir_fd < 0) { RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n", diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index ddfa8b133..dbf19499e 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1088,8 +1088,7 @@ get_socket_mem_size(int socket) for (i = 0; i < internal_config.num_hugepage_sizes; i++){ struct hugepage_info *hpi = &internal_config.hugepage_info[i]; - if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0) - size += hpi->hugepage_sz * hpi->num_pages[socket]; + size += hpi->hugepage_sz * hpi->num_pages[socket]; } return size;