From patchwork Thu Jun 28 11:41:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 41817 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D34BC5F54; Thu, 28 Jun 2018 13:41:56 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id DEA615398; Thu, 28 Jun 2018 13:41:52 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2018 04:41:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,283,1526367600"; d="scan'208";a="62194898" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 28 Jun 2018 04:41:50 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5SBfntZ010610; Thu, 28 Jun 2018 12:41:49 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5SBfnNq008953; Thu, 28 Jun 2018 12:41:49 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5SBfnPT008948; Thu, 28 Jun 2018 12:41:49 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , stable@dpdk.org Date: Thu, 28 Jun 2018 12:41:47 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 1/3] eal/bsdapp: fix segment index display X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Segment index was set to 0 at start but was never incremented. This has no consequences other than displayed number of segments allocated at initialization. Fix this by incrementing it after displaying. Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index a5e034789..ca06de2f8 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -200,7 +200,7 @@ rte_eal_hugepage_init(void) RTE_LOG(INFO, EAL, "Mapped memory segment %u @ %p: physaddr:0x%" PRIx64", len %zu\n", - seg_idx, addr, physaddr, page_sz); + seg_idx++, addr, physaddr, page_sz); total_mem += seg->len; } From patchwork Thu Jun 28 11:41:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 41816 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A03205F2F; Thu, 28 Jun 2018 13:41:53 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 0DBE75B2A for ; Thu, 28 Jun 2018 13:41:51 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2018 04:41:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,283,1526367600"; d="scan'208";a="50579577" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2018 04:41:51 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5SBfnXs010613; Thu, 28 Jun 2018 12:41:49 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5SBfnm3008961; Thu, 28 Jun 2018 12:41:49 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5SBfnbO008957; Thu, 28 Jun 2018 12:41:49 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson Date: Thu, 28 Jun 2018 12:41:48 +0100 Message-Id: <385f43246053aaf19d2377dbb9f91dbf8c314e35.1530186042.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 2/3] eal/bsdapp: concatenate adjacent segments X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Previously, memory allocator always left holes between mapped contigmem segments, even if they were IOVA-contiguous. Fix this by remembering last IOVA address and memseg index, and checking against those when mapping new contigmem segments. Signed-off-by: Anatoly Burakov --- Notes: v2: - Fix comparison when deciding if hole is needed lib/librte_eal/bsdapp/eal/eal_memory.c | 48 ++++++++++++++++---------- 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index ca06de2f8..7bf05c760 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -104,6 +104,8 @@ rte_eal_hugepage_init(void) /* map all hugepages and sort them */ for (i = 0; i < internal_config.num_hugepage_sizes; i ++){ struct hugepage_info *hpi; + rte_iova_t prev_end = 0; + int prev_ms_idx = -1; uint64_t page_sz, mem_needed; unsigned int n_pages, max_pages; @@ -124,10 +126,27 @@ rte_eal_hugepage_init(void) int error; size_t sysctl_size = sizeof(physaddr); char physaddr_str[64]; + bool is_adjacent; + + /* first, check if this segment is IOVA-adjacent to + * the previous one. + */ + snprintf(physaddr_str, sizeof(physaddr_str), + "hw.contigmem.physaddr.%d", j); + error = sysctlbyname(physaddr_str, &physaddr, + &sysctl_size, NULL, 0); + if (error < 0) { + RTE_LOG(ERR, EAL, "Failed to get physical addr for buffer %u " + "from %s\n", j, hpi->hugedir); + return -1; + } + + is_adjacent = prev_end != 0 && physaddr == prev_end; + prev_end = physaddr + hpi->hugepage_sz; for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) { - bool empty; + bool empty, need_hole; msl = &mcfg->memsegs[msl_idx]; arr = &msl->memseg_arr; @@ -136,20 +155,23 @@ rte_eal_hugepage_init(void) empty = arr->count == 0; - /* we need 1, plus hole if not empty */ + /* we need a hole if this isn't an empty memseg + * list, and if previous segment was not + * adjacent to current one. + */ + need_hole = !empty && !is_adjacent; + + /* we need 1, plus hole if not adjacent */ ms_idx = rte_fbarray_find_next_n_free(arr, - 0, 1 + (empty ? 1 : 0)); + 0, 1 + (need_hole ? 1 : 0)); /* memseg list is full? */ if (ms_idx < 0) continue; - /* leave some space between memsegs, they are - * not IOVA contiguous, so they shouldn't be VA - * contiguous either. - */ - if (!empty) + if (need_hole && prev_ms_idx == ms_idx - 1) ms_idx++; + prev_ms_idx = ms_idx; break; } @@ -178,16 +200,6 @@ rte_eal_hugepage_init(void) return -1; } - snprintf(physaddr_str, sizeof(physaddr_str), "hw.contigmem" - ".physaddr.%d", j); - error = sysctlbyname(physaddr_str, &physaddr, &sysctl_size, - NULL, 0); - if (error < 0) { - RTE_LOG(ERR, EAL, "Failed to get physical addr for buffer %u " - "from %s\n", j, hpi->hugedir); - return -1; - } - seg->addr = addr; seg->iova = physaddr; seg->hugepage_sz = page_sz; From patchwork Thu Jun 28 11:41:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 41818 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E99B76CC3; Thu, 28 Jun 2018 13:41:58 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 26D235B2A for ; Thu, 28 Jun 2018 13:41:52 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2018 04:41:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,283,1526367600"; d="scan'208";a="52926684" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 28 Jun 2018 04:41:50 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5SBfnk7010616; Thu, 28 Jun 2018 12:41:49 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5SBfniw008971; Thu, 28 Jun 2018 12:41:49 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5SBfn7E008965; Thu, 28 Jun 2018 12:41:49 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson Date: Thu, 28 Jun 2018 12:41:49 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 3/3] eal: make memory segment preallocation OS-specific X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In the perfect world, it wouldn't matter how much memory was preallocated because most of it was always going to be private anonymous zero-page mappings for the duration of the program. However, in practice, due to peculiarities of FreeBSD, we need to additionally limit memory allocation there. This patch moves the segment preallocation to EAL private functions that will be implemented by an OS-specific EAL rather than being in the common memory-related code. Since there is no support for growing/shrinking memory use at runtime on FreeBSD anyway, this does not inhibit any functionality but makes core dumps faster even on default settings. Signed-off-by: Anatoly Burakov --- Notes: For Linuxapp, this is 99% code move (aside from slight changes due to code deduplication between Linuxapp EAL and old common memory code), while for FreeBSD it's mostly code move but with changes due to dropping 32-bit code and implementing FreeBSD-specific limits on memory preallocation outlined in the commit. lib/librte_eal/bsdapp/eal/eal_memory.c | 215 ++++++++++++ lib/librte_eal/common/eal_common_memory.c | 386 +--------------------- lib/librte_eal/common/eal_private.h | 12 + lib/librte_eal/linuxapp/eal/eal_memory.c | 341 +++++++++++++++++++ 4 files changed, 569 insertions(+), 385 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index 7bf05c760..16d2bc7c3 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -12,6 +12,7 @@ #include #include +#include #include #include #include "eal_private.h" @@ -300,3 +301,217 @@ rte_eal_using_phys_addrs(void) { return 0; } + +static uint64_t +get_mem_amount(uint64_t page_sz, uint64_t max_mem) +{ + uint64_t area_sz, max_pages; + + /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ + max_pages = RTE_MAX_MEMSEG_PER_LIST; + max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); + + area_sz = RTE_MIN(page_sz * max_pages, max_mem); + + /* make sure the list isn't smaller than the page size */ + area_sz = RTE_MAX(area_sz, page_sz); + + return RTE_ALIGN(area_sz, page_sz); +} + +#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" +static int +alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, + int n_segs, int socket_id, int type_msl_idx) +{ + char name[RTE_FBARRAY_NAME_LEN]; + + snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, + type_msl_idx); + if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, + sizeof(struct rte_memseg))) { + RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", + rte_strerror(rte_errno)); + return -1; + } + + msl->page_sz = page_sz; + msl->socket_id = socket_id; + msl->base_va = NULL; + + RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", + (size_t)page_sz >> 10, socket_id); + + return 0; +} + +static int +alloc_va_space(struct rte_memseg_list *msl) +{ + uint64_t page_sz; + size_t mem_sz; + void *addr; + int flags = 0; + +#ifdef RTE_ARCH_PPC_64 + flags |= MAP_HUGETLB; +#endif + + page_sz = msl->page_sz; + mem_sz = page_sz * msl->memseg_arr.len; + + addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); + if (addr == NULL) { + if (rte_errno == EADDRNOTAVAIL) + RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - please use '--base-virtaddr' option\n", + (unsigned long long)mem_sz, msl->base_va); + else + RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); + return -1; + } + msl->base_va = addr; + + return 0; +} + + +static int +memseg_primary_init(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int hpi_idx, msl_idx = 0; + struct rte_memseg_list *msl; + uint64_t max_mem, total_mem; + + /* no-huge does not need this at all */ + if (internal_config.no_hugetlbfs) + return 0; + + /* FreeBSD has an issue where core dump will dump the entire memory + * contents, including anonymous zero-page memory. Therefore, while we + * will be limiting total amount of memory to RTE_MAX_MEM_MB, we will + * also be further limiting total memory amount to whatever memory is + * available to us through contigmem driver (plus spacing blocks). + * + * so, at each stage, we will be checking how much memory we are + * preallocating, and adjust all the values accordingly. + */ + + max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; + total_mem = 0; + + /* create memseg lists */ + for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes; + hpi_idx++) { + uint64_t max_type_mem, total_type_mem = 0; + uint64_t avail_mem; + int type_msl_idx, max_segs, avail_segs, total_segs = 0; + struct hugepage_info *hpi; + uint64_t hugepage_sz; + + hpi = &internal_config.hugepage_info[hpi_idx]; + hugepage_sz = hpi->hugepage_sz; + + /* no NUMA support on FreeBSD */ + + /* check if we've already exceeded total memory amount */ + if (total_mem >= max_mem) + break; + + /* first, calculate theoretical limits according to config */ + max_type_mem = RTE_MIN(max_mem - total_mem, + (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20); + max_segs = RTE_MAX_MEMSEG_PER_TYPE; + + /* now, limit all of that to whatever will actually be + * available to us, because without dynamic allocation support, + * all of that extra memory will be sitting there being useless + * and slowing down core dumps in case of a crash. + * + * we need (N*2)-1 segments because we cannot guarantee that + * each segment will be IOVA-contiguous with the previous one, + * so we will allocate more and put spaces inbetween segments + * that are non-contiguous. + */ + avail_segs = (hpi->num_pages[0] * 2) - 1; + avail_mem = avail_segs * hugepage_sz; + + max_type_mem = RTE_MIN(avail_mem, max_type_mem); + max_segs = RTE_MIN(avail_segs, max_segs); + + type_msl_idx = 0; + while (total_type_mem < max_type_mem && + total_segs < max_segs) { + uint64_t cur_max_mem, cur_mem; + unsigned int n_segs; + + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + RTE_LOG(ERR, EAL, + "No more space in memseg lists, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + return -1; + } + + msl = &mcfg->memsegs[msl_idx++]; + + cur_max_mem = max_type_mem - total_type_mem; + + cur_mem = get_mem_amount(hugepage_sz, + cur_max_mem); + n_segs = cur_mem / hugepage_sz; + + if (alloc_memseg_list(msl, hugepage_sz, n_segs, + 0, type_msl_idx)) + return -1; + + total_segs += msl->memseg_arr.len; + total_type_mem = total_segs * hugepage_sz; + type_msl_idx++; + + if (alloc_va_space(msl)) { + RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); + return -1; + } + } + total_mem += total_type_mem; + } + return 0; +} + +static int +memseg_secondary_init(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int msl_idx = 0; + struct rte_memseg_list *msl; + + for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) { + + msl = &mcfg->memsegs[msl_idx]; + + /* skip empty memseg lists */ + if (msl->memseg_arr.len == 0) + continue; + + if (rte_fbarray_attach(&msl->memseg_arr)) { + RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n"); + return -1; + } + + /* preallocate VA space */ + if (alloc_va_space(msl)) { + RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); + return -1; + } + } + + return 0; +} + +int +rte_eal_memseg_init(void) +{ + return rte_eal_process_type() == RTE_PROC_PRIMARY ? + memseg_primary_init() : + memseg_secondary_init(); +} diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4f0688f9d..4b7389ed4 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -153,382 +153,6 @@ eal_get_virtual_area(void *requested_addr, size_t *size, return aligned_addr; } -static uint64_t -get_mem_amount(uint64_t page_sz, uint64_t max_mem) -{ - uint64_t area_sz, max_pages; - - /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ - max_pages = RTE_MAX_MEMSEG_PER_LIST; - max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); - - area_sz = RTE_MIN(page_sz * max_pages, max_mem); - - /* make sure the list isn't smaller than the page size */ - area_sz = RTE_MAX(area_sz, page_sz); - - return RTE_ALIGN(area_sz, page_sz); -} - -static int -free_memseg_list(struct rte_memseg_list *msl) -{ - if (rte_fbarray_destroy(&msl->memseg_arr)) { - RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n"); - return -1; - } - memset(msl, 0, sizeof(*msl)); - return 0; -} - -static int -alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, - uint64_t max_mem, int socket_id, int type_msl_idx) -{ - char name[RTE_FBARRAY_NAME_LEN]; - uint64_t mem_amount; - int max_segs; - - mem_amount = get_mem_amount(page_sz, max_mem); - max_segs = mem_amount / page_sz; - - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); - if (rte_fbarray_init(&msl->memseg_arr, name, max_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", - rte_strerror(rte_errno)); - return -1; - } - - msl->page_sz = page_sz; - msl->socket_id = socket_id; - msl->base_va = NULL; - - RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", - (size_t)page_sz >> 10, socket_id); - - return 0; -} - -static int -alloc_va_space(struct rte_memseg_list *msl) -{ - uint64_t page_sz; - size_t mem_sz; - void *addr; - int flags = 0; - -#ifdef RTE_ARCH_PPC_64 - flags |= MAP_HUGETLB; -#endif - - page_sz = msl->page_sz; - mem_sz = page_sz * msl->memseg_arr.len; - - addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); - if (addr == NULL) { - if (rte_errno == EADDRNOTAVAIL) - RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - please use '--base-virtaddr' option\n", - (unsigned long long)mem_sz, msl->base_va); - else - RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); - return -1; - } - msl->base_va = addr; - - return 0; -} - -static int __rte_unused -memseg_primary_init_32(void) -{ - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int active_sockets, hpi_idx, msl_idx = 0; - unsigned int socket_id, i; - struct rte_memseg_list *msl; - uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem; - uint64_t max_mem; - - /* no-huge does not need this at all */ - if (internal_config.no_hugetlbfs) - return 0; - - /* this is a giant hack, but desperate times call for desperate - * measures. in legacy 32-bit mode, we cannot preallocate VA space, - * because having upwards of 2 gigabytes of VA space already mapped will - * interfere with our ability to map and sort hugepages. - * - * therefore, in legacy 32-bit mode, we will be initializing memseg - * lists much later - in eal_memory.c, right after we unmap all the - * unneeded pages. this will not affect secondary processes, as those - * should be able to mmap the space without (too many) problems. - */ - if (internal_config.legacy_mem) - return 0; - - /* 32-bit mode is a very special case. we cannot know in advance where - * the user will want to allocate their memory, so we have to do some - * heuristics. - */ - active_sockets = 0; - total_requested_mem = 0; - if (internal_config.force_sockets) - for (i = 0; i < rte_socket_count(); i++) { - uint64_t mem; - - socket_id = rte_socket_id_by_idx(i); - mem = internal_config.socket_mem[socket_id]; - - if (mem == 0) - continue; - - active_sockets++; - total_requested_mem += mem; - } - else - total_requested_mem = internal_config.memory; - - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - if (total_requested_mem > max_mem) { - RTE_LOG(ERR, EAL, "Invalid parameters: 32-bit process can at most use %uM of memory\n", - (unsigned int)(max_mem >> 20)); - return -1; - } - total_extra_mem = max_mem - total_requested_mem; - extra_mem_per_socket = active_sockets == 0 ? total_extra_mem : - total_extra_mem / active_sockets; - - /* the allocation logic is a little bit convoluted, but here's how it - * works, in a nutshell: - * - if user hasn't specified on which sockets to allocate memory via - * --socket-mem, we allocate all of our memory on master core socket. - * - if user has specified sockets to allocate memory on, there may be - * some "unused" memory left (e.g. if user has specified --socket-mem - * such that not all memory adds up to 2 gigabytes), so add it to all - * sockets that are in use equally. - * - * page sizes are sorted by size in descending order, so we can safely - * assume that we dispense with bigger page sizes first. - */ - - /* create memseg lists */ - for (i = 0; i < rte_socket_count(); i++) { - int hp_sizes = (int) internal_config.num_hugepage_sizes; - uint64_t max_socket_mem, cur_socket_mem; - unsigned int master_lcore_socket; - struct rte_config *cfg = rte_eal_get_configuration(); - bool skip; - - socket_id = rte_socket_id_by_idx(i); - -#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES - if (socket_id > 0) - break; -#endif - - /* if we didn't specifically request memory on this socket */ - skip = active_sockets != 0 && - internal_config.socket_mem[socket_id] == 0; - /* ...or if we didn't specifically request memory on *any* - * socket, and this is not master lcore - */ - master_lcore_socket = rte_lcore_to_socket_id(cfg->master_lcore); - skip |= active_sockets == 0 && socket_id != master_lcore_socket; - - if (skip) { - RTE_LOG(DEBUG, EAL, "Will not preallocate memory on socket %u\n", - socket_id); - continue; - } - - /* max amount of memory on this socket */ - max_socket_mem = (active_sockets != 0 ? - internal_config.socket_mem[socket_id] : - internal_config.memory) + - extra_mem_per_socket; - cur_socket_mem = 0; - - for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) { - uint64_t max_pagesz_mem, cur_pagesz_mem = 0; - uint64_t hugepage_sz; - struct hugepage_info *hpi; - int type_msl_idx, max_segs, total_segs = 0; - - hpi = &internal_config.hugepage_info[hpi_idx]; - hugepage_sz = hpi->hugepage_sz; - - /* check if pages are actually available */ - if (hpi->num_pages[socket_id] == 0) - continue; - - max_segs = RTE_MAX_MEMSEG_PER_TYPE; - max_pagesz_mem = max_socket_mem - cur_socket_mem; - - /* make it multiple of page size */ - max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem, - hugepage_sz); - - RTE_LOG(DEBUG, EAL, "Attempting to preallocate " - "%" PRIu64 "M on socket %i\n", - max_pagesz_mem >> 20, socket_id); - - type_msl_idx = 0; - while (cur_pagesz_mem < max_pagesz_mem && - total_segs < max_segs) { - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - RTE_LOG(ERR, EAL, - "No more space in memseg lists, please increase %s\n", - RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); - return -1; - } - - msl = &mcfg->memsegs[msl_idx]; - - if (alloc_memseg_list(msl, hugepage_sz, - max_pagesz_mem, socket_id, - type_msl_idx)) { - /* failing to allocate a memseg list is - * a serious error. - */ - RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n"); - return -1; - } - - if (alloc_va_space(msl)) { - /* if we couldn't allocate VA space, we - * can try with smaller page sizes. - */ - RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n"); - /* deallocate memseg list */ - if (free_memseg_list(msl)) - return -1; - break; - } - - total_segs += msl->memseg_arr.len; - cur_pagesz_mem = total_segs * hugepage_sz; - type_msl_idx++; - msl_idx++; - } - cur_socket_mem += cur_pagesz_mem; - } - if (cur_socket_mem == 0) { - RTE_LOG(ERR, EAL, "Cannot allocate VA space on socket %u\n", - socket_id); - return -1; - } - } - - return 0; -} - -static int __rte_unused -memseg_primary_init(void) -{ - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int i, socket_id, hpi_idx, msl_idx = 0; - struct rte_memseg_list *msl; - uint64_t max_mem, total_mem; - - /* no-huge does not need this at all */ - if (internal_config.no_hugetlbfs) - return 0; - - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - total_mem = 0; - - /* create memseg lists */ - for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes; - hpi_idx++) { - struct hugepage_info *hpi; - uint64_t hugepage_sz; - - hpi = &internal_config.hugepage_info[hpi_idx]; - hugepage_sz = hpi->hugepage_sz; - - for (i = 0; i < (int) rte_socket_count(); i++) { - uint64_t max_type_mem, total_type_mem = 0; - int type_msl_idx, max_segs, total_segs = 0; - - socket_id = rte_socket_id_by_idx(i); - -#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES - if (socket_id > 0) - break; -#endif - - if (total_mem >= max_mem) - break; - - max_type_mem = RTE_MIN(max_mem - total_mem, - (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20); - max_segs = RTE_MAX_MEMSEG_PER_TYPE; - - type_msl_idx = 0; - while (total_type_mem < max_type_mem && - total_segs < max_segs) { - uint64_t cur_max_mem; - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - RTE_LOG(ERR, EAL, - "No more space in memseg lists, please increase %s\n", - RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); - return -1; - } - - msl = &mcfg->memsegs[msl_idx++]; - - cur_max_mem = max_type_mem - total_type_mem; - if (alloc_memseg_list(msl, hugepage_sz, - cur_max_mem, socket_id, - type_msl_idx)) - return -1; - - total_segs += msl->memseg_arr.len; - total_type_mem = total_segs * hugepage_sz; - type_msl_idx++; - - if (alloc_va_space(msl)) { - RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); - return -1; - } - } - total_mem += total_type_mem; - } - } - return 0; -} - -static int -memseg_secondary_init(void) -{ - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int msl_idx = 0; - struct rte_memseg_list *msl; - - for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) { - - msl = &mcfg->memsegs[msl_idx]; - - /* skip empty memseg lists */ - if (msl->memseg_arr.len == 0) - continue; - - if (rte_fbarray_attach(&msl->memseg_arr)) { - RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n"); - return -1; - } - - /* preallocate VA space */ - if (alloc_va_space(msl)) { - RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); - return -1; - } - } - - return 0; -} - static struct rte_memseg * virt2memseg(const void *addr, const struct rte_memseg_list *msl) { @@ -918,15 +542,7 @@ rte_eal_memory_init(void) /* lock mem hotplug here, to prevent races while we init */ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); - retval = rte_eal_process_type() == RTE_PROC_PRIMARY ? -#ifndef RTE_ARCH_64 - memseg_primary_init_32() : -#else - memseg_primary_init() : -#endif - memseg_secondary_init(); - - if (retval < 0) + if (rte_eal_memseg_init() < 0) goto fail; if (eal_memalloc_init() < 0) diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index bdadc4d50..b742f4c58 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -46,6 +46,18 @@ void eal_log_set_default(FILE *default_log); */ int rte_eal_cpu_init(void); +/** + * Create memseg lists + * + * This function is private to EAL. + * + * Preallocate virtual memory. + * + * @return + * 0 on success, negative on error + */ +int rte_eal_memseg_init(void); + /** * Map memory * diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index c917de1c2..b8c8a59e0 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -767,6 +767,34 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end) return 0; } +static uint64_t +get_mem_amount(uint64_t page_sz, uint64_t max_mem) +{ + uint64_t area_sz, max_pages; + + /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ + max_pages = RTE_MAX_MEMSEG_PER_LIST; + max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); + + area_sz = RTE_MIN(page_sz * max_pages, max_mem); + + /* make sure the list isn't smaller than the page size */ + area_sz = RTE_MAX(area_sz, page_sz); + + return RTE_ALIGN(area_sz, page_sz); +} + +static int +free_memseg_list(struct rte_memseg_list *msl) +{ + if (rte_fbarray_destroy(&msl->memseg_arr)) { + RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n"); + return -1; + } + memset(msl, 0, sizeof(*msl)); + return 0; +} + #define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" static int alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, @@ -1840,3 +1868,316 @@ rte_eal_using_phys_addrs(void) { return phys_addrs_available; } + +static int __rte_unused +memseg_primary_init_32(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int active_sockets, hpi_idx, msl_idx = 0; + unsigned int socket_id, i; + struct rte_memseg_list *msl; + uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem; + uint64_t max_mem; + + /* no-huge does not need this at all */ + if (internal_config.no_hugetlbfs) + return 0; + + /* this is a giant hack, but desperate times call for desperate + * measures. in legacy 32-bit mode, we cannot preallocate VA space, + * because having upwards of 2 gigabytes of VA space already mapped will + * interfere with our ability to map and sort hugepages. + * + * therefore, in legacy 32-bit mode, we will be initializing memseg + * lists much later - in eal_memory.c, right after we unmap all the + * unneeded pages. this will not affect secondary processes, as those + * should be able to mmap the space without (too many) problems. + */ + if (internal_config.legacy_mem) + return 0; + + /* 32-bit mode is a very special case. we cannot know in advance where + * the user will want to allocate their memory, so we have to do some + * heuristics. + */ + active_sockets = 0; + total_requested_mem = 0; + if (internal_config.force_sockets) + for (i = 0; i < rte_socket_count(); i++) { + uint64_t mem; + + socket_id = rte_socket_id_by_idx(i); + mem = internal_config.socket_mem[socket_id]; + + if (mem == 0) + continue; + + active_sockets++; + total_requested_mem += mem; + } + else + total_requested_mem = internal_config.memory; + + max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; + if (total_requested_mem > max_mem) { + RTE_LOG(ERR, EAL, "Invalid parameters: 32-bit process can at most use %uM of memory\n", + (unsigned int)(max_mem >> 20)); + return -1; + } + total_extra_mem = max_mem - total_requested_mem; + extra_mem_per_socket = active_sockets == 0 ? total_extra_mem : + total_extra_mem / active_sockets; + + /* the allocation logic is a little bit convoluted, but here's how it + * works, in a nutshell: + * - if user hasn't specified on which sockets to allocate memory via + * --socket-mem, we allocate all of our memory on master core socket. + * - if user has specified sockets to allocate memory on, there may be + * some "unused" memory left (e.g. if user has specified --socket-mem + * such that not all memory adds up to 2 gigabytes), so add it to all + * sockets that are in use equally. + * + * page sizes are sorted by size in descending order, so we can safely + * assume that we dispense with bigger page sizes first. + */ + + /* create memseg lists */ + for (i = 0; i < rte_socket_count(); i++) { + int hp_sizes = (int) internal_config.num_hugepage_sizes; + uint64_t max_socket_mem, cur_socket_mem; + unsigned int master_lcore_socket; + struct rte_config *cfg = rte_eal_get_configuration(); + bool skip; + + socket_id = rte_socket_id_by_idx(i); + +#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES + if (socket_id > 0) + break; +#endif + + /* if we didn't specifically request memory on this socket */ + skip = active_sockets != 0 && + internal_config.socket_mem[socket_id] == 0; + /* ...or if we didn't specifically request memory on *any* + * socket, and this is not master lcore + */ + master_lcore_socket = rte_lcore_to_socket_id(cfg->master_lcore); + skip |= active_sockets == 0 && socket_id != master_lcore_socket; + + if (skip) { + RTE_LOG(DEBUG, EAL, "Will not preallocate memory on socket %u\n", + socket_id); + continue; + } + + /* max amount of memory on this socket */ + max_socket_mem = (active_sockets != 0 ? + internal_config.socket_mem[socket_id] : + internal_config.memory) + + extra_mem_per_socket; + cur_socket_mem = 0; + + for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) { + uint64_t max_pagesz_mem, cur_pagesz_mem = 0; + uint64_t hugepage_sz; + struct hugepage_info *hpi; + int type_msl_idx, max_segs, total_segs = 0; + + hpi = &internal_config.hugepage_info[hpi_idx]; + hugepage_sz = hpi->hugepage_sz; + + /* check if pages are actually available */ + if (hpi->num_pages[socket_id] == 0) + continue; + + max_segs = RTE_MAX_MEMSEG_PER_TYPE; + max_pagesz_mem = max_socket_mem - cur_socket_mem; + + /* make it multiple of page size */ + max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem, + hugepage_sz); + + RTE_LOG(DEBUG, EAL, "Attempting to preallocate " + "%" PRIu64 "M on socket %i\n", + max_pagesz_mem >> 20, socket_id); + + type_msl_idx = 0; + while (cur_pagesz_mem < max_pagesz_mem && + total_segs < max_segs) { + uint64_t cur_mem; + unsigned int n_segs; + + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + RTE_LOG(ERR, EAL, + "No more space in memseg lists, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + return -1; + } + + msl = &mcfg->memsegs[msl_idx]; + + cur_mem = get_mem_amount(hugepage_sz, + max_pagesz_mem); + n_segs = cur_mem / hugepage_sz; + + if (alloc_memseg_list(msl, hugepage_sz, n_segs, + socket_id, type_msl_idx)) { + /* failing to allocate a memseg list is + * a serious error. + */ + RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n"); + return -1; + } + + if (alloc_va_space(msl)) { + /* if we couldn't allocate VA space, we + * can try with smaller page sizes. + */ + RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n"); + /* deallocate memseg list */ + if (free_memseg_list(msl)) + return -1; + break; + } + + total_segs += msl->memseg_arr.len; + cur_pagesz_mem = total_segs * hugepage_sz; + type_msl_idx++; + msl_idx++; + } + cur_socket_mem += cur_pagesz_mem; + } + if (cur_socket_mem == 0) { + RTE_LOG(ERR, EAL, "Cannot allocate VA space on socket %u\n", + socket_id); + return -1; + } + } + + return 0; +} + +static int __rte_unused +memseg_primary_init(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int i, socket_id, hpi_idx, msl_idx = 0; + struct rte_memseg_list *msl; + uint64_t max_mem, total_mem; + + /* no-huge does not need this at all */ + if (internal_config.no_hugetlbfs) + return 0; + + max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; + total_mem = 0; + + /* create memseg lists */ + for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes; + hpi_idx++) { + struct hugepage_info *hpi; + uint64_t hugepage_sz; + + hpi = &internal_config.hugepage_info[hpi_idx]; + hugepage_sz = hpi->hugepage_sz; + + for (i = 0; i < (int) rte_socket_count(); i++) { + uint64_t max_type_mem, total_type_mem = 0; + int type_msl_idx, max_segs, total_segs = 0; + + socket_id = rte_socket_id_by_idx(i); + +#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES + if (socket_id > 0) + break; +#endif + + if (total_mem >= max_mem) + break; + + max_type_mem = RTE_MIN(max_mem - total_mem, + (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20); + max_segs = RTE_MAX_MEMSEG_PER_TYPE; + + type_msl_idx = 0; + while (total_type_mem < max_type_mem && + total_segs < max_segs) { + uint64_t cur_max_mem, cur_mem; + unsigned int n_segs; + + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + RTE_LOG(ERR, EAL, + "No more space in memseg lists, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + return -1; + } + + msl = &mcfg->memsegs[msl_idx++]; + + cur_max_mem = max_type_mem - total_type_mem; + + cur_mem = get_mem_amount(hugepage_sz, + cur_max_mem); + n_segs = cur_mem / hugepage_sz; + + if (alloc_memseg_list(msl, hugepage_sz, n_segs, + socket_id, type_msl_idx)) + return -1; + + total_segs += msl->memseg_arr.len; + total_type_mem = total_segs * hugepage_sz; + type_msl_idx++; + + if (alloc_va_space(msl)) { + RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); + return -1; + } + } + total_mem += total_type_mem; + } + } + return 0; +} + +static int +memseg_secondary_init(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int msl_idx = 0; + struct rte_memseg_list *msl; + + for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) { + + msl = &mcfg->memsegs[msl_idx]; + + /* skip empty memseg lists */ + if (msl->memseg_arr.len == 0) + continue; + + if (rte_fbarray_attach(&msl->memseg_arr)) { + RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n"); + return -1; + } + + /* preallocate VA space */ + if (alloc_va_space(msl)) { + RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); + return -1; + } + } + + return 0; +} + +int +rte_eal_memseg_init(void) +{ + return rte_eal_process_type() == RTE_PROC_PRIMARY ? +#ifndef RTE_ARCH_64 + memseg_primary_init_32() : +#else + memseg_primary_init() : +#endif + memseg_secondary_init(); +}