From patchwork Mon Sep 18 16:32:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bruce Richardson X-Patchwork-Id: 131579 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 08D0D425DC; Mon, 18 Sep 2023 18:33:10 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C96A540E36; Mon, 18 Sep 2023 18:32:22 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id D273A40E36 for ; Mon, 18 Sep 2023 18:32:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695054740; x=1726590740; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+3XBw/EDcqYLFFjUEvlZWhoBDEj51AqVv3nbKl6ztHk=; b=YwJWr2x8VnzyD55oDMryySKx92iuKIvE35w6c0QR8meLbP/0zyCFLcKo LpgiK47kiF2uok8GWlq9cYBk1eZF77SnnOEaVQ9Dvf+5bk2O/KiPhjZBC uzhAe2h4pebeORKG7tJ74ahfEO8DfDS+84zJtUCQhlY8dfXO55dWpO5WR QL5twmYrS5zRlH1nInUyTz3TOkcVLAKLw2v15eEXa3dFz/DM6f3viCAnz WMoWH/mtMqeLd8RDnPPvkP97SIAU9WRk/L1F93XO23BxVGoWkswS67Fhv KyW0zNsDZ3O4z3n4SqMgrdjkhjSnakfxgB6VpVTyDJhatNWs/9Ne8C6BQ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10837"; a="443784646" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="443784646" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2023 09:32:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10837"; a="775189365" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="775189365" Received: from silpixa00401385.ir.intel.com ([10.237.214.14]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2023 09:32:19 -0700 From: Bruce Richardson To: dev@dpdk.org Cc: Bruce Richardson , Anatoly Burakov Subject: [PATCH v3 1/2] eal: add flag to indicate non-EAL malloc heaps Date: Mon, 18 Sep 2023 17:32:05 +0100 Message-Id: <20230918163206.1010611-2-bruce.richardson@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230918163206.1010611-1-bruce.richardson@intel.com> References: <20230915122703.475834-1-bruce.richardson@intel.com> <20230918163206.1010611-1-bruce.richardson@intel.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Rather than relying implicitly on the socket_id value to identify external heaps vs internal ones that can be expanded on demand by adding more hugepages, we add an "is_external" flag to the heap structure. As we do so, we change the heap initialization to use designated initializers to guarantee as we add new fields that they are properly zeroed on init. As it stands, many fields are not explicitly set on first init of the internal malloc heaps. Signed-off-by: Bruce Richardson --- lib/eal/common/malloc_heap.c | 22 ++++++++++++---------- lib/eal/common/malloc_heap.h | 1 + lib/eal/common/malloc_mp.c | 5 ++--- lib/eal/common/rte_malloc.c | 14 ++++++++------ 4 files changed, 23 insertions(+), 19 deletions(-) diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 6b6cf9174c..4fa38fcd44 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -668,7 +668,7 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t size, * we just need to request more memory first. */ - socket_id = rte_socket_id_by_idx(heap_id); + socket_id = heap->is_external ? -1 : rte_socket_id_by_idx(heap_id); /* * if socket ID is negative, we cannot find a socket ID for this heap - * which means it's an external heap. those can have unexpected page @@ -1362,19 +1362,17 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name) } /* initialize empty heap */ - heap->alloc_count = 0; - heap->first = NULL; - heap->last = NULL; + *heap = (struct malloc_heap) { + .is_external = 1, + .socket_id = next_socket_id, + .lock = RTE_SPINLOCK_INITIALIZER, + }; LIST_INIT(heap->free_head); - rte_spinlock_init(&heap->lock); - heap->total_size = 0; - heap->socket_id = next_socket_id; + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); /* we hold a global mem hotplug writelock, so it's safe to increment */ mcfg->next_socket_id++; - /* set up name */ - strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); return 0; } @@ -1425,8 +1423,12 @@ rte_eal_malloc_heap_init(void) snprintf(heap_name, sizeof(heap_name), "socket_%i", socket_id); + + *heap = (struct malloc_heap){ + .lock = RTE_SPINLOCK_INITIALIZER, + .socket_id = socket_id, + }; strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); - heap->socket_id = socket_id; } } diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h index 8f3ab57154..e23cc01fb3 100644 --- a/lib/eal/common/malloc_heap.h +++ b/lib/eal/common/malloc_heap.h @@ -23,6 +23,7 @@ struct malloc_elem; */ struct malloc_heap { rte_spinlock_t lock; + uint32_t is_external:1; LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS]; struct malloc_elem *volatile first; struct malloc_elem *volatile last; diff --git a/lib/eal/common/malloc_mp.c b/lib/eal/common/malloc_mp.c index 7270c2ec90..d5ab6d8351 100644 --- a/lib/eal/common/malloc_mp.c +++ b/lib/eal/common/malloc_mp.c @@ -242,10 +242,9 @@ handle_alloc_request(const struct malloc_mp_req *m, /* * for allocations, we must only use internal heaps, but since the * rte_malloc_heap_socket_is_external() is thread-safe and we're already - * read-locked, we'll have to take advantage of the fact that internal - * socket ID's are always lower than RTE_MAX_NUMA_NODES. + * read-locked, we'll check directly here. */ - if (heap->socket_id >= RTE_MAX_NUMA_NODES) { + if (heap->is_external == 1) { RTE_LOG(ERR, EAL, "Attempting to allocate from external heap\n"); return -1; } diff --git a/lib/eal/common/rte_malloc.c b/lib/eal/common/rte_malloc.c index ebafef3f6c..5f40acabee 100644 --- a/lib/eal/common/rte_malloc.c +++ b/lib/eal/common/rte_malloc.c @@ -314,7 +314,7 @@ rte_malloc_heap_socket_is_external(int socket_id) if ((int)tmp->socket_id == socket_id) { /* external memory always has large socket ID's */ - ret = tmp->socket_id >= RTE_MAX_NUMA_NODES; + ret = tmp->is_external; break; } } @@ -421,7 +421,7 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, ret = -1; goto unlock; } - if (heap->socket_id < RTE_MAX_NUMA_NODES) { + if (heap->is_external == 0) { /* cannot add memory to internal heaps */ rte_errno = EPERM; ret = -1; @@ -469,7 +469,7 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len) ret = -1; goto unlock; } - if (heap->socket_id < RTE_MAX_NUMA_NODES) { + if (heap->is_external == 0) { /* cannot remove memory from internal heaps */ rte_errno = EPERM; ret = -1; @@ -520,7 +520,7 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach) goto unlock; } /* we shouldn't be able to sync to internal heaps */ - if (heap->socket_id < RTE_MAX_NUMA_NODES) { + if (heap->is_external == 0) { rte_errno = EPERM; ret = -1; goto unlock; @@ -648,8 +648,10 @@ rte_malloc_heap_destroy(const char *heap_name) ret = -1; goto unlock; } - /* we shouldn't be able to destroy internal heaps */ - if (heap->socket_id < RTE_MAX_NUMA_NODES) { + /* we shouldn't be able to destroy internal heaps, or external heaps + * configured to have an internal numa socket id + */ + if (heap->socket_id < RTE_MAX_NUMA_NODES || heap->is_external == 0) { rte_errno = EPERM; ret = -1; goto unlock; From patchwork Mon Sep 18 16:32:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bruce Richardson X-Patchwork-Id: 131580 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3E3B7425DC; Mon, 18 Sep 2023 18:33:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 100EA40DDA; Mon, 18 Sep 2023 18:32:28 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id 2FD0340E7C for ; Mon, 18 Sep 2023 18:32:26 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695054745; x=1726590745; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BvhAvYAl+d0DQrp+gCCaMYPmgOwLWX+/wCpGceBEJH8=; b=EkC8o5vnEzl6nffs8biaTQIrK6+z4EnAnsAC4/ZDu0t37Y2UKDMdSV0o uAEz/sNVwy7O3UqNPRNVKT7lWS/bnJyflSPVUEiknquTxTTpZvwskGjfi QX7CnQV1VTj9hlG1D4V1Oz0FSlL00fxTeSZ4qcRHw7R7h6u3x+lAQa4UL AyvNhs0cf0+sPY1SKWRWGxm6FQ4yImpWHk3QKnZ1me+hBVMVbWANZv/Id kgmQ89hpsivgFphIW2gl5/yKAetJoScOoe51etNsqCzaFzJIxP2yszHst xtD78bJs3zbr4BLk7b6B2I8f4cW4sY/WanYxdFcE9/Jky6OjiHb0AwFLF w==; X-IronPort-AV: E=McAfee;i="6600,9927,10837"; a="443784681" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="443784681" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2023 09:32:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10837"; a="775189383" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="775189383" Received: from silpixa00401385.ir.intel.com ([10.237.214.14]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2023 09:32:24 -0700 From: Bruce Richardson To: dev@dpdk.org Cc: Bruce Richardson , Anatoly Burakov Subject: [PATCH v3 2/2] eal: allow swapping of malloc heaps Date: Mon, 18 Sep 2023 17:32:06 +0100 Message-Id: <20230918163206.1010611-3-bruce.richardson@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230918163206.1010611-1-bruce.richardson@intel.com> References: <20230915122703.475834-1-bruce.richardson@intel.com> <20230918163206.1010611-1-bruce.richardson@intel.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The external memory functions in DPDK allow the addition of externally access memory to malloc heaps, but with one major restriction - the memory must be allocated to an application-created heap, not one of the standard DPDK heaps for a NUMA node. This restriction makes it difficult - if not impossible - to use externally allocated memory for DPDK by default. However, even if the restriction is relaxed, so we can add external memory to e.g. the socket 0 heap, there would be no way to guarantee that the external memory would be used in preference to the standard DPDK hugepage memory for a given allocation. To give appropriately defined behaviour, a better solution is to allow the application to explicitly swap a pair of heaps. With this one new API in place, it allows the user to configure a new malloc heap, add external memory to it, and then replace a standard socket heap with the newly created one - thereby guaranteeing future allocations from the external memory. Signed-off-by: Bruce Richardson --- lib/eal/common/malloc_heap.c | 24 ++++++++++++++++++++++++ lib/eal/include/rte_malloc.h | 34 ++++++++++++++++++++++++++++++++++ lib/eal/version.map | 2 ++ 3 files changed, 60 insertions(+) diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 4fa38fcd44..eba75111ca 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -1320,6 +1320,30 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, return 0; } +int +rte_malloc_heap_swap_socket(int socket1, int socket2) +{ + const int h1 = malloc_socket_to_heap_id(socket1); + if (h1 < 0 || h1 > RTE_MAX_HEAPS) + return -1; + + const int h2 = malloc_socket_to_heap_id(socket2); + if (h2 < 0 || h2 > RTE_MAX_HEAPS) + return -1; + + + rte_mcfg_mem_write_lock(); + do { + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int tmp = mcfg->malloc_heaps[h1].socket_id; + mcfg->malloc_heaps[h1].socket_id = mcfg->malloc_heaps[h2].socket_id; + mcfg->malloc_heaps[h2].socket_id = tmp; + } while (0); + rte_mcfg_mem_write_unlock(); + + return 0; +} + int malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, size_t len) diff --git a/lib/eal/include/rte_malloc.h b/lib/eal/include/rte_malloc.h index 54a8ac211e..df356a5efe 100644 --- a/lib/eal/include/rte_malloc.h +++ b/lib/eal/include/rte_malloc.h @@ -490,6 +490,40 @@ rte_malloc_heap_get_socket(const char *name); int rte_malloc_heap_socket_is_external(int socket_id); +/** + * Swap the heaps for the given socket ids + * + * This causes the heaps for the given socket ids to be swapped, allowing + * external memory registered as a malloc heap to become the new default memory + * for a standard numa node. For example, to have allocations on socket 0 come + * from external memory, the following sequence of API calls can be used: + * @code + * rte_malloc_heap_create() + * rte_malloc_heap_memory_add(,....) + * id = rte_malloc_heap_get_socket() + * rte_malloc_heap_swap_socket(0, id) + * @endcode + * + * Following these calls, allocations for the old memory allocated on socket 0, + * can be made by passing "id" as the socket_id parameter. + * + * @note: It is recommended that this function be used only after EAL initialization, + * before any temporary objects are created from the DPDK heaps. + * @note: Since any objects allocated using rte_malloc and similar functions, track + * the heaps via pointers, any already-allocated objects will be returned to their + * original heaps, even after a call to this function. + * + * @param socket1 + * The socket id of the first heap to swap + * @param socket2 + * The socket id of the second heap to swap + * @return + * 0 on success, -1 on error + */ +__rte_experimental +int +rte_malloc_heap_swap_socket(int socket1, int socket2); + /** * Dump statistics. * diff --git a/lib/eal/version.map b/lib/eal/version.map index 7940431e5a..b06ee7219e 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -417,6 +417,8 @@ EXPERIMENTAL { # added in 23.07 rte_memzone_max_get; rte_memzone_max_set; + + rte_malloc_heap_swap_socket; }; INTERNAL {