From patchwork Mon Jun 15 00:43:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71520 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E5C7CA0093; Mon, 15 Jun 2020 02:44:17 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C51B64C8B; Mon, 15 Jun 2020 02:44:17 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id 0451449E0 for ; Mon, 15 Jun 2020 02:44:03 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id 9so17104802ljc.8 for ; Sun, 14 Jun 2020 17:44:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Wo7UMW/kz2WBu66kayDiu9sVzaczM6fJ1aGaRO7qddY=; b=mNGRBHZTVKaSO7Z8zG19mAiC+cezVrh10rhybkF6EcUf5cq3U05QxoYsvJi8vzFEJg 04veT9ZXghvSyfWHAcFh4yZHsqAPEzlF2D0q0PZ9aj4iSoaIrbPWKRupOwAoy+3I86xs IUiX0CcqdM+Nui/sGE9q1ua9GghuSlMJHL3rhdkXHmDRD9D1C48ljyrsGg0WL1unBXId xt1PthBjHDL6/DfQsHjhdjo17LSha1HMqvVxz3Stw4pn6i/pbCvq3VdKXJbMVZWhw36q GjyIxKNcGWCRE/0KUJ5mVX8VlEYTfH/R/QBi0cx+8gku0oBAURS8GfSofpv4Y4oFs38c Nplg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Wo7UMW/kz2WBu66kayDiu9sVzaczM6fJ1aGaRO7qddY=; b=FIgmnhW7h4PZw2w/wKvnlIr6RwKEbid56gsd9ILljBqSb5ed2xoW0ZuUC34KwjQ/g2 UUypHXtxm0QIwIoZ278pIucAC7hSR4oGphzKxkJ2Z6MXoANwaYHH/HCKLv3ZAS3t/v4D pzIAdOvds4t9LhlzV15w0u3+Y1nxhn4xwj8+kJ212RCzhdEj7wZMucoiYfTudhJy4ACC AHkSMhfH5EkY54noBtWwMfd6F5XLdjT4NorLpmhcFGlVgbIGYFdBG9RIWtwLF2w3LNHj veBV6WUgpYowHmUR2ADfR+zDhzbmU2okkRjaixnAddPLc8DfWC1LI2ULAlbTJeQmM89x AQzQ== X-Gm-Message-State: AOAM530SKwd7zkFCAH1fvtyNOBeqYCQOeZ0kQPJcVM2NxRCc109uBHUt Uv/wJckgRlcXneRTyEQ85O1eR93NjQsRTQ== X-Google-Smtp-Source: ABdhPJz/Pp/mgWfj1Dv7nygQq4bVjV1KDLNCNgu2sjBw1dqhW9C2AAM0KxFsnnQ6fhD9PK2W8ch+ew== X-Received: by 2002:a2e:9192:: with SMTP id f18mr12480885ljg.383.1592181843353; Sun, 14 Jun 2020 17:44:03 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:02 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Jerin Jacob , John McNamara , Marko Kovacevic , Anatoly Burakov Date: Mon, 15 Jun 2020 03:43:43 +0300 Message-Id: <20200615004354.14380-2-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 01/12] eal: replace rte_page_sizes with a set of constants X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Clang on Windows follows MS ABI where enum values are limited to 2^31-1. Enum rte_page_sizes has members valued above this limit, which get wrapped to zero, resulting in compilation error (duplicate values in enum). Using MS ABI is mandatory for Windows EAL to call Win32 APIs. Remove rte_page_sizes and replace its values with #define's. This enumeration is not used in public API, so there's no ABI breakage. Announce API changes for 20.08 in documentation. Suggested-by: Jerin Jacob Signed-off-by: Dmitry Kozlyuk --- doc/guides/rel_notes/release_20_08.rst | 2 ++ lib/librte_eal/include/rte_memory.h | 23 ++++++++++------------- 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst index dee4ccbb5..86d240213 100644 --- a/doc/guides/rel_notes/release_20_08.rst +++ b/doc/guides/rel_notes/release_20_08.rst @@ -91,6 +91,8 @@ API Changes Also, make sure to start the actual text at the margin. ========================================================= +* ``rte_page_sizes`` enumeration is replaced with ``RTE_PGSIZE_xxx`` defines. + ABI Changes ----------- diff --git a/lib/librte_eal/include/rte_memory.h b/lib/librte_eal/include/rte_memory.h index 3d8d0bd69..65374d53a 100644 --- a/lib/librte_eal/include/rte_memory.h +++ b/lib/librte_eal/include/rte_memory.h @@ -24,19 +24,16 @@ extern "C" { #include #include -__extension__ -enum rte_page_sizes { - RTE_PGSIZE_4K = 1ULL << 12, - RTE_PGSIZE_64K = 1ULL << 16, - RTE_PGSIZE_256K = 1ULL << 18, - RTE_PGSIZE_2M = 1ULL << 21, - RTE_PGSIZE_16M = 1ULL << 24, - RTE_PGSIZE_256M = 1ULL << 28, - RTE_PGSIZE_512M = 1ULL << 29, - RTE_PGSIZE_1G = 1ULL << 30, - RTE_PGSIZE_4G = 1ULL << 32, - RTE_PGSIZE_16G = 1ULL << 34, -}; +#define RTE_PGSIZE_4K (1ULL << 12) +#define RTE_PGSIZE_64K (1ULL << 16) +#define RTE_PGSIZE_256K (1ULL << 18) +#define RTE_PGSIZE_2M (1ULL << 21) +#define RTE_PGSIZE_16M (1ULL << 24) +#define RTE_PGSIZE_256M (1ULL << 28) +#define RTE_PGSIZE_512M (1ULL << 29) +#define RTE_PGSIZE_1G (1ULL << 30) +#define RTE_PGSIZE_4G (1ULL << 32) +#define RTE_PGSIZE_16G (1ULL << 34) #define SOCKET_ID_ANY -1 /**< Any NUMA socket. */ From patchwork Mon Jun 15 00:43:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71521 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 921D4A0093; Mon, 15 Jun 2020 02:44:23 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4D8D654AE; Mon, 15 Jun 2020 02:44:19 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id 4F9B74C8B for ; Mon, 15 Jun 2020 02:44:05 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id z9so17061925ljh.13 for ; Sun, 14 Jun 2020 17:44:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8yGhgO/ojOR0fWqWSA0E+p49S2INPA9kRyVnNjrvaEs=; b=pyIqicyLfr8qP6vgptAyuJQcPN8HsFJDTWB+vcSoh66olHeuZZP7Y8F/5ZAiJxa3wL qUFdCkcetrzICOSF35mjo0ZnXtgO0I+siep96QW5npKdKi13d4Vq/jjD5iFpissVV23d wGB/PNDKO0mw0IT5IZhx6FDVi549PeMRWNuq+vFg/hPDjj41v5kzl8bbLaF8W0cc2WGG xSh4V0FXxD/B8NuD92yE7A+WmBXX9nHWVvxqQyH8XtAAJiTVZSodoqiJDY6tK4ABAgdB PvWCy9NSW7EoZhXxAq62CRj8Np4vXZDX9AzaPOIv43AtnoK+mYlFjd/iNFVzz9fxR7TC OIYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8yGhgO/ojOR0fWqWSA0E+p49S2INPA9kRyVnNjrvaEs=; b=l9WPTSWGEfJ19a6oWf5GRg7c385fKUtxIH37eTJ1IhqfUM/wqua7I+cgWHNrwGSqmY ZU7kZmrPiTrbhFAWFiRAxFs1yD4DCcNCZaTaK4nLcruRlRN94KkA6NbqgHx/USNVqrVG Ng1fQbKA87V9w9vorJicdS/QglqZlP2oKXWepYJ9eeVNml4MtrobaC989U5JL+d0zh8T 6QnnaASrvLVYy0ME6Z5+B+qpwxXpg9Zsv1oi+vGYUaWoucSmhq26neaNfISZInKYMsxm 1WmdGpe31V4drMexZ6iPSpAopY5Z+X+5+pxRfok2ZceUDijrJh4OoZR7AC5KgCnFwlV9 c5Qw== X-Gm-Message-State: AOAM532LqimqTciQZjhnXN+FHHmgAJiwXbxnOJGHcKPFPCYOtyraEZJl dFqPLBleolbmJhycuYsK8Ftz5qzGpPKCfA== X-Google-Smtp-Source: ABdhPJwot3QEMcN9GMszDyrMwm9akPHcFfxFpRHgw+st+1WYt41Al+AOY1yRX1jqSATsBa/Xt9+ttw== X-Received: by 2002:a05:651c:2cf:: with SMTP id f15mr11450324ljo.105.1592181844610; Sun, 14 Jun 2020 17:44:04 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:03 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Anatoly Burakov , Bruce Richardson Date: Mon, 15 Jun 2020 03:43:44 +0300 Message-Id: <20200615004354.14380-3-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 02/12] eal: introduce internal wrappers for file operations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Introduce OS-independent wrappers in order to support common EAL code on Unix and Windows: * eal_file_open: open or create a file. * eal_file_lock: lock or unlock an open file. * eal_file_truncate: enforce a given size for an open file. Implementation for Linux and FreeBSD is placed in "unix" subdirectory, which is intended for common code between the two. These thin wrappers require no special maintenance. Common code supporting multi-process doesn't use the new wrappers, because it is inherently Unix-specific and would impose excessive requirements on the wrappers. Signed-off-by: Dmitry Kozlyuk --- MAINTAINERS | 1 + lib/librte_eal/common/eal_common_fbarray.c | 31 ++++----- lib/librte_eal/common/eal_private.h | 73 ++++++++++++++++++++ lib/librte_eal/freebsd/Makefile | 4 ++ lib/librte_eal/linux/Makefile | 4 ++ lib/librte_eal/meson.build | 4 ++ lib/librte_eal/unix/eal_file.c | 80 ++++++++++++++++++++++ lib/librte_eal/unix/meson.build | 6 ++ 8 files changed, 184 insertions(+), 19 deletions(-) create mode 100644 lib/librte_eal/unix/eal_file.c create mode 100644 lib/librte_eal/unix/meson.build diff --git a/MAINTAINERS b/MAINTAINERS index e739b87ea..4d162efd6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -170,6 +170,7 @@ EAL API and common code F: lib/librte_eal/common/ F: lib/librte_eal/include/ F: lib/librte_eal/rte_eal_version.map +F: lib/librte_eal/unix/ F: doc/guides/prog_guide/env_abstraction_layer.rst F: app/test/test_alarm.c F: app/test/test_atomic.c diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 4f8f1af73..c52ddb967 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -8,8 +8,8 @@ #include #include #include -#include #include +#include #include #include @@ -85,10 +85,8 @@ resize_and_map(int fd, void *addr, size_t len) char path[PATH_MAX]; void *map_addr; - if (ftruncate(fd, len)) { + if (eal_file_truncate(fd, len)) { RTE_LOG(ERR, EAL, "Cannot truncate %s\n", path); - /* pass errno up the chain */ - rte_errno = errno; return -1; } @@ -772,15 +770,15 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, * and see if we succeed. If we don't, someone else is using it * already. */ - fd = open(path, O_CREAT | O_RDWR, 0600); + fd = eal_file_open(path, EAL_OPEN_CREATE | EAL_OPEN_READWRITE); if (fd < 0) { RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", - __func__, path, strerror(errno)); - rte_errno = errno; + __func__, path, rte_strerror(rte_errno)); goto fail; - } else if (flock(fd, LOCK_EX | LOCK_NB)) { + } else if (eal_file_lock( + fd, EAL_FLOCK_EXCLUSIVE, EAL_FLOCK_RETURN)) { RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", - __func__, path, strerror(errno)); + __func__, path, rte_strerror(rte_errno)); rte_errno = EBUSY; goto fail; } @@ -789,10 +787,8 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, * still attach to it, but no other process could reinitialize * it. */ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; + if (eal_file_lock(fd, EAL_FLOCK_SHARED, EAL_FLOCK_RETURN)) goto fail; - } if (resize_and_map(fd, data, mmap_len)) goto fail; @@ -888,17 +884,14 @@ rte_fbarray_attach(struct rte_fbarray *arr) eal_get_fbarray_path(path, sizeof(path), arr->name); - fd = open(path, O_RDWR); + fd = eal_file_open(path, EAL_OPEN_READWRITE); if (fd < 0) { - rte_errno = errno; goto fail; } /* lock the file, to let others know we're using it */ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; + if (eal_file_lock(fd, EAL_FLOCK_SHARED, EAL_FLOCK_RETURN)) goto fail; - } if (resize_and_map(fd, data, mmap_len)) goto fail; @@ -1025,7 +1018,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) * has been detached by all other processes */ fd = tmp->fd; - if (flock(fd, LOCK_EX | LOCK_NB)) { + if (eal_file_lock(fd, EAL_FLOCK_EXCLUSIVE, EAL_FLOCK_RETURN)) { RTE_LOG(DEBUG, EAL, "Cannot destroy fbarray - another process is using it\n"); rte_errno = EBUSY; ret = -1; @@ -1042,7 +1035,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) * we're still holding an exclusive lock, so drop it to * shared. */ - flock(fd, LOCK_SH | LOCK_NB); + eal_file_lock(fd, EAL_FLOCK_SHARED, EAL_FLOCK_RETURN); ret = -1; goto out; diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 869ce183a..6733a2321 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -420,4 +420,77 @@ eal_malloc_no_trace(const char *type, size_t size, unsigned int align); void eal_free_no_trace(void *addr); +/** Options for eal_file_open(). */ +enum eal_open_flags { + /** Open file for reading. */ + EAL_OPEN_READONLY = 0x00, + /** Open file for reading and writing. */ + EAL_OPEN_READWRITE = 0x02, + /** + * Create the file if it doesn't exist. + * New files are only accessible to the owner (0600 equivalent). + */ + EAL_OPEN_CREATE = 0x04 +}; + +/** + * Open or create a file. + * + * @param path + * Path to the file. + * @param flags + * A combination of eal_open_flags controlling operation and FD behavior. + * @return + * Open file descriptor on success, (-1) on failure and rte_errno is set. + */ +int +eal_file_open(const char *path, int flags); + +/** File locking operation. */ +enum eal_flock_op { + EAL_FLOCK_SHARED, /**< Acquire a shared lock. */ + EAL_FLOCK_EXCLUSIVE, /**< Acquire an exclusive lock. */ + EAL_FLOCK_UNLOCK /**< Release a previously taken lock. */ +}; + +/** Behavior on file locking conflict. */ +enum eal_flock_mode { + EAL_FLOCK_WAIT, /**< Wait until the file gets unlocked to lock it. */ + EAL_FLOCK_RETURN /**< Return immediately if the file is locked. */ +}; + +/** + * Lock or unlock the file. + * + * On failure @code rte_errno @endcode is set to the error code + * specified by POSIX flock(3) description. + * + * @param fd + * Opened file descriptor. + * @param op + * Operation to perform. + * @param mode + * Behavior on conflict. + * @return + * 0 on success, (-1) on failure. + */ +int +eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode); + +/** + * Truncate or extend the file to the specified size. + * + * On failure @code rte_errno @endcode is set to the error code + * specified by POSIX ftruncate(3) description. + * + * @param fd + * Opened file descriptor. + * @param size + * Desired file size. + * @return + * 0 on success, (-1) on failure. + */ +int +eal_file_truncate(int fd, ssize_t size); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/freebsd/Makefile b/lib/librte_eal/freebsd/Makefile index af95386d4..0f8741d96 100644 --- a/lib/librte_eal/freebsd/Makefile +++ b/lib/librte_eal/freebsd/Makefile @@ -7,6 +7,7 @@ LIB = librte_eal.a ARCH_DIR ?= $(RTE_ARCH) VPATH += $(RTE_SDK)/lib/librte_eal/$(ARCH_DIR) +VPATH += $(RTE_SDK)/lib/librte_eal/unix VPATH += $(RTE_SDK)/lib/librte_eal/common CFLAGS += -I$(SRCDIR)/include @@ -74,6 +75,9 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_service.c SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_random.c SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_reciprocal.c +# from unix dir +SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += eal_file.c + # from arch dir SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_cpuflags.c SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_hypervisor.c diff --git a/lib/librte_eal/linux/Makefile b/lib/librte_eal/linux/Makefile index 48cc34844..331489f99 100644 --- a/lib/librte_eal/linux/Makefile +++ b/lib/librte_eal/linux/Makefile @@ -7,6 +7,7 @@ LIB = librte_eal.a ARCH_DIR ?= $(RTE_ARCH) VPATH += $(RTE_SDK)/lib/librte_eal/$(ARCH_DIR) +VPATH += $(RTE_SDK)/lib/librte_eal/unix VPATH += $(RTE_SDK)/lib/librte_eal/common CFLAGS += -I$(SRCDIR)/include @@ -81,6 +82,9 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_service.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_random.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_reciprocal.c +# from unix dir +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_file.c + # from arch dir SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_cpuflags.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_hypervisor.c diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build index e301f4558..8d492897d 100644 --- a/lib/librte_eal/meson.build +++ b/lib/librte_eal/meson.build @@ -6,6 +6,10 @@ subdir('include') subdir('common') +if not is_windows + subdir('unix') +endif + dpdk_conf.set('RTE_EXEC_ENV_' + exec_env.to_upper(), 1) subdir(exec_env) diff --git a/lib/librte_eal/unix/eal_file.c b/lib/librte_eal/unix/eal_file.c new file mode 100644 index 000000000..1b26475ba --- /dev/null +++ b/lib/librte_eal/unix/eal_file.c @@ -0,0 +1,80 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Dmitry Kozlyuk + */ + +#include +#include +#include + +#include + +#include "eal_private.h" + +int +eal_file_open(const char *path, int flags) +{ + static const int MODE_MASK = EAL_OPEN_READONLY | EAL_OPEN_READWRITE; + + int ret, sys_flags; + + switch (flags & MODE_MASK) { + case EAL_OPEN_READONLY: + sys_flags = O_RDONLY; + break; + case EAL_OPEN_READWRITE: + sys_flags = O_RDWR; + break; + default: + rte_errno = ENOTSUP; + return -1; + } + + if (flags & EAL_OPEN_CREATE) + sys_flags |= O_CREAT; + + ret = open(path, sys_flags, 0600); + if (ret < 0) + rte_errno = errno; + + return ret; +} + +int +eal_file_truncate(int fd, ssize_t size) +{ + int ret; + + ret = ftruncate(fd, size); + if (ret) + rte_errno = errno; + + return ret; +} + +int +eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode) +{ + int sys_flags = 0; + int ret; + + if (mode == EAL_FLOCK_RETURN) + sys_flags |= LOCK_NB; + + switch (op) { + case EAL_FLOCK_EXCLUSIVE: + sys_flags |= LOCK_EX; + break; + case EAL_FLOCK_SHARED: + sys_flags |= LOCK_SH; + break; + case EAL_FLOCK_UNLOCK: + sys_flags |= LOCK_UN; + break; + } + + ret = flock(fd, sys_flags); + if (ret) + rte_errno = errno; + + return ret; +} diff --git a/lib/librte_eal/unix/meson.build b/lib/librte_eal/unix/meson.build new file mode 100644 index 000000000..21029ba1a --- /dev/null +++ b/lib/librte_eal/unix/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2020 Dmitry Kozlyuk + +sources += files( + 'eal_file.c', +) From patchwork Mon Jun 15 00:43:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71522 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2191BA0093; Mon, 15 Jun 2020 02:44:33 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 91BD15F2F; Mon, 15 Jun 2020 02:44:20 +0200 (CEST) Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) by dpdk.org (Postfix) with ESMTP id 9D9404C8B for ; Mon, 15 Jun 2020 02:44:06 +0200 (CEST) Received: by mail-lf1-f67.google.com with SMTP id o4so850502lfi.7 for ; Sun, 14 Jun 2020 17:44:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UH/O+R/3lSv02BDlDPzW9n2trmzyqmnEngAr5aM89I8=; b=pmGFRtwIUnzmPUJg8VLXSD8LEpzCZQy0NFCa11DgU7uc4ohCEHhwk07SR0clQJ5mQh 5IGDno8SknFVLHfdTR/xJ7y3YwJuomd8BdGVjfk2yw+7V2deE+7QIc3DUbqYExvTnHO7 9uvc4piplL6Lgi2QJYs2qK7VjbieZpuP4t4CuyE/+etFZRsh41RPJx11GSpdD6yOib1Y SmEoSgI+EeFPMTvthyJ9a8lDbno52O8XeRm+h1QT5AVPffzz15XVeb0pxsZ5qL+CG9jn XwoOi2y2a5Rv6sduNzBZFmdOacpP9zvd1buQUeOAWSimJuOI8iJqQ74eA/ohMJTXUl1n vLBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UH/O+R/3lSv02BDlDPzW9n2trmzyqmnEngAr5aM89I8=; b=CglgwyhOBa+0oEVcfisGblSL45aqOs0Ridw2euSTdqgkuusQWuNtLguxTVkUwr3icv /jtlgcgEm2XHu9j3OB/SC7ZL/NJW7CvflFk/z0NGT4n3sgVtCdgd5c4zfN3nLJwwNJXb Iba8uT+DDwzJffJ/ONwO3EBeO88+5ByprYIgJXdY2HM8kiPjxhZGkHyXCgNW3x05cdMV x92Nx/unR3fGKDA9JzJ63KGE9NXki0wcK2tpBqyKc5wcQ7YN2qFTvua32jbnA1JsoMSu m1mF1xlx/vdvmJXD1eqGbdO1KyMoqThakhxFD5Xi7mXZQocNCXnVKjw8kTEkNFQzv0dE AM8Q== X-Gm-Message-State: AOAM530Z730B8qr8/SE83kSZ1DonX3/dZ1qhfcD3yelb6+lQ5me2uap6 S+Hae+uQHOZx8C3YNT/SDsPV8kBAT8itqQ== X-Google-Smtp-Source: ABdhPJzVjwNFuqsTKwKzla2/S1bqvRKLPoAmOL5fXPbyej5JztWKCYYBkuUaS1If3BpShxh6By4atQ== X-Received: by 2002:ac2:5314:: with SMTP id c20mr12388918lfh.75.1592181845746; Sun, 14 Jun 2020 17:44:05 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:05 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Anatoly Burakov , Bruce Richardson , Ray Kinsella , Neil Horman Date: Mon, 15 Jun 2020 03:43:45 +0300 Message-Id: <20200615004354.14380-4-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 03/12] eal: introduce memory management wrappers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Introduce OS-independent wrappers for memory management operations used across DPDK and specifically in common code of EAL: * rte_mem_map() * rte_mem_unmap() * rte_mem_page_size() * rte_mem_lock() Windows uses different APIs for memory mapping and reservation, while Unices reserve memory by mapping it. Introduce EAL private functions to support memory reservation in common code: * eal_mem_reserve() * eal_mem_free() * eal_mem_set_dump() Wrappers follow POSIX semantics limited to DPDK tasks, but their signatures deliberately differ from POSIX ones to be more safe and expressive. New symbols are internal. Being thin wrappers, they require no special maintenance. Signed-off-by: Dmitry Kozlyuk --- Not adding rte_eal_paging.h to Doxygen index because, to my understanding, it only contains public API, and it was decided to keep rte_eal_paging.h functions private. lib/librte_eal/common/eal_common_fbarray.c | 40 +++--- lib/librte_eal/common/eal_common_memory.c | 61 ++++----- lib/librte_eal/common/eal_private.h | 78 ++++++++++- lib/librte_eal/freebsd/Makefile | 1 + lib/librte_eal/include/rte_eal_paging.h | 98 +++++++++++++ lib/librte_eal/linux/Makefile | 1 + lib/librte_eal/linux/eal_memalloc.c | 5 +- lib/librte_eal/rte_eal_version.map | 9 ++ lib/librte_eal/unix/eal_unix_memory.c | 152 +++++++++++++++++++++ lib/librte_eal/unix/meson.build | 1 + 10 files changed, 381 insertions(+), 65 deletions(-) create mode 100644 lib/librte_eal/include/rte_eal_paging.h create mode 100644 lib/librte_eal/unix/eal_unix_memory.c diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index c52ddb967..fd0292a64 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -5,15 +5,16 @@ #include #include #include -#include #include #include #include #include #include -#include +#include #include +#include +#include #include #include @@ -90,12 +91,9 @@ resize_and_map(int fd, void *addr, size_t len) return -1; } - map_addr = mmap(addr, len, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_FIXED, fd, 0); + map_addr = rte_mem_map(addr, len, RTE_PROT_READ | RTE_PROT_WRITE, + RTE_MAP_SHARED | RTE_MAP_FORCE_ADDRESS, fd, 0); if (map_addr != addr) { - RTE_LOG(ERR, EAL, "mmap() failed: %s\n", strerror(errno)); - /* pass errno up the chain */ - rte_errno = errno; return -1; } return 0; @@ -733,7 +731,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, return -1; } - page_sz = sysconf(_SC_PAGESIZE); + page_sz = rte_mem_page_size(); if (page_sz == (size_t)-1) { free(ma); return -1; @@ -754,11 +752,13 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, if (internal_config.no_shconf) { /* remap virtual area as writable */ - void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE, - MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, fd, 0); - if (new_data == MAP_FAILED) { + static const int flags = RTE_MAP_FORCE_ADDRESS | + RTE_MAP_PRIVATE | RTE_MAP_ANONYMOUS; + void *new_data = rte_mem_map(data, mmap_len, + RTE_PROT_READ | RTE_PROT_WRITE, flags, fd, 0); + if (new_data == NULL) { RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n", - __func__, strerror(errno)); + __func__, rte_strerror(rte_errno)); goto fail; } } else { @@ -820,7 +820,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, return 0; fail: if (data) - munmap(data, mmap_len); + rte_mem_unmap(data, mmap_len); if (fd >= 0) close(fd); free(ma); @@ -858,7 +858,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) return -1; } - page_sz = sysconf(_SC_PAGESIZE); + page_sz = rte_mem_page_size(); if (page_sz == (size_t)-1) { free(ma); return -1; @@ -909,7 +909,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) return 0; fail: if (data) - munmap(data, mmap_len); + rte_mem_unmap(data, mmap_len); if (fd >= 0) close(fd); free(ma); @@ -937,8 +937,7 @@ rte_fbarray_detach(struct rte_fbarray *arr) * really do anything about it, things will blow up either way. */ - size_t page_sz = sysconf(_SC_PAGESIZE); - + size_t page_sz = rte_mem_page_size(); if (page_sz == (size_t)-1) return -1; @@ -957,7 +956,7 @@ rte_fbarray_detach(struct rte_fbarray *arr) goto out; } - munmap(arr->data, mmap_len); + rte_mem_unmap(arr->data, mmap_len); /* area is unmapped, close fd and remove the tailq entry */ if (tmp->fd >= 0) @@ -992,8 +991,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) * really do anything about it, things will blow up either way. */ - size_t page_sz = sysconf(_SC_PAGESIZE); - + size_t page_sz = rte_mem_page_size(); if (page_sz == (size_t)-1) return -1; @@ -1042,7 +1040,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) } close(fd); } - munmap(arr->data, mmap_len); + rte_mem_unmap(arr->data, mmap_len); /* area is unmapped, remove the tailq entry */ TAILQ_REMOVE(&mem_area_tailq, tmp, next); diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4c897a13f..aa377990f 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -11,13 +11,13 @@ #include #include #include -#include #include #include #include #include #include +#include #include #include @@ -40,18 +40,10 @@ static void *next_baseaddr; static uint64_t system_page_sz; -#ifdef RTE_EXEC_ENV_LINUX -#define RTE_DONTDUMP MADV_DONTDUMP -#elif defined RTE_EXEC_ENV_FREEBSD -#define RTE_DONTDUMP MADV_NOCORE -#else -#error "madvise doesn't support this OS" -#endif - #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5 void * eal_get_virtual_area(void *requested_addr, size_t *size, - size_t page_sz, int flags, int mmap_flags) + size_t page_sz, int flags, int reserve_flags) { bool addr_is_hint, allow_shrink, unmap, no_align; uint64_t map_sz; @@ -59,9 +51,7 @@ eal_get_virtual_area(void *requested_addr, size_t *size, uint8_t try = 0; if (system_page_sz == 0) - system_page_sz = sysconf(_SC_PAGESIZE); - - mmap_flags |= MAP_PRIVATE | MAP_ANONYMOUS; + system_page_sz = rte_mem_page_size(); RTE_LOG(DEBUG, EAL, "Ask a virtual area of 0x%zx bytes\n", *size); @@ -105,24 +95,24 @@ eal_get_virtual_area(void *requested_addr, size_t *size, return NULL; } - mapped_addr = mmap(requested_addr, (size_t)map_sz, PROT_NONE, - mmap_flags, -1, 0); - if (mapped_addr == MAP_FAILED && allow_shrink) + mapped_addr = eal_mem_reserve( + requested_addr, (size_t)map_sz, reserve_flags); + if ((mapped_addr == NULL) && allow_shrink) *size -= page_sz; - if (mapped_addr != MAP_FAILED && addr_is_hint && - mapped_addr != requested_addr) { + if ((mapped_addr != NULL) && addr_is_hint && + (mapped_addr != requested_addr)) { try++; next_baseaddr = RTE_PTR_ADD(next_baseaddr, page_sz); if (try <= MAX_MMAP_WITH_DEFINED_ADDR_TRIES) { /* hint was not used. Try with another offset */ - munmap(mapped_addr, map_sz); - mapped_addr = MAP_FAILED; + eal_mem_free(mapped_addr, map_sz); + mapped_addr = NULL; requested_addr = next_baseaddr; } } } while ((allow_shrink || addr_is_hint) && - mapped_addr == MAP_FAILED && *size > 0); + (mapped_addr == NULL) && (*size > 0)); /* align resulting address - if map failed, we will ignore the value * anyway, so no need to add additional checks. @@ -132,20 +122,17 @@ eal_get_virtual_area(void *requested_addr, size_t *size, if (*size == 0) { RTE_LOG(ERR, EAL, "Cannot get a virtual area of any size: %s\n", - strerror(errno)); - rte_errno = errno; + rte_strerror(rte_errno)); return NULL; - } else if (mapped_addr == MAP_FAILED) { + } else if (mapped_addr == NULL) { RTE_LOG(ERR, EAL, "Cannot get a virtual area: %s\n", - strerror(errno)); - /* pass errno up the call chain */ - rte_errno = errno; + rte_strerror(rte_errno)); return NULL; } else if (requested_addr != NULL && !addr_is_hint && aligned_addr != requested_addr) { RTE_LOG(ERR, EAL, "Cannot get a virtual area at requested address: %p (got %p)\n", requested_addr, aligned_addr); - munmap(mapped_addr, map_sz); + eal_mem_free(mapped_addr, map_sz); rte_errno = EADDRNOTAVAIL; return NULL; } else if (requested_addr != NULL && addr_is_hint && @@ -161,7 +148,7 @@ eal_get_virtual_area(void *requested_addr, size_t *size, aligned_addr, *size); if (unmap) { - munmap(mapped_addr, map_sz); + eal_mem_free(mapped_addr, map_sz); } else if (!no_align) { void *map_end, *aligned_end; size_t before_len, after_len; @@ -179,19 +166,17 @@ eal_get_virtual_area(void *requested_addr, size_t *size, /* unmap space before aligned mmap address */ before_len = RTE_PTR_DIFF(aligned_addr, mapped_addr); if (before_len > 0) - munmap(mapped_addr, before_len); + eal_mem_free(mapped_addr, before_len); /* unmap space after aligned end mmap address */ after_len = RTE_PTR_DIFF(map_end, aligned_end); if (after_len > 0) - munmap(aligned_end, after_len); + eal_mem_free(aligned_end, after_len); } if (!unmap) { /* Exclude these pages from a core dump. */ - if (madvise(aligned_addr, *size, RTE_DONTDUMP) != 0) - RTE_LOG(DEBUG, EAL, "madvise failed: %s\n", - strerror(errno)); + eal_mem_set_dump(aligned_addr, *size, false); } return aligned_addr; @@ -547,10 +532,10 @@ rte_eal_memdevice_init(void) int rte_mem_lock_page(const void *virt) { - unsigned long virtual = (unsigned long)virt; - int page_size = getpagesize(); - unsigned long aligned = (virtual & ~(page_size - 1)); - return mlock((void *)aligned, page_size); + uintptr_t virtual = (uintptr_t)virt; + size_t page_size = rte_mem_page_size(); + uintptr_t aligned = RTE_PTR_ALIGN_FLOOR(virtual, page_size); + return rte_mem_lock((void *)aligned, page_size); } int diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 6733a2321..1696345c2 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -11,6 +11,7 @@ #include #include +#include /** * Structure storing internal configuration (per-lcore) @@ -202,6 +203,24 @@ int rte_eal_alarm_init(void); */ int rte_eal_check_module(const char *module_name); +/** + * Memory reservation flags. + */ +enum eal_mem_reserve_flags { + /** + * Reserve hugepages. May be unsupported by some platforms. + */ + EAL_RESERVE_HUGEPAGES = 1 << 0, + /** + * Force reserving memory at the requested address. + * This can be a destructive action depending on the implementation. + * + * @see RTE_MAP_FORCE_ADDRESS for description of possible consequences + * (although implementations are not required to use it). + */ + EAL_RESERVE_FORCE_ADDRESS = 1 << 1 +}; + /** * Get virtual area of specified size from the OS. * @@ -215,8 +234,8 @@ int rte_eal_check_module(const char *module_name); * Page size on which to align requested virtual area. * @param flags * EAL_VIRTUAL_AREA_* flags. - * @param mmap_flags - * Extra flags passed directly to mmap(). + * @param reserve_flags + * Extra flags passed directly to eal_mem_reserve(). * * @return * Virtual area address if successful. @@ -233,7 +252,7 @@ int rte_eal_check_module(const char *module_name); /**< immediately unmap reserved virtual area. */ void * eal_get_virtual_area(void *requested_addr, size_t *size, - size_t page_sz, int flags, int mmap_flags); + size_t page_sz, int flags, int reserve_flags); /** * Get cpu core_id. @@ -493,4 +512,57 @@ eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode); int eal_file_truncate(int fd, ssize_t size); +/** + * Reserve a region of virtual memory. + * + * Use eal_mem_free() to free reserved memory. + * + * @param requested_addr + * A desired reservation address which must be page-aligned. + * The system might not respect it. + * NULL means the address will be chosen by the system. + * @param size + * Reservation size. Must be a multiple of system page size. + * @param flags + * Reservation options, a combination of eal_mem_reserve_flags. + * @returns + * Starting address of the reserved area on success, NULL on failure. + * Callers must not access this memory until remapping it. + */ +void * +eal_mem_reserve(void *requested_addr, size_t size, int flags); + +/** + * Free memory obtained by eal_mem_reserve() or eal_mem_alloc(). + * + * If *virt* and *size* describe a part of the reserved region, + * only this part of the region is freed (accurately up to the system + * page size). If *virt* points to allocated memory, *size* must match + * the one specified on allocation. The behavior is undefined + * if the memory pointed by *virt* is obtained from another source + * than listed above. + * + * @param virt + * A virtual address in a region previously reserved. + * @param size + * Number of bytes to unreserve. + */ +void +eal_mem_free(void *virt, size_t size); + +/** + * Configure memory region inclusion into dumps. + * + * @param virt + * Starting address of the region. + * @param size + * Size of the region. + * @param dump + * True to include memory into dumps, false to exclude. + * @return + * 0 on success, (-1) on failure and rte_errno is set. + */ +int +eal_mem_set_dump(void *virt, size_t size, bool dump); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/freebsd/Makefile b/lib/librte_eal/freebsd/Makefile index 0f8741d96..2374ba0b7 100644 --- a/lib/librte_eal/freebsd/Makefile +++ b/lib/librte_eal/freebsd/Makefile @@ -77,6 +77,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_reciprocal.c # from unix dir SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += eal_file.c +SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += eal_unix_memory.c # from arch dir SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_cpuflags.c diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h new file mode 100644 index 000000000..ed98e70e9 --- /dev/null +++ b/lib/librte_eal/include/rte_eal_paging.h @@ -0,0 +1,98 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Dmitry Kozlyuk + */ + +#include + +#include + +/** + * @file + * @internal + * + * Wrappers for OS facilities related to memory paging, used across DPDK. + */ + +/** Memory protection flags. */ +enum rte_mem_prot { + RTE_PROT_READ = 1 << 0, /**< Read access. */ + RTE_PROT_WRITE = 1 << 1, /**< Write access. */ + RTE_PROT_EXECUTE = 1 << 2 /**< Code execution. */ +}; + +/** Additional flags for memory mapping. */ +enum rte_map_flags { + /** Changes to the mapped memory are visible to other processes. */ + RTE_MAP_SHARED = 1 << 0, + /** Mapping is not backed by a regular file. */ + RTE_MAP_ANONYMOUS = 1 << 1, + /** Copy-on-write mapping, changes are invisible to other processes. */ + RTE_MAP_PRIVATE = 1 << 2, + /** + * Force mapping to the requested address. This flag should be used + * with caution, because to fulfill the request implementation + * may remove all other mappings in the requested region. However, + * it is not required to do so, thus mapping with this flag may fail. + */ + RTE_MAP_FORCE_ADDRESS = 1 << 3 +}; + +/** + * Map a portion of an opened file or the page file into memory. + * + * This function is similar to POSIX mmap(3) with common MAP_ANONYMOUS + * extension, except for the return value. + * + * @param requested_addr + * Desired virtual address for mapping. Can be NULL to let OS choose. + * @param size + * Size of the mapping in bytes. + * @param prot + * Protection flags, a combination of rte_mem_prot values. + * @param flags + * Additional mapping flags, a combination of rte_map_flags. + * @param fd + * Mapped file descriptor. Can be negative for anonymous mapping. + * @param offset + * Offset of the mapped region in fd. Must be 0 for anonymous mappings. + * @return + * Mapped address or NULL on failure and rte_errno is set to OS error. + */ +__rte_internal +void * +rte_mem_map(void *requested_addr, size_t size, int prot, int flags, + int fd, size_t offset); + +/** + * OS-independent implementation of POSIX munmap(3). + */ +__rte_internal +int +rte_mem_unmap(void *virt, size_t size); + +/** + * Get system page size. This function never fails. + * + * @return + * Page size in bytes. + */ +__rte_internal +size_t +rte_mem_page_size(void); + +/** + * Lock in physical memory all pages crossed by the address region. + * + * @param virt + * Base virtual address of the region. + * @param size + * Size of the region. + * @return + * 0 on success, negative on error. + * + * @see rte_mem_page_size() to retrieve the page size. + * @see rte_mem_lock_page() to lock an entire single page. + */ +__rte_internal +int +rte_mem_lock(const void *virt, size_t size); diff --git a/lib/librte_eal/linux/Makefile b/lib/librte_eal/linux/Makefile index 331489f99..8febf2212 100644 --- a/lib/librte_eal/linux/Makefile +++ b/lib/librte_eal/linux/Makefile @@ -84,6 +84,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_reciprocal.c # from unix dir SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_file.c +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_unix_memory.c # from arch dir SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_cpuflags.c diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c index 2c717f8bd..bf29b83c6 100644 --- a/lib/librte_eal/linux/eal_memalloc.c +++ b/lib/librte_eal/linux/eal_memalloc.c @@ -630,7 +630,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, mapped: munmap(addr, alloc_sz); unmapped: - flags = MAP_FIXED; + flags = EAL_RESERVE_FORCE_ADDRESS; new_addr = eal_get_virtual_area(addr, &alloc_sz, alloc_sz, 0, flags); if (new_addr != addr) { if (new_addr != NULL) @@ -687,8 +687,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, return -1; } - if (madvise(ms->addr, ms->len, MADV_DONTDUMP) != 0) - RTE_LOG(DEBUG, EAL, "madvise failed: %s\n", strerror(errno)); + eal_mem_set_dump(ms->addr, ms->len, false); exit_early = false; diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index d8038749a..196eef5af 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -387,3 +387,12 @@ EXPERIMENTAL { rte_trace_regexp; rte_trace_save; }; + +INTERNAL { + global: + + rte_mem_lock; + rte_mem_map; + rte_mem_page_size; + rte_mem_unmap; +}; diff --git a/lib/librte_eal/unix/eal_unix_memory.c b/lib/librte_eal/unix/eal_unix_memory.c new file mode 100644 index 000000000..ec7156df9 --- /dev/null +++ b/lib/librte_eal/unix/eal_unix_memory.c @@ -0,0 +1,152 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Dmitry Kozlyuk + */ + +#include +#include +#include + +#include +#include +#include + +#include "eal_private.h" + +#ifdef RTE_EXEC_ENV_LINUX +#define EAL_DONTDUMP MADV_DONTDUMP +#define EAL_DODUMP MADV_DODUMP +#elif defined RTE_EXEC_ENV_FREEBSD +#define EAL_DONTDUMP MADV_NOCORE +#define EAL_DODUMP MADV_CORE +#else +#error "madvise doesn't support this OS" +#endif + +static void * +mem_map(void *requested_addr, size_t size, int prot, int flags, + int fd, size_t offset) +{ + void *virt = mmap(requested_addr, size, prot, flags, fd, offset); + if (virt == MAP_FAILED) { + RTE_LOG(DEBUG, EAL, + "Cannot mmap(%p, 0x%zx, 0x%x, 0x%x, %d, 0x%zx): %s\n", + requested_addr, size, prot, flags, fd, offset, + strerror(errno)); + rte_errno = errno; + return NULL; + } + return virt; +} + +static int +mem_unmap(void *virt, size_t size) +{ + int ret = munmap(virt, size); + if (ret < 0) { + RTE_LOG(DEBUG, EAL, "Cannot munmap(%p, 0x%zx): %s\n", + virt, size, strerror(errno)); + rte_errno = errno; + } + return ret; +} + +void * +eal_mem_reserve(void *requested_addr, size_t size, int flags) +{ + int sys_flags = MAP_PRIVATE | MAP_ANONYMOUS; + + if (flags & EAL_RESERVE_HUGEPAGES) { +#ifdef MAP_HUGETLB + sys_flags |= MAP_HUGETLB; +#else + rte_errno = ENOTSUP; + return NULL; +#endif + } + + if (flags & EAL_RESERVE_FORCE_ADDRESS) + sys_flags |= MAP_FIXED; + + return mem_map(requested_addr, size, PROT_NONE, sys_flags, -1, 0); +} + +void +eal_mem_free(void *virt, size_t size) +{ + mem_unmap(virt, size); +} + +int +eal_mem_set_dump(void *virt, size_t size, bool dump) +{ + int flags = dump ? EAL_DODUMP : EAL_DONTDUMP; + int ret = madvise(virt, size, flags); + if (ret) { + RTE_LOG(DEBUG, EAL, "madvise(%p, %#zx, %d) failed: %s\n", + virt, size, flags, strerror(rte_errno)); + rte_errno = errno; + } + return ret; +} + +static int +mem_rte_to_sys_prot(int prot) +{ + int sys_prot = PROT_NONE; + + if (prot & RTE_PROT_READ) + sys_prot |= PROT_READ; + if (prot & RTE_PROT_WRITE) + sys_prot |= PROT_WRITE; + if (prot & RTE_PROT_EXECUTE) + sys_prot |= PROT_EXEC; + + return sys_prot; +} + +void * +rte_mem_map(void *requested_addr, size_t size, int prot, int flags, + int fd, size_t offset) +{ + int sys_flags = 0; + int sys_prot; + + sys_prot = mem_rte_to_sys_prot(prot); + + if (flags & RTE_MAP_SHARED) + sys_flags |= MAP_SHARED; + if (flags & RTE_MAP_ANONYMOUS) + sys_flags |= MAP_ANONYMOUS; + if (flags & RTE_MAP_PRIVATE) + sys_flags |= MAP_PRIVATE; + if (flags & RTE_MAP_FORCE_ADDRESS) + sys_flags |= MAP_FIXED; + + return mem_map(requested_addr, size, sys_prot, sys_flags, fd, offset); +} + +int +rte_mem_unmap(void *virt, size_t size) +{ + return mem_unmap(virt, size); +} + +size_t +rte_mem_page_size(void) +{ + static size_t page_size; + + if (!page_size) + page_size = sysconf(_SC_PAGESIZE); + + return page_size; +} + +int +rte_mem_lock(const void *virt, size_t size) +{ + int ret = mlock(virt, size); + if (ret) + rte_errno = errno; + return ret; +} diff --git a/lib/librte_eal/unix/meson.build b/lib/librte_eal/unix/meson.build index 21029ba1a..e733910a1 100644 --- a/lib/librte_eal/unix/meson.build +++ b/lib/librte_eal/unix/meson.build @@ -3,4 +3,5 @@ sources += files( 'eal_file.c', + 'eal_unix_memory.c', ) From patchwork Mon Jun 15 00:43:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71523 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 652C2A04A3; Mon, 15 Jun 2020 02:44:43 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DC381199BC; Mon, 15 Jun 2020 02:44:21 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id E0BEA4C8B for ; Mon, 15 Jun 2020 02:44:07 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id z9so17061984ljh.13 for ; Sun, 14 Jun 2020 17:44:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=09whGnkWMFHKebG79Ra0NbjrQLN7e0FwRWNZIVLWJKU=; b=uHadLp5VQd44Lrvu9QSsvrAB5MKwgfC1VmhuM7EvuMK6UCmNV6fXAbg3jlb20J1ylX LnDXgOeJV9DZIIrAtbwCeOQlyvzXRXumklY/sLQygr2zlxlVbXtXAJA8/B4rSAqUzFAc IUjQl7QF8BZqgZIiXxOo9ZHr1JFITM76cC1AkjjVCG8eH/cpeU3B0jbkZi0ZoqB/epQ1 yWjbzAjAirJ59rui2id43aJ6kynHsDl1G1LN0z1lm5zDhHnpHC/2FFQf4/feVoS04PYq Ry49vOX5pZ5bhmwSPtNMTxFfhAA4IkHFKRUKKxkQIbIJPRiUw86qbEGaurrgSVcfnOvT UAxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=09whGnkWMFHKebG79Ra0NbjrQLN7e0FwRWNZIVLWJKU=; b=JsDhmCFKN9SvstAZSpOSyKVUY9hUtgTg6ho4mWsj9OedoyCkejLPvI3zIixLU84mkI GjzgYdDwfaYHdgmXAK5DfV1jr0iurjzLFVM/4ZurXt+MC3QaVdmRXQOF/XLAgGlC2gSE Rh6HRKpxFpz7sJHMhsfWlE3oIXSRauVRmguczS2QWl55PScDTpwLPevbRNfZjRn/nHat 7DoXa6rUnaXMN3BWbexJnoSHqmZszs1Be6618HBtywvtuFJKn5IIUxudWXpzb276bo6R lKFOO8pn1IODsHCzcAD6L84E0xEBYqHtIVt83+Z/PvMclfOjjDHAZPa0989OVt22VkRf KmHA== X-Gm-Message-State: AOAM531zP/QJQxlEvZF30Wjv7kEJZ/zQeow97Jvv6lk0h/jUxao2MXX/ q7ALgzUimlNHnsKFYcZnO+QMyfTI8LgHvA== X-Google-Smtp-Source: ABdhPJyqI1/uGEF9LuzcQi2TZOcn7UnffNMJq6lT2OB/Xg+U2839WychiMssLL0QYXZWLWW9CB2O1g== X-Received: by 2002:a05:651c:29b:: with SMTP id b27mr11123694ljo.454.1592181847038; Sun, 14 Jun 2020 17:44:07 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:06 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Anatoly Burakov , Bruce Richardson Date: Mon, 15 Jun 2020 03:43:46 +0300 Message-Id: <20200615004354.14380-5-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 04/12] eal/mem: extract common code for memseg list initialization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" All supported OS create memory segment lists (MSL) and reserve VA space for them in a nearly identical way. Move common code into EAL private functions to reduce duplication. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/common/eal_common_memory.c | 96 ++++++++++++++++++ lib/librte_eal/common/eal_private.h | 62 ++++++++++++ lib/librte_eal/freebsd/eal_memory.c | 98 ++++-------------- lib/librte_eal/linux/eal_memory.c | 118 +++++----------------- 4 files changed, 204 insertions(+), 170 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index aa377990f..1414460c7 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -25,6 +25,7 @@ #include "eal_private.h" #include "eal_internal_cfg.h" #include "eal_memcfg.h" +#include "eal_options.h" #include "malloc_heap.h" /* @@ -182,6 +183,101 @@ eal_get_virtual_area(void *requested_addr, size_t *size, return aligned_addr; } +int +eal_memseg_list_init_named(struct rte_memseg_list *msl, const char *name, + uint64_t page_sz, int n_segs, int socket_id, bool heap) +{ + if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, + sizeof(struct rte_memseg))) { + RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", + rte_strerror(rte_errno)); + return -1; + } + + msl->page_sz = page_sz; + msl->socket_id = socket_id; + msl->base_va = NULL; + msl->heap = heap; + + RTE_LOG(DEBUG, EAL, + "Memseg list allocated at socket %i, page size 0x%"PRIx64"kB\n", + socket_id, (size_t)page_sz >> 10); + + return 0; +} + +int +eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, + int n_segs, int socket_id, int type_msl_idx, bool heap) +{ + char name[RTE_FBARRAY_NAME_LEN]; + + snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, + type_msl_idx); + + return eal_memseg_list_init_named( + msl, name, page_sz, n_segs, socket_id, heap); +} + +int +eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags) +{ + size_t page_sz, mem_sz; + void *addr; + + page_sz = msl->page_sz; + mem_sz = page_sz * msl->memseg_arr.len; + + addr = eal_get_virtual_area( + msl->base_va, &mem_sz, page_sz, 0, reserve_flags); + if (addr == NULL) { +#ifndef RTE_EXEC_ENV_WINDOWS + /* The hint would be misleading on Windows, because address + * is by default system-selected (base VA = 0). + * However, this function is called from many places, + * including common code, so don't duplicate the message. + */ + if (rte_errno == EADDRNOTAVAIL) + RTE_LOG(ERR, EAL, "Cannot reserve %llu bytes at [%p] - " + "please use '--" OPT_BASE_VIRTADDR "' option\n", + (unsigned long long)mem_sz, msl->base_va); +#endif + return -1; + } + msl->base_va = addr; + msl->len = mem_sz; + + RTE_LOG(DEBUG, EAL, "VA reserved for memseg list at %p, size %zx\n", + addr, mem_sz); + + return 0; +} + +void +eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs) +{ + size_t page_sz = msl->page_sz; + int i; + + for (i = 0; i < n_segs; i++) { + struct rte_fbarray *arr = &msl->memseg_arr; + struct rte_memseg *ms = rte_fbarray_get(arr, i); + + if (rte_eal_iova_mode() == RTE_IOVA_VA) + ms->iova = (uintptr_t)addr; + else + ms->iova = RTE_BAD_IOVA; + ms->addr = addr; + ms->hugepage_sz = page_sz; + ms->socket_id = 0; + ms->len = page_sz; + + rte_fbarray_set_used(arr, i); + + addr = RTE_PTR_ADD(addr, page_sz); + } +} + static struct rte_memseg * virt2memseg(const void *addr, const struct rte_memseg_list *msl) { diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 1696345c2..75521d086 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -254,6 +254,68 @@ void * eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz, int flags, int reserve_flags); +/** + * Initialize a memory segment list and create its backing storage. + * + * @param msl + * Memory segment list to be filled. + * @param name + * Name for the backing storage. + * @param page_sz + * Size of segment pages in the MSL. + * @param n_segs + * Number of segments. + * @param socket_id + * Socket ID. Must not be SOCKET_ID_ANY. + * @param heap + * Mark MSL as pointing to a heap. + * @return + * 0 on success, (-1) on failure and rte_errno is set. + */ +int +eal_memseg_list_init_named(struct rte_memseg_list *msl, const char *name, + uint64_t page_sz, int n_segs, int socket_id, bool heap); + +/** + * Initialize memory segment list and create its backing storage + * with a name corresponding to MSL parameters. + * + * @param type_msl_idx + * Index of the MSL among other MSLs of the same socket and page size. + * + * @see eal_memseg_list_init_named for remaining parameters description. + */ +int +eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, + int n_segs, int socket_id, int type_msl_idx, bool heap); + +/** + * Reserve VA space for a memory segment list + * previously initialized with eal_memseg_list_init(). + * + * @param msl + * Initialized memory segment list with page size defined. + * @param reserve_flags + * Extra memory reservation flags. Can be 0 if unnecessary. + * @return + * 0 on success, (-1) on failure and rte_errno is set. + */ +int +eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags); + +/** + * Populate MSL, each segment is one page long. + * + * @param msl + * Initialized memory segment list with page size defined. + * @param addr + * Starting address of list segments. + * @param n_segs + * Number of segments to populate. + */ +void +eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs); + /** * Get cpu core_id. * diff --git a/lib/librte_eal/freebsd/eal_memory.c b/lib/librte_eal/freebsd/eal_memory.c index 5bc2da160..2eb70c2fe 100644 --- a/lib/librte_eal/freebsd/eal_memory.c +++ b/lib/librte_eal/freebsd/eal_memory.c @@ -64,55 +64,34 @@ rte_eal_hugepage_init(void) /* for debug purposes, hugetlbfs can be disabled */ if (internal_config.no_hugetlbfs) { struct rte_memseg_list *msl; - struct rte_fbarray *arr; - struct rte_memseg *ms; - uint64_t page_sz; - int n_segs, cur_seg; + uint64_t mem_sz, page_sz; + int n_segs; /* create a memseg list */ msl = &mcfg->memsegs[0]; + mem_sz = internal_config.memory; page_sz = RTE_PGSIZE_4K; - n_segs = internal_config.memory / page_sz; + n_segs = mem_sz / page_sz; - if (rte_fbarray_init(&msl->memseg_arr, "nohugemem", n_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n"); + if (eal_memseg_list_init_named( + msl, "nohugemem", page_sz, n_segs, 0, true)) { return -1; } - addr = mmap(NULL, internal_config.memory, - PROT_READ | PROT_WRITE, + addr = mmap(NULL, mem_sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (addr == MAP_FAILED) { RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__, strerror(errno)); return -1; } - msl->base_va = addr; - msl->page_sz = page_sz; - msl->len = internal_config.memory; - msl->socket_id = 0; - msl->heap = 1; - - /* populate memsegs. each memseg is 1 page long */ - for (cur_seg = 0; cur_seg < n_segs; cur_seg++) { - arr = &msl->memseg_arr; - ms = rte_fbarray_get(arr, cur_seg); - if (rte_eal_iova_mode() == RTE_IOVA_VA) - ms->iova = (uintptr_t)addr; - else - ms->iova = RTE_BAD_IOVA; - ms->addr = addr; - ms->hugepage_sz = page_sz; - ms->len = page_sz; - ms->socket_id = 0; + msl->base_va = addr; + msl->len = mem_sz; - rte_fbarray_set_used(arr, cur_seg); + eal_memseg_list_populate(msl, addr, n_segs); - addr = RTE_PTR_ADD(addr, page_sz); - } return 0; } @@ -336,64 +315,25 @@ get_mem_amount(uint64_t page_sz, uint64_t max_mem) return RTE_ALIGN(area_sz, page_sz); } -#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" static int -alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, +memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, int n_segs, int socket_id, int type_msl_idx) { - char name[RTE_FBARRAY_NAME_LEN]; - - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); - if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", - rte_strerror(rte_errno)); - return -1; - } - - msl->page_sz = page_sz; - msl->socket_id = socket_id; - msl->base_va = NULL; - - RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", - (size_t)page_sz >> 10, socket_id); - - return 0; + return eal_memseg_list_init( + msl, page_sz, n_segs, socket_id, type_msl_idx, false); } static int -alloc_va_space(struct rte_memseg_list *msl) +memseg_list_alloc(struct rte_memseg_list *msl) { - uint64_t page_sz; - size_t mem_sz; - void *addr; int flags = 0; #ifdef RTE_ARCH_PPC_64 - flags |= MAP_HUGETLB; + flags |= EAL_RESERVE_HUGEPAGES; #endif - - page_sz = msl->page_sz; - mem_sz = page_sz * msl->memseg_arr.len; - - addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); - if (addr == NULL) { - if (rte_errno == EADDRNOTAVAIL) - RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - " - "please use '--" OPT_BASE_VIRTADDR "' option\n", - (unsigned long long)mem_sz, msl->base_va); - else - RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); - return -1; - } - msl->base_va = addr; - msl->len = mem_sz; - - return 0; + return eal_memseg_list_alloc(msl, flags); } - static int memseg_primary_init(void) { @@ -479,7 +419,7 @@ memseg_primary_init(void) cur_max_mem); n_segs = cur_mem / hugepage_sz; - if (alloc_memseg_list(msl, hugepage_sz, n_segs, + if (memseg_list_init(msl, hugepage_sz, n_segs, 0, type_msl_idx)) return -1; @@ -487,7 +427,7 @@ memseg_primary_init(void) total_type_mem = total_segs * hugepage_sz; type_msl_idx++; - if (alloc_va_space(msl)) { + if (memseg_list_alloc(msl)) { RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); return -1; } @@ -518,7 +458,7 @@ memseg_secondary_init(void) } /* preallocate VA space */ - if (alloc_va_space(msl)) { + if (memseg_list_alloc(msl)) { RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); return -1; } diff --git a/lib/librte_eal/linux/eal_memory.c b/lib/librte_eal/linux/eal_memory.c index 7a9c97ff8..9cc39e6fb 100644 --- a/lib/librte_eal/linux/eal_memory.c +++ b/lib/librte_eal/linux/eal_memory.c @@ -802,7 +802,7 @@ get_mem_amount(uint64_t page_sz, uint64_t max_mem) } static int -free_memseg_list(struct rte_memseg_list *msl) +memseg_list_free(struct rte_memseg_list *msl) { if (rte_fbarray_destroy(&msl->memseg_arr)) { RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n"); @@ -812,58 +812,18 @@ free_memseg_list(struct rte_memseg_list *msl) return 0; } -#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" static int -alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, +memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, int n_segs, int socket_id, int type_msl_idx) { - char name[RTE_FBARRAY_NAME_LEN]; - - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); - if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", - rte_strerror(rte_errno)); - return -1; - } - - msl->page_sz = page_sz; - msl->socket_id = socket_id; - msl->base_va = NULL; - msl->heap = 1; /* mark it as a heap segment */ - - RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", - (size_t)page_sz >> 10, socket_id); - - return 0; + return eal_memseg_list_init( + msl, page_sz, n_segs, socket_id, type_msl_idx, true); } static int -alloc_va_space(struct rte_memseg_list *msl) +memseg_list_alloc(struct rte_memseg_list *msl) { - uint64_t page_sz; - size_t mem_sz; - void *addr; - int flags = 0; - - page_sz = msl->page_sz; - mem_sz = page_sz * msl->memseg_arr.len; - - addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); - if (addr == NULL) { - if (rte_errno == EADDRNOTAVAIL) - RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - " - "please use '--" OPT_BASE_VIRTADDR "' option\n", - (unsigned long long)mem_sz, msl->base_va); - else - RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); - return -1; - } - msl->base_va = addr; - msl->len = mem_sz; - - return 0; + return eal_memseg_list_alloc(msl, 0); } /* @@ -1009,13 +969,16 @@ prealloc_segments(struct hugepage_file *hugepages, int n_pages) } /* now, allocate fbarray itself */ - if (alloc_memseg_list(msl, page_sz, n_segs, socket, + if (memseg_list_init(msl, page_sz, n_segs, socket, msl_idx) < 0) return -1; /* finally, allocate VA space */ - if (alloc_va_space(msl) < 0) + if (memseg_list_alloc(msl) < 0) { + RTE_LOG(ERR, EAL, "Cannot preallocate 0x%"PRIx64"kB hugepages\n", + page_sz >> 10); return -1; + } } } return 0; @@ -1323,8 +1286,6 @@ eal_legacy_hugepage_init(void) struct rte_mem_config *mcfg; struct hugepage_file *hugepage = NULL, *tmp_hp = NULL; struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES]; - struct rte_fbarray *arr; - struct rte_memseg *ms; uint64_t memory[RTE_MAX_NUMA_NODES]; @@ -1343,7 +1304,7 @@ eal_legacy_hugepage_init(void) void *prealloc_addr; size_t mem_sz; struct rte_memseg_list *msl; - int n_segs, cur_seg, fd, flags; + int n_segs, fd, flags; #ifdef MEMFD_SUPPORTED int memfd; #endif @@ -1358,12 +1319,12 @@ eal_legacy_hugepage_init(void) /* create a memseg list */ msl = &mcfg->memsegs[0]; + mem_sz = internal_config.memory; page_sz = RTE_PGSIZE_4K; - n_segs = internal_config.memory / page_sz; + n_segs = mem_sz / page_sz; - if (rte_fbarray_init(&msl->memseg_arr, "nohugemem", n_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n"); + if (eal_memseg_list_init_named( + msl, "nohugemem", page_sz, n_segs, 0, true)) { return -1; } @@ -1400,16 +1361,12 @@ eal_legacy_hugepage_init(void) /* preallocate address space for the memory, so that it can be * fit into the DMA mask. */ - mem_sz = internal_config.memory; - prealloc_addr = eal_get_virtual_area( - NULL, &mem_sz, page_sz, 0, 0); - if (prealloc_addr == NULL) { - RTE_LOG(ERR, EAL, - "%s: reserving memory area failed: " - "%s\n", - __func__, strerror(errno)); + if (eal_memseg_list_alloc(msl, 0)) { + RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); return -1; } + + prealloc_addr = msl->base_va; addr = mmap(prealloc_addr, mem_sz, PROT_READ | PROT_WRITE, flags | MAP_FIXED, fd, 0); if (addr == MAP_FAILED || addr != prealloc_addr) { @@ -1418,11 +1375,6 @@ eal_legacy_hugepage_init(void) munmap(prealloc_addr, mem_sz); return -1; } - msl->base_va = addr; - msl->page_sz = page_sz; - msl->socket_id = 0; - msl->len = mem_sz; - msl->heap = 1; /* we're in single-file segments mode, so only the segment list * fd needs to be set up. @@ -1434,24 +1386,8 @@ eal_legacy_hugepage_init(void) } } - /* populate memsegs. each memseg is one page long */ - for (cur_seg = 0; cur_seg < n_segs; cur_seg++) { - arr = &msl->memseg_arr; + eal_memseg_list_populate(msl, addr, n_segs); - ms = rte_fbarray_get(arr, cur_seg); - if (rte_eal_iova_mode() == RTE_IOVA_VA) - ms->iova = (uintptr_t)addr; - else - ms->iova = RTE_BAD_IOVA; - ms->addr = addr; - ms->hugepage_sz = page_sz; - ms->socket_id = 0; - ms->len = page_sz; - - rte_fbarray_set_used(arr, cur_seg); - - addr = RTE_PTR_ADD(addr, (size_t)page_sz); - } if (mcfg->dma_maskbits && rte_mem_check_dma_mask_thread_unsafe(mcfg->dma_maskbits)) { RTE_LOG(ERR, EAL, @@ -2191,7 +2127,7 @@ memseg_primary_init_32(void) max_pagesz_mem); n_segs = cur_mem / hugepage_sz; - if (alloc_memseg_list(msl, hugepage_sz, n_segs, + if (memseg_list_init(msl, hugepage_sz, n_segs, socket_id, type_msl_idx)) { /* failing to allocate a memseg list is * a serious error. @@ -2200,13 +2136,13 @@ memseg_primary_init_32(void) return -1; } - if (alloc_va_space(msl)) { + if (memseg_list_alloc(msl)) { /* if we couldn't allocate VA space, we * can try with smaller page sizes. */ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n"); /* deallocate memseg list */ - if (free_memseg_list(msl)) + if (memseg_list_free(msl)) return -1; break; } @@ -2395,11 +2331,11 @@ memseg_primary_init(void) } msl = &mcfg->memsegs[msl_idx++]; - if (alloc_memseg_list(msl, pagesz, n_segs, + if (memseg_list_init(msl, pagesz, n_segs, socket_id, cur_seglist)) goto out; - if (alloc_va_space(msl)) { + if (memseg_list_alloc(msl)) { RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); goto out; } @@ -2433,7 +2369,7 @@ memseg_secondary_init(void) } /* preallocate VA space */ - if (alloc_va_space(msl)) { + if (memseg_list_alloc(msl)) { RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); return -1; } From patchwork Mon Jun 15 00:43:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71525 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D5FBBA04A3; Mon, 15 Jun 2020 02:45:04 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 98AA21BC24; Mon, 15 Jun 2020 02:44:24 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id E9ED04C9D for ; Mon, 15 Jun 2020 02:44:09 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id i27so17049059ljb.12 for ; Sun, 14 Jun 2020 17:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VYQ5p2nPpBR5DJvILXrUnlOfJmOiXgnGWSaG5+Zoefw=; b=mmWrXxUvie4KuuLm7B2kRcnWeGUSKit2s9CsFbe9MBHNJDjyCytCzUT+XsKuzsrR0k +J7N0YNM/uxuN99SliAxxRdxZOqkX/NhZjff4WrW/MozhtY8szf4NIowd3NIl25CEer6 Yh55l0JbzMsjKWuoj45BfPLciGtY+7UHLPQtBP6LlJrNK9QKYn9tT7DNUQuJWZ47bmT0 EGDTxrTHoCm3u1FuJLEmZjVlICCmL3dAl+yF1Ih19irhO1GuMzaL49An7zk+9Rr8Qim7 JJoN8gN0lIJ2TA0qC01fK1wCiv6FNx+ZKFwtOHC6CV2yw+bdF+JsXv7STp+fH5Pv9YCr BMCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VYQ5p2nPpBR5DJvILXrUnlOfJmOiXgnGWSaG5+Zoefw=; b=RgMOQs2etD/bVxMUE5XazFGcIc00XEzrG3ZCEZ7jfuhlKHze/eqkI0yJiEZccfSFcL +5nk4NugKVGEoviXs7baO5XtKHL6Qrr+aCLJ0EGJT8So5EZ3SAHxKiJifxO+xJO+IYb9 paWq/0teiKwVoK0OdqUx6FC7zHGftUnxFcJ/J45X17EBfg2F2CbFuapeJnvH22dWaUJT akoDbuPNCLMvmLG/srtLJTRYdzvxme17MXq1+TWjAI/YXiUC9XOX0vcIXQoJySUuNyQn dseCacPKqWs5/3d161j391MLmuxV+lSmAQ7wM8KpdyQlXULgmkavxhO97zXKZVZsAbe0 FZpw== X-Gm-Message-State: AOAM530k4M8tAGBzFdBsuO8kI3jpBdCUy89YhamjKgkFKgmFzNRRzGWR s0UiN5T1FGfz3GkfGJG4VZUos7BEz2K5/w== X-Google-Smtp-Source: ABdhPJy6QwR6jlTrxYJAwtbbqxQIgo71nSgNTz63hBdS1Ek4iIRUgEn+kqD7gZdDgPqJ/DEjKcuj4g== X-Received: by 2002:a05:651c:484:: with SMTP id s4mr10797288ljc.381.1592181848330; Sun, 14 Jun 2020 17:44:08 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:07 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Anatoly Burakov , Bruce Richardson Date: Mon, 15 Jun 2020 03:43:47 +0300 Message-Id: <20200615004354.14380-6-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 05/12] eal/mem: extract common code for dynamic memory allocation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Code in Linux EAL that supports dynamic memory allocation (as opposed to static allocation used by FreeBSD) is not OS-dependent and can be reused by Windows EAL. Move such code to a file compiled only for the OS that require it. Keep Anatoly Burakov maintainer of extracted code. Signed-off-by: Dmitry Kozlyuk --- MAINTAINERS | 1 + lib/librte_eal/common/eal_common_dynmem.c | 521 +++++++++++++++++++++ lib/librte_eal/common/eal_private.h | 43 +- lib/librte_eal/common/meson.build | 4 + lib/librte_eal/freebsd/eal_memory.c | 12 +- lib/librte_eal/linux/Makefile | 1 + lib/librte_eal/linux/eal_memory.c | 523 +--------------------- 7 files changed, 582 insertions(+), 523 deletions(-) create mode 100644 lib/librte_eal/common/eal_common_dynmem.c diff --git a/MAINTAINERS b/MAINTAINERS index 4d162efd6..241dbc3d7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -209,6 +209,7 @@ F: lib/librte_eal/include/rte_fbarray.h F: lib/librte_eal/include/rte_mem* F: lib/librte_eal/include/rte_malloc.h F: lib/librte_eal/common/*malloc* +F: lib/librte_eal/common/eal_common_dynmem.c F: lib/librte_eal/common/eal_common_fbarray.c F: lib/librte_eal/common/eal_common_mem* F: lib/librte_eal/common/eal_hugepages.h diff --git a/lib/librte_eal/common/eal_common_dynmem.c b/lib/librte_eal/common/eal_common_dynmem.c new file mode 100644 index 000000000..6b07672d0 --- /dev/null +++ b/lib/librte_eal/common/eal_common_dynmem.c @@ -0,0 +1,521 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2014 Intel Corporation. + * Copyright(c) 2013 6WIND S.A. + */ + +#include +#include + +#include +#include + +#include "eal_internal_cfg.h" +#include "eal_memalloc.h" +#include "eal_memcfg.h" +#include "eal_private.h" + +/** @file Functions common to EALs that support dynamic memory allocation. */ + +int +eal_dynmem_memseg_lists_init(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct memtype { + uint64_t page_sz; + int socket_id; + } *memtypes = NULL; + int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ + struct rte_memseg_list *msl; + uint64_t max_mem, max_mem_per_type; + unsigned int max_seglists_per_type; + unsigned int n_memtypes, cur_type; + + /* no-huge does not need this at all */ + if (internal_config.no_hugetlbfs) + return 0; + + /* + * figuring out amount of memory we're going to have is a long and very + * involved process. the basic element we're operating with is a memory + * type, defined as a combination of NUMA node ID and page size (so that + * e.g. 2 sockets with 2 page sizes yield 4 memory types in total). + * + * deciding amount of memory going towards each memory type is a + * balancing act between maximum segments per type, maximum memory per + * type, and number of detected NUMA nodes. the goal is to make sure + * each memory type gets at least one memseg list. + * + * the total amount of memory is limited by RTE_MAX_MEM_MB value. + * + * the total amount of memory per type is limited by either + * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB divided by the number + * of detected NUMA nodes. additionally, maximum number of segments per + * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for + * smaller page sizes, it can take hundreds of thousands of segments to + * reach the above specified per-type memory limits. + * + * additionally, each type may have multiple memseg lists associated + * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for bigger + * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller ones. + * + * the number of memseg lists per type is decided based on the above + * limits, and also taking number of detected NUMA nodes, to make sure + * that we don't run out of memseg lists before we populate all NUMA + * nodes with memory. + * + * we do this in three stages. first, we collect the number of types. + * then, we figure out memory constraints and populate the list of + * would-be memseg lists. then, we go ahead and allocate the memseg + * lists. + */ + + /* create space for mem types */ + n_memtypes = internal_config.num_hugepage_sizes * rte_socket_count(); + memtypes = calloc(n_memtypes, sizeof(*memtypes)); + if (memtypes == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate space for memory types\n"); + return -1; + } + + /* populate mem types */ + cur_type = 0; + for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes; + hpi_idx++) { + struct hugepage_info *hpi; + uint64_t hugepage_sz; + + hpi = &internal_config.hugepage_info[hpi_idx]; + hugepage_sz = hpi->hugepage_sz; + + for (i = 0; i < (int) rte_socket_count(); i++, cur_type++) { + int socket_id = rte_socket_id_by_idx(i); + +#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES + /* we can still sort pages by socket in legacy mode */ + if (!internal_config.legacy_mem && socket_id > 0) + break; +#endif + memtypes[cur_type].page_sz = hugepage_sz; + memtypes[cur_type].socket_id = socket_id; + + RTE_LOG(DEBUG, EAL, "Detected memory type: " + "socket_id:%u hugepage_sz:%" PRIu64 "\n", + socket_id, hugepage_sz); + } + } + /* number of memtypes could have been lower due to no NUMA support */ + n_memtypes = cur_type; + + /* set up limits for types */ + max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; + max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, + max_mem / n_memtypes); + /* + * limit maximum number of segment lists per type to ensure there's + * space for memseg lists for all NUMA nodes with all page sizes + */ + max_seglists_per_type = RTE_MAX_MEMSEG_LISTS / n_memtypes; + + if (max_seglists_per_type == 0) { + RTE_LOG(ERR, EAL, "Cannot accommodate all memory types, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + goto out; + } + + /* go through all mem types and create segment lists */ + msl_idx = 0; + for (cur_type = 0; cur_type < n_memtypes; cur_type++) { + unsigned int cur_seglist, n_seglists, n_segs; + unsigned int max_segs_per_type, max_segs_per_list; + struct memtype *type = &memtypes[cur_type]; + uint64_t max_mem_per_list, pagesz; + int socket_id; + + pagesz = type->page_sz; + socket_id = type->socket_id; + + /* + * we need to create segment lists for this type. we must take + * into account the following things: + * + * 1. total amount of memory we can use for this memory type + * 2. total amount of memory per memseg list allowed + * 3. number of segments needed to fit the amount of memory + * 4. number of segments allowed per type + * 5. number of segments allowed per memseg list + * 6. number of memseg lists we are allowed to take up + */ + + /* calculate how much segments we will need in total */ + max_segs_per_type = max_mem_per_type / pagesz; + /* limit number of segments to maximum allowed per type */ + max_segs_per_type = RTE_MIN(max_segs_per_type, + (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); + /* limit number of segments to maximum allowed per list */ + max_segs_per_list = RTE_MIN(max_segs_per_type, + (unsigned int)RTE_MAX_MEMSEG_PER_LIST); + + /* calculate how much memory we can have per segment list */ + max_mem_per_list = RTE_MIN(max_segs_per_list * pagesz, + (uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20); + + /* calculate how many segments each segment list will have */ + n_segs = RTE_MIN(max_segs_per_list, max_mem_per_list / pagesz); + + /* calculate how many segment lists we can have */ + n_seglists = RTE_MIN(max_segs_per_type / n_segs, + max_mem_per_type / max_mem_per_list); + + /* limit number of segment lists according to our maximum */ + n_seglists = RTE_MIN(n_seglists, max_seglists_per_type); + + RTE_LOG(DEBUG, EAL, "Creating %i segment lists: " + "n_segs:%i socket_id:%i hugepage_sz:%" PRIu64 "\n", + n_seglists, n_segs, socket_id, pagesz); + + /* create all segment lists */ + for (cur_seglist = 0; cur_seglist < n_seglists; cur_seglist++) { + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + RTE_LOG(ERR, EAL, + "No more space in memseg lists, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + goto out; + } + msl = &mcfg->memsegs[msl_idx++]; + + if (eal_memseg_list_init(msl, pagesz, n_segs, + socket_id, cur_seglist, true)) + goto out; + + if (eal_memseg_list_alloc(msl, 0)) { + RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); + goto out; + } + } + } + /* we're successful */ + ret = 0; +out: + free(memtypes); + return ret; +} + +static int __rte_unused +hugepage_count_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct hugepage_info *hpi = arg; + + if (msl->page_sz != hpi->hugepage_sz) + return 0; + + hpi->num_pages[msl->socket_id] += msl->memseg_arr.len; + return 0; +} + +static int +limits_callback(int socket_id, size_t cur_limit, size_t new_len) +{ + RTE_SET_USED(socket_id); + RTE_SET_USED(cur_limit); + RTE_SET_USED(new_len); + return -1; +} + +int +eal_dynmem_hugepage_init(void) +{ + struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES]; + uint64_t memory[RTE_MAX_NUMA_NODES]; + int hp_sz_idx, socket_id; + + memset(used_hp, 0, sizeof(used_hp)); + + for (hp_sz_idx = 0; + hp_sz_idx < (int) internal_config.num_hugepage_sizes; + hp_sz_idx++) { +#ifndef RTE_ARCH_64 + struct hugepage_info dummy; + unsigned int i; +#endif + /* also initialize used_hp hugepage sizes in used_hp */ + struct hugepage_info *hpi; + hpi = &internal_config.hugepage_info[hp_sz_idx]; + used_hp[hp_sz_idx].hugepage_sz = hpi->hugepage_sz; + +#ifndef RTE_ARCH_64 + /* for 32-bit, limit number of pages on socket to whatever we've + * preallocated, as we cannot allocate more. + */ + memset(&dummy, 0, sizeof(dummy)); + dummy.hugepage_sz = hpi->hugepage_sz; + if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0) + return -1; + + for (i = 0; i < RTE_DIM(dummy.num_pages); i++) { + hpi->num_pages[i] = RTE_MIN(hpi->num_pages[i], + dummy.num_pages[i]); + } +#endif + } + + /* make a copy of socket_mem, needed for balanced allocation. */ + for (hp_sz_idx = 0; hp_sz_idx < RTE_MAX_NUMA_NODES; hp_sz_idx++) + memory[hp_sz_idx] = internal_config.socket_mem[hp_sz_idx]; + + /* calculate final number of pages */ + if (eal_dynmem_calc_num_pages_per_socket(memory, + internal_config.hugepage_info, used_hp, + internal_config.num_hugepage_sizes) < 0) + return -1; + + for (hp_sz_idx = 0; + hp_sz_idx < (int)internal_config.num_hugepage_sizes; + hp_sz_idx++) { + for (socket_id = 0; socket_id < RTE_MAX_NUMA_NODES; + socket_id++) { + struct rte_memseg **pages; + struct hugepage_info *hpi = &used_hp[hp_sz_idx]; + unsigned int num_pages = hpi->num_pages[socket_id]; + unsigned int num_pages_alloc; + + if (num_pages == 0) + continue; + + RTE_LOG(DEBUG, EAL, + "Allocating %u pages of size %" PRIu64 "M " + "on socket %i\n", + num_pages, hpi->hugepage_sz >> 20, socket_id); + + /* we may not be able to allocate all pages in one go, + * because we break up our memory map into multiple + * memseg lists. therefore, try allocating multiple + * times and see if we can get the desired number of + * pages from multiple allocations. + */ + + num_pages_alloc = 0; + do { + int i, cur_pages, needed; + + needed = num_pages - num_pages_alloc; + + pages = malloc(sizeof(*pages) * needed); + + /* do not request exact number of pages */ + cur_pages = eal_memalloc_alloc_seg_bulk(pages, + needed, hpi->hugepage_sz, + socket_id, false); + if (cur_pages <= 0) { + free(pages); + return -1; + } + + /* mark preallocated pages as unfreeable */ + for (i = 0; i < cur_pages; i++) { + struct rte_memseg *ms = pages[i]; + ms->flags |= + RTE_MEMSEG_FLAG_DO_NOT_FREE; + } + free(pages); + + num_pages_alloc += cur_pages; + } while (num_pages_alloc != num_pages); + } + } + + /* if socket limits were specified, set them */ + if (internal_config.force_socket_limits) { + unsigned int i; + for (i = 0; i < RTE_MAX_NUMA_NODES; i++) { + uint64_t limit = internal_config.socket_limit[i]; + if (limit == 0) + continue; + if (rte_mem_alloc_validator_register("socket-limit", + limits_callback, i, limit)) + RTE_LOG(ERR, EAL, "Failed to register socket limits validator callback\n"); + } + } + return 0; +} + +__rte_unused /* function is unused on 32-bit builds */ +static inline uint64_t +get_socket_mem_size(int socket) +{ + uint64_t size = 0; + unsigned int i; + + for (i = 0; i < internal_config.num_hugepage_sizes; i++) { + struct hugepage_info *hpi = &internal_config.hugepage_info[i]; + size += hpi->hugepage_sz * hpi->num_pages[socket]; + } + + return size; +} + +int +eal_dynmem_calc_num_pages_per_socket( + uint64_t *memory, struct hugepage_info *hp_info, + struct hugepage_info *hp_used, unsigned int num_hp_info) +{ + unsigned int socket, j, i = 0; + unsigned int requested, available; + int total_num_pages = 0; + uint64_t remaining_mem, cur_mem; + uint64_t total_mem = internal_config.memory; + + if (num_hp_info == 0) + return -1; + + /* if specific memory amounts per socket weren't requested */ + if (internal_config.force_sockets == 0) { + size_t total_size; +#ifdef RTE_ARCH_64 + int cpu_per_socket[RTE_MAX_NUMA_NODES]; + size_t default_size; + unsigned int lcore_id; + + /* Compute number of cores per socket */ + memset(cpu_per_socket, 0, sizeof(cpu_per_socket)); + RTE_LCORE_FOREACH(lcore_id) { + cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++; + } + + /* + * Automatically spread requested memory amongst detected + * sockets according to number of cores from CPU mask present + * on each socket. + */ + total_size = internal_config.memory; + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; + socket++) { + + /* Set memory amount per socket */ + default_size = internal_config.memory * + cpu_per_socket[socket] / rte_lcore_count(); + + /* Limit to maximum available memory on socket */ + default_size = RTE_MIN( + default_size, get_socket_mem_size(socket)); + + /* Update sizes */ + memory[socket] = default_size; + total_size -= default_size; + } + + /* + * If some memory is remaining, try to allocate it by getting + * all available memory from sockets, one after the other. + */ + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; + socket++) { + /* take whatever is available */ + default_size = RTE_MIN( + get_socket_mem_size(socket) - memory[socket], + total_size); + + /* Update sizes */ + memory[socket] += default_size; + total_size -= default_size; + } +#else + /* in 32-bit mode, allocate all of the memory only on master + * lcore socket + */ + total_size = internal_config.memory; + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; + socket++) { + struct rte_config *cfg = rte_eal_get_configuration(); + unsigned int master_lcore_socket; + + master_lcore_socket = + rte_lcore_to_socket_id(cfg->master_lcore); + + if (master_lcore_socket != socket) + continue; + + /* Update sizes */ + memory[socket] = total_size; + break; + } +#endif + } + + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; + socket++) { + /* skips if the memory on specific socket wasn't requested */ + for (i = 0; i < num_hp_info && memory[socket] != 0; i++) { + rte_strscpy(hp_used[i].hugedir, hp_info[i].hugedir, + sizeof(hp_used[i].hugedir)); + hp_used[i].num_pages[socket] = RTE_MIN( + memory[socket] / hp_info[i].hugepage_sz, + hp_info[i].num_pages[socket]); + + cur_mem = hp_used[i].num_pages[socket] * + hp_used[i].hugepage_sz; + + memory[socket] -= cur_mem; + total_mem -= cur_mem; + + total_num_pages += hp_used[i].num_pages[socket]; + + /* check if we have met all memory requests */ + if (memory[socket] == 0) + break; + + /* Check if we have any more pages left at this size, + * if so, move on to next size. + */ + if (hp_used[i].num_pages[socket] == + hp_info[i].num_pages[socket]) + continue; + /* At this point we know that there are more pages + * available that are bigger than the memory we want, + * so lets see if we can get enough from other page + * sizes. + */ + remaining_mem = 0; + for (j = i+1; j < num_hp_info; j++) + remaining_mem += hp_info[j].hugepage_sz * + hp_info[j].num_pages[socket]; + + /* Is there enough other memory? + * If not, allocate another page and quit. + */ + if (remaining_mem < memory[socket]) { + cur_mem = RTE_MIN( + memory[socket], hp_info[i].hugepage_sz); + memory[socket] -= cur_mem; + total_mem -= cur_mem; + hp_used[i].num_pages[socket]++; + total_num_pages++; + break; /* we are done with this socket*/ + } + } + + /* if we didn't satisfy all memory requirements per socket */ + if (memory[socket] > 0 && + internal_config.socket_mem[socket] != 0) { + /* to prevent icc errors */ + requested = (unsigned int)( + internal_config.socket_mem[socket] / 0x100000); + available = requested - + ((unsigned int)(memory[socket] / 0x100000)); + RTE_LOG(ERR, EAL, "Not enough memory available on " + "socket %u! Requested: %uMB, available: %uMB\n", + socket, requested, available); + return -1; + } + } + + /* if we didn't satisfy total memory requirements */ + if (total_mem > 0) { + requested = (unsigned int)(internal_config.memory / 0x100000); + available = requested - (unsigned int)(total_mem / 0x100000); + RTE_LOG(ERR, EAL, "Not enough memory available! " + "Requested: %uMB, available: %uMB\n", + requested, available); + return -1; + } + return total_num_pages; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 75521d086..0592fcd69 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -13,6 +13,8 @@ #include #include +#include "eal_internal_cfg.h" + /** * Structure storing internal configuration (per-lcore) */ @@ -316,6 +318,45 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags); void eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs); +/** + * Distribute available memory between MSLs. + * + * @return + * 0 on success, (-1) on failure. + */ +int +eal_dynmem_memseg_lists_init(void); + +/** + * Preallocate hugepages for dynamic allocation. + * + * @return + * 0 on success, (-1) on failure. + */ +int +eal_dynmem_hugepage_init(void); + +/** + * Given the list of hugepage sizes and the number of pages thereof, + * calculate the best number of pages of each size to fulfill the request + * for RAM on each NUMA node. + * + * @param memory + * Amounts of memory requested for each NUMA node of RTE_MAX_NUMA_NODES. + * @param hp_info + * Information about hugepages of different size. + * @param hp_used + * Receives information about used hugepages of each size. + * @param num_hp_info + * Number of elements in hp_info and hp_used. + * @return + * 0 on success, (-1) on failure. + */ +int +eal_dynmem_calc_num_pages_per_socket( + uint64_t *memory, struct hugepage_info *hp_info, + struct hugepage_info *hp_used, unsigned int num_hp_info); + /** * Get cpu core_id. * @@ -595,7 +636,7 @@ void * eal_mem_reserve(void *requested_addr, size_t size, int flags); /** - * Free memory obtained by eal_mem_reserve() or eal_mem_alloc(). + * Free memory obtained by eal_mem_reserve() and possibly allocated. * * If *virt* and *size* describe a part of the reserved region, * only this part of the region is freed (accurately up to the system diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build index 55aaeb18e..d91c22220 100644 --- a/lib/librte_eal/common/meson.build +++ b/lib/librte_eal/common/meson.build @@ -56,3 +56,7 @@ sources += files( 'rte_reciprocal.c', 'rte_service.c', ) + +if is_linux + sources += files('eal_common_dynmem.c') +endif diff --git a/lib/librte_eal/freebsd/eal_memory.c b/lib/librte_eal/freebsd/eal_memory.c index 2eb70c2fe..72a30f21a 100644 --- a/lib/librte_eal/freebsd/eal_memory.c +++ b/lib/librte_eal/freebsd/eal_memory.c @@ -315,14 +315,6 @@ get_mem_amount(uint64_t page_sz, uint64_t max_mem) return RTE_ALIGN(area_sz, page_sz); } -static int -memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, - int n_segs, int socket_id, int type_msl_idx) -{ - return eal_memseg_list_init( - msl, page_sz, n_segs, socket_id, type_msl_idx, false); -} - static int memseg_list_alloc(struct rte_memseg_list *msl) { @@ -419,8 +411,8 @@ memseg_primary_init(void) cur_max_mem); n_segs = cur_mem / hugepage_sz; - if (memseg_list_init(msl, hugepage_sz, n_segs, - 0, type_msl_idx)) + if (eal_memseg_list_init(msl, hugepage_sz, n_segs, + 0, type_msl_idx, false)) return -1; total_segs += msl->memseg_arr.len; diff --git a/lib/librte_eal/linux/Makefile b/lib/librte_eal/linux/Makefile index 8febf2212..07ce643ba 100644 --- a/lib/librte_eal/linux/Makefile +++ b/lib/librte_eal/linux/Makefile @@ -50,6 +50,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_timer.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_memzone.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_log.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_launch.c +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_dynmem.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_mcfg.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_memalloc.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_common_memory.c diff --git a/lib/librte_eal/linux/eal_memory.c b/lib/librte_eal/linux/eal_memory.c index 9cc39e6fb..5986dab23 100644 --- a/lib/librte_eal/linux/eal_memory.c +++ b/lib/librte_eal/linux/eal_memory.c @@ -812,20 +812,6 @@ memseg_list_free(struct rte_memseg_list *msl) return 0; } -static int -memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, - int n_segs, int socket_id, int type_msl_idx) -{ - return eal_memseg_list_init( - msl, page_sz, n_segs, socket_id, type_msl_idx, true); -} - -static int -memseg_list_alloc(struct rte_memseg_list *msl) -{ - return eal_memseg_list_alloc(msl, 0); -} - /* * Our VA space is not preallocated yet, so preallocate it here. We need to know * how many segments there are in order to map all pages into one address space, @@ -969,12 +955,12 @@ prealloc_segments(struct hugepage_file *hugepages, int n_pages) } /* now, allocate fbarray itself */ - if (memseg_list_init(msl, page_sz, n_segs, socket, - msl_idx) < 0) + if (eal_memseg_list_init(msl, page_sz, n_segs, + socket, msl_idx, true) < 0) return -1; /* finally, allocate VA space */ - if (memseg_list_alloc(msl) < 0) { + if (eal_memseg_list_alloc(msl, 0) < 0) { RTE_LOG(ERR, EAL, "Cannot preallocate 0x%"PRIx64"kB hugepages\n", page_sz >> 10); return -1; @@ -1048,182 +1034,6 @@ remap_needed_hugepages(struct hugepage_file *hugepages, int n_pages) return 0; } -__rte_unused /* function is unused on 32-bit builds */ -static inline uint64_t -get_socket_mem_size(int socket) -{ - uint64_t size = 0; - unsigned i; - - for (i = 0; i < internal_config.num_hugepage_sizes; i++){ - struct hugepage_info *hpi = &internal_config.hugepage_info[i]; - size += hpi->hugepage_sz * hpi->num_pages[socket]; - } - - return size; -} - -/* - * This function is a NUMA-aware equivalent of calc_num_pages. - * It takes in the list of hugepage sizes and the - * number of pages thereof, and calculates the best number of - * pages of each size to fulfill the request for ram - */ -static int -calc_num_pages_per_socket(uint64_t * memory, - struct hugepage_info *hp_info, - struct hugepage_info *hp_used, - unsigned num_hp_info) -{ - unsigned socket, j, i = 0; - unsigned requested, available; - int total_num_pages = 0; - uint64_t remaining_mem, cur_mem; - uint64_t total_mem = internal_config.memory; - - if (num_hp_info == 0) - return -1; - - /* if specific memory amounts per socket weren't requested */ - if (internal_config.force_sockets == 0) { - size_t total_size; -#ifdef RTE_ARCH_64 - int cpu_per_socket[RTE_MAX_NUMA_NODES]; - size_t default_size; - unsigned lcore_id; - - /* Compute number of cores per socket */ - memset(cpu_per_socket, 0, sizeof(cpu_per_socket)); - RTE_LCORE_FOREACH(lcore_id) { - cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++; - } - - /* - * Automatically spread requested memory amongst detected sockets according - * to number of cores from cpu mask present on each socket - */ - total_size = internal_config.memory; - for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; socket++) { - - /* Set memory amount per socket */ - default_size = (internal_config.memory * cpu_per_socket[socket]) - / rte_lcore_count(); - - /* Limit to maximum available memory on socket */ - default_size = RTE_MIN(default_size, get_socket_mem_size(socket)); - - /* Update sizes */ - memory[socket] = default_size; - total_size -= default_size; - } - - /* - * If some memory is remaining, try to allocate it by getting all - * available memory from sockets, one after the other - */ - for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; socket++) { - /* take whatever is available */ - default_size = RTE_MIN(get_socket_mem_size(socket) - memory[socket], - total_size); - - /* Update sizes */ - memory[socket] += default_size; - total_size -= default_size; - } -#else - /* in 32-bit mode, allocate all of the memory only on master - * lcore socket - */ - total_size = internal_config.memory; - for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; - socket++) { - struct rte_config *cfg = rte_eal_get_configuration(); - unsigned int master_lcore_socket; - - master_lcore_socket = - rte_lcore_to_socket_id(cfg->master_lcore); - - if (master_lcore_socket != socket) - continue; - - /* Update sizes */ - memory[socket] = total_size; - break; - } -#endif - } - - for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; socket++) { - /* skips if the memory on specific socket wasn't requested */ - for (i = 0; i < num_hp_info && memory[socket] != 0; i++){ - strlcpy(hp_used[i].hugedir, hp_info[i].hugedir, - sizeof(hp_used[i].hugedir)); - hp_used[i].num_pages[socket] = RTE_MIN( - memory[socket] / hp_info[i].hugepage_sz, - hp_info[i].num_pages[socket]); - - cur_mem = hp_used[i].num_pages[socket] * - hp_used[i].hugepage_sz; - - memory[socket] -= cur_mem; - total_mem -= cur_mem; - - total_num_pages += hp_used[i].num_pages[socket]; - - /* check if we have met all memory requests */ - if (memory[socket] == 0) - break; - - /* check if we have any more pages left at this size, if so - * move on to next size */ - if (hp_used[i].num_pages[socket] == hp_info[i].num_pages[socket]) - continue; - /* At this point we know that there are more pages available that are - * bigger than the memory we want, so lets see if we can get enough - * from other page sizes. - */ - remaining_mem = 0; - for (j = i+1; j < num_hp_info; j++) - remaining_mem += hp_info[j].hugepage_sz * - hp_info[j].num_pages[socket]; - - /* is there enough other memory, if not allocate another page and quit */ - if (remaining_mem < memory[socket]){ - cur_mem = RTE_MIN(memory[socket], - hp_info[i].hugepage_sz); - memory[socket] -= cur_mem; - total_mem -= cur_mem; - hp_used[i].num_pages[socket]++; - total_num_pages++; - break; /* we are done with this socket*/ - } - } - /* if we didn't satisfy all memory requirements per socket */ - if (memory[socket] > 0 && - internal_config.socket_mem[socket] != 0) { - /* to prevent icc errors */ - requested = (unsigned) (internal_config.socket_mem[socket] / - 0x100000); - available = requested - - ((unsigned) (memory[socket] / 0x100000)); - RTE_LOG(ERR, EAL, "Not enough memory available on socket %u! " - "Requested: %uMB, available: %uMB\n", socket, - requested, available); - return -1; - } - } - - /* if we didn't satisfy total memory requirements */ - if (total_mem > 0) { - requested = (unsigned) (internal_config.memory / 0x100000); - available = requested - (unsigned) (total_mem / 0x100000); - RTE_LOG(ERR, EAL, "Not enough memory available! Requested: %uMB," - " available: %uMB\n", requested, available); - return -1; - } - return total_num_pages; -} - static inline size_t eal_get_hugepage_mem_size(void) { @@ -1529,7 +1339,7 @@ eal_legacy_hugepage_init(void) memory[i] = internal_config.socket_mem[i]; /* calculate final number of pages */ - nr_hugepages = calc_num_pages_per_socket(memory, + nr_hugepages = eal_dynmem_calc_num_pages_per_socket(memory, internal_config.hugepage_info, used_hp, internal_config.num_hugepage_sizes); @@ -1656,140 +1466,6 @@ eal_legacy_hugepage_init(void) return -1; } -static int __rte_unused -hugepage_count_walk(const struct rte_memseg_list *msl, void *arg) -{ - struct hugepage_info *hpi = arg; - - if (msl->page_sz != hpi->hugepage_sz) - return 0; - - hpi->num_pages[msl->socket_id] += msl->memseg_arr.len; - return 0; -} - -static int -limits_callback(int socket_id, size_t cur_limit, size_t new_len) -{ - RTE_SET_USED(socket_id); - RTE_SET_USED(cur_limit); - RTE_SET_USED(new_len); - return -1; -} - -static int -eal_hugepage_init(void) -{ - struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES]; - uint64_t memory[RTE_MAX_NUMA_NODES]; - int hp_sz_idx, socket_id; - - memset(used_hp, 0, sizeof(used_hp)); - - for (hp_sz_idx = 0; - hp_sz_idx < (int) internal_config.num_hugepage_sizes; - hp_sz_idx++) { -#ifndef RTE_ARCH_64 - struct hugepage_info dummy; - unsigned int i; -#endif - /* also initialize used_hp hugepage sizes in used_hp */ - struct hugepage_info *hpi; - hpi = &internal_config.hugepage_info[hp_sz_idx]; - used_hp[hp_sz_idx].hugepage_sz = hpi->hugepage_sz; - -#ifndef RTE_ARCH_64 - /* for 32-bit, limit number of pages on socket to whatever we've - * preallocated, as we cannot allocate more. - */ - memset(&dummy, 0, sizeof(dummy)); - dummy.hugepage_sz = hpi->hugepage_sz; - if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0) - return -1; - - for (i = 0; i < RTE_DIM(dummy.num_pages); i++) { - hpi->num_pages[i] = RTE_MIN(hpi->num_pages[i], - dummy.num_pages[i]); - } -#endif - } - - /* make a copy of socket_mem, needed for balanced allocation. */ - for (hp_sz_idx = 0; hp_sz_idx < RTE_MAX_NUMA_NODES; hp_sz_idx++) - memory[hp_sz_idx] = internal_config.socket_mem[hp_sz_idx]; - - /* calculate final number of pages */ - if (calc_num_pages_per_socket(memory, - internal_config.hugepage_info, used_hp, - internal_config.num_hugepage_sizes) < 0) - return -1; - - for (hp_sz_idx = 0; - hp_sz_idx < (int)internal_config.num_hugepage_sizes; - hp_sz_idx++) { - for (socket_id = 0; socket_id < RTE_MAX_NUMA_NODES; - socket_id++) { - struct rte_memseg **pages; - struct hugepage_info *hpi = &used_hp[hp_sz_idx]; - unsigned int num_pages = hpi->num_pages[socket_id]; - unsigned int num_pages_alloc; - - if (num_pages == 0) - continue; - - RTE_LOG(DEBUG, EAL, "Allocating %u pages of size %" PRIu64 "M on socket %i\n", - num_pages, hpi->hugepage_sz >> 20, socket_id); - - /* we may not be able to allocate all pages in one go, - * because we break up our memory map into multiple - * memseg lists. therefore, try allocating multiple - * times and see if we can get the desired number of - * pages from multiple allocations. - */ - - num_pages_alloc = 0; - do { - int i, cur_pages, needed; - - needed = num_pages - num_pages_alloc; - - pages = malloc(sizeof(*pages) * needed); - - /* do not request exact number of pages */ - cur_pages = eal_memalloc_alloc_seg_bulk(pages, - needed, hpi->hugepage_sz, - socket_id, false); - if (cur_pages <= 0) { - free(pages); - return -1; - } - - /* mark preallocated pages as unfreeable */ - for (i = 0; i < cur_pages; i++) { - struct rte_memseg *ms = pages[i]; - ms->flags |= RTE_MEMSEG_FLAG_DO_NOT_FREE; - } - free(pages); - - num_pages_alloc += cur_pages; - } while (num_pages_alloc != num_pages); - } - } - /* if socket limits were specified, set them */ - if (internal_config.force_socket_limits) { - unsigned int i; - for (i = 0; i < RTE_MAX_NUMA_NODES; i++) { - uint64_t limit = internal_config.socket_limit[i]; - if (limit == 0) - continue; - if (rte_mem_alloc_validator_register("socket-limit", - limits_callback, i, limit)) - RTE_LOG(ERR, EAL, "Failed to register socket limits validator callback\n"); - } - } - return 0; -} - /* * uses fstat to report the size of a file on disk */ @@ -1948,7 +1624,7 @@ rte_eal_hugepage_init(void) { return internal_config.legacy_mem ? eal_legacy_hugepage_init() : - eal_hugepage_init(); + eal_dynmem_hugepage_init(); } int @@ -2127,8 +1803,9 @@ memseg_primary_init_32(void) max_pagesz_mem); n_segs = cur_mem / hugepage_sz; - if (memseg_list_init(msl, hugepage_sz, n_segs, - socket_id, type_msl_idx)) { + if (eal_memseg_list_init(msl, hugepage_sz, + n_segs, socket_id, type_msl_idx, + true)) { /* failing to allocate a memseg list is * a serious error. */ @@ -2136,7 +1813,7 @@ memseg_primary_init_32(void) return -1; } - if (memseg_list_alloc(msl)) { + if (eal_memseg_list_alloc(msl, 0)) { /* if we couldn't allocate VA space, we * can try with smaller page sizes. */ @@ -2167,185 +1844,7 @@ memseg_primary_init_32(void) static int __rte_unused memseg_primary_init(void) { - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - struct memtype { - uint64_t page_sz; - int socket_id; - } *memtypes = NULL; - int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ - struct rte_memseg_list *msl; - uint64_t max_mem, max_mem_per_type; - unsigned int max_seglists_per_type; - unsigned int n_memtypes, cur_type; - - /* no-huge does not need this at all */ - if (internal_config.no_hugetlbfs) - return 0; - - /* - * figuring out amount of memory we're going to have is a long and very - * involved process. the basic element we're operating with is a memory - * type, defined as a combination of NUMA node ID and page size (so that - * e.g. 2 sockets with 2 page sizes yield 4 memory types in total). - * - * deciding amount of memory going towards each memory type is a - * balancing act between maximum segments per type, maximum memory per - * type, and number of detected NUMA nodes. the goal is to make sure - * each memory type gets at least one memseg list. - * - * the total amount of memory is limited by RTE_MAX_MEM_MB value. - * - * the total amount of memory per type is limited by either - * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB divided by the number - * of detected NUMA nodes. additionally, maximum number of segments per - * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for - * smaller page sizes, it can take hundreds of thousands of segments to - * reach the above specified per-type memory limits. - * - * additionally, each type may have multiple memseg lists associated - * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for bigger - * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller ones. - * - * the number of memseg lists per type is decided based on the above - * limits, and also taking number of detected NUMA nodes, to make sure - * that we don't run out of memseg lists before we populate all NUMA - * nodes with memory. - * - * we do this in three stages. first, we collect the number of types. - * then, we figure out memory constraints and populate the list of - * would-be memseg lists. then, we go ahead and allocate the memseg - * lists. - */ - - /* create space for mem types */ - n_memtypes = internal_config.num_hugepage_sizes * rte_socket_count(); - memtypes = calloc(n_memtypes, sizeof(*memtypes)); - if (memtypes == NULL) { - RTE_LOG(ERR, EAL, "Cannot allocate space for memory types\n"); - return -1; - } - - /* populate mem types */ - cur_type = 0; - for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes; - hpi_idx++) { - struct hugepage_info *hpi; - uint64_t hugepage_sz; - - hpi = &internal_config.hugepage_info[hpi_idx]; - hugepage_sz = hpi->hugepage_sz; - - for (i = 0; i < (int) rte_socket_count(); i++, cur_type++) { - int socket_id = rte_socket_id_by_idx(i); - -#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES - /* we can still sort pages by socket in legacy mode */ - if (!internal_config.legacy_mem && socket_id > 0) - break; -#endif - memtypes[cur_type].page_sz = hugepage_sz; - memtypes[cur_type].socket_id = socket_id; - - RTE_LOG(DEBUG, EAL, "Detected memory type: " - "socket_id:%u hugepage_sz:%" PRIu64 "\n", - socket_id, hugepage_sz); - } - } - /* number of memtypes could have been lower due to no NUMA support */ - n_memtypes = cur_type; - - /* set up limits for types */ - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, - max_mem / n_memtypes); - /* - * limit maximum number of segment lists per type to ensure there's - * space for memseg lists for all NUMA nodes with all page sizes - */ - max_seglists_per_type = RTE_MAX_MEMSEG_LISTS / n_memtypes; - - if (max_seglists_per_type == 0) { - RTE_LOG(ERR, EAL, "Cannot accommodate all memory types, please increase %s\n", - RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); - goto out; - } - - /* go through all mem types and create segment lists */ - msl_idx = 0; - for (cur_type = 0; cur_type < n_memtypes; cur_type++) { - unsigned int cur_seglist, n_seglists, n_segs; - unsigned int max_segs_per_type, max_segs_per_list; - struct memtype *type = &memtypes[cur_type]; - uint64_t max_mem_per_list, pagesz; - int socket_id; - - pagesz = type->page_sz; - socket_id = type->socket_id; - - /* - * we need to create segment lists for this type. we must take - * into account the following things: - * - * 1. total amount of memory we can use for this memory type - * 2. total amount of memory per memseg list allowed - * 3. number of segments needed to fit the amount of memory - * 4. number of segments allowed per type - * 5. number of segments allowed per memseg list - * 6. number of memseg lists we are allowed to take up - */ - - /* calculate how much segments we will need in total */ - max_segs_per_type = max_mem_per_type / pagesz; - /* limit number of segments to maximum allowed per type */ - max_segs_per_type = RTE_MIN(max_segs_per_type, - (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); - /* limit number of segments to maximum allowed per list */ - max_segs_per_list = RTE_MIN(max_segs_per_type, - (unsigned int)RTE_MAX_MEMSEG_PER_LIST); - - /* calculate how much memory we can have per segment list */ - max_mem_per_list = RTE_MIN(max_segs_per_list * pagesz, - (uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20); - - /* calculate how many segments each segment list will have */ - n_segs = RTE_MIN(max_segs_per_list, max_mem_per_list / pagesz); - - /* calculate how many segment lists we can have */ - n_seglists = RTE_MIN(max_segs_per_type / n_segs, - max_mem_per_type / max_mem_per_list); - - /* limit number of segment lists according to our maximum */ - n_seglists = RTE_MIN(n_seglists, max_seglists_per_type); - - RTE_LOG(DEBUG, EAL, "Creating %i segment lists: " - "n_segs:%i socket_id:%i hugepage_sz:%" PRIu64 "\n", - n_seglists, n_segs, socket_id, pagesz); - - /* create all segment lists */ - for (cur_seglist = 0; cur_seglist < n_seglists; cur_seglist++) { - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - RTE_LOG(ERR, EAL, - "No more space in memseg lists, please increase %s\n", - RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); - goto out; - } - msl = &mcfg->memsegs[msl_idx++]; - - if (memseg_list_init(msl, pagesz, n_segs, - socket_id, cur_seglist)) - goto out; - - if (memseg_list_alloc(msl)) { - RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); - goto out; - } - } - } - /* we're successful */ - ret = 0; -out: - free(memtypes); - return ret; + return eal_dynmem_memseg_lists_init(); } static int @@ -2369,7 +1868,7 @@ memseg_secondary_init(void) } /* preallocate VA space */ - if (memseg_list_alloc(msl)) { + if (eal_memseg_list_alloc(msl, 0)) { RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); return -1; } From patchwork Mon Jun 15 00:43:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71524 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 47BA1A0093; Mon, 15 Jun 2020 02:44:55 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 406F61B75C; Mon, 15 Jun 2020 02:44:23 +0200 (CEST) Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by dpdk.org (Postfix) with ESMTP id B2A994C8B for ; Mon, 15 Jun 2020 02:44:09 +0200 (CEST) Received: by mail-lj1-f195.google.com with SMTP id y11so17103461ljm.9 for ; Sun, 14 Jun 2020 17:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dVcxVkYcX4IdEsRf5bHPXaga+vGMrVzMkh75QGfZRIc=; b=DW/G9p3NAGQx0+5PEh8TWaN4crXHzK2P6uoVLb5YZuuC4zVbhrGMm1gu4wF+rssn2a Ono5h9GjzL+lV1cyFc9oIwnr9NgKM4TlWVvdjvL3pP0ut+khQzFQrwiJyd1mRGRt8yUt LpGbjkLRXNFtMKlHBGVRPb61cWDDj9jX0JGPZLRbDpvVv5UCB+tIDbP4p69/EfB9hGVn YhgQh1Hwp2zDXVC2NkFaFBOd6ugDl/UscIPLb1/RFGa4XC/aBNafqMwChgjSFcugb1bG LNiPt1YJAvwC0z3Be/A904ovKGwFXO89pnHsirDisLtaQS94xaG7HJy8YgNRS6rX64AH QHaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dVcxVkYcX4IdEsRf5bHPXaga+vGMrVzMkh75QGfZRIc=; b=qLVJtZn2zfxe3mVj19D8iBn5R55ebxXb+Fndcam/7GjV9BukGDjF3Ilao+0FCXH3xS YQurHirWBgNKYU4d9UCnxHxZcNqEJBdb8/AUbhcpzoxMdzm0bMHl1Dg1S/WmJF0ON76M sA9/aHqDeSVbfFq5eGCk4a5LMCvwxRk/x99Z/wj+JfTTcGPWG4yMNCc8tFG2Py6OVWGW 8s85OQoWw2kvHs5CRtC4MgjQEzc5vPYG9kKMk64v+NDloWD2XIBizRr5AybjGjlsNz/0 IG6YrxKsn7wZP5larV9Z2fsbU91YOlE5lWlq9Q9WjK5U5Yqgx6zL0YlFJ1FmAeOMEG2o RcEg== X-Gm-Message-State: AOAM532XSymmKQsKEd9tx1Qs3Zfbvp5faIgkqv6CyedaTq0ZQoPFQTW/ RWN5X9KwrTox/Bwd5I02OnmThKRoxwCkog== X-Google-Smtp-Source: ABdhPJxsIUqKan5RYuGHzhCdzQmPGHo2Nv5u7z7Cjf7hJfzRNgZM5Z1d9iN+51nFMMuaUj6mP6HraA== X-Received: by 2002:a2e:98c4:: with SMTP id s4mr12529568ljj.221.1592181849221; Sun, 14 Jun 2020 17:44:09 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:08 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Jerin Jacob , Sunil Kumar Kori , Olivier Matz , Andrew Rybchenko Date: Mon, 15 Jun 2020 03:43:48 +0300 Message-Id: <20200615004354.14380-7-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 06/12] trace: add size_t field emitter X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" It is not guaranteed that sizeof(long) == sizeof(size_t). On Windows, sizeof(long) == 4 and sizeof(size_t) == 8 for 64-bit programs. Tracepoints using "long" field emitter are therefore invalid there. Add dedicated field emitter for size_t and use it to store size_t values in all existing tracepoints. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/include/rte_eal_trace.h | 8 ++++---- lib/librte_eal/include/rte_trace_point.h | 3 +++ lib/librte_mempool/rte_mempool_trace.h | 10 +++++----- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/lib/librte_eal/include/rte_eal_trace.h b/lib/librte_eal/include/rte_eal_trace.h index 1ebb2905a..bcfef0cfa 100644 --- a/lib/librte_eal/include/rte_eal_trace.h +++ b/lib/librte_eal/include/rte_eal_trace.h @@ -143,7 +143,7 @@ RTE_TRACE_POINT( RTE_TRACE_POINT_ARGS(const char *type, size_t size, unsigned int align, int socket, void *ptr), rte_trace_point_emit_string(type); - rte_trace_point_emit_long(size); + rte_trace_point_emit_size_t(size); rte_trace_point_emit_u32(align); rte_trace_point_emit_int(socket); rte_trace_point_emit_ptr(ptr); @@ -154,7 +154,7 @@ RTE_TRACE_POINT( RTE_TRACE_POINT_ARGS(const char *type, size_t size, unsigned int align, int socket, void *ptr), rte_trace_point_emit_string(type); - rte_trace_point_emit_long(size); + rte_trace_point_emit_size_t(size); rte_trace_point_emit_u32(align); rte_trace_point_emit_int(socket); rte_trace_point_emit_ptr(ptr); @@ -164,7 +164,7 @@ RTE_TRACE_POINT( rte_eal_trace_mem_realloc, RTE_TRACE_POINT_ARGS(size_t size, unsigned int align, int socket, void *ptr), - rte_trace_point_emit_long(size); + rte_trace_point_emit_size_t(size); rte_trace_point_emit_u32(align); rte_trace_point_emit_int(socket); rte_trace_point_emit_ptr(ptr); @@ -183,7 +183,7 @@ RTE_TRACE_POINT( unsigned int flags, unsigned int align, unsigned int bound, const void *mz), rte_trace_point_emit_string(name); - rte_trace_point_emit_long(len); + rte_trace_point_emit_size_t(len); rte_trace_point_emit_int(socket_id); rte_trace_point_emit_u32(flags); rte_trace_point_emit_u32(align); diff --git a/lib/librte_eal/include/rte_trace_point.h b/lib/librte_eal/include/rte_trace_point.h index b45171275..377c2414a 100644 --- a/lib/librte_eal/include/rte_trace_point.h +++ b/lib/librte_eal/include/rte_trace_point.h @@ -138,6 +138,8 @@ _tp _args \ #define rte_trace_point_emit_int(val) /** Tracepoint function payload for long datatype */ #define rte_trace_point_emit_long(val) +/** Tracepoint function payload for size_t datatype */ +#define rte_trace_point_emit_size_t(val) /** Tracepoint function payload for float datatype */ #define rte_trace_point_emit_float(val) /** Tracepoint function payload for double datatype */ @@ -395,6 +397,7 @@ do { \ #define rte_trace_point_emit_i8(in) __rte_trace_point_emit(in, int8_t) #define rte_trace_point_emit_int(in) __rte_trace_point_emit(in, int32_t) #define rte_trace_point_emit_long(in) __rte_trace_point_emit(in, long) +#define rte_trace_point_emit_size_t(in) __rte_trace_point_emit(in, size_t) #define rte_trace_point_emit_float(in) __rte_trace_point_emit(in, float) #define rte_trace_point_emit_double(in) __rte_trace_point_emit(in, double) #define rte_trace_point_emit_ptr(in) __rte_trace_point_emit(in, uintptr_t) diff --git a/lib/librte_mempool/rte_mempool_trace.h b/lib/librte_mempool/rte_mempool_trace.h index e776df0a6..087c913c8 100644 --- a/lib/librte_mempool/rte_mempool_trace.h +++ b/lib/librte_mempool/rte_mempool_trace.h @@ -72,7 +72,7 @@ RTE_TRACE_POINT( rte_trace_point_emit_string(mempool->name); rte_trace_point_emit_ptr(vaddr); rte_trace_point_emit_u64(iova); - rte_trace_point_emit_long(len); + rte_trace_point_emit_size_t(len); rte_trace_point_emit_ptr(free_cb); rte_trace_point_emit_ptr(opaque); ) @@ -84,8 +84,8 @@ RTE_TRACE_POINT( rte_trace_point_emit_ptr(mempool); rte_trace_point_emit_string(mempool->name); rte_trace_point_emit_ptr(addr); - rte_trace_point_emit_long(len); - rte_trace_point_emit_long(pg_sz); + rte_trace_point_emit_size_t(len); + rte_trace_point_emit_size_t(pg_sz); rte_trace_point_emit_ptr(free_cb); rte_trace_point_emit_ptr(opaque); ) @@ -126,7 +126,7 @@ RTE_TRACE_POINT( RTE_TRACE_POINT_ARGS(struct rte_mempool *mempool, size_t pg_sz), rte_trace_point_emit_ptr(mempool); rte_trace_point_emit_string(mempool->name); - rte_trace_point_emit_long(pg_sz); + rte_trace_point_emit_size_t(pg_sz); ) RTE_TRACE_POINT( @@ -139,7 +139,7 @@ RTE_TRACE_POINT( rte_trace_point_emit_u32(max_objs); rte_trace_point_emit_ptr(vaddr); rte_trace_point_emit_u64(iova); - rte_trace_point_emit_long(len); + rte_trace_point_emit_size_t(len); rte_trace_point_emit_ptr(obj_cb); rte_trace_point_emit_ptr(obj_cb_arg); ) From patchwork Mon Jun 15 00:43:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71526 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E1512A0093; Mon, 15 Jun 2020 02:45:15 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A0CFD1BDF8; Mon, 15 Jun 2020 02:44:25 +0200 (CEST) Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by dpdk.org (Postfix) with ESMTP id BFCA14C9D for ; Mon, 15 Jun 2020 02:44:10 +0200 (CEST) Received: by mail-lf1-f66.google.com with SMTP id g2so870954lfb.0 for ; Sun, 14 Jun 2020 17:44:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LHlAb1z42AG9LyDm6x4AYa/r3zc1UWVG/v4HM8tjcwQ=; b=PeoE0SatG2KRJAYxDqYj+b6zMOKi3XR7hsNrnF3/xW5KiAVh7weEje+ApcMGNMj1SL 3YMiV8HQhO7/NNDukS/Sldxn3GQ8DP/kkl9OPCzZySPZRrGZMXXZXCT/6+L3T5CEkZGS 7AHwIFE7NgfEeDW+BAjMDaHFMuhqS3kPKO2yz7OGpe/aLb0FfXWmtu7YP+nfkyTheFP0 ReqSBnDs0NCVpp62EfYvlkIjV7bJwCNqP7MrpCTq34eppdRefmSH7XTEOTAt7dzeBCBS hwfT5obkAgdtqJRGV1l+ahFyO8sy8euP6L7YDlP9v9Akls81x0yXqg0oxuYC7if0YQTB y6XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LHlAb1z42AG9LyDm6x4AYa/r3zc1UWVG/v4HM8tjcwQ=; b=a1pVnePZko2WyGPK56WUF+wdSjWZgklZ+itnhfAjtFC0QWDhIUAHX18wjlrYHUsvMZ Wn6cH4g84QiaFu/7XPPv52V9/ds9HDIGeOt/Q8++mWgOgVTNl3MkSsoc0q2u5rSDiUM7 Dg0Tm8nEDntOS+UmnI9m79EYzuSxo8hrRLioMUYBNavq2PuocEqM8gzWta2p5RAHoaQf jdP0VHkzkZwdtt6lH5ESyxf0CSOQzpd5A12hpRyYTHZxnjZ4QKqsEHQzv3CRgJGEyMxu KzVGnNs+G4hGc77NAUaR97vUF0jh8SKdQVLmoywD66qIDHB/E94L8OUQYvxT+QR9EsDw lgJQ== X-Gm-Message-State: AOAM532wF/O8YlRWtGMxmz0Egg8yiA5Qj//akg+kQVTdtHbDaN3YkzD9 6fzM/rKQPMR3XmSPx+5PrZNlp0sXhzaELw== X-Google-Smtp-Source: ABdhPJwIj6k8RjxDyGuJEwAZ/waqwZBLQdE3d4eHl2Be4tV63E1T5vte10fBDJcivR7EwJrSIVSRcA== X-Received: by 2002:ac2:4422:: with SMTP id w2mr3334912lfl.152.1592181850237; Sun, 14 Jun 2020 17:44:10 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:09 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon Date: Mon, 15 Jun 2020 03:43:49 +0300 Message-Id: <20200615004354.14380-8-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 07/12] eal/windows: add tracing support stubs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" EAL common code depends on tracepoint calls, but generic implementation cannot be enabled on Windows due to missing standard library facilities. Add stub functions to support tracepoint compilation, so that common code does not have to conditionally include tracepoints until proper support is added. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/common/eal_common_thread.c | 5 +--- lib/librte_eal/common/meson.build | 1 + lib/librte_eal/windows/eal.c | 34 ++++++++++++++++++++++- 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c index f9f588c17..370bb1b63 100644 --- a/lib/librte_eal/common/eal_common_thread.c +++ b/lib/librte_eal/common/eal_common_thread.c @@ -15,9 +15,7 @@ #include #include #include -#ifndef RTE_EXEC_ENV_WINDOWS #include -#endif #include "eal_internal_cfg.h" #include "eal_private.h" @@ -169,9 +167,8 @@ static void *rte_thread_init(void *arg) free(params); } -#ifndef RTE_EXEC_ENV_WINDOWS __rte_trace_mem_per_thread_alloc(); -#endif + return start_routine(routine_arg); } diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build index d91c22220..4e9208129 100644 --- a/lib/librte_eal/common/meson.build +++ b/lib/librte_eal/common/meson.build @@ -14,6 +14,7 @@ if is_windows 'eal_common_log.c', 'eal_common_options.c', 'eal_common_thread.c', + 'eal_common_trace_points.c', ) subdir_done() endif diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index d084606a6..e7461f731 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "eal_windows.h" @@ -221,7 +222,38 @@ rte_eal_init_alert(const char *msg) RTE_LOG(ERR, EAL, "%s\n", msg); } - /* Launch threads, called at application init(). */ +/* Stubs to enable EAL trace point compilation + * until eal_common_trace.c can be compiled. + */ + +RTE_DEFINE_PER_LCORE(volatile int, trace_point_sz); +RTE_DEFINE_PER_LCORE(void *, trace_mem); + +void +__rte_trace_mem_per_thread_alloc(void) +{ +} + +void +__rte_trace_point_emit_field(size_t sz, const char *field, + const char *type) +{ + RTE_SET_USED(sz); + RTE_SET_USED(field); + RTE_SET_USED(type); +} + +int +__rte_trace_point_register(rte_trace_point_t *trace, const char *name, + void (*register_fn)(void)) +{ + RTE_SET_USED(trace); + RTE_SET_USED(name); + RTE_SET_USED(register_fn); + return -ENOTSUP; +} + +/* Launch threads, called at application init(). */ int rte_eal_init(int argc, char **argv) { From patchwork Mon Jun 15 00:43:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71527 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E80D0A0093; Mon, 15 Jun 2020 02:45:31 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 40FCE14583; Mon, 15 Jun 2020 02:44:34 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id 2C9B84C8B for ; Mon, 15 Jun 2020 02:44:12 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id s1so17106115ljo.0 for ; Sun, 14 Jun 2020 17:44:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WdapCNY9MaWPrRKKk1uJt6FgQa8jHQ2py3rwmwD8aaY=; b=UOi7bvYTj9SlFKdyimIVMnh0GtDQnsijI/JOSfDhrvt9LXj/vsbZvyLGiIWzfV4+HW cK83IuhRrNfhZf+Oms08Qsz2vAAKRZ3Hfm6qMdo0E39ijarvaCLmEEnxif3wDQZ+Ca1r WW36ODEHRCwmPnl1KftBbpFDpJBroYIrkKTzXqj2p7MVRiLVYEsvC4fGwLqV1JyCIYLv kAGyeOP0momaI8KsENWfmar4T1sCRcoKhXIsRuuiRRtbfbbQdPYZ1j4nBkTm1WaQlVvx N66teYdSTDubz3ithaiVdbXMSIPV1uc4cB7tynrIWshhdFLOpSdD1Og7+/q+XI/ZWTPA rudA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WdapCNY9MaWPrRKKk1uJt6FgQa8jHQ2py3rwmwD8aaY=; b=BMmEV8dfvTVfnsTIWAZA30/8FCCNgoMhjkkD/N1LS7BsbxJNN8dANlTKIH7bexBel2 QP9Xy9rgjVUJm9XfJfO5K847DpV7PNmRUMUmhhpqMSld3yE8PA/S+Wi/KlQhGqifGevC JcVGtkRn5b14sYqOhzah+O8FVGUgVWgl3CgaQzVbP4BXdzxTz3UAmxZG5NOg1mFm5b8D L/cw/xm6r3Si6A7NWrht+64GCrUlrDtXWobF8hUajH1B5raTckUrNE3zVldp+1f0LWkz Cz4NNNw8xeN9aOmY8YMbfR/XyTsfLwMY5nd9yd4bY3opEWm+PfoOzHC5atnm6YBlEOfP 3KGQ== X-Gm-Message-State: AOAM533X93Toar8kfpDxv5avNloJTuDF6iRMfNQyI49im1X1Axhtrnro W25JFqarbeSuUb2E+z/pQ/CdChN9O7sMyA== X-Google-Smtp-Source: ABdhPJxlnnjAE9pvYV/imkHpbPvuXp23odM/SdzUH39djS35SfJ/ad6kKtNVxGSSyCrYT5unhMBwXg== X-Received: by 2002:a05:651c:550:: with SMTP id q16mr9281428ljp.188.1592181851120; Sun, 14 Jun 2020 17:44:11 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:10 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon Date: Mon, 15 Jun 2020 03:43:50 +0300 Message-Id: <20200615004354.14380-9-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 08/12] eal/windows: replace sys/queue.h with a complete one from FreeBSD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Limited version imported previously lacks at least SLIST macros. Import a complete file from FreeBSD, since its license exception is already approved by Technical Board. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/windows/include/sys/queue.h | 663 +++++++++++++++++++-- 1 file changed, 601 insertions(+), 62 deletions(-) diff --git a/lib/librte_eal/windows/include/sys/queue.h b/lib/librte_eal/windows/include/sys/queue.h index a65949a78..9756bee6f 100644 --- a/lib/librte_eal/windows/include/sys/queue.h +++ b/lib/librte_eal/windows/include/sys/queue.h @@ -8,7 +8,36 @@ #define _SYS_QUEUE_H_ /* - * This file defines tail queues. + * This file defines four types of data structures: singly-linked lists, + * singly-linked tail queues, lists and tail queues. + * + * A singly-linked list is headed by a single forward pointer. The elements + * are singly linked for minimum space and pointer manipulation overhead at + * the expense of O(n) removal for arbitrary elements. New elements can be + * added to the list after an existing element or at the head of the list. + * Elements being removed from the head of the list should use the explicit + * macro for this purpose for optimum efficiency. A singly-linked list may + * only be traversed in the forward direction. Singly-linked lists are ideal + * for applications with large datasets and few or no removals or for + * implementing a LIFO queue. + * + * A singly-linked tail queue is headed by a pair of pointers, one to the + * head of the list and the other to the tail of the list. The elements are + * singly linked for minimum space and pointer manipulation overhead at the + * expense of O(n) removal for arbitrary elements. New elements can be added + * to the list after an existing element, at the head of the list, or at the + * end of the list. Elements being removed from the head of the tail queue + * should use the explicit macro for this purpose for optimum efficiency. + * A singly-linked tail queue may only be traversed in the forward direction. + * Singly-linked tail queues are ideal for applications with large datasets + * and few or no removals or for implementing a FIFO queue. + * + * A list is headed by a single forward pointer (or an array of forward + * pointers for a hash table header). The elements are doubly linked + * so that an arbitrary element can be removed without a need to + * traverse the list. New elements can be added to the list before + * or after an existing element or at the head of the list. A list + * may be traversed in either direction. * * A tail queue is headed by a pair of pointers, one to the head of the * list and the other to the tail of the list. The elements are doubly @@ -17,65 +46,93 @@ * after an existing element, at the head of the list, or at the end of * the list. A tail queue may be traversed in either direction. * + * For details on the use of these macros, see the queue(3) manual page. + * * Below is a summary of implemented functions where: * + means the macro is available * - means the macro is not available * s means the macro is available but is slow (runs in O(n) time) * - * TAILQ - * _HEAD + - * _CLASS_HEAD + - * _HEAD_INITIALIZER + - * _ENTRY + - * _CLASS_ENTRY + - * _INIT + - * _EMPTY + - * _FIRST + - * _NEXT + - * _PREV + - * _LAST + - * _LAST_FAST + - * _FOREACH + - * _FOREACH_FROM + - * _FOREACH_SAFE + - * _FOREACH_FROM_SAFE + - * _FOREACH_REVERSE + - * _FOREACH_REVERSE_FROM + - * _FOREACH_REVERSE_SAFE + - * _FOREACH_REVERSE_FROM_SAFE + - * _INSERT_HEAD + - * _INSERT_BEFORE + - * _INSERT_AFTER + - * _INSERT_TAIL + - * _CONCAT + - * _REMOVE_AFTER - - * _REMOVE_HEAD - - * _REMOVE + - * _SWAP + + * SLIST LIST STAILQ TAILQ + * _HEAD + + + + + * _CLASS_HEAD + + + + + * _HEAD_INITIALIZER + + + + + * _ENTRY + + + + + * _CLASS_ENTRY + + + + + * _INIT + + + + + * _EMPTY + + + + + * _FIRST + + + + + * _NEXT + + + + + * _PREV - + - + + * _LAST - - + + + * _LAST_FAST - - - + + * _FOREACH + + + + + * _FOREACH_FROM + + + + + * _FOREACH_SAFE + + + + + * _FOREACH_FROM_SAFE + + + + + * _FOREACH_REVERSE - - - + + * _FOREACH_REVERSE_FROM - - - + + * _FOREACH_REVERSE_SAFE - - - + + * _FOREACH_REVERSE_FROM_SAFE - - - + + * _INSERT_HEAD + + + + + * _INSERT_BEFORE - + - + + * _INSERT_AFTER + + + + + * _INSERT_TAIL - - + + + * _CONCAT s s + + + * _REMOVE_AFTER + - + - + * _REMOVE_HEAD + - + - + * _REMOVE s + s + + * _SWAP + + + + * */ - -#ifdef __cplusplus -extern "C" { +#ifdef QUEUE_MACRO_DEBUG +#warn Use QUEUE_MACRO_DEBUG_TRACE and/or QUEUE_MACRO_DEBUG_TRASH +#define QUEUE_MACRO_DEBUG_TRACE +#define QUEUE_MACRO_DEBUG_TRASH #endif -/* - * List definitions. - */ -#define LIST_HEAD(name, type) \ -struct name { \ - struct type *lh_first; /* first element */ \ -} +#ifdef QUEUE_MACRO_DEBUG_TRACE +/* Store the last 2 places the queue element or head was altered */ +struct qm_trace { + unsigned long lastline; + unsigned long prevline; + const char *lastfile; + const char *prevfile; +}; + +#define TRACEBUF struct qm_trace trace; +#define TRACEBUF_INITIALIZER { __LINE__, 0, __FILE__, NULL } , + +#define QMD_TRACE_HEAD(head) do { \ + (head)->trace.prevline = (head)->trace.lastline; \ + (head)->trace.prevfile = (head)->trace.lastfile; \ + (head)->trace.lastline = __LINE__; \ + (head)->trace.lastfile = __FILE__; \ +} while (0) +#define QMD_TRACE_ELEM(elem) do { \ + (elem)->trace.prevline = (elem)->trace.lastline; \ + (elem)->trace.prevfile = (elem)->trace.lastfile; \ + (elem)->trace.lastline = __LINE__; \ + (elem)->trace.lastfile = __FILE__; \ +} while (0) + +#else /* !QUEUE_MACRO_DEBUG_TRACE */ #define QMD_TRACE_ELEM(elem) #define QMD_TRACE_HEAD(head) #define TRACEBUF #define TRACEBUF_INITIALIZER +#endif /* QUEUE_MACRO_DEBUG_TRACE */ +#ifdef QUEUE_MACRO_DEBUG_TRASH +#define QMD_SAVELINK(name, link) void **name = (void *)&(link) +#define TRASHIT(x) do {(x) = (void *)-1;} while (0) +#define QMD_IS_TRASHED(x) ((x) == (void *)(intptr_t)-1) +#else /* !QUEUE_MACRO_DEBUG_TRASH */ +#define QMD_SAVELINK(name, link) #define TRASHIT(x) #define QMD_IS_TRASHED(x) 0 - -#define QMD_SAVELINK(name, link) +#endif /* QUEUE_MACRO_DEBUG_TRASH */ #ifdef __cplusplus /* @@ -86,6 +143,445 @@ struct name { \ #define QUEUE_TYPEOF(type) struct type #endif +/* + * Singly-linked List declarations. + */ +#define SLIST_HEAD(name, type) \ +struct name { \ + struct type *slh_first; /* first element */ \ +} + +#define SLIST_CLASS_HEAD(name, type) \ +struct name { \ + class type *slh_first; /* first element */ \ +} + +#define SLIST_HEAD_INITIALIZER(head) \ + { NULL } + +#define SLIST_ENTRY(type) \ +struct { \ + struct type *sle_next; /* next element */ \ +} + +#define SLIST_CLASS_ENTRY(type) \ +struct { \ + class type *sle_next; /* next element */ \ +} + +/* + * Singly-linked List functions. + */ +#if (defined(_KERNEL) && defined(INVARIANTS)) +#define QMD_SLIST_CHECK_PREVPTR(prevp, elm) do { \ + if (*(prevp) != (elm)) \ + panic("Bad prevptr *(%p) == %p != %p", \ + (prevp), *(prevp), (elm)); \ +} while (0) +#else +#define QMD_SLIST_CHECK_PREVPTR(prevp, elm) +#endif + +#define SLIST_CONCAT(head1, head2, type, field) do { \ + QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head1); \ + if (curelm == NULL) { \ + if ((SLIST_FIRST(head1) = SLIST_FIRST(head2)) != NULL) \ + SLIST_INIT(head2); \ + } else if (SLIST_FIRST(head2) != NULL) { \ + while (SLIST_NEXT(curelm, field) != NULL) \ + curelm = SLIST_NEXT(curelm, field); \ + SLIST_NEXT(curelm, field) = SLIST_FIRST(head2); \ + SLIST_INIT(head2); \ + } \ +} while (0) + +#define SLIST_EMPTY(head) ((head)->slh_first == NULL) + +#define SLIST_FIRST(head) ((head)->slh_first) + +#define SLIST_FOREACH(var, head, field) \ + for ((var) = SLIST_FIRST((head)); \ + (var); \ + (var) = SLIST_NEXT((var), field)) + +#define SLIST_FOREACH_FROM(var, head, field) \ + for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ + (var); \ + (var) = SLIST_NEXT((var), field)) + +#define SLIST_FOREACH_SAFE(var, head, field, tvar) \ + for ((var) = SLIST_FIRST((head)); \ + (var) && ((tvar) = SLIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define SLIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ + for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ + (var) && ((tvar) = SLIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define SLIST_FOREACH_PREVPTR(var, varp, head, field) \ + for ((varp) = &SLIST_FIRST((head)); \ + ((var) = *(varp)) != NULL; \ + (varp) = &SLIST_NEXT((var), field)) + +#define SLIST_INIT(head) do { \ + SLIST_FIRST((head)) = NULL; \ +} while (0) + +#define SLIST_INSERT_AFTER(slistelm, elm, field) do { \ + SLIST_NEXT((elm), field) = SLIST_NEXT((slistelm), field); \ + SLIST_NEXT((slistelm), field) = (elm); \ +} while (0) + +#define SLIST_INSERT_HEAD(head, elm, field) do { \ + SLIST_NEXT((elm), field) = SLIST_FIRST((head)); \ + SLIST_FIRST((head)) = (elm); \ +} while (0) + +#define SLIST_NEXT(elm, field) ((elm)->field.sle_next) + +#define SLIST_REMOVE(head, elm, type, field) do { \ + QMD_SAVELINK(oldnext, (elm)->field.sle_next); \ + if (SLIST_FIRST((head)) == (elm)) { \ + SLIST_REMOVE_HEAD((head), field); \ + } \ + else { \ + QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head); \ + while (SLIST_NEXT(curelm, field) != (elm)) \ + curelm = SLIST_NEXT(curelm, field); \ + SLIST_REMOVE_AFTER(curelm, field); \ + } \ + TRASHIT(*oldnext); \ +} while (0) + +#define SLIST_REMOVE_AFTER(elm, field) do { \ + SLIST_NEXT(elm, field) = \ + SLIST_NEXT(SLIST_NEXT(elm, field), field); \ +} while (0) + +#define SLIST_REMOVE_HEAD(head, field) do { \ + SLIST_FIRST((head)) = SLIST_NEXT(SLIST_FIRST((head)), field); \ +} while (0) + +#define SLIST_REMOVE_PREVPTR(prevp, elm, field) do { \ + QMD_SLIST_CHECK_PREVPTR(prevp, elm); \ + *(prevp) = SLIST_NEXT(elm, field); \ + TRASHIT((elm)->field.sle_next); \ +} while (0) + +#define SLIST_SWAP(head1, head2, type) do { \ + QUEUE_TYPEOF(type) *swap_first = SLIST_FIRST(head1); \ + SLIST_FIRST(head1) = SLIST_FIRST(head2); \ + SLIST_FIRST(head2) = swap_first; \ +} while (0) + +/* + * Singly-linked Tail queue declarations. + */ +#define STAILQ_HEAD(name, type) \ +struct name { \ + struct type *stqh_first;/* first element */ \ + struct type **stqh_last;/* addr of last next element */ \ +} + +#define STAILQ_CLASS_HEAD(name, type) \ +struct name { \ + class type *stqh_first; /* first element */ \ + class type **stqh_last; /* addr of last next element */ \ +} + +#define STAILQ_HEAD_INITIALIZER(head) \ + { NULL, &(head).stqh_first } + +#define STAILQ_ENTRY(type) \ +struct { \ + struct type *stqe_next; /* next element */ \ +} + +#define STAILQ_CLASS_ENTRY(type) \ +struct { \ + class type *stqe_next; /* next element */ \ +} + +/* + * Singly-linked Tail queue functions. + */ +#define STAILQ_CONCAT(head1, head2) do { \ + if (!STAILQ_EMPTY((head2))) { \ + *(head1)->stqh_last = (head2)->stqh_first; \ + (head1)->stqh_last = (head2)->stqh_last; \ + STAILQ_INIT((head2)); \ + } \ +} while (0) + +#define STAILQ_EMPTY(head) ((head)->stqh_first == NULL) + +#define STAILQ_FIRST(head) ((head)->stqh_first) + +#define STAILQ_FOREACH(var, head, field) \ + for((var) = STAILQ_FIRST((head)); \ + (var); \ + (var) = STAILQ_NEXT((var), field)) + +#define STAILQ_FOREACH_FROM(var, head, field) \ + for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ + (var); \ + (var) = STAILQ_NEXT((var), field)) + +#define STAILQ_FOREACH_SAFE(var, head, field, tvar) \ + for ((var) = STAILQ_FIRST((head)); \ + (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define STAILQ_FOREACH_FROM_SAFE(var, head, field, tvar) \ + for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ + (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define STAILQ_INIT(head) do { \ + STAILQ_FIRST((head)) = NULL; \ + (head)->stqh_last = &STAILQ_FIRST((head)); \ +} while (0) + +#define STAILQ_INSERT_AFTER(head, tqelm, elm, field) do { \ + if ((STAILQ_NEXT((elm), field) = STAILQ_NEXT((tqelm), field)) == NULL)\ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ + STAILQ_NEXT((tqelm), field) = (elm); \ +} while (0) + +#define STAILQ_INSERT_HEAD(head, elm, field) do { \ + if ((STAILQ_NEXT((elm), field) = STAILQ_FIRST((head))) == NULL) \ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ + STAILQ_FIRST((head)) = (elm); \ +} while (0) + +#define STAILQ_INSERT_TAIL(head, elm, field) do { \ + STAILQ_NEXT((elm), field) = NULL; \ + *(head)->stqh_last = (elm); \ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ +} while (0) + +#define STAILQ_LAST(head, type, field) \ + (STAILQ_EMPTY((head)) ? NULL : \ + __containerof((head)->stqh_last, \ + QUEUE_TYPEOF(type), field.stqe_next)) + +#define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next) + +#define STAILQ_REMOVE(head, elm, type, field) do { \ + QMD_SAVELINK(oldnext, (elm)->field.stqe_next); \ + if (STAILQ_FIRST((head)) == (elm)) { \ + STAILQ_REMOVE_HEAD((head), field); \ + } \ + else { \ + QUEUE_TYPEOF(type) *curelm = STAILQ_FIRST(head); \ + while (STAILQ_NEXT(curelm, field) != (elm)) \ + curelm = STAILQ_NEXT(curelm, field); \ + STAILQ_REMOVE_AFTER(head, curelm, field); \ + } \ + TRASHIT(*oldnext); \ +} while (0) + +#define STAILQ_REMOVE_AFTER(head, elm, field) do { \ + if ((STAILQ_NEXT(elm, field) = \ + STAILQ_NEXT(STAILQ_NEXT(elm, field), field)) == NULL) \ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ +} while (0) + +#define STAILQ_REMOVE_HEAD(head, field) do { \ + if ((STAILQ_FIRST((head)) = \ + STAILQ_NEXT(STAILQ_FIRST((head)), field)) == NULL) \ + (head)->stqh_last = &STAILQ_FIRST((head)); \ +} while (0) + +#define STAILQ_SWAP(head1, head2, type) do { \ + QUEUE_TYPEOF(type) *swap_first = STAILQ_FIRST(head1); \ + QUEUE_TYPEOF(type) **swap_last = (head1)->stqh_last; \ + STAILQ_FIRST(head1) = STAILQ_FIRST(head2); \ + (head1)->stqh_last = (head2)->stqh_last; \ + STAILQ_FIRST(head2) = swap_first; \ + (head2)->stqh_last = swap_last; \ + if (STAILQ_EMPTY(head1)) \ + (head1)->stqh_last = &STAILQ_FIRST(head1); \ + if (STAILQ_EMPTY(head2)) \ + (head2)->stqh_last = &STAILQ_FIRST(head2); \ +} while (0) + + +/* + * List declarations. + */ +#define LIST_HEAD(name, type) \ +struct name { \ + struct type *lh_first; /* first element */ \ +} + +#define LIST_CLASS_HEAD(name, type) \ +struct name { \ + class type *lh_first; /* first element */ \ +} + +#define LIST_HEAD_INITIALIZER(head) \ + { NULL } + +#define LIST_ENTRY(type) \ +struct { \ + struct type *le_next; /* next element */ \ + struct type **le_prev; /* address of previous next element */ \ +} + +#define LIST_CLASS_ENTRY(type) \ +struct { \ + class type *le_next; /* next element */ \ + class type **le_prev; /* address of previous next element */ \ +} + +/* + * List functions. + */ + +#if (defined(_KERNEL) && defined(INVARIANTS)) +/* + * QMD_LIST_CHECK_HEAD(LIST_HEAD *head, LIST_ENTRY NAME) + * + * If the list is non-empty, validates that the first element of the list + * points back at 'head.' + */ +#define QMD_LIST_CHECK_HEAD(head, field) do { \ + if (LIST_FIRST((head)) != NULL && \ + LIST_FIRST((head))->field.le_prev != \ + &LIST_FIRST((head))) \ + panic("Bad list head %p first->prev != head", (head)); \ +} while (0) + +/* + * QMD_LIST_CHECK_NEXT(TYPE *elm, LIST_ENTRY NAME) + * + * If an element follows 'elm' in the list, validates that the next element + * points back at 'elm.' + */ +#define QMD_LIST_CHECK_NEXT(elm, field) do { \ + if (LIST_NEXT((elm), field) != NULL && \ + LIST_NEXT((elm), field)->field.le_prev != \ + &((elm)->field.le_next)) \ + panic("Bad link elm %p next->prev != elm", (elm)); \ +} while (0) + +/* + * QMD_LIST_CHECK_PREV(TYPE *elm, LIST_ENTRY NAME) + * + * Validates that the previous element (or head of the list) points to 'elm.' + */ +#define QMD_LIST_CHECK_PREV(elm, field) do { \ + if (*(elm)->field.le_prev != (elm)) \ + panic("Bad link elm %p prev->next != elm", (elm)); \ +} while (0) +#else +#define QMD_LIST_CHECK_HEAD(head, field) +#define QMD_LIST_CHECK_NEXT(elm, field) +#define QMD_LIST_CHECK_PREV(elm, field) +#endif /* (_KERNEL && INVARIANTS) */ + +#define LIST_CONCAT(head1, head2, type, field) do { \ + QUEUE_TYPEOF(type) *curelm = LIST_FIRST(head1); \ + if (curelm == NULL) { \ + if ((LIST_FIRST(head1) = LIST_FIRST(head2)) != NULL) { \ + LIST_FIRST(head2)->field.le_prev = \ + &LIST_FIRST((head1)); \ + LIST_INIT(head2); \ + } \ + } else if (LIST_FIRST(head2) != NULL) { \ + while (LIST_NEXT(curelm, field) != NULL) \ + curelm = LIST_NEXT(curelm, field); \ + LIST_NEXT(curelm, field) = LIST_FIRST(head2); \ + LIST_FIRST(head2)->field.le_prev = &LIST_NEXT(curelm, field); \ + LIST_INIT(head2); \ + } \ +} while (0) + +#define LIST_EMPTY(head) ((head)->lh_first == NULL) + +#define LIST_FIRST(head) ((head)->lh_first) + +#define LIST_FOREACH(var, head, field) \ + for ((var) = LIST_FIRST((head)); \ + (var); \ + (var) = LIST_NEXT((var), field)) + +#define LIST_FOREACH_FROM(var, head, field) \ + for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ + (var); \ + (var) = LIST_NEXT((var), field)) + +#define LIST_FOREACH_SAFE(var, head, field, tvar) \ + for ((var) = LIST_FIRST((head)); \ + (var) && ((tvar) = LIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define LIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ + for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ + (var) && ((tvar) = LIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define LIST_INIT(head) do { \ + LIST_FIRST((head)) = NULL; \ +} while (0) + +#define LIST_INSERT_AFTER(listelm, elm, field) do { \ + QMD_LIST_CHECK_NEXT(listelm, field); \ + if ((LIST_NEXT((elm), field) = LIST_NEXT((listelm), field)) != NULL)\ + LIST_NEXT((listelm), field)->field.le_prev = \ + &LIST_NEXT((elm), field); \ + LIST_NEXT((listelm), field) = (elm); \ + (elm)->field.le_prev = &LIST_NEXT((listelm), field); \ +} while (0) + +#define LIST_INSERT_BEFORE(listelm, elm, field) do { \ + QMD_LIST_CHECK_PREV(listelm, field); \ + (elm)->field.le_prev = (listelm)->field.le_prev; \ + LIST_NEXT((elm), field) = (listelm); \ + *(listelm)->field.le_prev = (elm); \ + (listelm)->field.le_prev = &LIST_NEXT((elm), field); \ +} while (0) + +#define LIST_INSERT_HEAD(head, elm, field) do { \ + QMD_LIST_CHECK_HEAD((head), field); \ + if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \ + LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field);\ + LIST_FIRST((head)) = (elm); \ + (elm)->field.le_prev = &LIST_FIRST((head)); \ +} while (0) + +#define LIST_NEXT(elm, field) ((elm)->field.le_next) + +#define LIST_PREV(elm, head, type, field) \ + ((elm)->field.le_prev == &LIST_FIRST((head)) ? NULL : \ + __containerof((elm)->field.le_prev, \ + QUEUE_TYPEOF(type), field.le_next)) + +#define LIST_REMOVE(elm, field) do { \ + QMD_SAVELINK(oldnext, (elm)->field.le_next); \ + QMD_SAVELINK(oldprev, (elm)->field.le_prev); \ + QMD_LIST_CHECK_NEXT(elm, field); \ + QMD_LIST_CHECK_PREV(elm, field); \ + if (LIST_NEXT((elm), field) != NULL) \ + LIST_NEXT((elm), field)->field.le_prev = \ + (elm)->field.le_prev; \ + *(elm)->field.le_prev = LIST_NEXT((elm), field); \ + TRASHIT(*oldnext); \ + TRASHIT(*oldprev); \ +} while (0) + +#define LIST_SWAP(head1, head2, type, field) do { \ + QUEUE_TYPEOF(type) *swap_tmp = LIST_FIRST(head1); \ + LIST_FIRST((head1)) = LIST_FIRST((head2)); \ + LIST_FIRST((head2)) = swap_tmp; \ + if ((swap_tmp = LIST_FIRST((head1))) != NULL) \ + swap_tmp->field.le_prev = &LIST_FIRST((head1)); \ + if ((swap_tmp = LIST_FIRST((head2))) != NULL) \ + swap_tmp->field.le_prev = &LIST_FIRST((head2)); \ +} while (0) + /* * Tail queue declarations. */ @@ -123,10 +619,58 @@ struct { \ /* * Tail queue functions. */ +#if (defined(_KERNEL) && defined(INVARIANTS)) +/* + * QMD_TAILQ_CHECK_HEAD(TAILQ_HEAD *head, TAILQ_ENTRY NAME) + * + * If the tailq is non-empty, validates that the first element of the tailq + * points back at 'head.' + */ +#define QMD_TAILQ_CHECK_HEAD(head, field) do { \ + if (!TAILQ_EMPTY(head) && \ + TAILQ_FIRST((head))->field.tqe_prev != \ + &TAILQ_FIRST((head))) \ + panic("Bad tailq head %p first->prev != head", (head)); \ +} while (0) + +/* + * QMD_TAILQ_CHECK_TAIL(TAILQ_HEAD *head, TAILQ_ENTRY NAME) + * + * Validates that the tail of the tailq is a pointer to pointer to NULL. + */ +#define QMD_TAILQ_CHECK_TAIL(head, field) do { \ + if (*(head)->tqh_last != NULL) \ + panic("Bad tailq NEXT(%p->tqh_last) != NULL", (head)); \ +} while (0) + +/* + * QMD_TAILQ_CHECK_NEXT(TYPE *elm, TAILQ_ENTRY NAME) + * + * If an element follows 'elm' in the tailq, validates that the next element + * points back at 'elm.' + */ +#define QMD_TAILQ_CHECK_NEXT(elm, field) do { \ + if (TAILQ_NEXT((elm), field) != NULL && \ + TAILQ_NEXT((elm), field)->field.tqe_prev != \ + &((elm)->field.tqe_next)) \ + panic("Bad link elm %p next->prev != elm", (elm)); \ +} while (0) + +/* + * QMD_TAILQ_CHECK_PREV(TYPE *elm, TAILQ_ENTRY NAME) + * + * Validates that the previous element (or head of the tailq) points to 'elm.' + */ +#define QMD_TAILQ_CHECK_PREV(elm, field) do { \ + if (*(elm)->field.tqe_prev != (elm)) \ + panic("Bad link elm %p prev->next != elm", (elm)); \ +} while (0) +#else #define QMD_TAILQ_CHECK_HEAD(head, field) #define QMD_TAILQ_CHECK_TAIL(head, headname) #define QMD_TAILQ_CHECK_NEXT(elm, field) #define QMD_TAILQ_CHECK_PREV(elm, field) +#endif /* (_KERNEL && INVARIANTS) */ #define TAILQ_CONCAT(head1, head2, field) do { \ if (!TAILQ_EMPTY(head2)) { \ @@ -191,9 +735,8 @@ struct { \ #define TAILQ_INSERT_AFTER(head, listelm, elm, field) do { \ QMD_TAILQ_CHECK_NEXT(listelm, field); \ - TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field); \ - if (TAILQ_NEXT((listelm), field) != NULL) \ - TAILQ_NEXT((elm), field)->field.tqe_prev = \ + if ((TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field)) != NULL)\ + TAILQ_NEXT((elm), field)->field.tqe_prev = \ &TAILQ_NEXT((elm), field); \ else { \ (head)->tqh_last = &TAILQ_NEXT((elm), field); \ @@ -217,8 +760,7 @@ struct { \ #define TAILQ_INSERT_HEAD(head, elm, field) do { \ QMD_TAILQ_CHECK_HEAD(head, field); \ - TAILQ_NEXT((elm), field) = TAILQ_FIRST((head)); \ - if (TAILQ_FIRST((head)) != NULL) \ + if ((TAILQ_NEXT((elm), field) = TAILQ_FIRST((head))) != NULL) \ TAILQ_FIRST((head))->field.tqe_prev = \ &TAILQ_NEXT((elm), field); \ else \ @@ -250,21 +792,24 @@ struct { \ * you may want to prefetch the last data element. */ #define TAILQ_LAST_FAST(head, type, field) \ - (TAILQ_EMPTY(head) ? NULL : __containerof((head)->tqh_last, \ - QUEUE_TYPEOF(type), field.tqe_next)) + (TAILQ_EMPTY(head) ? NULL : __containerof((head)->tqh_last, QUEUE_TYPEOF(type), field.tqe_next)) #define TAILQ_NEXT(elm, field) ((elm)->field.tqe_next) #define TAILQ_PREV(elm, headname, field) \ (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last)) +#define TAILQ_PREV_FAST(elm, head, type, field) \ + ((elm)->field.tqe_prev == &(head)->tqh_first ? NULL : \ + __containerof((elm)->field.tqe_prev, QUEUE_TYPEOF(type), field.tqe_next)) + #define TAILQ_REMOVE(head, elm, field) do { \ QMD_SAVELINK(oldnext, (elm)->field.tqe_next); \ QMD_SAVELINK(oldprev, (elm)->field.tqe_prev); \ QMD_TAILQ_CHECK_NEXT(elm, field); \ QMD_TAILQ_CHECK_PREV(elm, field); \ if ((TAILQ_NEXT((elm), field)) != NULL) \ - TAILQ_NEXT((elm), field)->field.tqe_prev = \ + TAILQ_NEXT((elm), field)->field.tqe_prev = \ (elm)->field.tqe_prev; \ else { \ (head)->tqh_last = (elm)->field.tqe_prev; \ @@ -277,26 +822,20 @@ struct { \ } while (0) #define TAILQ_SWAP(head1, head2, type, field) do { \ - QUEUE_TYPEOF(type) * swap_first = (head1)->tqh_first; \ - QUEUE_TYPEOF(type) * *swap_last = (head1)->tqh_last; \ + QUEUE_TYPEOF(type) *swap_first = (head1)->tqh_first; \ + QUEUE_TYPEOF(type) **swap_last = (head1)->tqh_last; \ (head1)->tqh_first = (head2)->tqh_first; \ (head1)->tqh_last = (head2)->tqh_last; \ (head2)->tqh_first = swap_first; \ (head2)->tqh_last = swap_last; \ - swap_first = (head1)->tqh_first; \ - if (swap_first != NULL) \ + if ((swap_first = (head1)->tqh_first) != NULL) \ swap_first->field.tqe_prev = &(head1)->tqh_first; \ else \ (head1)->tqh_last = &(head1)->tqh_first; \ - swap_first = (head2)->tqh_first; \ - if (swap_first != NULL) \ + if ((swap_first = (head2)->tqh_first) != NULL) \ swap_first->field.tqe_prev = &(head2)->tqh_first; \ else \ (head2)->tqh_last = &(head2)->tqh_first; \ } while (0) -#ifdef __cplusplus -} -#endif - -#endif /* _SYS_QUEUE_H_ */ +#endif /* !_SYS_QUEUE_H_ */ From patchwork Mon Jun 15 00:43:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71528 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 98396A0093; Mon, 15 Jun 2020 02:45:41 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 740A91BE99; Mon, 15 Jun 2020 02:44:35 +0200 (CEST) Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by dpdk.org (Postfix) with ESMTP id 9628B4C7A for ; Mon, 15 Jun 2020 02:44:12 +0200 (CEST) Received: by mail-lj1-f193.google.com with SMTP id x18so17054154lji.1 for ; Sun, 14 Jun 2020 17:44:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SJMl5RYeSpJR0c08k8R+RC4d3cmcUx9XLA3/R+S799A=; b=XqsbRDBufRy7wXCgTPhdKzffhWgVmrCSYVzztlBpfAX5xvea9nDn6cqBfwbiALl6ql 6nN9h6EaKXFISs+4atDLcGMbcqCM4Xk2fQTIXh8pifFosDgbMNykIr4R7Th6Jnu+ZqQq FCyZ/HPeNkrHUjTbS9gkFUkGd4PnXjt20FwGEp7F6rM7Jr1G+CAk2/O3g4cCurdEyyhZ hi4QxULyA6fv+3opJRbEUTIUWsfB2G9ZuoY9v9BKc7TDYYNw9bnGR8/qDWxk7u7bD8Nk xUm47PaxR4TWFudn076B4u7eNnUp73+IYfUqIpE6fJMPiz6KbK//unP8aA1Wbom9rVXM shZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SJMl5RYeSpJR0c08k8R+RC4d3cmcUx9XLA3/R+S799A=; b=odcg/plRb82dpJxO7TVad0vXxjFvq0WxWXLU3outCstGaP6T9+tzI1J6Xga0z/6knW Vv3brI8AX55ijh0OiuWPvyZa2bjR5vU2bly6A9XHMEjYWQPCGNErySr4Yur8FcSQuLFn kmrffcmOm0XqqRt87Yn5JvR0FPPi1+2ax86kdMXpBkXKRch9BvmWYxD6PpFojjz1uCTa v4VVvjNCzosHOd1VTWQnd/9SNOzc5YFCwHjoFE7C3kRNaCxDap3Rut95elYQlKE1Ayol RsMRkr6ZYj5/cQlJ2Q+c9OnvUx1ZWS8uoqJkY8yj6Yb8JsrWcHlidX1rZrLCk3i5FMW+ SClw== X-Gm-Message-State: AOAM5319RSY5DRZcWXDtLPnGTrRLRu5CNPvwbhzipjV1bbg1EA/1CJj/ 4Ccwq0D8zRFPyq/pJFA2dkf5okCUPDn1JQ== X-Google-Smtp-Source: ABdhPJz/kzuLUO3vsu7SIl/ykfnQeKRwjBW1s7w1S900F9tiQ3SpNPcEullq0GbAl91xr+IR+7Ztkg== X-Received: by 2002:a2e:3a04:: with SMTP id h4mr7144796lja.103.1592181852018; Sun, 14 Jun 2020 17:44:12 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:11 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon Date: Mon, 15 Jun 2020 03:43:51 +0300 Message-Id: <20200615004354.14380-10-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 09/12] eal/windows: improve CPU and NUMA node detection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 1. Map CPU cores to their respective NUMA nodes as reported by system. 2. Support systems with more than 64 cores (multiple processor groups). 3. Fix magic constants, styling issues, and compiler warnings. 4. Add EAL private function to map DPDK socket ID to NUMA node number. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/windows/eal.c | 7 +- lib/librte_eal/windows/eal_lcore.c | 205 +++++++++++++++++---------- lib/librte_eal/windows/eal_windows.h | 15 +- 3 files changed, 152 insertions(+), 75 deletions(-) diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index e7461f731..dfc10b494 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -263,8 +263,11 @@ rte_eal_init(int argc, char **argv) eal_log_level_parse(argc, argv); - /* create a map of all processors in the system */ - eal_create_cpu_map(); + if (eal_create_cpu_map() < 0) { + rte_eal_init_alert("Cannot discover CPU and NUMA."); + /* rte_errno is set */ + return -1; + } if (rte_eal_cpu_init() < 0) { rte_eal_init_alert("Cannot detect lcores."); diff --git a/lib/librte_eal/windows/eal_lcore.c b/lib/librte_eal/windows/eal_lcore.c index 82ee45413..d5ff721e0 100644 --- a/lib/librte_eal/windows/eal_lcore.c +++ b/lib/librte_eal/windows/eal_lcore.c @@ -3,103 +3,164 @@ */ #include +#include #include #include +#include +#include +#include #include "eal_private.h" #include "eal_thread.h" #include "eal_windows.h" -/* global data structure that contains the CPU map */ -static struct _wcpu_map { - unsigned int total_procs; - unsigned int proc_sockets; - unsigned int proc_cores; - unsigned int reserved; - struct _win_lcore_map { - uint8_t socket_id; - uint8_t core_id; - } wlcore_map[RTE_MAX_LCORE]; -} wcpu_map = { 0 }; - -/* - * Create a map of all processors and associated cores on the system - */ -void -eal_create_cpu_map() +/** Number of logical processors (cores) in a processor group (32 or 64). */ +#define EAL_PROCESSOR_GROUP_SIZE (sizeof(KAFFINITY) * CHAR_BIT) + +struct lcore_map { + uint8_t socket_id; + uint8_t core_id; +}; + +struct socket_map { + uint16_t node_id; +}; + +struct cpu_map { + unsigned int socket_count; + unsigned int lcore_count; + struct lcore_map lcores[RTE_MAX_LCORE]; + struct socket_map sockets[RTE_MAX_NUMA_NODES]; +}; + +static struct cpu_map cpu_map = { 0 }; + +/* eal_create_cpu_map() is called before logging is initialized */ +static void +log_early(const char *format, ...) +{ + va_list va; + + va_start(va, format); + vfprintf(stderr, format, va); + va_end(va); +} + +int +eal_create_cpu_map(void) { - wcpu_map.total_procs = - GetActiveProcessorCount(ALL_PROCESSOR_GROUPS); - - LOGICAL_PROCESSOR_RELATIONSHIP lprocRel; - DWORD lprocInfoSize = 0; - BOOL ht_enabled = FALSE; - - /* First get the processor package information */ - lprocRel = RelationProcessorPackage; - /* Determine the size of buffer we need (pass NULL) */ - GetLogicalProcessorInformationEx(lprocRel, NULL, &lprocInfoSize); - wcpu_map.proc_sockets = lprocInfoSize / 48; - - lprocInfoSize = 0; - /* Next get the processor core information */ - lprocRel = RelationProcessorCore; - GetLogicalProcessorInformationEx(lprocRel, NULL, &lprocInfoSize); - wcpu_map.proc_cores = lprocInfoSize / 48; - - if (wcpu_map.total_procs > wcpu_map.proc_cores) - ht_enabled = TRUE; - - /* Distribute the socket and core ids appropriately - * across the logical cores. For now, split the cores - * equally across the sockets. - */ - unsigned int lcore = 0; - for (unsigned int socket = 0; socket < - wcpu_map.proc_sockets; ++socket) { - for (unsigned int core = 0; - core < (wcpu_map.proc_cores / wcpu_map.proc_sockets); - ++core) { - wcpu_map.wlcore_map[lcore] - .socket_id = socket; - wcpu_map.wlcore_map[lcore] - .core_id = core; - lcore++; - if (ht_enabled) { - wcpu_map.wlcore_map[lcore] - .socket_id = socket; - wcpu_map.wlcore_map[lcore] - .core_id = core; - lcore++; + SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos, *info; + DWORD infos_size; + bool full = false; + + infos_size = 0; + if (!GetLogicalProcessorInformationEx( + RelationNumaNode, NULL, &infos_size)) { + DWORD error = GetLastError(); + if (error != ERROR_INSUFFICIENT_BUFFER) { + log_early("Cannot get NUMA node info size, error %lu\n", + GetLastError()); + rte_errno = ENOMEM; + return -1; + } + } + + infos = malloc(infos_size); + if (infos == NULL) { + log_early("Cannot allocate memory for NUMA node information\n"); + rte_errno = ENOMEM; + return -1; + } + + if (!GetLogicalProcessorInformationEx( + RelationNumaNode, infos, &infos_size)) { + log_early("Cannot get NUMA node information, error %lu\n", + GetLastError()); + rte_errno = EINVAL; + return -1; + } + + info = infos; + while ((uint8_t *)info - (uint8_t *)infos < infos_size) { + unsigned int node_id = info->NumaNode.NodeNumber; + GROUP_AFFINITY *cores = &info->NumaNode.GroupMask; + struct lcore_map *lcore; + unsigned int i, socket_id; + + /* NUMA node may be reported multiple times if it includes + * cores from different processor groups, e. g. 80 cores + * of a physical processor comprise one NUMA node, but two + * processor groups, because group size is limited by 32/64. + */ + for (socket_id = 0; socket_id < cpu_map.socket_count; + socket_id++) { + if (cpu_map.sockets[socket_id].node_id == node_id) + break; + } + + if (socket_id == cpu_map.socket_count) { + if (socket_id == RTE_DIM(cpu_map.sockets)) { + full = true; + goto exit; + } + + cpu_map.sockets[socket_id].node_id = node_id; + cpu_map.socket_count++; + } + + for (i = 0; i < EAL_PROCESSOR_GROUP_SIZE; i++) { + if ((cores->Mask & ((KAFFINITY)1 << i)) == 0) + continue; + + if (cpu_map.lcore_count == RTE_DIM(cpu_map.lcores)) { + full = true; + goto exit; } + + lcore = &cpu_map.lcores[cpu_map.lcore_count]; + lcore->socket_id = socket_id; + lcore->core_id = + cores->Group * EAL_PROCESSOR_GROUP_SIZE + i; + cpu_map.lcore_count++; } + + info = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *)( + (uint8_t *)info + info->Size); } + +exit: + if (full) { + /* Not a fatal error, but important for troubleshooting. */ + log_early("Enumerated maximum of %u NUMA nodes and %u cores\n", + cpu_map.socket_count, cpu_map.lcore_count); + } + + free(infos); + + return 0; } -/* - * Check if a cpu is present by the presence of the cpu information for it - */ int eal_cpu_detected(unsigned int lcore_id) { - return (lcore_id < wcpu_map.total_procs); + return lcore_id < cpu_map.lcore_count; } -/* - * Get CPU socket id for a logical core - */ unsigned eal_cpu_socket_id(unsigned int lcore_id) { - return wcpu_map.wlcore_map[lcore_id].socket_id; + return cpu_map.lcores[lcore_id].socket_id; } -/* - * Get CPU socket id (NUMA node) for a logical core - */ unsigned eal_cpu_core_id(unsigned int lcore_id) { - return wcpu_map.wlcore_map[lcore_id].core_id; + return cpu_map.lcores[lcore_id].core_id; +} + +unsigned int +eal_socket_numa_node(unsigned int socket_id) +{ + return cpu_map.sockets[socket_id].node_id; } diff --git a/lib/librte_eal/windows/eal_windows.h b/lib/librte_eal/windows/eal_windows.h index fadd676b2..f3ed8c37f 100644 --- a/lib/librte_eal/windows/eal_windows.h +++ b/lib/librte_eal/windows/eal_windows.h @@ -13,8 +13,11 @@ /** * Create a map of processors and cores on the system. + * + * @return + * 0 on success, (-1) on failure and rte_errno is set. */ -void eal_create_cpu_map(void); +int eal_create_cpu_map(void); /** * Create a thread. @@ -26,4 +29,14 @@ void eal_create_cpu_map(void); */ int eal_thread_create(pthread_t *thread); +/** + * Get system NUMA node number for a socket ID. + * + * @param socket_id + * Valid EAL socket ID. + * @return + * NUMA node number to use with Win32 API. + */ +unsigned int eal_socket_numa_node(unsigned int socket_id); + #endif /* _EAL_WINDOWS_H_ */ From patchwork Mon Jun 15 00:43:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71529 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A0A09A0093; Mon, 15 Jun 2020 02:45:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6DF351BEB4; Mon, 15 Jun 2020 02:44:36 +0200 (CEST) Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) by dpdk.org (Postfix) with ESMTP id 6B9C64C7A for ; Mon, 15 Jun 2020 02:44:13 +0200 (CEST) Received: by mail-lf1-f65.google.com with SMTP id u16so8492651lfl.8 for ; Sun, 14 Jun 2020 17:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+n43t5qkHnnDF+Es1Hhy9M02FAdIDUC2uu0LAIJ/LJM=; b=Sjw2EXSGW0WmbVvtTM8g7clQPXzx2/TsmSsItREHxuV/umOShM5xzVBsrNVGOApC6P YQuAFvkUfbffZKvWK4wYffiz8RU7gq4uUfMcdh9LNXXpCm30sUmpOwsXffYP2tFMrCAt er6wvNHEXIsiZ08xuYztxo6Zo95D5A2156KDl3pGVooqq9y0pqwRuGDRQSpvG/U5I3RS GUcbjw8QDtz8JgQSav1nv4hF8HU62SvkTgcGP1ILYw6xeFHvSKCIYlKA2NPVeSl26cfz exCQrw+IahzKEv8p7okuEHoGVCnz+iZNKDw74nXp54EUWNDTcxUtmoyiGWQij3u/8+tm 4OKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+n43t5qkHnnDF+Es1Hhy9M02FAdIDUC2uu0LAIJ/LJM=; b=a20fyP6BE2Ujzl1Lecf76AXdBZrsl3pV8yQxqDYQwANaFQL2Q7jQs51lB54i320l+o /jjr9mo7E7Ceci52yVZCbu+VHdyBvaRi0TthHBmAr9ETODkue12C+PjxK3HhJm1HbT29 ENnvAvD7uTGZTfG1rCSiO9WfLT1dUaLJGHvSXhiS3Nv6HciQeGm0HF9iHpCBdBhGddLz FF35Qb9k+BNsWctASsopq1mvAZer5Rlg+Vpb4ehQ/6fLX+wx+VzzDD15rbNYQ4Y5j9Ao fHc+O2Z3Pxv0v2D2VT73NOWByk7iIY9m/RXGfdyW6FaaRX1+ZFcANc4poRIwD/+hnpcH Xplw== X-Gm-Message-State: AOAM531Drwss/mOXkSW5NI2pX4TVCsh8/d01VN2MtHwb643GSyPTGvDm ueUxxlf7Hmxtg2Krnml+veXIDF2iQuYN7A== X-Google-Smtp-Source: ABdhPJzObCGGEawP2+p8gN3bu5KsvC5U7gQJbDFLCflIBGn9JN1hmoT8rvR6+4RcCRDEW+4V6tDLaA== X-Received: by 2002:a19:c70d:: with SMTP id x13mr931337lff.212.1592181852899; Sun, 14 Jun 2020 17:44:12 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:12 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , John McNamara , Marko Kovacevic Date: Mon, 15 Jun 2020 03:43:52 +0300 Message-Id: <20200615004354.14380-11-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 10/12] doc/windows: split build and run instructions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" With memory management implemented for Windows, the guide for running sample applications is going to be extended with hugepages and driver setup. Move run instructions to a separate file to give space for planned expansion. Signed-off-by: Dmitry Kozlyuk --- doc/guides/windows_gsg/build_dpdk.rst | 20 -------------------- doc/guides/windows_gsg/index.rst | 1 + doc/guides/windows_gsg/run_apps.rst | 24 ++++++++++++++++++++++++ 3 files changed, 25 insertions(+), 20 deletions(-) create mode 100644 doc/guides/windows_gsg/run_apps.rst diff --git a/doc/guides/windows_gsg/build_dpdk.rst b/doc/guides/windows_gsg/build_dpdk.rst index d46e84e3f..650483e3b 100644 --- a/doc/guides/windows_gsg/build_dpdk.rst +++ b/doc/guides/windows_gsg/build_dpdk.rst @@ -111,23 +111,3 @@ Depending on the distribution, paths in this file may need adjustments. meson --cross-file config/x86/meson_mingw.txt -Dexamples=helloworld build ninja -C build - - -Run the helloworld example -========================== - -Navigate to the examples in the build directory and run `dpdk-helloworld.exe`. - -.. code-block:: console - - cd C:\Users\me\dpdk\build\examples - dpdk-helloworld.exe - hello from core 1 - hello from core 3 - hello from core 0 - hello from core 2 - -Note for MinGW-w64: applications are linked to ``libwinpthread-1.dll`` -by default. To run the example, either add toolchain executables directory -to the PATH or copy the library to the working directory. -Alternatively, static linking may be used (mind the LGPLv2.1 license). diff --git a/doc/guides/windows_gsg/index.rst b/doc/guides/windows_gsg/index.rst index d9b7990a8..e94593572 100644 --- a/doc/guides/windows_gsg/index.rst +++ b/doc/guides/windows_gsg/index.rst @@ -12,3 +12,4 @@ Getting Started Guide for Windows intro build_dpdk + run_apps diff --git a/doc/guides/windows_gsg/run_apps.rst b/doc/guides/windows_gsg/run_apps.rst new file mode 100644 index 000000000..ff4c4654f --- /dev/null +++ b/doc/guides/windows_gsg/run_apps.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2020 Dmitry Kozlyuk + +Running DPDK Applications +========================= + +Run the ``helloworld`` Example +------------------------------ + +Navigate to the examples in the build directory and run `dpdk-helloworld.exe`. + +.. code-block:: console + + cd C:\Users\me\dpdk\build\examples + dpdk-helloworld.exe + hello from core 1 + hello from core 3 + hello from core 0 + hello from core 2 + +Note for MinGW-w64: applications are linked to ``libwinpthread-1.dll`` +by default. To run the example, either add toolchain executables directory +to the PATH or copy the library to the working directory. +Alternatively, static linking may be used (mind the LGPLv2.1 license). From patchwork Mon Jun 15 00:43:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71530 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E78A5A0093; Mon, 15 Jun 2020 02:45:59 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C16811BEC3; Mon, 15 Jun 2020 02:44:37 +0200 (CEST) Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by dpdk.org (Postfix) with ESMTP id 7E19D4C8B for ; Mon, 15 Jun 2020 02:44:14 +0200 (CEST) Received: by mail-lj1-f193.google.com with SMTP id x18so17054193lji.1 for ; Sun, 14 Jun 2020 17:44:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ps01p+yBXSeqVZgez7F1ApFAxqR9jG6QNIuOrkJu5uY=; b=JjRINsogpQi+rFXDfTC81aqigTAo7QhVhjECj7VrFwckKW4El+h3QM9ga9pRgzrJVv VYw4/3R0zVMHmzDgGRP+Pb0ZVWrC7DThboJU+IX3TxWcXBSPXpl9DhCVee2DxeBpnqyV HQXuJslaw1wJlPnfXQ+txNJ1hMfU5IiY/WlAzUqiGbDcv7M/MNP/ZfNQPQdmxGPrKPU6 TlCRjFmJBpC7+keGz0zOvSLiQy+dHoHF2F0DhxuSjod5tVKgZ5EMYfBmbCeyfgDeLS1F c5pU8/inbzxJqjketkir0n8TNjgTS0FK+nZjPP6lq1tGYS+MyXNTae9a6Umg+uojnYky q8aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ps01p+yBXSeqVZgez7F1ApFAxqR9jG6QNIuOrkJu5uY=; b=G/PFcLnnbFrrI3HfjXA11mHxtosDNtM0MKAxgl/P5M/yTyLIsaf33WnfvOEvzqvB1/ hwuXv9nRSEu0r5/aEJiSxJ9X0lQ76iFAzHCjbL97mQ6GJmrzx65GSEl825XdlJrGFbdW 1D83vadVYkKVWVDFpc7CwRwZI1eX1+0sTzhssuz/O7bG752JwcRCa8tGqU3pWaE5DNIl 4VIducUZJpsWjBVXIaTTdZ3vro16Lvn1WF92v0nkEm1gYzxSWaB4SWLnarNXOOwoTJg8 zhaOGtq5pCtZCkCugwUo1J9GpnZq21+cb7nMmrxH5HvpqAA2LllC1diQSjWOQ9Kz2w4r RYfw== X-Gm-Message-State: AOAM531X72num/U+c3b9fF59bayr+ssi8LyT8uU1Rd0X025hUHibjdWg XfxTKsq4CpEryHcOh8DS4rt0bz2uq+nUhA== X-Google-Smtp-Source: ABdhPJzCiZZhYZ3PTywkjWtMm+92fmaFBo4CuGJ4GQXoXkKbTBnrYu86g9y8ah+y6TzcOKbW4Y9hsA== X-Received: by 2002:a2e:b4b6:: with SMTP id q22mr9945282ljm.53.1592181853854; Sun, 14 Jun 2020 17:44:13 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:13 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , John McNamara , Marko Kovacevic Date: Mon, 15 Jun 2020 03:43:53 +0300 Message-Id: <20200615004354.14380-12-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 11/12] eal/windows: initialize hugepage info X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add hugepages discovery ("large pages" in Windows terminology) and update documentation for required privilege setup. Only 2MB hugepages are supported and their number is estimated roughly due to the lack or unstable status of suitable OS APIs. Assign myself as maintainer for the implementation file. Signed-off-by: Dmitry Kozlyuk --- MAINTAINERS | 4 + config/meson.build | 2 + doc/guides/windows_gsg/run_apps.rst | 23 ++++++ lib/librte_eal/windows/eal.c | 14 ++++ lib/librte_eal/windows/eal_hugepages.c | 108 +++++++++++++++++++++++++ lib/librte_eal/windows/meson.build | 1 + 6 files changed, 152 insertions(+) create mode 100644 lib/librte_eal/windows/eal_hugepages.c diff --git a/MAINTAINERS b/MAINTAINERS index 241dbc3d7..9d5dacc23 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -334,6 +334,10 @@ F: lib/librte_eal/windows/ F: lib/librte_eal/rte_eal_exports.def F: doc/guides/windows_gsg/ +Windows memory allocation +M: Dmitry Kozlyuk +F: lib/librte_eal/windows/eal_hugepages.c + Core Libraries -------------- diff --git a/config/meson.build b/config/meson.build index 43ab11310..c1e80de4b 100644 --- a/config/meson.build +++ b/config/meson.build @@ -268,6 +268,8 @@ if is_windows if cc.get_id() == 'gcc' add_project_arguments('-D__USE_MINGW_ANSI_STDIO', language: 'c') endif + + add_project_link_arguments('-ladvapi32', language: 'c') endif if get_option('b_lto') diff --git a/doc/guides/windows_gsg/run_apps.rst b/doc/guides/windows_gsg/run_apps.rst index ff4c4654f..21ac7f6c1 100644 --- a/doc/guides/windows_gsg/run_apps.rst +++ b/doc/guides/windows_gsg/run_apps.rst @@ -4,6 +4,29 @@ Running DPDK Applications ========================= +Grant *Lock pages in memory* Privilege +-------------------------------------- + +Use of hugepages ("large pages" in Windows terminolocy) requires +``SeLockMemoryPrivilege`` for the user running an application. + +1. Open *Local Security Policy* snap in, either: + + * Control Panel / Computer Management / Local Security Policy; + * or Win+R, type ``secpol``, press Enter. + +2. Open *Local Policies / User Rights Assignment / Lock pages in memory.* + +3. Add desired users or groups to the list of grantees. + +4. Privilege is applied upon next logon. In particular, if privilege has been + granted to current user, a logoff is required before it is available. + +See `Large-Page Support`_ in MSDN for details. + +.. _Large-page Support: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support + + Run the ``helloworld`` Example ------------------------------ diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index dfc10b494..759bf4be5 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -19,8 +19,11 @@ #include #include +#include "eal_hugepages.h" #include "eal_windows.h" +#define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL) + /* Allow the application to print its usage message too if set */ static rte_usage_hook_t rte_application_usage_hook; @@ -279,6 +282,17 @@ rte_eal_init(int argc, char **argv) if (fctret < 0) exit(1); + if (!internal_config.no_hugetlbfs && (eal_hugepage_info_init() < 0)) { + rte_eal_init_alert("Cannot get hugepage information"); + rte_errno = EACCES; + return -1; + } + + if (internal_config.memory == 0 && !internal_config.force_sockets) { + if (internal_config.no_hugetlbfs) + internal_config.memory = MEMSIZE_IF_NO_HUGE_PAGE; + } + eal_thread_init_master(rte_config.master_lcore); RTE_LCORE_FOREACH_SLAVE(i) { diff --git a/lib/librte_eal/windows/eal_hugepages.c b/lib/librte_eal/windows/eal_hugepages.c new file mode 100644 index 000000000..61d0dcd3c --- /dev/null +++ b/lib/librte_eal/windows/eal_hugepages.c @@ -0,0 +1,108 @@ +#include +#include +#include +#include +#include + +#include "eal_filesystem.h" +#include "eal_hugepages.h" +#include "eal_internal_cfg.h" +#include "eal_windows.h" + +static int +hugepage_claim_privilege(void) +{ + static const wchar_t privilege[] = L"SeLockMemoryPrivilege"; + + HANDLE token; + LUID luid; + TOKEN_PRIVILEGES tp; + int ret = -1; + + if (!OpenProcessToken(GetCurrentProcess(), + TOKEN_ADJUST_PRIVILEGES, &token)) { + RTE_LOG_WIN32_ERR("OpenProcessToken()"); + return -1; + } + + if (!LookupPrivilegeValueW(NULL, privilege, &luid)) { + RTE_LOG_WIN32_ERR("LookupPrivilegeValue(\"%S\")", privilege); + goto exit; + } + + tp.PrivilegeCount = 1; + tp.Privileges[0].Luid = luid; + tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; + + if (!AdjustTokenPrivileges( + token, FALSE, &tp, sizeof(tp), NULL, NULL)) { + RTE_LOG_WIN32_ERR("AdjustTokenPrivileges()"); + goto exit; + } + + ret = 0; + +exit: + CloseHandle(token); + + return ret; +} + +static int +hugepage_info_init(void) +{ + struct hugepage_info *hpi; + unsigned int socket_id; + int ret = 0; + + /* Only one hugepage size available on Windows. */ + internal_config.num_hugepage_sizes = 1; + hpi = &internal_config.hugepage_info[0]; + + hpi->hugepage_sz = GetLargePageMinimum(); + if (hpi->hugepage_sz == 0) + return -ENOTSUP; + + /* Assume all memory on each NUMA node available for hugepages, + * because Windows neither advertises additional limits, + * nor provides an API to query them. + */ + for (socket_id = 0; socket_id < rte_socket_count(); socket_id++) { + ULONGLONG bytes; + unsigned int numa_node; + + numa_node = eal_socket_numa_node(socket_id); + if (!GetNumaAvailableMemoryNodeEx(numa_node, &bytes)) { + RTE_LOG_WIN32_ERR("GetNumaAvailableMemoryNodeEx(%u)", + numa_node); + continue; + } + + hpi->num_pages[socket_id] = bytes / hpi->hugepage_sz; + RTE_LOG(DEBUG, EAL, + "Found %u hugepages of %zu bytes on socket %u\n", + hpi->num_pages[socket_id], hpi->hugepage_sz, socket_id); + } + + /* No hugepage filesystem on Windows. */ + hpi->lock_descriptor = -1; + memset(hpi->hugedir, 0, sizeof(hpi->hugedir)); + + return ret; +} + +int +eal_hugepage_info_init(void) +{ + if (hugepage_claim_privilege() < 0) { + RTE_LOG(ERR, EAL, "Cannot claim hugepage privilege\n"); + return -1; + } + + if (hugepage_info_init() < 0) { + RTE_LOG(ERR, EAL, "Cannot get hugepage information\n"); + return -1; + } + + return 0; +} diff --git a/lib/librte_eal/windows/meson.build b/lib/librte_eal/windows/meson.build index adfc8b9b7..52978e9d7 100644 --- a/lib/librte_eal/windows/meson.build +++ b/lib/librte_eal/windows/meson.build @@ -6,6 +6,7 @@ subdir('include') sources += files( 'eal.c', 'eal_debug.c', + 'eal_hugepages.c', 'eal_lcore.c', 'eal_log.c', 'eal_thread.c', From patchwork Mon Jun 15 00:43:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 71531 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4F5B0A0093; Mon, 15 Jun 2020 02:46:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EB1C51BEC7; Mon, 15 Jun 2020 02:44:38 +0200 (CEST) Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by dpdk.org (Postfix) with ESMTP id EA8514C8B for ; Mon, 15 Jun 2020 02:44:16 +0200 (CEST) Received: by mail-lj1-f195.google.com with SMTP id i3so12542355ljg.3 for ; Sun, 14 Jun 2020 17:44:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QJ8qhUOMYKhNZQN86mYkOzbsF7k315H1M9vnJrvf16A=; b=ZXS0cslc7dEr6NEMrkzjiPuMm8nZgERDG5HN99PLv/6O8VtnD6EiPbtVzMbc7+NhmO 0Uy1h37um2/3CZTwLG1AR4Lq5+oyFt5TQw7S4AKTN8PwnLvva0z5aapyC59Pld9hlQmF 8ue6go0/QNNaWhNCDjq6vzAM3KdSDfWEgOzNawli5MSMtXtQg6/HovtGk5YA06W6WtoC qCWDIo04MMC0Ovm8wlQdDbo/5yVsYl1p7WJ9VzjG3un7oy93HUpsYlqQ9F5fsM6Lm4D+ yGxMvSCB8mGZSEyhArZWxIE+uj/CvFg8XQEoUAUal2D3APHV/uhtH7sPPbjgaQgj2V8H 8EKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QJ8qhUOMYKhNZQN86mYkOzbsF7k315H1M9vnJrvf16A=; b=a5i3kwL6YXpZertDRV1bDYCXlWYkStNsa0xyP8OFDzxjMaziuCPnH1b8n1STbIdOPa cltx0fdNZzNgA9GCaI/OaCW5uPaZzAYpK06DM2duaRiUTbrPuG3T6PqeJm9/PwCqqDk3 +9erin+PBq5Lm8/GSDDZaOH+0noNVVPKBkabSgWl07XmkXTm8zRIf0v69zSzcLc7QngW TswFYvxnKyx0QFHKNZ5PtBl8z2ovptdv8y2QE3ych0dqnTQXGMF2bdBI+MGAbnpKtJNh Jkf69J71IZY0EF8p2gReCdEA6ReKVTk6v8NUKZGEhcaFLxpk9wpd7rZPtEAcgfsffUNF F5dA== X-Gm-Message-State: AOAM5307XOLTOLCTHmZ1xbfPZY7yUKYR+Y4HRuP3OcXqjeqNhUbFaULV 7r+nGoiPW46CtH0Ji3tgGuikz7B5thY2Tw== X-Google-Smtp-Source: ABdhPJyP0ihdymP1dcP8tt6VxKHSN7wTtGL1JKNZDrEl/2WYh9Duvg5lhPXIBOkBvrVetb0ZC9fMcQ== X-Received: by 2002:a2e:8ec9:: with SMTP id e9mr12274433ljl.152.1592181855021; Sun, 14 Jun 2020 17:44:15 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id f19sm4176342lfk.24.2020.06.14.17.44.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 17:44:14 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , John McNamara , Marko Kovacevic , Anatoly Burakov Date: Mon, 15 Jun 2020 03:43:54 +0300 Message-Id: <20200615004354.14380-13-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200615004354.14380-1-dmitry.kozliuk@gmail.com> References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v9 12/12] eal/windows: implement basic memory management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Basic memory management supports core libraries and PMDs operating in IOVA as PA mode. It uses a kernel-mode driver, virt2phys, to obtain IOVAs of hugepages allocated from user-mode. Multi-process mode is not implemented and is forcefully disabled at startup. Assign myself as a maintainer for Windows file and memory management implementation. Signed-off-by: Dmitry Kozlyuk --- MAINTAINERS | 1 + config/meson.build | 12 +- doc/guides/windows_gsg/run_apps.rst | 54 +- lib/librte_eal/common/meson.build | 11 + lib/librte_eal/common/rte_malloc.c | 1 + lib/librte_eal/rte_eal_exports.def | 119 +++ lib/librte_eal/windows/eal.c | 63 +- lib/librte_eal/windows/eal_file.c | 125 +++ lib/librte_eal/windows/eal_memalloc.c | 441 +++++++++++ lib/librte_eal/windows/eal_memory.c | 710 ++++++++++++++++++ lib/librte_eal/windows/eal_mp.c | 103 +++ lib/librte_eal/windows/eal_windows.h | 75 ++ lib/librte_eal/windows/include/meson.build | 1 + lib/librte_eal/windows/include/rte_os.h | 17 + .../windows/include/rte_virt2phys.h | 34 + lib/librte_eal/windows/include/rte_windows.h | 2 + lib/librte_eal/windows/include/unistd.h | 3 + lib/librte_eal/windows/meson.build | 6 + 18 files changed, 1771 insertions(+), 7 deletions(-) create mode 100644 lib/librte_eal/windows/eal_file.c create mode 100644 lib/librte_eal/windows/eal_memalloc.c create mode 100644 lib/librte_eal/windows/eal_memory.c create mode 100644 lib/librte_eal/windows/eal_mp.c create mode 100644 lib/librte_eal/windows/include/rte_virt2phys.h diff --git a/MAINTAINERS b/MAINTAINERS index 9d5dacc23..a80a3b904 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -337,6 +337,7 @@ F: doc/guides/windows_gsg/ Windows memory allocation M: Dmitry Kozlyuk F: lib/librte_eal/windows/eal_hugepages.c +F: lib/librte_eal/windows/eal_mem* Core Libraries diff --git a/config/meson.build b/config/meson.build index c1e80de4b..d3f05f878 100644 --- a/config/meson.build +++ b/config/meson.build @@ -261,15 +261,21 @@ if is_freebsd endif if is_windows - # Minimum supported API is Windows 7. - add_project_arguments('-D_WIN32_WINNT=0x0601', language: 'c') + # VirtualAlloc2() is available since Windows 10 / Server 2016. + add_project_arguments('-D_WIN32_WINNT=0x0A00', language: 'c') # Use MinGW-w64 stdio, because DPDK assumes ANSI-compliant formatting. if cc.get_id() == 'gcc' add_project_arguments('-D__USE_MINGW_ANSI_STDIO', language: 'c') endif - add_project_link_arguments('-ladvapi32', language: 'c') + # Contrary to docs, VirtualAlloc2() is exported by mincore.lib + # in Windows SDK, while MinGW exports it by advapi32.a. + if is_ms_linker + add_project_link_arguments('-lmincore', language: 'c') + endif + + add_project_link_arguments('-ladvapi32', '-lsetupapi', language: 'c') endif if get_option('b_lto') diff --git a/doc/guides/windows_gsg/run_apps.rst b/doc/guides/windows_gsg/run_apps.rst index 21ac7f6c1..78e5a614f 100644 --- a/doc/guides/windows_gsg/run_apps.rst +++ b/doc/guides/windows_gsg/run_apps.rst @@ -7,10 +7,10 @@ Running DPDK Applications Grant *Lock pages in memory* Privilege -------------------------------------- -Use of hugepages ("large pages" in Windows terminolocy) requires +Use of hugepages ("large pages" in Windows terminology) requires ``SeLockMemoryPrivilege`` for the user running an application. -1. Open *Local Security Policy* snap in, either: +1. Open *Local Security Policy* snap-in, either: * Control Panel / Computer Management / Local Security Policy; * or Win+R, type ``secpol``, press Enter. @@ -24,7 +24,55 @@ Use of hugepages ("large pages" in Windows terminolocy) requires See `Large-Page Support`_ in MSDN for details. -.. _Large-page Support: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support +.. _Large-Page Support: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support + + +Load virt2phys Driver +--------------------- + +Access to physical addresses is provided by a kernel-mode driver, virt2phys. +It is mandatory at least for using hardware PMDs, but may also be required +for mempools. + +Refer to documentation in ``dpdk-kmods`` repository for details on system +setup, driver build and installation. This driver is not signed, so signature +checking must be disabled to load it. + +.. warning:: + + Disabling driver signature enforcement weakens OS security. + It is discouraged in production environments. + +Compiled package consists of ``virt2phys.inf``, ``virt2phys.cat``, +and ``virt2phys.sys``. It can be installed as follows +from Elevated Command Prompt: + +.. code-block:: console + + pnputil /add-driver Z:\path\to\virt2phys.inf /install + +On Windows Server additional steps are required: + +1. From Device Manager, Action menu, select "Add legacy hardware". +2. It will launch the "Add Hardware Wizard". Click "Next". +3. Select second option "Install the hardware that I manually select + from a list (Advanced)". +4. On the next screen, "Kernel bypass" will be shown as a device class. +5. Select it, and click "Next". +6. The previously installed drivers will now be installed for the + "Virtual to physical address translator" device. + +When loaded successfully, the driver is shown in *Device Manager* as *Virtual +to physical address translator* device under *Kernel bypass* category. +Installed driver persists across reboots. + +If DPDK is unable to communicate with the driver, a warning is printed +on initialization (debug-level logs provide more details): + +.. code-block:: text + + EAL: Cannot open virt2phys driver interface + Run the ``helloworld`` Example diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build index 4e9208129..310844269 100644 --- a/lib/librte_eal/common/meson.build +++ b/lib/librte_eal/common/meson.build @@ -8,13 +8,24 @@ if is_windows 'eal_common_bus.c', 'eal_common_class.c', 'eal_common_devargs.c', + 'eal_common_dynmem.c', 'eal_common_errno.c', + 'eal_common_fbarray.c', 'eal_common_launch.c', 'eal_common_lcore.c', 'eal_common_log.c', + 'eal_common_mcfg.c', + 'eal_common_memalloc.c', + 'eal_common_memory.c', + 'eal_common_memzone.c', 'eal_common_options.c', + 'eal_common_string_fns.c', + 'eal_common_tailqs.c', 'eal_common_thread.c', 'eal_common_trace_points.c', + 'malloc_elem.c', + 'malloc_heap.c', + 'rte_malloc.c', ) subdir_done() endif diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index f1b73168b..9d39e58c0 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -20,6 +20,7 @@ #include #include #include + #include #include diff --git a/lib/librte_eal/rte_eal_exports.def b/lib/librte_eal/rte_eal_exports.def index 12a6c79d6..e2eb24f01 100644 --- a/lib/librte_eal/rte_eal_exports.def +++ b/lib/librte_eal/rte_eal_exports.def @@ -1,9 +1,128 @@ EXPORTS __rte_panic + rte_calloc + rte_calloc_socket rte_eal_get_configuration + rte_eal_has_hugepages rte_eal_init + rte_eal_iova_mode rte_eal_mp_remote_launch rte_eal_mp_wait_lcore + rte_eal_process_type rte_eal_remote_launch rte_log + rte_eal_tailq_lookup + rte_eal_tailq_register + rte_eal_using_phys_addrs + rte_free + rte_malloc + rte_malloc_dump_stats + rte_malloc_get_socket_stats + rte_malloc_set_limit + rte_malloc_socket + rte_malloc_validate + rte_malloc_virt2iova + rte_mcfg_mem_read_lock + rte_mcfg_mem_read_unlock + rte_mcfg_mem_write_lock + rte_mcfg_mem_write_unlock + rte_mcfg_mempool_read_lock + rte_mcfg_mempool_read_unlock + rte_mcfg_mempool_write_lock + rte_mcfg_mempool_write_unlock + rte_mcfg_tailq_read_lock + rte_mcfg_tailq_read_unlock + rte_mcfg_tailq_write_lock + rte_mcfg_tailq_write_unlock + rte_mem_lock_page + rte_mem_virt2iova + rte_mem_virt2phy + rte_memory_get_nchannel + rte_memory_get_nrank + rte_memzone_dump + rte_memzone_free + rte_memzone_lookup + rte_memzone_reserve + rte_memzone_reserve_aligned + rte_memzone_reserve_bounded + rte_memzone_walk rte_vlog + rte_realloc + rte_zmalloc + rte_zmalloc_socket + + rte_mp_action_register + rte_mp_action_unregister + rte_mp_reply + rte_mp_sendmsg + + rte_fbarray_attach + rte_fbarray_destroy + rte_fbarray_detach + rte_fbarray_dump_metadata + rte_fbarray_find_contig_free + rte_fbarray_find_contig_used + rte_fbarray_find_idx + rte_fbarray_find_next_free + rte_fbarray_find_next_n_free + rte_fbarray_find_next_n_used + rte_fbarray_find_next_used + rte_fbarray_get + rte_fbarray_init + rte_fbarray_is_used + rte_fbarray_set_free + rte_fbarray_set_used + rte_malloc_dump_heaps + rte_mem_alloc_validator_register + rte_mem_alloc_validator_unregister + rte_mem_check_dma_mask + rte_mem_event_callback_register + rte_mem_event_callback_unregister + rte_mem_iova2virt + rte_mem_virt2memseg + rte_mem_virt2memseg_list + rte_memseg_contig_walk + rte_memseg_list_walk + rte_memseg_walk + rte_mp_request_async + rte_mp_request_sync + + rte_fbarray_find_prev_free + rte_fbarray_find_prev_n_free + rte_fbarray_find_prev_n_used + rte_fbarray_find_prev_used + rte_fbarray_find_rev_contig_free + rte_fbarray_find_rev_contig_used + rte_memseg_contig_walk_thread_unsafe + rte_memseg_list_walk_thread_unsafe + rte_memseg_walk_thread_unsafe + + rte_malloc_heap_create + rte_malloc_heap_destroy + rte_malloc_heap_get_socket + rte_malloc_heap_memory_add + rte_malloc_heap_memory_attach + rte_malloc_heap_memory_detach + rte_malloc_heap_memory_remove + rte_malloc_heap_socket_is_external + rte_mem_check_dma_mask_thread_unsafe + rte_mem_set_dma_mask + rte_memseg_get_fd + rte_memseg_get_fd_offset + rte_memseg_get_fd_offset_thread_unsafe + rte_memseg_get_fd_thread_unsafe + + rte_extmem_attach + rte_extmem_detach + rte_extmem_register + rte_extmem_unregister + + rte_fbarray_find_biggest_free + rte_fbarray_find_biggest_used + rte_fbarray_find_rev_biggest_free + rte_fbarray_find_rev_biggest_used + + rte_mem_lock + rte_mem_map + rte_mem_page_size + rte_mem_unmap diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index 759bf4be5..666651dc7 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -94,6 +94,24 @@ eal_proc_type_detect(void) return ptype; } +enum rte_proc_type_t +rte_eal_process_type(void) +{ + return rte_config.process_type; +} + +int +rte_eal_has_hugepages(void) +{ + return !internal_config.no_hugetlbfs; +} + +enum rte_iova_mode +rte_eal_iova_mode(void) +{ + return rte_config.iova_mode; +} + /* display usage */ static void eal_usage(const char *prgname) @@ -256,7 +274,7 @@ __rte_trace_point_register(rte_trace_point_t *trace, const char *name, return -ENOTSUP; } -/* Launch threads, called at application init(). */ + /* Launch threads, called at application init(). */ int rte_eal_init(int argc, char **argv) { @@ -282,6 +300,13 @@ rte_eal_init(int argc, char **argv) if (fctret < 0) exit(1); + /* Prevent creation of shared memory files. */ + if (internal_config.in_memory == 0) { + RTE_LOG(WARNING, EAL, "Multi-process support is requested, " + "but not available.\n"); + internal_config.in_memory = 1; + } + if (!internal_config.no_hugetlbfs && (eal_hugepage_info_init() < 0)) { rte_eal_init_alert("Cannot get hugepage information"); rte_errno = EACCES; @@ -293,6 +318,42 @@ rte_eal_init(int argc, char **argv) internal_config.memory = MEMSIZE_IF_NO_HUGE_PAGE; } + if (eal_mem_win32api_init() < 0) { + rte_eal_init_alert("Cannot access Win32 memory management"); + rte_errno = ENOTSUP; + return -1; + } + + if (eal_mem_virt2iova_init() < 0) { + /* Non-fatal error if physical addresses are not required. */ + RTE_LOG(WARNING, EAL, "Cannot access virt2phys driver, " + "PA will not be available\n"); + } + + if (rte_eal_memzone_init() < 0) { + rte_eal_init_alert("Cannot init memzone"); + rte_errno = ENODEV; + return -1; + } + + if (rte_eal_memory_init() < 0) { + rte_eal_init_alert("Cannot init memory"); + rte_errno = ENOMEM; + return -1; + } + + if (rte_eal_malloc_heap_init() < 0) { + rte_eal_init_alert("Cannot init malloc heap"); + rte_errno = ENODEV; + return -1; + } + + if (rte_eal_tailqs_init() < 0) { + rte_eal_init_alert("Cannot init tail queues for objects"); + rte_errno = EFAULT; + return -1; + } + eal_thread_init_master(rte_config.master_lcore); RTE_LCORE_FOREACH_SLAVE(i) { diff --git a/lib/librte_eal/windows/eal_file.c b/lib/librte_eal/windows/eal_file.c new file mode 100644 index 000000000..dfbe8d311 --- /dev/null +++ b/lib/librte_eal/windows/eal_file.c @@ -0,0 +1,125 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Dmitry Kozlyuk + */ + +#include +#include +#include +#include + +#include "eal_private.h" +#include "eal_windows.h" + +int +eal_file_open(const char *path, int flags) +{ + static const int MODE_MASK = EAL_OPEN_READONLY | EAL_OPEN_READWRITE; + + int fd, ret, sys_flags; + + switch (flags & MODE_MASK) { + case EAL_OPEN_READONLY: + sys_flags = _O_RDONLY; + break; + case EAL_OPEN_READWRITE: + sys_flags = _O_RDWR; + break; + default: + rte_errno = ENOTSUP; + return -1; + } + + if (flags & EAL_OPEN_CREATE) + sys_flags |= _O_CREAT; + + ret = _sopen_s(&fd, path, sys_flags, _SH_DENYNO, _S_IWRITE); + if (ret < 0) { + rte_errno = errno; + return -1; + } + + return fd; +} + +int +eal_file_truncate(int fd, ssize_t size) +{ + HANDLE handle; + DWORD ret; + LONG low = (LONG)((size_t)size); + LONG high = (LONG)((size_t)size >> 32); + + handle = (HANDLE)_get_osfhandle(fd); + if (handle == INVALID_HANDLE_VALUE) { + rte_errno = EBADF; + return -1; + } + + ret = SetFilePointer(handle, low, &high, FILE_BEGIN); + if (ret == INVALID_SET_FILE_POINTER) { + RTE_LOG_WIN32_ERR("SetFilePointer()"); + rte_errno = EINVAL; + return -1; + } + + return 0; +} + +static int +lock_file(HANDLE handle, enum eal_flock_op op, enum eal_flock_mode mode) +{ + DWORD sys_flags = 0; + OVERLAPPED overlapped; + + if (op == EAL_FLOCK_EXCLUSIVE) + sys_flags |= LOCKFILE_EXCLUSIVE_LOCK; + if (mode == EAL_FLOCK_RETURN) + sys_flags |= LOCKFILE_FAIL_IMMEDIATELY; + + memset(&overlapped, 0, sizeof(overlapped)); + if (!LockFileEx(handle, sys_flags, 0, 0, 0, &overlapped)) { + if ((sys_flags & LOCKFILE_FAIL_IMMEDIATELY) && + (GetLastError() == ERROR_IO_PENDING)) { + rte_errno = EWOULDBLOCK; + } else { + RTE_LOG_WIN32_ERR("LockFileEx()"); + rte_errno = EINVAL; + } + return -1; + } + + return 0; +} + +static int +unlock_file(HANDLE handle) +{ + if (!UnlockFileEx(handle, 0, 0, 0, NULL)) { + RTE_LOG_WIN32_ERR("UnlockFileEx()"); + rte_errno = EINVAL; + return -1; + } + return 0; +} + +int +eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode) +{ + HANDLE handle = (HANDLE)_get_osfhandle(fd); + + if (handle == INVALID_HANDLE_VALUE) { + rte_errno = EBADF; + return -1; + } + + switch (op) { + case EAL_FLOCK_EXCLUSIVE: + case EAL_FLOCK_SHARED: + return lock_file(handle, op, mode); + case EAL_FLOCK_UNLOCK: + return unlock_file(handle); + default: + rte_errno = EINVAL; + return -1; + } +} diff --git a/lib/librte_eal/windows/eal_memalloc.c b/lib/librte_eal/windows/eal_memalloc.c new file mode 100644 index 000000000..a7452b6e1 --- /dev/null +++ b/lib/librte_eal/windows/eal_memalloc.c @@ -0,0 +1,441 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +#include +#include + +#include "eal_internal_cfg.h" +#include "eal_memalloc.h" +#include "eal_memcfg.h" +#include "eal_private.h" +#include "eal_windows.h" + +int +eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +{ + /* Hugepages have no associated files in Windows. */ + RTE_SET_USED(list_idx); + RTE_SET_USED(seg_idx); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +{ + /* Hugepages have no associated files in Windows. */ + RTE_SET_USED(list_idx); + RTE_SET_USED(seg_idx); + RTE_SET_USED(offset); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +static int +alloc_seg(struct rte_memseg *ms, void *requested_addr, int socket_id, + struct hugepage_info *hi) +{ + HANDLE current_process; + unsigned int numa_node; + size_t alloc_sz; + void *addr; + rte_iova_t iova = RTE_BAD_IOVA; + PSAPI_WORKING_SET_EX_INFORMATION info; + PSAPI_WORKING_SET_EX_BLOCK *page; + + if (ms->len > 0) { + /* If a segment is already allocated as needed, return it. */ + if ((ms->addr == requested_addr) && + (ms->socket_id == socket_id) && + (ms->hugepage_sz == hi->hugepage_sz)) { + return 0; + } + + /* Bugcheck, should not happen. */ + RTE_LOG(DEBUG, EAL, "Attempted to reallocate segment %p " + "(size %zu) on socket %d", ms->addr, + ms->len, ms->socket_id); + return -1; + } + + current_process = GetCurrentProcess(); + numa_node = eal_socket_numa_node(socket_id); + alloc_sz = hi->hugepage_sz; + + if (requested_addr == NULL) { + /* Request a new chunk of memory from OS. */ + addr = eal_mem_alloc_socket(alloc_sz, socket_id); + if (addr == NULL) { + RTE_LOG(DEBUG, EAL, "Cannot allocate %zu bytes " + "on socket %d\n", alloc_sz, socket_id); + return -1; + } + } else { + /* Requested address is already reserved, commit memory. */ + addr = eal_mem_commit(requested_addr, alloc_sz, socket_id); + + /* During commitment, memory is temporary freed and might + * be allocated by different non-EAL thread. This is a fatal + * error, because it breaks MSL assumptions. + */ + if ((addr != NULL) && (addr != requested_addr)) { + RTE_LOG(CRIT, EAL, "Address %p occupied by an alien " + " allocation - MSL is not VA-contiguous!\n", + requested_addr); + return -1; + } + + if (addr == NULL) { + RTE_LOG(DEBUG, EAL, "Cannot commit reserved memory %p " + "(size %zu) on socket %d\n", + requested_addr, alloc_sz, socket_id); + return -1; + } + } + + /* Force OS to allocate a physical page and select a NUMA node. + * Hugepages are not pageable in Windows, so there's no race + * for physical address. + */ + *(volatile int *)addr = *(volatile int *)addr; + + /* Only try to obtain IOVA if it's available, so that applications + * that do not need IOVA can use this allocator. + */ + if (rte_eal_using_phys_addrs()) { + iova = rte_mem_virt2iova(addr); + if (iova == RTE_BAD_IOVA) { + RTE_LOG(DEBUG, EAL, + "Cannot get IOVA of allocated segment\n"); + goto error; + } + } + + /* Only "Ex" function can handle hugepages. */ + info.VirtualAddress = addr; + if (!QueryWorkingSetEx(current_process, &info, sizeof(info))) { + RTE_LOG_WIN32_ERR("QueryWorkingSetEx(%p)", addr); + goto error; + } + + page = &info.VirtualAttributes; + if (!page->Valid || !page->LargePage) { + RTE_LOG(DEBUG, EAL, "Got regular page instead of a hugepage\n"); + goto error; + } + if (page->Node != numa_node) { + RTE_LOG(DEBUG, EAL, + "NUMA node hint %u (socket %d) not respected, got %u\n", + numa_node, socket_id, page->Node); + goto error; + } + + ms->addr = addr; + ms->hugepage_sz = hi->hugepage_sz; + ms->len = alloc_sz; + ms->nchannel = rte_memory_get_nchannel(); + ms->nrank = rte_memory_get_nrank(); + ms->iova = iova; + ms->socket_id = socket_id; + + return 0; + +error: + /* Only jump here when `addr` and `alloc_sz` are valid. */ + if (eal_mem_decommit(addr, alloc_sz) && (rte_errno == EADDRNOTAVAIL)) { + /* During decommitment, memory is temporarily returned + * to the system and the address may become unavailable. + */ + RTE_LOG(CRIT, EAL, "Address %p occupied by an alien " + " allocation - MSL is not VA-contiguous!\n", addr); + } + return -1; +} + +static int +free_seg(struct rte_memseg *ms) +{ + if (eal_mem_decommit(ms->addr, ms->len)) { + if (rte_errno == EADDRNOTAVAIL) { + /* See alloc_seg() for explanation. */ + RTE_LOG(CRIT, EAL, "Address %p occupied by an alien " + " allocation - MSL is not VA-contiguous!\n", + ms->addr); + } + return -1; + } + + /* Must clear the segment, because alloc_seg() inspects it. */ + memset(ms, 0, sizeof(*ms)); + return 0; +} + +struct alloc_walk_param { + struct hugepage_info *hi; + struct rte_memseg **ms; + size_t page_sz; + unsigned int segs_allocated; + unsigned int n_segs; + int socket; + bool exact; +}; + +static int +alloc_seg_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct alloc_walk_param *wa = arg; + struct rte_memseg_list *cur_msl; + size_t page_sz; + int cur_idx, start_idx, j; + unsigned int msl_idx, need, i; + + if (msl->page_sz != wa->page_sz) + return 0; + if (msl->socket_id != wa->socket) + return 0; + + page_sz = (size_t)msl->page_sz; + + msl_idx = msl - mcfg->memsegs; + cur_msl = &mcfg->memsegs[msl_idx]; + + need = wa->n_segs; + + /* try finding space in memseg list */ + if (wa->exact) { + /* if we require exact number of pages in a list, find them */ + cur_idx = rte_fbarray_find_next_n_free( + &cur_msl->memseg_arr, 0, need); + if (cur_idx < 0) + return 0; + start_idx = cur_idx; + } else { + int cur_len; + + /* we don't require exact number of pages, so we're going to go + * for best-effort allocation. that means finding the biggest + * unused block, and going with that. + */ + cur_idx = rte_fbarray_find_biggest_free( + &cur_msl->memseg_arr, 0); + if (cur_idx < 0) + return 0; + start_idx = cur_idx; + /* adjust the size to possibly be smaller than original + * request, but do not allow it to be bigger. + */ + cur_len = rte_fbarray_find_contig_free( + &cur_msl->memseg_arr, cur_idx); + need = RTE_MIN(need, (unsigned int)cur_len); + } + + for (i = 0; i < need; i++, cur_idx++) { + struct rte_memseg *cur; + void *map_addr; + + cur = rte_fbarray_get(&cur_msl->memseg_arr, cur_idx); + map_addr = RTE_PTR_ADD(cur_msl->base_va, cur_idx * page_sz); + + if (alloc_seg(cur, map_addr, wa->socket, wa->hi)) { + RTE_LOG(DEBUG, EAL, "attempted to allocate %i segments, " + "but only %i were allocated\n", need, i); + + /* if exact number wasn't requested, stop */ + if (!wa->exact) + goto out; + + /* clean up */ + for (j = start_idx; j < cur_idx; j++) { + struct rte_memseg *tmp; + struct rte_fbarray *arr = &cur_msl->memseg_arr; + + tmp = rte_fbarray_get(arr, j); + rte_fbarray_set_free(arr, j); + + if (free_seg(tmp)) + RTE_LOG(DEBUG, EAL, "Cannot free page\n"); + } + /* clear the list */ + if (wa->ms) + memset(wa->ms, 0, sizeof(*wa->ms) * wa->n_segs); + + return -1; + } + if (wa->ms) + wa->ms[i] = cur; + + rte_fbarray_set_used(&cur_msl->memseg_arr, cur_idx); + } + +out: + wa->segs_allocated = i; + if (i > 0) + cur_msl->version++; + + /* if we didn't allocate any segments, move on to the next list */ + return i > 0; +} + +struct free_walk_param { + struct hugepage_info *hi; + struct rte_memseg *ms; +}; +static int +free_seg_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_memseg_list *found_msl; + struct free_walk_param *wa = arg; + uintptr_t start_addr, end_addr; + int msl_idx, seg_idx, ret; + + start_addr = (uintptr_t) msl->base_va; + end_addr = start_addr + msl->len; + + if ((uintptr_t)wa->ms->addr < start_addr || + (uintptr_t)wa->ms->addr >= end_addr) + return 0; + + msl_idx = msl - mcfg->memsegs; + seg_idx = RTE_PTR_DIFF(wa->ms->addr, start_addr) / msl->page_sz; + + /* msl is const */ + found_msl = &mcfg->memsegs[msl_idx]; + found_msl->version++; + + rte_fbarray_set_free(&found_msl->memseg_arr, seg_idx); + + ret = free_seg(wa->ms); + + return (ret < 0) ? (-1) : 1; +} + +int +eal_memalloc_alloc_seg_bulk(struct rte_memseg **ms, int n_segs, + size_t page_sz, int socket, bool exact) +{ + unsigned int i; + int ret = -1; + struct alloc_walk_param wa; + struct hugepage_info *hi = NULL; + + if (internal_config.legacy_mem) { + RTE_LOG(ERR, EAL, "dynamic allocation not supported in legacy mode\n"); + return -ENOTSUP; + } + + for (i = 0; i < internal_config.num_hugepage_sizes; i++) { + struct hugepage_info *hpi = &internal_config.hugepage_info[i]; + if (page_sz == hpi->hugepage_sz) { + hi = hpi; + break; + } + } + if (!hi) { + RTE_LOG(ERR, EAL, "cannot find relevant hugepage_info entry\n"); + return -1; + } + + memset(&wa, 0, sizeof(wa)); + wa.exact = exact; + wa.hi = hi; + wa.ms = ms; + wa.n_segs = n_segs; + wa.page_sz = page_sz; + wa.socket = socket; + wa.segs_allocated = 0; + + /* memalloc is locked, so it's safe to use thread-unsafe version */ + ret = rte_memseg_list_walk_thread_unsafe(alloc_seg_walk, &wa); + if (ret == 0) { + RTE_LOG(ERR, EAL, "cannot find suitable memseg_list\n"); + ret = -1; + } else if (ret > 0) { + ret = (int)wa.segs_allocated; + } + + return ret; +} + +struct rte_memseg * +eal_memalloc_alloc_seg(size_t page_sz, int socket) +{ + struct rte_memseg *ms = NULL; + eal_memalloc_alloc_seg_bulk(&ms, 1, page_sz, socket, true); + return ms; +} + +int +eal_memalloc_free_seg_bulk(struct rte_memseg **ms, int n_segs) +{ + int seg, ret = 0; + + /* dynamic free not supported in legacy mode */ + if (internal_config.legacy_mem) + return -1; + + for (seg = 0; seg < n_segs; seg++) { + struct rte_memseg *cur = ms[seg]; + struct hugepage_info *hi = NULL; + struct free_walk_param wa; + size_t i; + int walk_res; + + /* if this page is marked as unfreeable, fail */ + if (cur->flags & RTE_MEMSEG_FLAG_DO_NOT_FREE) { + RTE_LOG(DEBUG, EAL, "Page is not allowed to be freed\n"); + ret = -1; + continue; + } + + memset(&wa, 0, sizeof(wa)); + + for (i = 0; i < RTE_DIM(internal_config.hugepage_info); i++) { + hi = &internal_config.hugepage_info[i]; + if (cur->hugepage_sz == hi->hugepage_sz) + break; + } + if (i == RTE_DIM(internal_config.hugepage_info)) { + RTE_LOG(ERR, EAL, "Can't find relevant hugepage_info entry\n"); + ret = -1; + continue; + } + + wa.ms = cur; + wa.hi = hi; + + /* memalloc is locked, so it's safe to use thread-unsafe version + */ + walk_res = rte_memseg_list_walk_thread_unsafe(free_seg_walk, + &wa); + if (walk_res == 1) + continue; + if (walk_res == 0) + RTE_LOG(ERR, EAL, "Couldn't find memseg list\n"); + ret = -1; + } + return ret; +} + +int +eal_memalloc_free_seg(struct rte_memseg *ms) +{ + return eal_memalloc_free_seg_bulk(&ms, 1); +} + +int +eal_memalloc_sync_with_primary(void) +{ + /* No multi-process support. */ + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +eal_memalloc_init(void) +{ + /* No action required. */ + return 0; +} diff --git a/lib/librte_eal/windows/eal_memory.c b/lib/librte_eal/windows/eal_memory.c new file mode 100644 index 000000000..73be1cf72 --- /dev/null +++ b/lib/librte_eal/windows/eal_memory.c @@ -0,0 +1,710 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +#include +#include + +#include +#include + +#include "eal_internal_cfg.h" +#include "eal_memalloc.h" +#include "eal_memcfg.h" +#include "eal_options.h" +#include "eal_private.h" +#include "eal_windows.h" + +#include + +/* MinGW-w64 headers lack VirtualAlloc2() in some distributions. + * Provide a copy of definitions and code to load it dynamically. + * Note: definitions are copied verbatim from Microsoft documentation + * and don't follow DPDK code style. + * + * MEM_RESERVE_PLACEHOLDER being defined means VirtualAlloc2() is present too. + */ +#ifndef MEM_PRESERVE_PLACEHOLDER + +/* https://docs.microsoft.com/en-us/windows/win32/api/winnt/ne-winnt-mem_extended_parameter_type */ +typedef enum MEM_EXTENDED_PARAMETER_TYPE { + MemExtendedParameterInvalidType, + MemExtendedParameterAddressRequirements, + MemExtendedParameterNumaNode, + MemExtendedParameterPartitionHandle, + MemExtendedParameterUserPhysicalHandle, + MemExtendedParameterAttributeFlags, + MemExtendedParameterMax +} *PMEM_EXTENDED_PARAMETER_TYPE; + +#define MEM_EXTENDED_PARAMETER_TYPE_BITS 4 + +/* https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-mem_extended_parameter */ +typedef struct MEM_EXTENDED_PARAMETER { + struct { + DWORD64 Type : MEM_EXTENDED_PARAMETER_TYPE_BITS; + DWORD64 Reserved : 64 - MEM_EXTENDED_PARAMETER_TYPE_BITS; + } DUMMYSTRUCTNAME; + union { + DWORD64 ULong64; + PVOID Pointer; + SIZE_T Size; + HANDLE Handle; + DWORD ULong; + } DUMMYUNIONNAME; +} MEM_EXTENDED_PARAMETER, *PMEM_EXTENDED_PARAMETER; + +/* https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc2 */ +typedef PVOID (*VirtualAlloc2_type)( + HANDLE Process, + PVOID BaseAddress, + SIZE_T Size, + ULONG AllocationType, + ULONG PageProtection, + MEM_EXTENDED_PARAMETER *ExtendedParameters, + ULONG ParameterCount +); + +/* VirtualAlloc2() flags. */ +#define MEM_COALESCE_PLACEHOLDERS 0x00000001 +#define MEM_PRESERVE_PLACEHOLDER 0x00000002 +#define MEM_REPLACE_PLACEHOLDER 0x00004000 +#define MEM_RESERVE_PLACEHOLDER 0x00040000 + +/* Named exactly as the function, so that user code does not depend + * on it being found at compile time or dynamically. + */ +static VirtualAlloc2_type VirtualAlloc2; + +int +eal_mem_win32api_init(void) +{ + /* Contrary to the docs, VirtualAlloc2() is not in kernel32.dll, + * see https://github.com/MicrosoftDocs/feedback/issues/1129. + */ + static const char library_name[] = "kernelbase.dll"; + static const char function[] = "VirtualAlloc2"; + + HMODULE library = NULL; + int ret = 0; + + /* Already done. */ + if (VirtualAlloc2 != NULL) + return 0; + + library = LoadLibraryA(library_name); + if (library == NULL) { + RTE_LOG_WIN32_ERR("LoadLibraryA(\"%s\")", library_name); + return -1; + } + + VirtualAlloc2 = (VirtualAlloc2_type)( + (void *)GetProcAddress(library, function)); + if (VirtualAlloc2 == NULL) { + RTE_LOG_WIN32_ERR("GetProcAddress(\"%s\", \"%s\")\n", + library_name, function); + + /* Contrary to the docs, Server 2016 is not supported. */ + RTE_LOG(ERR, EAL, "Windows 10 or Windows Server 2019 " + " is required for memory management\n"); + ret = -1; + } + + FreeLibrary(library); + + return ret; +} + +#else + +/* Stub in case VirtualAlloc2() is provided by the compiler. */ +int +eal_mem_win32api_init(void) +{ + return 0; +} + +#endif /* defined(MEM_RESERVE_PLACEHOLDER) */ + +static HANDLE virt2phys_device = INVALID_HANDLE_VALUE; + +int +eal_mem_virt2iova_init(void) +{ + HDEVINFO list = INVALID_HANDLE_VALUE; + SP_DEVICE_INTERFACE_DATA ifdata; + SP_DEVICE_INTERFACE_DETAIL_DATA *detail = NULL; + DWORD detail_size; + int ret = -1; + + list = SetupDiGetClassDevs( + &GUID_DEVINTERFACE_VIRT2PHYS, NULL, NULL, + DIGCF_DEVICEINTERFACE | DIGCF_PRESENT); + if (list == INVALID_HANDLE_VALUE) { + RTE_LOG_WIN32_ERR("SetupDiGetClassDevs()"); + goto exit; + } + + ifdata.cbSize = sizeof(ifdata); + if (!SetupDiEnumDeviceInterfaces( + list, NULL, &GUID_DEVINTERFACE_VIRT2PHYS, 0, &ifdata)) { + RTE_LOG_WIN32_ERR("SetupDiEnumDeviceInterfaces()"); + goto exit; + } + + if (!SetupDiGetDeviceInterfaceDetail( + list, &ifdata, NULL, 0, &detail_size, NULL)) { + if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) { + RTE_LOG_WIN32_ERR( + "SetupDiGetDeviceInterfaceDetail(probe)"); + goto exit; + } + } + + detail = malloc(detail_size); + if (detail == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate virt2phys " + "device interface detail data\n"); + goto exit; + } + + detail->cbSize = sizeof(*detail); + if (!SetupDiGetDeviceInterfaceDetail( + list, &ifdata, detail, detail_size, NULL, NULL)) { + RTE_LOG_WIN32_ERR("SetupDiGetDeviceInterfaceDetail(read)"); + goto exit; + } + + RTE_LOG(DEBUG, EAL, "Found virt2phys device: %s\n", detail->DevicePath); + + virt2phys_device = CreateFile( + detail->DevicePath, 0, 0, NULL, OPEN_EXISTING, 0, NULL); + if (virt2phys_device == INVALID_HANDLE_VALUE) { + RTE_LOG_WIN32_ERR("CreateFile()"); + goto exit; + } + + /* Indicate success. */ + ret = 0; + +exit: + if (detail != NULL) + free(detail); + if (list != INVALID_HANDLE_VALUE) + SetupDiDestroyDeviceInfoList(list); + + return ret; +} + +phys_addr_t +rte_mem_virt2phy(const void *virt) +{ + LARGE_INTEGER phys; + DWORD bytes_returned; + + if (virt2phys_device == INVALID_HANDLE_VALUE) + return RTE_BAD_PHYS_ADDR; + + if (!DeviceIoControl( + virt2phys_device, IOCTL_VIRT2PHYS_TRANSLATE, + &virt, sizeof(virt), &phys, sizeof(phys), + &bytes_returned, NULL)) { + RTE_LOG_WIN32_ERR("DeviceIoControl(IOCTL_VIRT2PHYS_TRANSLATE)"); + return RTE_BAD_PHYS_ADDR; + } + + return phys.QuadPart; +} + +/* Windows currently only supports IOVA as PA. */ +rte_iova_t +rte_mem_virt2iova(const void *virt) +{ + phys_addr_t phys; + + if (virt2phys_device == INVALID_HANDLE_VALUE) + return RTE_BAD_IOVA; + + phys = rte_mem_virt2phy(virt); + if (phys == RTE_BAD_PHYS_ADDR) + return RTE_BAD_IOVA; + + return (rte_iova_t)phys; +} + +/* Always using physical addresses under Windows if they can be obtained. */ +int +rte_eal_using_phys_addrs(void) +{ + return virt2phys_device != INVALID_HANDLE_VALUE; +} + +/* Approximate error mapping from VirtualAlloc2() to POSIX mmap(3). */ +static void +set_errno_from_win32_alloc_error(DWORD code) +{ + switch (code) { + case ERROR_SUCCESS: + rte_errno = 0; + break; + + case ERROR_INVALID_ADDRESS: + /* A valid requested address is not available. */ + case ERROR_COMMITMENT_LIMIT: + /* May occur when committing regular memory. */ + case ERROR_NO_SYSTEM_RESOURCES: + /* Occurs when the system runs out of hugepages. */ + rte_errno = ENOMEM; + break; + + case ERROR_INVALID_PARAMETER: + default: + rte_errno = EINVAL; + break; + } +} + +void * +eal_mem_reserve(void *requested_addr, size_t size, int flags) +{ + HANDLE process; + void *virt; + + /* Windows requires hugepages to be committed. */ + if (flags & EAL_RESERVE_HUGEPAGES) { + rte_errno = ENOTSUP; + return NULL; + } + + process = GetCurrentProcess(); + + virt = VirtualAlloc2(process, requested_addr, size, + MEM_RESERVE | MEM_RESERVE_PLACEHOLDER, PAGE_NOACCESS, + NULL, 0); + if (virt == NULL) { + DWORD err = GetLastError(); + RTE_LOG_WIN32_ERR("VirtualAlloc2()"); + set_errno_from_win32_alloc_error(err); + return NULL; + } + + if ((flags & EAL_RESERVE_FORCE_ADDRESS) && (virt != requested_addr)) { + if (!VirtualFreeEx(process, virt, 0, MEM_RELEASE)) + RTE_LOG_WIN32_ERR("VirtualFreeEx()"); + rte_errno = ENOMEM; + return NULL; + } + + return virt; +} + +void * +eal_mem_alloc_socket(size_t size, int socket_id) +{ + DWORD flags = MEM_RESERVE | MEM_COMMIT; + void *addr; + + flags = MEM_RESERVE | MEM_COMMIT | MEM_LARGE_PAGES; + addr = VirtualAllocExNuma(GetCurrentProcess(), NULL, size, flags, + PAGE_READWRITE, eal_socket_numa_node(socket_id)); + if (addr == NULL) + rte_errno = ENOMEM; + return addr; +} + +void * +eal_mem_commit(void *requested_addr, size_t size, int socket_id) +{ + HANDLE process; + MEM_EXTENDED_PARAMETER param; + DWORD param_count = 0; + DWORD flags; + void *addr; + + process = GetCurrentProcess(); + + if (requested_addr != NULL) { + MEMORY_BASIC_INFORMATION info; + + if (VirtualQueryEx(process, requested_addr, &info, + sizeof(info)) != sizeof(info)) { + RTE_LOG_WIN32_ERR("VirtualQuery(%p)", requested_addr); + return NULL; + } + + /* Split reserved region if only a part is committed. */ + flags = MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER; + if ((info.RegionSize > size) && !VirtualFreeEx( + process, requested_addr, size, flags)) { + RTE_LOG_WIN32_ERR( + "VirtualFreeEx(%p, %zu, preserve placeholder)", + requested_addr, size); + return NULL; + } + + /* Temporarily release the region to be committed. + * + * There is an inherent race for this memory range + * if another thread allocates memory via OS API. + * However, VirtualAlloc2(MEM_REPLACE_PLACEHOLDER) + * doesn't work with MEM_LARGE_PAGES on Windows Server. + */ + if (!VirtualFreeEx(process, requested_addr, 0, MEM_RELEASE)) { + RTE_LOG_WIN32_ERR("VirtualFreeEx(%p, 0, release)", + requested_addr); + return NULL; + } + } + + if (socket_id != SOCKET_ID_ANY) { + param_count = 1; + memset(¶m, 0, sizeof(param)); + param.Type = MemExtendedParameterNumaNode; + param.ULong = eal_socket_numa_node(socket_id); + } + + flags = MEM_RESERVE | MEM_COMMIT | MEM_LARGE_PAGES; + addr = VirtualAlloc2(process, requested_addr, size, + flags, PAGE_READWRITE, ¶m, param_count); + if (addr == NULL) { + /* Logging may overwrite GetLastError() result. */ + DWORD err = GetLastError(); + RTE_LOG_WIN32_ERR("VirtualAlloc2(%p, %zu, commit large pages)", + requested_addr, size); + set_errno_from_win32_alloc_error(err); + return NULL; + } + + if ((requested_addr != NULL) && (addr != requested_addr)) { + /* We lost the race for the requested_addr. */ + if (!VirtualFreeEx(process, addr, 0, MEM_RELEASE)) + RTE_LOG_WIN32_ERR("VirtualFreeEx(%p, release)", addr); + + rte_errno = EADDRNOTAVAIL; + return NULL; + } + + return addr; +} + +int +eal_mem_decommit(void *addr, size_t size) +{ + HANDLE process; + void *stub; + DWORD flags; + + process = GetCurrentProcess(); + + /* Hugepages cannot be decommited on Windows, + * so free them and replace the block with a placeholder. + * There is a race for VA in this block until VirtualAlloc2 call. + */ + if (!VirtualFreeEx(process, addr, 0, MEM_RELEASE)) { + RTE_LOG_WIN32_ERR("VirtualFreeEx(%p, 0, release)", addr); + return -1; + } + + flags = MEM_RESERVE | MEM_RESERVE_PLACEHOLDER; + stub = VirtualAlloc2( + process, addr, size, flags, PAGE_NOACCESS, NULL, 0); + if (stub == NULL) { + /* We lost the race for the VA. */ + if (!VirtualFreeEx(process, stub, 0, MEM_RELEASE)) + RTE_LOG_WIN32_ERR("VirtualFreeEx(%p, release)", stub); + rte_errno = EADDRNOTAVAIL; + return -1; + } + + /* No need to join reserved regions adjacent to the freed one: + * eal_mem_commit() will just pick up the page-size placeholder + * created here. + */ + return 0; +} + +/** + * Free a reserved memory region in full or in part. + * + * @param addr + * Starting address of the area to free. + * @param size + * Number of bytes to free. Must be a multiple of page size. + * @param reserved + * Fail if the region is not in reserved state. + * @return + * * 0 on successful deallocation; + * * 1 if region must be in reserved state but it is not; + * * (-1) on system API failures. + */ +static int +mem_free(void *addr, size_t size, bool reserved) +{ + MEMORY_BASIC_INFORMATION info; + HANDLE process; + + process = GetCurrentProcess(); + + if (VirtualQueryEx( + process, addr, &info, sizeof(info)) != sizeof(info)) { + RTE_LOG_WIN32_ERR("VirtualQueryEx(%p)", addr); + return -1; + } + + if (reserved && (info.State != MEM_RESERVE)) + return 1; + + /* Free complete region. */ + if ((addr == info.AllocationBase) && (size == info.RegionSize)) { + if (!VirtualFreeEx(process, addr, 0, MEM_RELEASE)) { + RTE_LOG_WIN32_ERR("VirtualFreeEx(%p, 0, release)", + addr); + } + return 0; + } + + /* Split the part to be freed and the remaining reservation. */ + if (!VirtualFreeEx(process, addr, size, + MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER)) { + RTE_LOG_WIN32_ERR( + "VirtualFreeEx(%p, %zu, preserve placeholder)", + addr, size); + return -1; + } + + /* Actually free reservation part. */ + if (!VirtualFreeEx(process, addr, 0, MEM_RELEASE)) { + RTE_LOG_WIN32_ERR("VirtualFreeEx(%p, 0, release)", addr); + return -1; + } + + return 0; +} + +void +eal_mem_free(void *virt, size_t size) +{ + mem_free(virt, size, false); +} + +int +eal_mem_set_dump(void *virt, size_t size, bool dump) +{ + RTE_SET_USED(virt); + RTE_SET_USED(size); + RTE_SET_USED(dump); + + /* Windows does not dump reserved memory by default. + * + * There is to include or exclude regions from the dump, + * but this is not currently required by EAL. + */ + + rte_errno = ENOTSUP; + return -1; +} + +void * +rte_mem_map(void *requested_addr, size_t size, int prot, int flags, + int fd, size_t offset) +{ + HANDLE file_handle = INVALID_HANDLE_VALUE; + HANDLE mapping_handle = INVALID_HANDLE_VALUE; + DWORD sys_prot = 0; + DWORD sys_access = 0; + DWORD size_high = (DWORD)(size >> 32); + DWORD size_low = (DWORD)size; + DWORD offset_high = (DWORD)(offset >> 32); + DWORD offset_low = (DWORD)offset; + LPVOID virt = NULL; + + if (prot & RTE_PROT_EXECUTE) { + if (prot & RTE_PROT_READ) { + sys_prot = PAGE_EXECUTE_READ; + sys_access = FILE_MAP_READ | FILE_MAP_EXECUTE; + } + if (prot & RTE_PROT_WRITE) { + sys_prot = PAGE_EXECUTE_READWRITE; + sys_access = FILE_MAP_WRITE | FILE_MAP_EXECUTE; + } + } else { + if (prot & RTE_PROT_READ) { + sys_prot = PAGE_READONLY; + sys_access = FILE_MAP_READ; + } + if (prot & RTE_PROT_WRITE) { + sys_prot = PAGE_READWRITE; + sys_access = FILE_MAP_WRITE; + } + } + + if (flags & RTE_MAP_PRIVATE) + sys_access |= FILE_MAP_COPY; + + if ((flags & RTE_MAP_ANONYMOUS) == 0) + file_handle = (HANDLE)_get_osfhandle(fd); + + mapping_handle = CreateFileMapping( + file_handle, NULL, sys_prot, size_high, size_low, NULL); + if (mapping_handle == INVALID_HANDLE_VALUE) { + RTE_LOG_WIN32_ERR("CreateFileMapping()"); + return NULL; + } + + /* There is a race for the requested_addr between mem_free() + * and MapViewOfFileEx(). MapViewOfFile3() that can replace a reserved + * region with a mapping in a single operation, but it does not support + * private mappings. + */ + if (requested_addr != NULL) { + int ret = mem_free(requested_addr, size, true); + if (ret) { + if (ret > 0) { + RTE_LOG(ERR, EAL, "Cannot map memory " + "to a region not reserved\n"); + rte_errno = EADDRNOTAVAIL; + } + return NULL; + } + } + + virt = MapViewOfFileEx(mapping_handle, sys_access, + offset_high, offset_low, size, requested_addr); + if (!virt) { + RTE_LOG_WIN32_ERR("MapViewOfFileEx()"); + return NULL; + } + + if ((flags & RTE_MAP_FORCE_ADDRESS) && (virt != requested_addr)) { + if (!UnmapViewOfFile(virt)) + RTE_LOG_WIN32_ERR("UnmapViewOfFile()"); + virt = NULL; + } + + if (!CloseHandle(mapping_handle)) + RTE_LOG_WIN32_ERR("CloseHandle()"); + + return virt; +} + +int +rte_mem_unmap(void *virt, size_t size) +{ + RTE_SET_USED(size); + + if (!UnmapViewOfFile(virt)) { + RTE_LOG_WIN32_ERR("UnmapViewOfFile()"); + rte_errno = EINVAL; + return -1; + } + return 0; +} + +uint64_t +eal_get_baseaddr(void) +{ + /* Windows strategy for memory allocation is undocumented. + * Returning 0 here effectively disables address guessing + * unless user provides an address hint. + */ + return 0; +} + +size_t +rte_mem_page_size(void) +{ + static SYSTEM_INFO info; + + if (info.dwPageSize == 0) + GetSystemInfo(&info); + + return info.dwPageSize; +} + +int +rte_mem_lock(const void *virt, size_t size) +{ + /* VirtualLock() takes `void*`, work around compiler warning. */ + void *addr = (void *)((uintptr_t)virt); + + if (!VirtualLock(addr, size)) { + RTE_LOG_WIN32_ERR("VirtualLock(%p %#zx)", virt, size); + return -1; + } + + return 0; +} + +int +rte_eal_memseg_init(void) +{ + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + EAL_LOG_NOT_IMPLEMENTED(); + return -1; + } + + return eal_dynmem_memseg_lists_init(); +} + +static int +eal_nohuge_init(void) +{ + struct rte_mem_config *mcfg; + struct rte_memseg_list *msl; + int n_segs; + uint64_t mem_sz, page_sz; + void *addr; + + mcfg = rte_eal_get_configuration()->mem_config; + + /* nohuge mode is legacy mode */ + internal_config.legacy_mem = 1; + + msl = &mcfg->memsegs[0]; + + mem_sz = internal_config.memory; + page_sz = RTE_PGSIZE_4K; + n_segs = mem_sz / page_sz; + + if (eal_memseg_list_init_named( + msl, "nohugemem", page_sz, n_segs, 0, true)) { + return -1; + } + + addr = VirtualAlloc( + NULL, mem_sz, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); + if (addr == NULL) { + RTE_LOG_WIN32_ERR("VirtualAlloc(size=%#zx)", mem_sz); + RTE_LOG(ERR, EAL, "Cannot allocate memory\n"); + return -1; + } + + msl->base_va = addr; + msl->len = mem_sz; + + eal_memseg_list_populate(msl, addr, n_segs); + + if (mcfg->dma_maskbits && + rte_mem_check_dma_mask_thread_unsafe(mcfg->dma_maskbits)) { + RTE_LOG(ERR, EAL, + "%s(): couldn't allocate memory due to IOVA " + "exceeding limits of current DMA mask.\n", __func__); + return -1; + } + + return 0; +} + +int +rte_eal_hugepage_init(void) +{ + return internal_config.no_hugetlbfs ? + eal_nohuge_init() : eal_dynmem_hugepage_init(); +} + +int +rte_eal_hugepage_attach(void) +{ + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} diff --git a/lib/librte_eal/windows/eal_mp.c b/lib/librte_eal/windows/eal_mp.c new file mode 100644 index 000000000..16a5e8ba0 --- /dev/null +++ b/lib/librte_eal/windows/eal_mp.c @@ -0,0 +1,103 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +/** + * @file Multiprocess support stubs + * + * Stubs must log an error until implemented. If success is required + * for non-multiprocess operation, stub must log a warning and a comment + * must document what requires success emulation. + */ + +#include +#include + +#include "eal_private.h" +#include "eal_windows.h" +#include "malloc_mp.h" + +void +rte_mp_channel_cleanup(void) +{ + EAL_LOG_NOT_IMPLEMENTED(); +} + +int +rte_mp_action_register(const char *name, rte_mp_t action) +{ + RTE_SET_USED(name); + RTE_SET_USED(action); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +void +rte_mp_action_unregister(const char *name) +{ + RTE_SET_USED(name); + EAL_LOG_NOT_IMPLEMENTED(); +} + +int +rte_mp_sendmsg(struct rte_mp_msg *msg) +{ + RTE_SET_USED(msg); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply, + const struct timespec *ts) +{ + RTE_SET_USED(req); + RTE_SET_USED(reply); + RTE_SET_USED(ts); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, + rte_mp_async_reply_t clb) +{ + RTE_SET_USED(req); + RTE_SET_USED(ts); + RTE_SET_USED(clb); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_mp_reply(struct rte_mp_msg *msg, const char *peer) +{ + RTE_SET_USED(msg); + RTE_SET_USED(peer); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +register_mp_requests(void) +{ + /* Non-stub function succeeds if multi-process is not supported. */ + EAL_LOG_STUB(); + return 0; +} + +int +request_to_primary(struct malloc_mp_req *req) +{ + RTE_SET_USED(req); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +request_sync(void) +{ + /* Common memory allocator depends on this function success. */ + EAL_LOG_STUB(); + return 0; +} diff --git a/lib/librte_eal/windows/eal_windows.h b/lib/librte_eal/windows/eal_windows.h index f3ed8c37f..d48ee0a12 100644 --- a/lib/librte_eal/windows/eal_windows.h +++ b/lib/librte_eal/windows/eal_windows.h @@ -9,8 +9,24 @@ * @file Facilities private to Windows EAL */ +#include #include +/** + * Log current function as not implemented and set rte_errno. + */ +#define EAL_LOG_NOT_IMPLEMENTED() \ + do { \ + RTE_LOG(DEBUG, EAL, "%s() is not implemented\n", __func__); \ + rte_errno = ENOTSUP; \ + } while (0) + +/** + * Log current function as a stub. + */ +#define EAL_LOG_STUB() \ + RTE_LOG(DEBUG, EAL, "Windows: %s() is a stub\n", __func__) + /** * Create a map of processors and cores on the system. * @@ -39,4 +55,63 @@ int eal_thread_create(pthread_t *thread); */ unsigned int eal_socket_numa_node(unsigned int socket_id); +/** + * Open virt2phys driver interface device. + * + * @return 0 on success, (-1) on failure. + */ +int eal_mem_virt2iova_init(void); + +/** + * Locate Win32 memory management routines in system libraries. + * + * @return 0 on success, (-1) on failure. + */ +int eal_mem_win32api_init(void); + +/** + * Allocate new memory in hugepages on the specified NUMA node. + * + * @param size + * Number of bytes to allocate. Must be a multiple of huge page size. + * @param socket_id + * Socket ID. + * @return + * Address of the memory allocated on success or NULL on failure. + */ +void *eal_mem_alloc_socket(size_t size, int socket_id); + +/** + * Commit memory previously reserved with eal_mem_reserve() + * or decommitted from hugepages by eal_mem_decommit(). + * + * @param requested_addr + * Address within a reserved region. Must not be NULL. + * @param size + * Number of bytes to commit. Must be a multiple of page size. + * @param socket_id + * Socket ID to allocate on. Can be SOCKET_ID_ANY. + * @return + * On success, address of the committed memory, that is, requested_addr. + * On failure, NULL and rte_errno is set. + */ +void *eal_mem_commit(void *requested_addr, size_t size, int socket_id); + +/** + * Put allocated or committed memory back into reserved state. + * + * @param addr + * Address of the region to decommit. + * @param size + * Number of bytes to decommit, must be the size of a page + * (hugepage or regular one). + * + * The *addr* and *size* must match location and size + * of a previously allocated or committed region. + * + * @return + * 0 on success, (-1) on failure. + */ +int eal_mem_decommit(void *addr, size_t size); + #endif /* _EAL_WINDOWS_H_ */ diff --git a/lib/librte_eal/windows/include/meson.build b/lib/librte_eal/windows/include/meson.build index 5fb1962ac..b3534b025 100644 --- a/lib/librte_eal/windows/include/meson.build +++ b/lib/librte_eal/windows/include/meson.build @@ -5,5 +5,6 @@ includes += include_directories('.') headers += files( 'rte_os.h', + 'rte_virt2phys.h', 'rte_windows.h', ) diff --git a/lib/librte_eal/windows/include/rte_os.h b/lib/librte_eal/windows/include/rte_os.h index 510e39e03..cb10d6494 100644 --- a/lib/librte_eal/windows/include/rte_os.h +++ b/lib/librte_eal/windows/include/rte_os.h @@ -14,6 +14,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -36,6 +37,9 @@ extern "C" { #define strncasecmp(s1, s2, count) _strnicmp(s1, s2, count) +#define close _close +#define unlink _unlink + /* cpu_set macros implementation */ #define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2) #define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2) @@ -46,6 +50,7 @@ extern "C" { typedef long long ssize_t; #ifndef RTE_TOOLCHAIN_GCC + static inline int asprintf(char **buffer, const char *format, ...) { @@ -72,6 +77,18 @@ asprintf(char **buffer, const char *format, ...) } return ret; } + +static inline const char * +eal_strerror(int code) +{ + static char buffer[128]; + + strerror_s(buffer, sizeof(buffer), code); + return buffer; +} + +#define strerror eal_strerror + #endif /* RTE_TOOLCHAIN_GCC */ #ifdef __cplusplus diff --git a/lib/librte_eal/windows/include/rte_virt2phys.h b/lib/librte_eal/windows/include/rte_virt2phys.h new file mode 100644 index 000000000..4bb2b4aaf --- /dev/null +++ b/lib/librte_eal/windows/include/rte_virt2phys.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +/** + * @file virt2phys driver interface + */ + +/** + * Driver device interface GUID {539c2135-793a-4926-afec-d3a1b61bbc8a}. + */ +DEFINE_GUID(GUID_DEVINTERFACE_VIRT2PHYS, + 0x539c2135, 0x793a, 0x4926, + 0xaf, 0xec, 0xd3, 0xa1, 0xb6, 0x1b, 0xbc, 0x8a); + +/** + * Driver device type for IO control codes. + */ +#define VIRT2PHYS_DEVTYPE 0x8000 + +/** + * Translate a valid non-paged virtual address to a physical address. + * + * Note: A physical address zero (0) is reported if input address + * is paged out or not mapped. However, if input is a valid mapping + * of I/O port 0x0000, output is also zero. There is no way + * to distinguish between these cases by return value only. + * + * Input: a non-paged virtual address (PVOID). + * + * Output: the corresponding physical address (LARGE_INTEGER). + */ +#define IOCTL_VIRT2PHYS_TRANSLATE CTL_CODE( \ + VIRT2PHYS_DEVTYPE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS) diff --git a/lib/librte_eal/windows/include/rte_windows.h b/lib/librte_eal/windows/include/rte_windows.h index ed6e4c148..899ed7d87 100644 --- a/lib/librte_eal/windows/include/rte_windows.h +++ b/lib/librte_eal/windows/include/rte_windows.h @@ -23,6 +23,8 @@ #include #include +#include +#include /* Have GUIDs defined. */ #ifndef INITGUID diff --git a/lib/librte_eal/windows/include/unistd.h b/lib/librte_eal/windows/include/unistd.h index 757b7f3c5..6b33005b2 100644 --- a/lib/librte_eal/windows/include/unistd.h +++ b/lib/librte_eal/windows/include/unistd.h @@ -9,4 +9,7 @@ * as Microsoft libc does not contain unistd.h. This may be removed * in future releases. */ + +#include + #endif /* _UNISTD_H_ */ diff --git a/lib/librte_eal/windows/meson.build b/lib/librte_eal/windows/meson.build index 52978e9d7..ded5a2b80 100644 --- a/lib/librte_eal/windows/meson.build +++ b/lib/librte_eal/windows/meson.build @@ -6,10 +6,16 @@ subdir('include') sources += files( 'eal.c', 'eal_debug.c', + 'eal_file.c', 'eal_hugepages.c', 'eal_lcore.c', 'eal_log.c', + 'eal_memalloc.c', + 'eal_memory.c', + 'eal_mp.c', 'eal_thread.c', 'fnmatch.c', 'getopt.c', ) + +dpdk_conf.set10('RTE_EAL_NUMA_AWARE_HUGEPAGES', true)