From patchwork Tue Oct 2 13:34:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45888 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6B0291B11F; Tue, 2 Oct 2018 15:35:32 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id E06E65B3C for ; Tue, 2 Oct 2018 15:35:16 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="79224901" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 02 Oct 2018 06:35:00 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ04j009124; Tue, 2 Oct 2018 14:35:00 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DYxZL031919; Tue, 2 Oct 2018 14:34:59 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DYxE1031912; Tue, 2 Oct 2018 14:34:59 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , Bruce Richardson , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:39 +0100 Message-Id: <68b1cc34d69c96f16e3c6068daaa9b2a392660cc.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Previously, to calculate length of memory area covered by a memseg list, we would've needed to multiply page size by length of fbarray backing that memseg list. This is not obvious and unnecessarily low level, so store length in the memseg list itself. This breaks ABI, so bump the EAL ABI version and document the change. Also, while we're breaking ABI, pack the members a little better. Signed-off-by: Anatoly Burakov Acked-by: Shreyansh Jain --- doc/guides/rel_notes/release_18_11.rst | 8 +++++++- drivers/bus/pci/linux/pci.c | 2 +- lib/librte_eal/bsdapp/eal/Makefile | 2 +- lib/librte_eal/bsdapp/eal/eal_memory.c | 2 ++ lib/librte_eal/common/eal_common_memory.c | 5 ++--- lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++- lib/librte_eal/linuxapp/eal/Makefile | 2 +- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 3 ++- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++- lib/librte_eal/meson.build | 2 +- 10 files changed, 22 insertions(+), 11 deletions(-) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index a8327ea77..58bb79022 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -153,6 +153,12 @@ ABI Changes ========================================================= +* eal: EAL library ABI version was changed due to previously announced work on + supporting external memory in DPDK: + - structure ``rte_memseg_list`` now has a new field indicating length + of memory addressed by the segment list + + Removed Items ------------- @@ -198,7 +204,7 @@ The libraries prepended with a plus sign were incremented in this version. librte_compressdev.so.1 librte_cryptodev.so.5 librte_distributor.so.1 - librte_eal.so.8 + + librte_eal.so.9 librte_ethdev.so.10 + librte_eventdev.so.6 librte_flow_classify.so.1 diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 04648ac93..d6e1027ab 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev) static int find_max_end_va(const struct rte_memseg_list *msl, void *arg) { - size_t sz = msl->memseg_arr.len * msl->page_sz; + size_t sz = msl->len; void *end_va = RTE_PTR_ADD(msl->base_va, sz); void **max_va = arg; diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index d27da3d15..97bff4852 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs EXPORT_MAP := ../../rte_eal_version.map -LIBABIVER := 8 +LIBABIVER := 9 # specific to bsdapp exec-env SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index 16d2bc7c3..65ea670f9 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -79,6 +79,7 @@ rte_eal_hugepage_init(void) } msl->base_va = addr; msl->page_sz = page_sz; + msl->len = internal_config.memory; msl->socket_id = 0; /* populate memsegs. each memseg is 1 page long */ @@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl) return -1; } msl->base_va = addr; + msl->len = mem_sz; return 0; } diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 0b69804ff..30d018209 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl) /* a memseg list was specified, check if it's the right one */ start = msl->base_va; - end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len); + end = RTE_PTR_ADD(start, msl->len); if (addr < start || addr >= end) return NULL; @@ -194,8 +194,7 @@ virt2memseg_list(const void *addr) msl = &mcfg->memsegs[msl_idx]; start = msl->base_va; - end = RTE_PTR_ADD(start, - (size_t)msl->page_sz * msl->memseg_arr.len); + end = RTE_PTR_ADD(start, msl->len); if (addr >= start && addr < end) break; } diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index aff0688dd..1d2362985 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -30,9 +30,10 @@ struct rte_memseg_list { uint64_t addr_64; /**< Makes sure addr is always 64-bits */ }; - int socket_id; /**< Socket ID for all memsegs in this list. */ uint64_t page_sz; /**< Page size for all memsegs in this list. */ + int socket_id; /**< Socket ID for all memsegs in this list. */ volatile uint32_t version; /**< version number for multiprocess sync. */ + size_t len; /**< Length of memory area covered by this memseg list. */ struct rte_fbarray memseg_arr; }; diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index fd92c75c2..5c16bc40f 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH) EXPORT_MAP := ../../rte_eal_version.map VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR) -LIBABIVER := 8 +LIBABIVER := 9 VPATH += $(RTE_SDK)/lib/librte_eal/common diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index b2e2a9599..71a6e0fd9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg) int msl_idx, seg_idx, ret, dir_fd = -1; start_addr = (uintptr_t) msl->base_va; - end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz; + end_addr = start_addr + msl->len; if ((uintptr_t)wa->ms->addr < start_addr || (uintptr_t)wa->ms->addr >= end_addr) @@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl, return -1; } local_msl->base_va = primary_msl->base_va; + local_msl->len = primary_msl->len; return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index e3ac24815..897d94179 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl) return -1; } msl->base_va = addr; + msl->len = mem_sz; return 0; } @@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void) msl->base_va = addr; msl->page_sz = page_sz; msl->socket_id = 0; + msl->len = internal_config.memory; /* populate memsegs. each memseg is one page long */ for (cur_seg = 0; cur_seg < n_segs; cur_seg++) { @@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void) if (msl->memseg_arr.count > 0) continue; /* this is an unused list, deallocate it */ - mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len; + mem_sz = msl->len; munmap(msl->base_va, mem_sz); msl->base_va = NULL; diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build index e1fde15d1..62ef985b9 100644 --- a/lib/librte_eal/meson.build +++ b/lib/librte_eal/meson.build @@ -21,7 +21,7 @@ else error('unsupported system type "@0@"'.format(host_machine.system())) endif -version = 8 # the version of the EAL API +version = 9 # the version of the EAL API allow_experimental_apis = true deps += 'compat' deps += 'kvargs' From patchwork Tue Oct 2 13:34:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45878 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 35F225F2B; Tue, 2 Oct 2018 15:35:15 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id BD0675A44 for ; Tue, 2 Oct 2018 15:35:09 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="91470288" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 02 Oct 2018 06:35:04 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ3ba009179; Tue, 2 Oct 2018 14:35:03 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ3gd032033; Tue, 2 Oct 2018 14:35:03 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ0qR031931; Tue, 2 Oct 2018 14:35:00 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Neil Horman , John McNamara , Marko Kovacevic , Hemant Agrawal , Shreyansh Jain , Shahaf Shuler , Yongseok Koh , Maxime Coquelin , Tiwei Bie , Zhihong Wang , Bruce Richardson , Olivier Matz , Andrew Rybchenko , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, thomas@monjalon.net, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:40 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When we allocate and use DPDK memory, we need to be able to differentiate between DPDK hugepage segments and segments that were made part of DPDK but are externally allocated. Add such a property to memseg lists. This breaks the ABI, so document the change in release notes. This also breaks a few internal assumptions about memory contiguousness, so adjust malloc code in a few places. All current calls for memseg walk functions were adjusted to ignore external segments where it made sense. Mempools is a special case, because we may be asked to allocate a mempool on a specific socket, and we need to ignore all page sizes on other heaps or other sockets. Previously, this assumption of knowing all page sizes was not a problem, but it will be now, so we have to match socket ID with page size when calculating minimum page size for a mempool. Signed-off-by: Anatoly Burakov Acked-by: Andrew Rybchenko Acked-by: Yongseok Koh --- Notes: v3: - Add comment to explain the process of picking up minimum page sizes for mempool v2: - Add documentation changes and ABI break v1: - Adjust all calls to memseg walk functions to ignore external segments where it made sense to do so doc/guides/rel_notes/deprecation.rst | 15 -------- doc/guides/rel_notes/release_18_11.rst | 8 +++++ drivers/bus/fslmc/fslmc_vfio.c | 6 +++- drivers/net/mlx5/mlx5.c | 4 ++- drivers/net/virtio/virtio_user/vhost_kernel.c | 3 ++ lib/librte_eal/bsdapp/eal/eal.c | 3 ++ lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++-- lib/librte_eal/common/eal_common_memory.c | 3 ++ .../common/include/rte_eal_memconfig.h | 1 + lib/librte_eal/common/include/rte_memory.h | 9 +++++ lib/librte_eal/common/malloc_elem.c | 10 ++++-- lib/librte_eal/common/malloc_heap.c | 9 +++-- lib/librte_eal/common/rte_malloc.c | 2 +- lib/librte_eal/linuxapp/eal/eal.c | 10 +++++- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++ lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++--- lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++----- test/test/test_malloc.c | 3 ++ test/test/test_memzone.c | 3 ++ 19 files changed, 119 insertions(+), 38 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 138335dfb..d2aec64d1 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here. Deprecation Notices ------------------- -* eal: certain structures will change in EAL on account of upcoming external - memory support. Aside from internal changes leading to an ABI break, the - following externally visible changes will also be implemented: - - - ``rte_memseg_list`` will change to include a boolean flag indicating - whether a particular memseg list is externally allocated. This will have - implications for any users of memseg-walk-related functions, as they will - now have to skip externally allocated segments in most cases if the intent - is to only iterate over internal DPDK memory. - - ``socket_id`` parameter across the entire DPDK will gain additional meaning, - as some socket ID's will now be representing externally allocated memory. No - changes will be required for existing code as backwards compatibility will - be kept, and those who do not use this feature will not see these extra - socket ID's. - * eal: both declaring and identifying devices will be streamlined in v18.11. New functions will appear to query a specific port from buses, classes of device and device drivers. Device declaration will be made coherent with the diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 58bb79022..bc1d56130 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -118,6 +118,12 @@ API Changes Also, make sure to start the actual text at the margin. ========================================================= +* eal: ``rte_memseg_list`` structure now has an additional flag indicating + whether the memseg list is externally allocated. This will have implications + for any users of memseg-walk-related functions, as they will now have to skip + externally allocated segments in most cases if the intent is to only iterate + over internal DPDK memory. + * mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()`` functions were deprecated since 17.05 and are replaced by ``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``. @@ -157,6 +163,8 @@ ABI Changes supporting external memory in DPDK: - structure ``rte_memseg_list`` now has a new field indicating length of memory addressed by the segment list + - structure ``rte_memseg_list`` now has a new flag indicating whether + the memseg list refers to external memory Removed Items diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c index 4c2cd2a87..cb33dd891 100644 --- a/drivers/bus/fslmc/fslmc_vfio.c +++ b/drivers/bus/fslmc/fslmc_vfio.c @@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len) static int fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused, - const struct rte_memseg *ms, void *arg) + const struct rte_memseg *ms, void *arg) { int *n_segs = arg; int ret; + /* if IOVA address is invalid, skip */ + if (ms->iova == RTE_BAD_IOVA) + return 0; + ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len); if (ret) DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)", diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index fd89e2af3..af4a78ce9 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver; static void *uar_base; static int -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused, +find_lower_va_bound(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { void **addr = arg; + if (msl->external) + return 0; if (*addr == NULL) *addr = ms->addr; else diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c index b3bfcb76f..990ce80ce 100644 --- a/drivers/net/virtio/virtio_user/vhost_kernel.c +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c @@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg) void *start_addr; uint64_t len; + if (msl->external) + return 0; + if (vm->nregions >= max_regions) return -1; diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index d7ae9d686..7735194a3 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg) { int *socket_id = arg; + if (msl->external) + return 0; + if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0) return 1; diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index 65ea670f9..4b092e1f2 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -236,12 +236,15 @@ struct attach_walk_args { int seg_idx; }; static int -attach_segment(const struct rte_memseg_list *msl __rte_unused, - const struct rte_memseg *ms, void *arg) +attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms, + void *arg) { struct attach_walk_args *wa = arg; void *addr; + if (msl->external) + return 0; + addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, wa->fd_hugepage, wa->seg_idx * EAL_PAGE_SIZE); diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 30d018209..a2461ed79 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg) { uint64_t *total_len = arg; + if (msl->external) + return 0; + *total_len += msl->memseg_arr.count * msl->page_sz; return 0; diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 1d2362985..645288b02 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -34,6 +34,7 @@ struct rte_memseg_list { int socket_id; /**< Socket ID for all memsegs in this list. */ volatile uint32_t version; /**< version number for multiprocess sync. */ size_t len; /**< Length of memory area covered by this memseg list. */ + unsigned int external; /**< 1 if this list points to external memory */ struct rte_fbarray memseg_arr; }; diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index 14bd277a4..ffdd56bfb 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl, * @note This function read-locks the memory hotplug subsystem, and thus cannot * be used within memory-related callback functions. * + * @note This function will also walk through externally allocated segments. It + * is up to the user to decide whether to skip through these segments. + * * @param func * Iterator function * @param arg @@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg); * @note This function read-locks the memory hotplug subsystem, and thus cannot * be used within memory-related callback functions. * + * @note This function will also walk through externally allocated segments. It + * is up to the user to decide whether to skip through these segments. + * * @param func * Iterator function * @param arg @@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg); * @note This function read-locks the memory hotplug subsystem, and thus cannot * be used within memory-related callback functions. * + * @note This function will also walk through externally allocated segments. It + * is up to the user to decide whether to skip through these segments. + * * @param func * Iterator function * @param arg diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c index e0a8ed15b..1a74660de 100644 --- a/lib/librte_eal/common/malloc_elem.c +++ b/lib/librte_eal/common/malloc_elem.c @@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align) contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align); /* if we're in IOVA as VA mode, or if we're in legacy mode with - * hugepages, all elements are IOVA-contiguous. + * hugepages, all elements are IOVA-contiguous. however, we can only + * make these assumptions about internal memory - externally allocated + * segments have to be checked. */ - if (rte_eal_iova_mode() == RTE_IOVA_VA || - (internal_config.legacy_mem && rte_eal_has_hugepages())) + if (!elem->msl->external && + (rte_eal_iova_mode() == RTE_IOVA_VA || + (internal_config.legacy_mem && + rte_eal_has_hugepages()))) return RTE_PTR_DIFF(data_end, contig_seg_start); cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz); diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index ac7bbb3ba..3c8e2063b 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl, struct malloc_heap *heap; int msl_idx; + if (msl->external) + return 0; + heap = &mcfg->malloc_heaps[msl->socket_id]; /* msl is const, so find it */ @@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem) /* anything after this is a bonus */ ret = 0; - /* ...of which we can't avail if we are in legacy mode */ - if (internal_config.legacy_mem) + /* ...of which we can't avail if we are in legacy mode, or if this is an + * externally allocated segment. + */ + if (internal_config.legacy_mem || msl->external) goto free_unlock; /* check if we can free any memory back to the system */ diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index b51a6d111..47ca5a742 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr) if (elem == NULL) return RTE_BAD_IOVA; - if (rte_eal_iova_mode() == RTE_IOVA_VA) + if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA) return (uintptr_t) addr; ms = rte_mem_virt2memseg(addr, elem->msl); diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index e59ac6577..253a6aece 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg) { int *socket_id = arg; + if (msl->external) + return 0; + return *socket_id == msl->socket_id; } @@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg __rte_unused) { /* ms is const, so find this memseg */ - struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl); + struct rte_memseg *found; + + if (msl->external) + return 0; + + found = rte_mem_virt2memseg(ms->addr, msl); found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE; diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 71a6e0fd9..f6a0098af 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused) unsigned int i; int msl_idx; + if (msl->external) + return 0; + msl_idx = msl - mcfg->memsegs; primary_msl = &mcfg->memsegs[msl_idx]; local_msl = &local_memsegs[msl_idx]; @@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl, char name[PATH_MAX]; int msl_idx, ret; + if (msl->external) + return 0; + msl_idx = msl - mcfg->memsegs; primary_msl = &mcfg->memsegs[msl_idx]; local_msl = &local_memsegs[msl_idx]; @@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl, unsigned int len; int msl_idx; + if (msl->external) + return 0; + msl_idx = msl - mcfg->memsegs; len = msl->memseg_arr.len; diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index c68dc38e0..fddbc3b54 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base, } static int -type1_map(const struct rte_memseg_list *msl __rte_unused, - const struct rte_memseg *ms, void *arg) +type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms, + void *arg) { int *vfio_container_fd = arg; + if (msl->external) + return 0; + return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova, ms->len, 1); } @@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, } static int -vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused, +vfio_spapr_map_walk(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { int *vfio_container_fd = arg; + if (msl->external) + return 0; + return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova, ms->len, 1); } @@ -1210,12 +1216,15 @@ struct spapr_walk_param { uint64_t hugepage_sz; }; static int -vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused, +vfio_spapr_window_size_walk(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { struct spapr_walk_param *param = arg; uint64_t max = ms->iova + ms->len; + if (msl->external) + return 0; + if (max > param->window_size) { param->hugepage_sz = ms->hugepage_sz; param->window_size = max; diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 03e6b5f73..2ed539f01 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size) return new_obj_size * RTE_MEMPOOL_ALIGN; } +struct pagesz_walk_arg { + int socket_id; + size_t min; +}; + static int find_min_pagesz(const struct rte_memseg_list *msl, void *arg) { - size_t *min = arg; + struct pagesz_walk_arg *wa = arg; + bool valid; - if (msl->page_sz < *min) - *min = msl->page_sz; + /* + * we need to only look at page sizes available for a particular socket + * ID. so, we either need an exact match on socket ID (can match both + * native and external memory), or, if SOCKET_ID_ANY was specified as a + * socket ID argument, we must only look at native memory and ignore any + * page sizes associated with external memory. + */ + valid = msl->socket_id == wa->socket_id; + valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0; + + if (valid && msl->page_sz < wa->min) + wa->min = msl->page_sz; return 0; } static size_t -get_min_page_size(void) +get_min_page_size(int socket_id) { - size_t min_pagesz = SIZE_MAX; + struct pagesz_walk_arg wa; - rte_memseg_list_walk(find_min_pagesz, &min_pagesz); + wa.min = SIZE_MAX; + wa.socket_id = socket_id; - return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz; + rte_memseg_list_walk(find_min_pagesz, &wa); + + return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min; } @@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp) pg_sz = 0; pg_shift = 0; } else if (try_contig) { - pg_sz = get_min_page_size(); + pg_sz = get_min_page_size(mp->socket_id); pg_shift = rte_bsf32(pg_sz); } else { pg_sz = getpagesize(); diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c index 4b5abb4e0..5e5272419 100644 --- a/test/test/test_malloc.c +++ b/test/test/test_malloc.c @@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg) { int32_t *socket = arg; + if (msl->external) + return 0; + return *socket == msl->socket_id; } diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c index 452d7cc5e..9fe465e62 100644 --- a/test/test/test_memzone.c +++ b/test/test/test_memzone.c @@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg) { struct walk_arg *wa = arg; + if (msl->external) + return 0; + if (msl->page_sz == RTE_PGSIZE_2M) wa->hugepage_2MB_avail = 1; if (msl->page_sz == RTE_PGSIZE_1G) From patchwork Tue Oct 2 13:34:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45877 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 61C475F16; Tue, 2 Oct 2018 15:35:13 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id D91055B2C for ; Tue, 2 Oct 2018 15:35:09 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="95772875" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 02 Oct 2018 06:35:04 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ3QO009182; Tue, 2 Oct 2018 14:35:03 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ3LT032040; Tue, 2 Oct 2018 14:35:03 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ336032036; Tue, 2 Oct 2018 14:35:03 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Thomas Monjalon , Bruce Richardson , John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:41 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Switch over all parts of EAL to use heap ID instead of NUMA node ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA node's index within the detected NUMA node list. Heap ID for external heaps will be order of their creation. This breaks the ABI, so document the changes. Signed-off-by: Anatoly Burakov --- config/common_base | 1 + config/rte_config.h | 1 + doc/guides/rel_notes/release_18_11.rst | 5 +- .../common/include/rte_eal_memconfig.h | 4 +- .../common/include/rte_malloc_heap.h | 1 + lib/librte_eal/common/malloc_heap.c | 102 +++++++++++++----- lib/librte_eal/common/malloc_heap.h | 3 + lib/librte_eal/common/rte_malloc.c | 41 ++++--- 8 files changed, 114 insertions(+), 44 deletions(-) diff --git a/config/common_base b/config/common_base index acc5211bc..83350e0b1 100644 --- a/config/common_base +++ b/config/common_base @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64 CONFIG_RTE_LIBRTE_EAL=y CONFIG_RTE_MAX_LCORE=128 CONFIG_RTE_MAX_NUMA_NODES=8 +CONFIG_RTE_MAX_HEAPS=32 CONFIG_RTE_MAX_MEMSEG_LISTS=64 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller diff --git a/config/rte_config.h b/config/rte_config.h index 20c58dff1..816e6f879 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -24,6 +24,7 @@ #define RTE_BUILD_SHARED_LIB /* EAL defines */ +#define RTE_MAX_HEAPS 32 #define RTE_MAX_MEMSEG_LISTS 128 #define RTE_MAX_MEMSEG_PER_LIST 8192 #define RTE_MAX_MEM_MB_PER_LIST 32768 diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index bc1d56130..0607a3980 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -165,7 +165,10 @@ ABI Changes of memory addressed by the segment list - structure ``rte_memseg_list`` now has a new flag indicating whether the memseg list refers to external memory - + - structure ``rte_malloc_heap`` now has a new field indicating socket + ID the malloc heap belongs to + - structure ``rte_mem_config`` has had its ``malloc_heaps`` array + resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value Removed Items ------------- diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 645288b02..7634bff5d 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -72,8 +72,8 @@ struct rte_mem_config { struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */ - /* Heaps of Malloc per socket */ - struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES]; + /* Heaps of Malloc */ + struct malloc_heap malloc_heaps[RTE_MAX_HEAPS]; /* address of mem_config in primary process. used to map shared config into * exact same address the primary process maps it. diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h index d43fa9097..d432cef88 100644 --- a/lib/librte_eal/common/include/rte_malloc_heap.h +++ b/lib/librte_eal/common/include/rte_malloc_heap.h @@ -26,6 +26,7 @@ struct malloc_heap { struct malloc_elem *volatile last; unsigned alloc_count; + unsigned int socket_id; size_t total_size; } __rte_cache_aligned; diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 3c8e2063b..a9cfa423f 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz) return check_flag & flags; } +int +malloc_socket_to_heap_id(unsigned int socket_id) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int i; + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + + if (heap->socket_id == socket_id) + return i; + } + return -1; +} + /* * Expand the heap with a memory area. */ @@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl, struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct rte_memseg_list *found_msl; struct malloc_heap *heap; - int msl_idx; + int msl_idx, heap_idx; if (msl->external) return 0; - heap = &mcfg->malloc_heaps[msl->socket_id]; + heap_idx = malloc_socket_to_heap_id(msl->socket_id); + if (heap_idx < 0) { + RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n"); + return -1; + } + heap = &mcfg->malloc_heaps[heap_idx]; /* msl is const, so find it */ msl_idx = msl - mcfg->memsegs; @@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl, malloc_heap_add_memory(heap, found_msl, ms->addr, len); heap->total_size += len; + heap->socket_id = msl->socket_id; RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20, msl->socket_id); @@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket, /* this will try lower page sizes first */ static void * -heap_alloc_on_socket(const char *type, size_t size, int socket, - unsigned int flags, size_t align, size_t bound, bool contig) +malloc_heap_alloc_on_heap_id(const char *type, size_t size, + unsigned int heap_id, unsigned int flags, size_t align, + size_t bound, bool contig) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - struct malloc_heap *heap = &mcfg->malloc_heaps[socket]; + struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id]; unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY; + int socket_id; void *ret; rte_spinlock_lock(&(heap->lock)); @@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket, * we may still be able to allocate memory from appropriate page sizes, * we just need to request more memory first. */ + + socket_id = rte_socket_id_by_idx(heap_id); + /* + * if socket ID is negative, we cannot find a socket ID for this heap - + * which means it's an external heap. those can have unexpected page + * sizes, so if the user asked to allocate from there - assume user + * knows what they're doing, and allow allocating from there with any + * page size flags. + */ + if (socket_id < 0) + size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY; + ret = heap_alloc(heap, type, size, size_flags, align, bound, contig); if (ret != NULL) goto alloc_unlock; - if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound, - contig)) { + /* if socket ID is invalid, this is an external heap */ + if (socket_id < 0) + goto alloc_unlock; + + if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align, + bound, contig)) { ret = heap_alloc(heap, type, size, flags, align, bound, contig); /* this should have succeeded */ @@ -605,7 +644,7 @@ void * malloc_heap_alloc(const char *type, size_t size, int socket_arg, unsigned int flags, size_t align, size_t bound, bool contig) { - int socket, i, cur_socket; + int socket, heap_id, i; void *ret; /* return NULL if size is 0 or alignment is not power-of-2 */ @@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg, else socket = socket_arg; - /* Check socket parameter */ - if (socket >= RTE_MAX_NUMA_NODES) + /* turn socket ID into heap ID */ + heap_id = malloc_socket_to_heap_id(socket); + /* if heap id is negative, socket ID was invalid */ + if (heap_id < 0) return NULL; - ret = heap_alloc_on_socket(type, size, socket, flags, align, bound, - contig); + ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align, + bound, contig); if (ret != NULL || socket_arg != SOCKET_ID_ANY) return ret; - /* try other heaps */ + /* try other heaps. we are only iterating through native DPDK sockets, + * so external heaps won't be included. + */ for (i = 0; i < (int) rte_socket_count(); i++) { - cur_socket = rte_socket_id_by_idx(i); - if (cur_socket == socket) + if (i == heap_id) continue; - ret = heap_alloc_on_socket(type, size, cur_socket, flags, - align, bound, contig); + ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align, + bound, contig); if (ret != NULL) return ret; } @@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg, } static void * -heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags, - size_t align, bool contig) +heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id, + unsigned int flags, size_t align, bool contig) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - struct malloc_heap *heap = &mcfg->malloc_heaps[socket]; + struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id]; void *ret; rte_spinlock_lock(&(heap->lock)); @@ -665,7 +707,7 @@ void * malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags, size_t align, bool contig) { - int socket, i, cur_socket; + int socket, i, cur_socket, heap_id; void *ret; /* return NULL if align is not power-of-2 */ @@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags, else socket = socket_arg; - /* Check socket parameter */ - if (socket >= RTE_MAX_NUMA_NODES) + /* turn socket ID into heap ID */ + heap_id = malloc_socket_to_heap_id(socket); + /* if heap id is negative, socket ID was invalid */ + if (heap_id < 0) return NULL; - ret = heap_alloc_biggest_on_socket(type, socket, flags, align, + ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align, contig); if (ret != NULL || socket_arg != SOCKET_ID_ANY) return ret; @@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags, cur_socket = rte_socket_id_by_idx(i); if (cur_socket == socket) continue; - ret = heap_alloc_biggest_on_socket(type, cur_socket, flags, - align, contig); + ret = heap_alloc_biggest_on_heap_id(type, i, flags, align, + contig); if (ret != NULL) return ret; } @@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem) /* ...of which we can't avail if we are in legacy mode, or if this is an * externally allocated segment. */ - if (internal_config.legacy_mem || msl->external) + if (internal_config.legacy_mem || (msl->external > 0)) goto free_unlock; /* check if we can free any memory back to the system */ @@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size) } /* - * Function to retrieve data for heap on given socket + * Function to retrieve data for a given heap */ int malloc_heap_get_stats(struct malloc_heap *heap, @@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap, } /* - * Function to retrieve data for heap on given socket + * Function to retrieve data for a given heap */ void malloc_heap_dump(struct malloc_heap *heap, FILE *f) diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index f52cb5559..61b844b6f 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap, void malloc_heap_dump(struct malloc_heap *heap, FILE *f); +int +malloc_socket_to_heap_id(unsigned int socket_id); + int rte_eal_malloc_heap_init(void); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 47ca5a742..73d6df31d 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int heap_idx, ret = -1; - if (socket >= RTE_MAX_NUMA_NODES || socket < 0) - return -1; + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); - return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats); + heap_idx = malloc_socket_to_heap_id(socket); + if (heap_idx < 0) + goto unlock; + + ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx], + socket_stats); +unlock: + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; } /* @@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f) struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; unsigned int idx; - for (idx = 0; idx < rte_socket_count(); idx++) { - unsigned int socket = rte_socket_id_by_idx(idx); - fprintf(f, "Heap on socket %i:\n", socket); - malloc_heap_dump(&mcfg->malloc_heaps[socket], f); + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + + for (idx = 0; idx < RTE_MAX_HEAPS; idx++) { + fprintf(f, "Heap id: %u\n", idx); + malloc_heap_dump(&mcfg->malloc_heaps[idx], f); } + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); } /* @@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f) void rte_malloc_dump_stats(FILE *f, __rte_unused const char *type) { - unsigned int socket; + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int heap_id; struct rte_malloc_socket_stats sock_stats; + + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + /* Iterate through all initialised heaps */ - for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) { - if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0)) - continue; + for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id]; - fprintf(f, "Socket:%u\n", socket); + malloc_heap_get_stats(heap, &sock_stats); + + fprintf(f, "Heap id:%u\n", heap_id); fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes); fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes); fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes); @@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type) fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count); fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count); } + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); return; } From patchwork Tue Oct 2 13:34:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45876 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DC50E5B38; Tue, 2 Oct 2018 15:35:10 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 49B884C96 for ; Tue, 2 Oct 2018 15:35:09 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="268815288" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 02 Oct 2018 06:35:04 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ4oU009194; Tue, 2 Oct 2018 14:35:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ32N032047; Tue, 2 Oct 2018 14:35:03 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ3pU032043; Tue, 2 Oct 2018 14:35:03 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:42 +0100 Message-Id: <856a06c16917528dfcc6aaba0e9a18c8ce41eedd.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 04/21] mem: do not check for invalid socket ID X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. This changes the semantics of what we understand by "socket ID", so document the change in the release notes. Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/release_18_11.rst | 7 +++++++ lib/librte_eal/common/eal_common_memzone.c | 8 +++++--- lib/librte_eal/common/malloc_heap.c | 2 +- lib/librte_eal/common/rte_malloc.c | 4 ---- 4 files changed, 13 insertions(+), 8 deletions(-) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 0607a3980..172c42f71 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -123,6 +123,13 @@ API Changes for any users of memseg-walk-related functions, as they will now have to skip externally allocated segments in most cases if the intent is to only iterate over internal DPDK memory. + ``socket_id`` parameter across the entire DPDK has gained additional meaning, + as some socket ID's will now be representing externally allocated memory. No + changes will be required for existing code as backwards compatibility will be + kept, and those who do not use this feature will not see these extra socket + ID's. Any new API's must not check socket ID parameters themselves, and must + instead leave it to the memory subsystem to decide whether socket ID is a + valid one. * mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()`` functions were deprecated since 17.05 and are replaced by diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c index 7300fe05d..b7081afbf 100644 --- a/lib/librte_eal/common/eal_common_memzone.c +++ b/lib/librte_eal/common/eal_common_memzone.c @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, return NULL; } - if ((socket_id != SOCKET_ID_ANY) && - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) { + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) { rte_errno = EINVAL; return NULL; } - if (!rte_eal_has_hugepages()) + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an + * external heap. + */ + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES) socket_id = SOCKET_ID_ANY; contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0; diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index a9cfa423f..09b06061d 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -651,7 +651,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg, if (size == 0 || (align && !rte_is_power_of_2(align))) return NULL; - if (!rte_eal_has_hugepages()) + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES) socket_arg = SOCKET_ID_ANY; if (socket_arg == SOCKET_ID_ANY) diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 73d6df31d..9ba1472c3 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align, if (!rte_eal_has_hugepages()) socket_arg = SOCKET_ID_ANY; - /* Check socket parameter */ - if (socket_arg >= RTE_MAX_NUMA_NODES) - return NULL; - return malloc_heap_alloc(type, size, socket_arg, 0, align == 0 ? 1 : align, 0, false); } From patchwork Tue Oct 2 13:34:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45889 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F0E2C1B122; Tue, 2 Oct 2018 15:35:33 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 6571A5F38 for ; Tue, 2 Oct 2018 15:35:17 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="79224904" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 02 Oct 2018 06:35:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ46r009200; Tue, 2 Oct 2018 14:35:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ47R032054; Tue, 2 Oct 2018 14:35:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ4WV032050; Tue, 2 Oct 2018 14:35:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bernard Iremonger , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:43 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 05/21] flow_classify: do not check for invalid socket ID X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov Acked-by: Bernard Iremonger --- lib/librte_flow_classify/rte_flow_classify.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c index 4c3469da1..fb652a2b7 100644 --- a/lib/librte_flow_classify/rte_flow_classify.c +++ b/lib/librte_flow_classify/rte_flow_classify.c @@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params) } /* socket */ - if ((params->socket_id < 0) || - (params->socket_id >= RTE_MAX_NUMA_NODES)) { + if (params->socket_id < 0) { RTE_FLOW_CLASSIFY_LOG(ERR, "%s: Incorrect value for parameter socket_id\n", __func__); From patchwork Tue Oct 2 13:34:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45893 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1E53D1B149; Tue, 2 Oct 2018 15:35:41 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id B67F11B128 for ; Tue, 2 Oct 2018 15:35:34 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="95369383" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 02 Oct 2018 06:35:07 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ4if009205; Tue, 2 Oct 2018 14:35:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ4Dg032061; Tue, 2 Oct 2018 14:35:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ4h9032057; Tue, 2 Oct 2018 14:35:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Cristian Dumitrescu , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:44 +0100 Message-Id: <8aba112259cd35ed5d1d7eb9a42aaafbdd4f0972.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 06/21] pipeline: do not check for invalid socket ID X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov Acked-by: Cristian Dumitrescu --- lib/librte_pipeline/rte_pipeline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c index 0cb8b804e..2c047a8a4 100644 --- a/lib/librte_pipeline/rte_pipeline.c +++ b/lib/librte_pipeline/rte_pipeline.c @@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params) } /* socket */ - if ((params->socket_id < 0) || - (params->socket_id >= RTE_MAX_NUMA_NODES)) { + if (params->socket_id < 0) { RTE_LOG(ERR, PIPELINE, "%s: Incorrect value for parameter socket_id\n", __func__); From patchwork Tue Oct 2 13:34:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45879 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 714525F44; Tue, 2 Oct 2018 15:35:17 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id CC4125A44 for ; Tue, 2 Oct 2018 15:35:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="95772879" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 02 Oct 2018 06:35:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ4mZ009208; Tue, 2 Oct 2018 14:35:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ46X032068; Tue, 2 Oct 2018 14:35:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ4qx032064; Tue, 2 Oct 2018 14:35:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Cristian Dumitrescu , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:45 +0100 Message-Id: <4d9ba6de0f2b5784fbb78a185794f931c26df815.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 07/21] sched: do not check for invalid socket ID X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov Acked-by: Cristian Dumitrescu --- lib/librte_sched/rte_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 9269e5c71..d4e2189c7 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params) return -1; /* socket */ - if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES)) + if (params->socket < 0) return -3; /* rate */ From patchwork Tue Oct 2 13:34:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45881 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B6225683E; Tue, 2 Oct 2018 15:35:20 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id EB07E5B3A for ; Tue, 2 Oct 2018 15:35:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="95772881" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 02 Oct 2018 06:35:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ4xj009211; Tue, 2 Oct 2018 14:35:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ4Cq032075; Tue, 2 Oct 2018 14:35:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ4TV032071; Tue, 2 Oct 2018 14:35:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:46 +0100 Message-Id: <89ca3e478e643d69a2115e008f5bf188a7210f39.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" We will need to refer to external heaps in some way. While we use heap ID's internally, for external API use it has to be something more user-friendly. So, we will be using a string to uniquely identify a heap. This breaks the ABI, so document the change. Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/release_18_11.rst | 2 ++ lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++ lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++- lib/librte_eal/common/rte_malloc.c | 1 + 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 172c42f71..754c41755 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -176,6 +176,8 @@ ABI Changes ID the malloc heap belongs to - structure ``rte_mem_config`` has had its ``malloc_heaps`` array resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value + - structure ``rte_malloc_heap`` now has a ``heap_name`` member + Removed Items ------------- diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h index d432cef88..4a7e0eb1d 100644 --- a/lib/librte_eal/common/include/rte_malloc_heap.h +++ b/lib/librte_eal/common/include/rte_malloc_heap.h @@ -12,6 +12,7 @@ /* Number of free lists per heap, grouped by size. */ #define RTE_HEAP_NUM_FREELISTS 13 +#define RTE_HEAP_NAME_MAX_LEN 32 /* dummy definition, for pointers */ struct malloc_elem; @@ -28,6 +29,7 @@ struct malloc_heap { unsigned alloc_count; unsigned int socket_id; size_t total_size; + char name[RTE_HEAP_NAME_MAX_LEN]; } __rte_cache_aligned; #endif /* _RTE_MALLOC_HEAP_H_ */ diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 09b06061d..b28905817 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl, malloc_heap_add_memory(heap, found_msl, ms->addr, len); heap->total_size += len; - heap->socket_id = msl->socket_id; RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20, msl->socket_id); @@ -1024,6 +1023,22 @@ int rte_eal_malloc_heap_init(void) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int i; + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* assign names to default DPDK heaps */ + for (i = 0; i < rte_socket_count(); i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + char heap_name[RTE_HEAP_NAME_MAX_LEN]; + int socket_id = rte_socket_id_by_idx(i); + + snprintf(heap_name, sizeof(heap_name) - 1, + "socket_%i", socket_id); + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); + heap->socket_id = socket_id; + } + } + if (register_mp_requests()) { RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n"); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 9ba1472c3..72632da56 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type) malloc_heap_get_stats(heap, &sock_stats); fprintf(f, "Heap id:%u\n", heap_id); + fprintf(f, "\tHeap name:%s\n", heap->name); fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes); fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes); fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes); From patchwork Tue Oct 2 13:34:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45890 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 813441B128; Tue, 2 Oct 2018 15:35:35 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id B876F6904 for ; Tue, 2 Oct 2018 15:35:20 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="78070988" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 02 Oct 2018 06:35:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ54k009214; Tue, 2 Oct 2018 14:35:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ4np032086; Tue, 2 Oct 2018 14:35:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ4L2032078; Tue, 2 Oct 2018 14:35:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:47 +0100 Message-Id: <0058b814e3c11fe869b43ffc5a9d4904d810ab6c.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 09/21] malloc: add function to query socket ID of named heap X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When we will be creating external heaps, they will have their own "fake" socket ID, so add a function that will map the heap name to its socket ID. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++ lib/librte_eal/common/rte_malloc.c | 37 ++++++++++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 52 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index a9fb7e452..8870732a6 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,20 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Find socket ID corresponding to a named heap. + * + * @param name + * Heap name to find socket ID for + * @return + * Socket ID in case of success (a non-negative number) + * -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``name`` was NULL + * ENOENT - heap identified by the name ``name`` was not found + */ +int __rte_experimental +rte_malloc_heap_get_socket(const char *name); + /** * Dump statistics. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 72632da56..b807dfe09 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f) rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); } +int +rte_malloc_heap_get_socket(const char *name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + unsigned int idx; + int ret; + + if (name == NULL || + strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + for (idx = 0; idx < RTE_MAX_HEAPS; idx++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[idx]; + + if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) { + heap = tmp; + break; + } + } + + if (heap != NULL) { + ret = heap->socket_id; + } else { + rte_errno = ENOENT; + ret = -1; + } + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* * Print stats on memory type. If type is NULL, info on all types is printed */ diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 73282bbb0..d8f9665b8 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -318,6 +318,7 @@ EXPERIMENTAL { rte_fbarray_set_used; rte_log_register_type_and_pick_level; rte_malloc_dump_heaps; + rte_malloc_heap_get_socket; rte_mem_alloc_validator_register; rte_mem_alloc_validator_unregister; rte_mem_event_callback_register; From patchwork Tue Oct 2 13:34:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45892 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A12091B142; Tue, 2 Oct 2018 15:35:39 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 7B1695A44 for ; Tue, 2 Oct 2018 15:35:34 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="262207256" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga005.jf.intel.com with ESMTP; 02 Oct 2018 06:35:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ53W009217; Tue, 2 Oct 2018 14:35:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ5Vh032097; Tue, 2 Oct 2018 14:35:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ5Ir032093; Tue, 2 Oct 2018 14:35:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Olivier Matz , Andrew Rybchenko , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:48 +0100 Message-Id: <48dfeb8e0e0f2b2c0d26fe753b6ac5c8c98bfa37.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 10/21] malloc: add function to check if socket is external X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" An API is needed to check whether a particular socket ID belongs to an internal or external heap. Prime user of this would be mempool allocator, because normal assumptions of IOVA contiguousness in IOVA as VA mode do not hold in case of externally allocated memory. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++ lib/librte_eal/common/rte_malloc.c | 25 ++++++++++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + lib/librte_mempool/rte_mempool.c | 22 ++++++++++++++++--- 4 files changed, 60 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 8870732a6..403271ddc 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket, int __rte_experimental rte_malloc_heap_get_socket(const char *name); +/** + * Check if a given socket ID refers to externally allocated memory. + * + * @note Passing SOCKET_ID_ANY will return 0. + * + * @param socket_id + * Socket ID to check + * @return + * 1 if socket ID refers to externally allocated memory + * 0 if socket ID refers to internal DPDK memory + * -1 if socket ID is invalid + */ +int __rte_experimental +rte_malloc_heap_socket_is_external(int socket_id); + /** * Dump statistics. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index b807dfe09..fa81d7862 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name) return ret; } +int +rte_malloc_heap_socket_is_external(int socket_id) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int idx; + int ret = -1; + + if (socket_id == SOCKET_ID_ANY) + return 0; + + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + for (idx = 0; idx < RTE_MAX_HEAPS; idx++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[idx]; + + if ((int)tmp->socket_id == socket_id) { + /* external memory always has large socket ID's */ + ret = tmp->socket_id >= RTE_MAX_NUMA_NODES; + break; + } + } + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* * Print stats on memory type. If type is NULL, info on all types is printed */ diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index d8f9665b8..bd60506af 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -319,6 +319,7 @@ EXPERIMENTAL { rte_log_register_type_and_pick_level; rte_malloc_dump_heaps; rte_malloc_heap_get_socket; + rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; rte_mem_alloc_validator_unregister; rte_mem_event_callback_register; diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 2ed539f01..683b216f9 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp) rte_iova_t iova; unsigned mz_id, n; int ret; - bool no_contig, try_contig, no_pageshift; + bool no_contig, try_contig, no_pageshift, external; ret = mempool_ops_alloc_once(mp); if (ret != 0) return ret; + /* check if we can retrieve a valid socket ID */ + ret = rte_malloc_heap_socket_is_external(mp->socket_id); + if (ret < 0) + return -EINVAL; + external = ret; + /* mempool must not be populated */ if (mp->nb_mem_chunks != 0) return -EEXIST; @@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp) * in one contiguous chunk as well (otherwise we might end up wasting a * 1G page on a 10MB memzone). If we fail to get enough contiguous * memory, then we'll go and reserve space page-by-page. + * + * We also have to take into account the fact that memory that we're + * going to allocate from can belong to an externally allocated memory + * area, in which case the assumption of IOVA as VA mode being + * synonymous with IOVA contiguousness will not hold. We should also try + * to go for contiguous memory even if we're in no-huge mode, because + * external memory may in fact be IOVA-contiguous. */ - no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA; - try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages(); + external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1; + no_pageshift = no_contig || + (!external && rte_eal_iova_mode() == RTE_IOVA_VA); + try_contig = !no_contig && !no_pageshift && + (rte_eal_has_hugepages() || external); if (no_pageshift) { pg_sz = 0; From patchwork Tue Oct 2 13:34:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45897 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EC33D1B177; Tue, 2 Oct 2018 15:35:45 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id B7C121B13D for ; Tue, 2 Oct 2018 15:35:35 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="262207258" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga005.jf.intel.com with ESMTP; 02 Oct 2018 06:35:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ54u009221; Tue, 2 Oct 2018 14:35:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ5aq032151; Tue, 2 Oct 2018 14:35:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ5g3032136; Tue, 2 Oct 2018 14:35:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:49 +0100 Message-Id: <6a23e7a2ec414049967e441e20502ab882756eb6.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add API to allow creating new malloc heaps. They will be created with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing with internal heaps. This breaks the ABI, so document the change. Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/release_18_11.rst | 2 + .../common/include/rte_eal_memconfig.h | 3 ++ lib/librte_eal/common/include/rte_malloc.h | 19 +++++++ lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++ lib/librte_eal/common/malloc_heap.h | 3 ++ lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + 7 files changed, 117 insertions(+) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 754c41755..e7674adb9 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -177,6 +177,8 @@ ABI Changes - structure ``rte_mem_config`` has had its ``malloc_heaps`` array resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value - structure ``rte_malloc_heap`` now has a ``heap_name`` member + - structure ``rte_eal_memconfig`` has been extended to contain next + socket ID for externally allocated segments Removed Items diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 7634bff5d..fc44c4e5f 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -75,6 +75,9 @@ struct rte_mem_config { /* Heaps of Malloc */ struct malloc_heap malloc_heaps[RTE_MAX_HEAPS]; + /* next socket ID for external malloc heap */ + int next_socket_id; + /* address of mem_config in primary process. used to map shared config into * exact same address the primary process maps it. */ diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 403271ddc..e326529d0 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,25 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Creates a new empty malloc heap with a specified name. + * + * @note Heaps created via this call will automatically get assigned a unique + * socket ID, which can be found using ``rte_malloc_heap_get_socket()`` + * + * @param heap_name + * Name of the heap to create. + * + * @return + * - 0 on successful creation + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``heap_name`` was NULL, empty or too long + * EEXIST - heap by name of ``heap_name`` already exists + * ENOSPC - no more space in internal config to store a new heap + */ +int __rte_experimental +rte_malloc_heap_create(const char *heap_name); + /** * Find socket ID corresponding to a named heap. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index b28905817..00fdf54f7 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -29,6 +29,10 @@ #include "malloc_heap.h" #include "malloc_mp.h" +/* start external socket ID's at a very high number */ +#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */ +#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES)) + static unsigned check_hugepage_sz(unsigned flags, uint64_t hugepage_sz) { @@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +int +malloc_heap_create(struct malloc_heap *heap, const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + uint32_t next_socket_id = mcfg->next_socket_id; + + /* prevent overflow. did you really create 2 billion heaps??? */ + if (next_socket_id > INT32_MAX) { + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n"); + rte_errno = ENOSPC; + return -1; + } + + /* initialize empty heap */ + heap->alloc_count = 0; + heap->first = NULL; + heap->last = NULL; + LIST_INIT(heap->free_head); + rte_spinlock_init(&heap->lock); + heap->total_size = 0; + heap->socket_id = next_socket_id; + + /* we hold a global mem hotplug writelock, so it's safe to increment */ + mcfg->next_socket_id++; + + /* set up name */ + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); + return 0; +} + int rte_eal_malloc_heap_init(void) { @@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void) unsigned int i; if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* assign min socket ID to external heaps */ + mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID; + /* assign names to default DPDK heaps */ for (i = 0; i < rte_socket_count(); i++) { struct malloc_heap *heap = &mcfg->malloc_heaps[i]; diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 61b844b6f..eebee16dc 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -33,6 +33,9 @@ void * malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags, size_t align, bool contig); +int +malloc_heap_create(struct malloc_heap *heap, const char *heap_name); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index fa81d7862..25967a7cb 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr) return ms->iova + RTE_PTR_DIFF(addr, ms->addr); } + +int +rte_malloc_heap_create(const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int i, ret; + + if (heap_name == NULL || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + /* check if there is space in the heap list, or if heap with this name + * already exists. + */ + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[i]; + /* existing heap */ + if (strncmp(heap_name, tmp->name, + RTE_HEAP_NAME_MAX_LEN) == 0) { + RTE_LOG(ERR, EAL, "Heap %s already exists\n", + heap_name); + rte_errno = EEXIST; + ret = -1; + goto unlock; + } + /* empty heap */ + if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) { + heap = tmp; + break; + } + } + if (heap == NULL) { + RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n"); + rte_errno = ENOSPC; + ret = -1; + goto unlock; + } + + /* we're sure that we can create a new heap, so do it */ + ret = malloc_heap_create(heap, heap_name); +unlock: + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index bd60506af..376f33bbb 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -318,6 +318,7 @@ EXPERIMENTAL { rte_fbarray_set_used; rte_log_register_type_and_pick_level; rte_malloc_dump_heaps; + rte_malloc_heap_create; rte_malloc_heap_get_socket; rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; From patchwork Tue Oct 2 13:34:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45883 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D12C47CE2; Tue, 2 Oct 2018 15:35:23 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 6F1725F16 for ; Tue, 2 Oct 2018 15:35:11 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="95772884" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 02 Oct 2018 06:35:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ5bb009226; Tue, 2 Oct 2018 14:35:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ54e032177; Tue, 2 Oct 2018 14:35:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ5DB032169; Tue, 2 Oct 2018 14:35:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:50 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 12/21] malloc: allow destroying heaps X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add an API to destroy specified heap. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++ lib/librte_eal/common/malloc_heap.c | 22 ++++++++ lib/librte_eal/common/malloc_heap.h | 3 ++ lib/librte_eal/common/rte_malloc.c | 58 ++++++++++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 107 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index e326529d0..309bbbcc9 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket, int __rte_experimental rte_malloc_heap_create(const char *heap_name); +/** + * Destroys a previously created malloc heap with specified name. + * + * @note This function will return a failure result if not all memory allocated + * from the heap has been freed back to the heap + * + * @note This function will return a failure result if not all memory segments + * were removed from the heap prior to its destruction + * + * @param heap_name + * Name of the heap to create. + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``heap_name`` was NULL, empty or too long + * ENOENT - heap by the name of ``heap_name`` was not found + * EPERM - attempting to destroy reserved heap + * EBUSY - heap still contains data + */ +int __rte_experimental +rte_malloc_heap_destroy(const char *heap_name); + /** * Find socket ID corresponding to a named heap. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 00fdf54f7..ca774c96f 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1053,6 +1053,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name) return 0; } +int +malloc_heap_destroy(struct malloc_heap *heap) +{ + if (heap->alloc_count != 0) { + RTE_LOG(ERR, EAL, "Heap is still in use\n"); + rte_errno = EBUSY; + return -1; + } + if (heap->first != NULL || heap->last != NULL) { + RTE_LOG(ERR, EAL, "Heap still contains memory segments\n"); + rte_errno = EBUSY; + return -1; + } + if (heap->total_size != 0) + RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n"); + + /* after this, the lock will be dropped */ + memset(heap, 0, sizeof(*heap)); + + return 0; +} + int rte_eal_malloc_heap_init(void) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index eebee16dc..75278da3c 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags, int malloc_heap_create(struct malloc_heap *heap, const char *heap_name); +int +malloc_heap_destroy(struct malloc_heap *heap); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 25967a7cb..286e748ef 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr) return ms->iova + RTE_PTR_DIFF(addr, ms->addr); } +static struct malloc_heap * +find_named_heap(const char *name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int i; + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + + if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN)) + return heap; + } + return NULL; +} + int rte_malloc_heap_create(const char *heap_name) { @@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name) return ret; } + +int +rte_malloc_heap_destroy(const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int ret; + + if (heap_name == NULL || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + /* start from non-socket heaps */ + heap = find_named_heap(heap_name); + if (heap == NULL) { + RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name); + rte_errno = ENOENT; + ret = -1; + goto unlock; + } + /* we shouldn't be able to destroy internal heaps */ + if (heap->socket_id < RTE_MAX_NUMA_NODES) { + rte_errno = EPERM; + ret = -1; + goto unlock; + } + /* sanity checks done, now we can destroy the heap */ + rte_spinlock_lock(&heap->lock); + ret = malloc_heap_destroy(heap); + + /* if we failed, lock is still active */ + if (ret < 0) + rte_spinlock_unlock(&heap->lock); +unlock: + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 376f33bbb..27aac5bea 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -319,6 +319,7 @@ EXPERIMENTAL { rte_log_register_type_and_pick_level; rte_malloc_dump_heaps; rte_malloc_heap_create; + rte_malloc_heap_destroy; rte_malloc_heap_get_socket; rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; From patchwork Tue Oct 2 13:34:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45894 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DCA861B150; Tue, 2 Oct 2018 15:35:42 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 475B41B134 for ; Tue, 2 Oct 2018 15:35:35 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="262207257" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga005.jf.intel.com with ESMTP; 02 Oct 2018 06:35:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ5rH009229; Tue, 2 Oct 2018 14:35:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ5B9032191; Tue, 2 Oct 2018 14:35:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ5Rr032186; Tue, 2 Oct 2018 14:35:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:51 +0100 Message-Id: <4e1ac007ada6d8af98f41c0220a9db9904773219.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 13/21] malloc: allow adding memory to named heaps X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add an API to add externally allocated memory to malloc heap. The memory will be stored in memseg lists like regular DPDK memory. Multiple segments are allowed within a heap. If IOVA table is not provided, IOVA addresses are filled in with RTE_BAD_IOVA. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++ lib/librte_eal/common/malloc_heap.c | 74 ++++++++++++++++++++++ lib/librte_eal/common/malloc_heap.h | 4 ++ lib/librte_eal/common/rte_malloc.c | 51 +++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 169 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 309bbbcc9..fb5b6e2f7 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,45 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Add memory chunk to a heap with specified name. + * + * @note Multiple memory chunks can be added to the same heap + * + * @note Memory must be previously allocated for DPDK to be able to use it as a + * malloc heap. Failing to do so will result in undefined behavior, up to and + * including segmentation faults. + * + * @note Calling this function will erase any contents already present at the + * supplied memory address. + * + * @param heap_name + * Name of the heap to add memory chunk to + * @param va_addr + * Start of virtual area to add to the heap + * @param len + * Length of virtual area to add to the heap + * @param iova_addrs + * Array of page IOVA addresses corresponding to each page in this memory + * area. Can be NULL, in which case page IOVA addresses will be set to + * RTE_BAD_IOVA. + * @param n_pages + * Number of elements in the iova_addrs array. Ignored if ``iova_addrs`` + * is NULL. + * @param page_sz + * Page size of the underlying memory + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to add memory to a reserved heap + * ENOSPC - no more space in internal config to store a new memory chunk + */ +int __rte_experimental +rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index ca774c96f..256c25edf 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1023,6 +1023,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +int +malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + char fbarray_name[RTE_FBARRAY_NAME_LEN]; + struct rte_memseg_list *msl = NULL; + struct rte_fbarray *arr; + size_t seg_len = n_pages * page_sz; + unsigned int i; + + /* first, find a free memseg list */ + for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { + struct rte_memseg_list *tmp = &mcfg->memsegs[i]; + if (tmp->base_va == NULL) { + msl = tmp; + break; + } + } + if (msl == NULL) { + RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n"); + rte_errno = ENOSPC; + return -1; + } + + snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p", + heap->name, va_addr); + + /* create the backing fbarray */ + if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages, + sizeof(struct rte_memseg)) < 0) { + RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n"); + return -1; + } + arr = &msl->memseg_arr; + + /* fbarray created, fill it up */ + for (i = 0; i < n_pages; i++) { + struct rte_memseg *ms; + + rte_fbarray_set_used(arr, i); + ms = rte_fbarray_get(arr, i); + ms->addr = RTE_PTR_ADD(va_addr, i * page_sz); + ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i]; + ms->hugepage_sz = page_sz; + ms->len = page_sz; + ms->nchannel = rte_memory_get_nchannel(); + ms->nrank = rte_memory_get_nrank(); + ms->socket_id = heap->socket_id; + } + + /* set up the memseg list */ + msl->base_va = va_addr; + msl->page_sz = page_sz; + msl->socket_id = heap->socket_id; + msl->len = seg_len; + msl->version = 0; + msl->external = 1; + + /* erase contents of new memory */ + memset(va_addr, 0, seg_len); + + /* now, add newly minted memory to the malloc heap */ + malloc_heap_add_memory(heap, msl, va_addr, seg_len); + + heap->total_size += seg_len; + + /* all done! */ + RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n", + heap->name, va_addr); + + return 0; +} + int malloc_heap_create(struct malloc_heap *heap, const char *heap_name) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 75278da3c..237ce9dc2 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name); int malloc_heap_destroy(struct malloc_heap *heap); +int +malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 286e748ef..acdbd92a2 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -328,6 +328,57 @@ find_named_heap(const char *name) return NULL; } +int +rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + unsigned int n; + int ret; + + if (heap_name == NULL || va_addr == NULL || + page_sz == 0 || !rte_is_power_of_2(page_sz) || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + ret = -1; + goto unlock; + } + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + /* find our heap */ + heap = find_named_heap(heap_name); + if (heap == NULL) { + rte_errno = ENOENT; + ret = -1; + goto unlock; + } + if (heap->socket_id < RTE_MAX_NUMA_NODES) { + /* cannot add memory to internal heaps */ + rte_errno = EPERM; + ret = -1; + goto unlock; + } + n = len / page_sz; + if (n != n_pages && iova_addrs != NULL) { + rte_errno = EINVAL; + ret = -1; + goto unlock; + } + + rte_spinlock_lock(&heap->lock); + ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n, + page_sz); + rte_spinlock_unlock(&heap->lock); + +unlock: + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + int rte_malloc_heap_create(const char *heap_name) { diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 27aac5bea..02254042c 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -321,6 +321,7 @@ EXPERIMENTAL { rte_malloc_heap_create; rte_malloc_heap_destroy; rte_malloc_heap_get_socket; + rte_malloc_heap_memory_add; rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; rte_mem_alloc_validator_unregister; From patchwork Tue Oct 2 13:34:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45880 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 404EB5F54; Tue, 2 Oct 2018 15:35:19 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id C4E4D4C96 for ; Tue, 2 Oct 2018 15:35:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="267788302" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga005.fm.intel.com with ESMTP; 02 Oct 2018 06:35:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ57a009232; Tue, 2 Oct 2018 14:35:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ51U032206; Tue, 2 Oct 2018 14:35:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ5gb032198; Tue, 2 Oct 2018 14:35:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:52 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 14/21] malloc: allow removing memory from named heaps X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add an API to remove memory from specified heaps. This will first check if all elements within the region are free, and that the region is the original region that was added to the heap (by comparing its length to length of memory addressed by the underlying memseg list). Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++ lib/librte_eal/common/malloc_heap.c | 54 ++++++++++++++++++++++ lib/librte_eal/common/malloc_heap.h | 4 ++ lib/librte_eal/common/rte_malloc.c | 39 ++++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 125 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index fb5b6e2f7..40bae4478 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -302,6 +302,33 @@ int __rte_experimental rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); +/** + * Remove memory chunk from heap with specified name. + * + * @note Memory chunk being removed must be the same as one that was added; + * partially removing memory chunks is not supported + * + * @note Memory area must not contain any allocated elements to allow its + * removal from the heap + * + * @param heap_name + * Name of the heap to remove memory from + * @param va_addr + * Virtual address to remove from the heap + * @param len + * Length of virtual area to remove from the heap + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to remove memory from a reserved heap + * ENOENT - heap or memory chunk was not found + * EBUSY - memory chunk still contains data + */ +int __rte_experimental +rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 256c25edf..adc1669aa 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1023,6 +1023,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +static int +destroy_seg(struct malloc_elem *elem, size_t len) +{ + struct malloc_heap *heap = elem->heap; + struct rte_memseg_list *msl; + + msl = elem->msl; + + /* this element can be removed */ + malloc_elem_free_list_remove(elem); + malloc_elem_hide_region(elem, elem, len); + + heap->total_size -= len; + + memset(elem, 0, sizeof(*elem)); + + /* destroy the fbarray backing this memory */ + if (rte_fbarray_destroy(&msl->memseg_arr) < 0) + return -1; + + /* reset the memseg list */ + memset(msl, 0, sizeof(*msl)); + + return 0; +} + int malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) @@ -1097,6 +1123,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, return 0; } +int +malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, + size_t len) +{ + struct malloc_elem *elem = heap->first; + + /* find element with specified va address */ + while (elem != NULL && elem != va_addr) { + elem = elem->next; + /* stop if we've blown past our VA */ + if (elem > (struct malloc_elem *)va_addr) { + rte_errno = ENOENT; + return -1; + } + } + /* check if element was found */ + if (elem == NULL || elem->msl->len != len) { + rte_errno = ENOENT; + return -1; + } + /* if element's size is not equal to segment len, segment is busy */ + if (elem->state == ELEM_BUSY || elem->size != len) { + rte_errno = EBUSY; + return -1; + } + return destroy_seg(elem, len); +} + int malloc_heap_create(struct malloc_heap *heap, const char *heap_name) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 237ce9dc2..e48996d52 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -43,6 +43,10 @@ int malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); +int +malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, + size_t len); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index acdbd92a2..bfc49d0b7 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, return ret; } +int +rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int ret; + + if (heap_name == NULL || va_addr == NULL || len == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + /* find our heap */ + heap = find_named_heap(heap_name); + if (heap == NULL) { + rte_errno = ENOENT; + ret = -1; + goto unlock; + } + if (heap->socket_id < RTE_MAX_NUMA_NODES) { + /* cannot remove memory from internal heaps */ + rte_errno = EPERM; + ret = -1; + goto unlock; + } + + rte_spinlock_lock(&heap->lock); + ret = malloc_heap_remove_external_memory(heap, va_addr, len); + rte_spinlock_unlock(&heap->lock); + +unlock: + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + int rte_malloc_heap_create(const char *heap_name) { diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 02254042c..8c66d0be9 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -322,6 +322,7 @@ EXPERIMENTAL { rte_malloc_heap_destroy; rte_malloc_heap_get_socket; rte_malloc_heap_memory_add; + rte_malloc_heap_memory_remove; rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; rte_mem_alloc_validator_unregister; From patchwork Tue Oct 2 13:34:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45898 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9E1F84F90; Tue, 2 Oct 2018 15:38:20 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id D97037D26 for ; Tue, 2 Oct 2018 15:38:18 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:38:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="77840487" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga007.jf.intel.com with ESMTP; 02 Oct 2018 06:35:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ61d009237; Tue, 2 Oct 2018 14:35:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ5X4032217; Tue, 2 Oct 2018 14:35:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ5uD032213; Tue, 2 Oct 2018 14:35:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:53 +0100 Message-Id: <1d446998a1e03a3466e45aaae19fda51d871d946.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 15/21] malloc: allow attaching to external memory chunks X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In order to use external memory in multiple processes, we need to attach to primary process's memseg lists, so add a new API to do that. It is the responsibility of the user to ensure that memory is accessible and that it has been previously added to the malloc heap by another process. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++ lib/librte_eal/common/rte_malloc.c | 83 ++++++++++++++++++++++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 112 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 40bae4478..793f9473a 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket, * * @note Multiple memory chunks can be added to the same heap * + * @note Before accessing this memory in other processes, it needs to be + * attached in each of those processes by calling + * ``rte_malloc_heap_memory_attach`` in each other process. + * * @note Memory must be previously allocated for DPDK to be able to use it as a * malloc heap. Failing to do so will result in undefined behavior, up to and * including segmentation faults. @@ -329,6 +333,30 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, int __rte_experimental rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); +/** + * Attach to an already existing chunk of external memory in another process. + * + * @note This function must be called before any attempt is made to use an + * already existing external memory chunk. This function does *not* need to + * be called if a call to ``rte_malloc_heap_memory_add`` was made in the + * current process. + * + * @param heap_name + * Heap name to which this chunk of memory belongs + * @param va_addr + * Start address of memory chunk to attach to + * @param len + * Length of memory chunk to attach to + * @return + * 0 on successful attach + * -1 on unsuccessful attach, with rte_errno set to indicate cause for error: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to attach memory to a reserved heap + * ENOENT - heap or memory chunk was not found + */ +int __rte_experimental +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index bfc49d0b7..5078235b1 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len) return ret; } +struct sync_mem_walk_arg { + void *va_addr; + size_t len; + int result; +}; + +static int +attach_mem_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct sync_mem_walk_arg *wa = arg; + size_t len = msl->page_sz * msl->memseg_arr.len; + + if (msl->base_va == wa->va_addr && + len == wa->len) { + struct rte_memseg_list *found_msl; + int msl_idx, ret; + + /* msl is const */ + msl_idx = msl - mcfg->memsegs; + found_msl = &mcfg->memsegs[msl_idx]; + + ret = rte_fbarray_attach(&found_msl->memseg_arr); + + if (ret < 0) + wa->result = -rte_errno; + else + wa->result = 0; + return 1; + } + return 0; +} + +int +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + struct sync_mem_walk_arg wa; + int ret; + + if (heap_name == NULL || va_addr == NULL || len == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + + /* find our heap */ + heap = find_named_heap(heap_name); + if (heap == NULL) { + rte_errno = ENOENT; + ret = -1; + goto unlock; + } + /* we shouldn't be able to attach to internal heaps */ + if (heap->socket_id < RTE_MAX_NUMA_NODES) { + rte_errno = EPERM; + ret = -1; + goto unlock; + } + + /* find corresponding memseg list to attach to */ + wa.va_addr = va_addr; + wa.len = len; + wa.result = -ENOENT; /* fail unless explicitly told to succeed */ + + /* we're already holding a read lock */ + rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa); + + if (wa.result < 0) { + rte_errno = -wa.result; + ret = -1; + } else { + ret = 0; + } +unlock: + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + return ret; +} + int rte_malloc_heap_create(const char *heap_name) { diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 8c66d0be9..920852042 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -322,6 +322,7 @@ EXPERIMENTAL { rte_malloc_heap_destroy; rte_malloc_heap_get_socket; rte_malloc_heap_memory_add; + rte_malloc_heap_memory_attach; rte_malloc_heap_memory_remove; rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; From patchwork Tue Oct 2 13:34:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45882 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1E7417CB0; Tue, 2 Oct 2018 15:35:22 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 91C595A44 for ; Tue, 2 Oct 2018 15:35:11 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="74757284" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 02 Oct 2018 06:35:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ65Y009249; Tue, 2 Oct 2018 14:35:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ65F032224; Tue, 2 Oct 2018 14:35:06 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ6G1032220; Tue, 2 Oct 2018 14:35:06 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:54 +0100 Message-Id: <640777446dfbeb8050039d17a13201bf591431cd.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 16/21] malloc: allow detaching from external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add API to detach from existing chunk of external memory in a process. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++++++++++ lib/librte_eal/common/rte_malloc.c | 31 +++++++++++++++++----- lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 52 insertions(+), 7 deletions(-) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 793f9473a..7249e6aae 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, * @note Memory area must not contain any allocated elements to allow its * removal from the heap * + * @note All other processes must detach from the memory chunk prior to it being + * removed from the heap. + * * @param heap_name * Name of the heap to remove memory from * @param va_addr @@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); int __rte_experimental rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len); +/** + * Detach from a chunk of external memory in secondary process. + * + * @note This function must be called in before any attempt is made to remove + * external memory from the heap in another process. This function does *not* + * need to be called if a call to ``rte_malloc_heap_memory_remove`` will be + * called in current process. + * + * @param heap_name + * Heap name to which this chunk of memory belongs + * @param va_addr + * Start address of memory chunk to attach to + * @param len + * Length of memory chunk to attach to + * @return + * 0 on successful detach + * -1 on unsuccessful detach, with rte_errno set to indicate cause for error: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to detach memory from a reserved heap + * ENOENT - heap or memory chunk was not found + */ +int __rte_experimental +rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 5078235b1..72e42b337 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -422,10 +422,11 @@ struct sync_mem_walk_arg { void *va_addr; size_t len; int result; + bool attach; }; static int -attach_mem_walk(const struct rte_memseg_list *msl, void *arg) +sync_mem_walk(const struct rte_memseg_list *msl, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct sync_mem_walk_arg *wa = arg; @@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg) msl_idx = msl - mcfg->memsegs; found_msl = &mcfg->memsegs[msl_idx]; - ret = rte_fbarray_attach(&found_msl->memseg_arr); + if (wa->attach) + ret = rte_fbarray_attach(&found_msl->memseg_arr); + else + ret = rte_fbarray_detach(&found_msl->memseg_arr); if (ret < 0) wa->result = -rte_errno; @@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg) return 0; } -int -rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +static int +sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct malloc_heap *heap = NULL; @@ -475,20 +479,21 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) ret = -1; goto unlock; } - /* we shouldn't be able to attach to internal heaps */ + /* we shouldn't be able to sync to internal heaps */ if (heap->socket_id < RTE_MAX_NUMA_NODES) { rte_errno = EPERM; ret = -1; goto unlock; } - /* find corresponding memseg list to attach to */ + /* find corresponding memseg list to sync to */ wa.va_addr = va_addr; wa.len = len; wa.result = -ENOENT; /* fail unless explicitly told to succeed */ + wa.attach = attach; /* we're already holding a read lock */ - rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa); + rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa); if (wa.result < 0) { rte_errno = -wa.result; @@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) return ret; } +int +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +{ + return sync_memory(heap_name, va_addr, len, true); +} + +int +rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len) +{ + return sync_memory(heap_name, va_addr, len, false); +} + int rte_malloc_heap_create(const char *heap_name) { diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 920852042..30583eef2 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -323,6 +323,7 @@ EXPERIMENTAL { rte_malloc_heap_get_socket; rte_malloc_heap_memory_add; rte_malloc_heap_memory_attach; + rte_malloc_heap_memory_detach; rte_malloc_heap_memory_remove; rte_malloc_heap_socket_is_external; rte_mem_alloc_validator_register; From patchwork Tue Oct 2 13:34:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45886 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2B17F1B0FF; Tue, 2 Oct 2018 15:35:29 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id C4E645F16 for ; Tue, 2 Oct 2018 15:35:12 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="237851025" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga004.jf.intel.com with ESMTP; 02 Oct 2018 06:35:07 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ6F4009256; Tue, 2 Oct 2018 14:35:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ6Wd032231; Tue, 2 Oct 2018 14:35:06 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ6tS032227; Tue, 2 Oct 2018 14:35:06 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Hemant Agrawal , Shreyansh Jain , Maxime Coquelin , Tiwei Bie , Zhihong Wang , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:55 +0100 Message-Id: <638b674b09542914f814914b2616b1766014442d.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 17/21] malloc: enable event callbacks for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When adding or removing external memory from the memory map, there may be actions that need to be taken on account of this memory (e.g. DMA mapping). Add support for triggering callbacks when adding, removing, attaching or detaching external memory. Some memory event callback handlers will need additional logic to handle external memory regions. For example, virtio callback has to completely ignore externally allocated memory, because there is no way to find file descriptors backing the memory address in a generic fashion. All other callbacks have also been adjusted to handle RTE_BAD_IOVA as IOVA address, as this is one of the expected use cases for external memory support. Signed-off-by: Anatoly Burakov --- drivers/bus/fslmc/fslmc_vfio.c | 7 +++++ .../net/virtio/virtio_user/virtio_user_dev.c | 6 +++++ lib/librte_eal/common/malloc_heap.c | 7 +++++ lib/librte_eal/common/rte_malloc.c | 27 ++++++++++++++++--- lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 +++++-- 5 files changed, 51 insertions(+), 6 deletions(-) diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c index cb33dd891..493b6e5be 100644 --- a/drivers/bus/fslmc/fslmc_vfio.c +++ b/drivers/bus/fslmc/fslmc_vfio.c @@ -221,6 +221,13 @@ fslmc_memevent_cb(enum rte_mem_event type, const void *addr, size_t len, "alloc" : "dealloc", va, virt_addr, iova_addr, map_len); + /* iova_addr may be set to RTE_BAD_IOVA */ + if (iova_addr == RTE_BAD_IOVA) { + DPAA2_BUS_DEBUG("Segment has invalid iova, skipping\n"); + cur_len += map_len; + continue; + } + if (type == RTE_MEM_EVENT_ALLOC) ret = fslmc_map_dma(virt_addr, iova_addr, map_len); else diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c index 55a82e4b0..a185aed34 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c @@ -301,8 +301,14 @@ virtio_user_mem_event_cb(enum rte_mem_event type __rte_unused, void *arg) { struct virtio_user_dev *dev = arg; + struct rte_memseg_list *msl; uint16_t i; + /* ignore externally allocated memory */ + msl = rte_mem_virt2memseg_list(addr); + if (msl->external) + return; + pthread_mutex_lock(&dev->mutex); if (dev->started == false) diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index adc1669aa..08ec75377 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1031,6 +1031,9 @@ destroy_seg(struct malloc_elem *elem, size_t len) msl = elem->msl; + /* notify all subscribers that a memory area is going to be removed */ + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len); + /* this element can be removed */ malloc_elem_free_list_remove(elem); malloc_elem_hide_region(elem, elem, len); @@ -1120,6 +1123,10 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n", heap->name, va_addr); + /* notify all subscribers that a new memory area has been added */ + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC, + va_addr, seg_len); + return 0; } diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 72e42b337..2c19c2f87 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -25,6 +25,7 @@ #include #include "malloc_elem.h" #include "malloc_heap.h" +#include "eal_memalloc.h" /* Free the memory space back to heap */ @@ -441,15 +442,29 @@ sync_mem_walk(const struct rte_memseg_list *msl, void *arg) msl_idx = msl - mcfg->memsegs; found_msl = &mcfg->memsegs[msl_idx]; - if (wa->attach) + if (wa->attach) { ret = rte_fbarray_attach(&found_msl->memseg_arr); - else + } else { + /* notify all subscribers that a memory area is about to + * be removed + */ + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, + msl->base_va, msl->len); ret = rte_fbarray_detach(&found_msl->memseg_arr); + } - if (ret < 0) + if (ret < 0) { wa->result = -rte_errno; - else + } else { + /* notify all subscribers that a new memory area was + * added + */ + if (wa->attach) + eal_memalloc_mem_event_notify( + RTE_MEM_EVENT_ALLOC, + msl->base_va, msl->len); wa->result = 0; + } return 1; } return 0; @@ -499,6 +514,10 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach) rte_errno = -wa.result; ret = -1; } else { + /* notify all subscribers that a new memory area was added */ + if (attach) + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC, + va_addr, len); ret = 0; } unlock: diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index fddbc3b54..d7268e4ce 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -509,7 +509,7 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, msl = rte_mem_virt2memseg_list(addr); /* for IOVA as VA mode, no need to care for IOVA addresses */ - if (rte_eal_iova_mode() == RTE_IOVA_VA) { + if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; if (type == RTE_MEM_EVENT_ALLOC) vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, @@ -523,13 +523,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); while (cur_len < len) { + /* some memory segments may have invalid IOVA */ + if (ms->iova == RTE_BAD_IOVA) { + RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", + ms->addr); + goto next; + } if (type == RTE_MEM_EVENT_ALLOC) vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, ms->iova, ms->len, 1); else vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, ms->iova, ms->len, 0); - +next: cur_len += ms->len; ++ms; } From patchwork Tue Oct 2 13:34:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45885 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CC3351B0F8; Tue, 2 Oct 2018 15:35:27 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 2C29E4C96 for ; Tue, 2 Oct 2018 15:35:12 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="237851022" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga004.jf.intel.com with ESMTP; 02 Oct 2018 06:35:07 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ6oV009259; Tue, 2 Oct 2018 14:35:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ6dO032238; Tue, 2 Oct 2018 14:35:06 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ60T032234; Tue, 2 Oct 2018 14:35:06 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:56 +0100 Message-Id: <9f0f96f8e7b7d6483b77e434ff78057fb1317744.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 18/21] test: add unit tests for external memory support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add simple unit tests to test external memory support. The tests are pretty basic and mostly consist of checking if invalid API calls are handled correctly, plus a simple allocation/deallocation test for malloc and memzone. Signed-off-by: Anatoly Burakov --- test/test/Makefile | 1 + test/test/autotest_data.py | 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++ 4 files changed, 401 insertions(+), 4 deletions(-) create mode 100644 test/test/test_external_mem.c diff --git a/test/test/Makefile b/test/test/Makefile index dcea4410d..5d8b1dcb0 100644 --- a/test/test/Makefile +++ b/test/test/Makefile @@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c SRCS-y += test_reciprocal_division.c SRCS-y += test_reciprocal_division_perf.c SRCS-y += test_fbarray.c +SRCS-y += test_external_mem.c SRCS-y += test_ring.c SRCS-y += test_ring_perf.c diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py index f68d9b111..51f8e1689 100644 --- a/test/test/autotest_data.py +++ b/test/test/autotest_data.py @@ -477,10 +477,16 @@ "Report": None, }, { - "Name": "Fbarray autotest", - "Command": "fbarray_autotest", - "Func": default_autotest, - "Report": None, + "Name": "Fbarray autotest", + "Command": "fbarray_autotest", + "Func": default_autotest, + "Report": None, + }, + { + "Name": "External memory autotest", + "Command": "external_mem_autotest", + "Func": default_autotest, + "Report": None, }, # #Please always keep all dump tests at the end and together! diff --git a/test/test/meson.build b/test/test/meson.build index bacb5b144..6a71ee0d3 100644 --- a/test/test/meson.build +++ b/test/test/meson.build @@ -164,6 +164,7 @@ test_names = [ 'eventdev_common_autotest', 'eventdev_octeontx_autotest', 'eventdev_sw_autotest', + 'external_mem_autotest', 'func_reentrancy_autotest', 'flow_classify_autotest', 'hash_scaling_autotest', diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c new file mode 100644 index 000000000..d0837aa35 --- /dev/null +++ b/test/test/test_external_mem.c @@ -0,0 +1,389 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "test.h" + +#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */ + +static int +test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, + int n_pages) +{ + static const char * const names[] = { + NULL, /* NULL name */ + "", /* empty name */ + "this heap name is definitely way too long to be valid" + }; + const char *valid_name = "valid heap name"; + unsigned int i; + + /* check invalid name handling */ + for (i = 0; i < RTE_DIM(names); i++) { + const char *name = names[i]; + + /* these calls may fail for other reasons, so check errno */ + if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Created heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Destroyed heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_get_socket(name) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Found socket for heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_add(name, addr, len, + NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory to heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Removed memory from heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Attached memory to heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Detached memory from heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + } + + /* do same as above, but with a valid heap name */ + + /* skip create call */ + if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) { + printf("%s():%i: Destroyed heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_get_socket(valid_name) >= 0 || + rte_errno != ENOENT) { + printf("%s():%i: Found socket for heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + /* these calls may fail for other reasons, so check errno */ + if (rte_malloc_heap_memory_add(valid_name, addr, len, + NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) { + printf("%s():%i: Added memory to non-existent heap\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 || + rte_errno != ENOENT) { + printf("%s():%i: Removed memory from non-existent heap\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 || + rte_errno != ENOENT) { + printf("%s():%i: Attached memory to non-existent heap\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 || + rte_errno != ENOENT) { + printf("%s():%i: Detached memory from non-existent heap\n", + __func__, __LINE__); + goto fail; + } + + /* create a valid heap but test other invalid parameters */ + if (rte_malloc_heap_create(valid_name) != 0) { + printf("%s():%i: Failed to create valid heap\n", + __func__, __LINE__); + goto fail; + } + + /* zero length */ + if (rte_malloc_heap_memory_add(valid_name, addr, 0, + NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Removed memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Attached memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Detached memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + /* zero address */ + if (rte_malloc_heap_memory_add(valid_name, NULL, len, + NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Removed memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Attached memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Detached memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + /* wrong page count */ + if (rte_malloc_heap_memory_add(valid_name, addr, len, + iova, 0, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_add(valid_name, addr, len, + iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_add(valid_name, addr, len, + iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory with invalid parameters\n", + __func__, __LINE__); + goto fail; + } + + /* tests passed, destroy heap */ + if (rte_malloc_heap_destroy(valid_name) != 0) { + printf("%s():%i: Failed to destroy valid heap\n", + __func__, __LINE__); + goto fail; + } + return 0; +fail: + rte_malloc_heap_destroy(valid_name); + return -1; +} + +static int +test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages) +{ + const char *heap_name = "heap"; + void *ptr = NULL; + int socket_id, i; + const struct rte_memzone *mz = NULL; + + /* create heap */ + if (rte_malloc_heap_create(heap_name) != 0) { + printf("%s():%i: Failed to create malloc heap\n", + __func__, __LINE__); + goto fail; + } + + /* get socket ID corresponding to this heap */ + socket_id = rte_malloc_heap_get_socket(heap_name); + if (socket_id < 0) { + printf("%s():%i: cannot find socket for external heap\n", + __func__, __LINE__); + goto fail; + } + + /* heap is empty, so any allocation should fail */ + ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id); + if (ptr != NULL) { + printf("%s():%i: Allocated from empty heap\n", __func__, + __LINE__); + goto fail; + } + + /* add memory to heap */ + if (rte_malloc_heap_memory_add(heap_name, addr, len, + iova, n_pages, pgsz) != 0) { + printf("%s():%i: Failed to add memory to heap\n", + __func__, __LINE__); + goto fail; + } + + /* check that we can get this memory from EAL now */ + for (i = 0; i < n_pages; i++) { + const struct rte_memseg *ms; + void *cur = RTE_PTR_ADD(addr, pgsz * i); + + ms = rte_mem_virt2memseg(cur, NULL); + if (ms == NULL) { + printf("%s():%i: Failed to retrieve memseg for external mem\n", + __func__, __LINE__); + goto fail; + } + if (ms->addr != cur) { + printf("%s():%i: VA mismatch\n", __func__, __LINE__); + goto fail; + } + if (ms->iova != iova[i]) { + printf("%s():%i: IOVA mismatch\n", __func__, __LINE__); + goto fail; + } + } + + /* allocate - this now should succeed */ + ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id); + if (ptr == NULL) { + printf("%s():%i: Failed to allocate from external heap\n", + __func__, __LINE__); + goto fail; + } + + /* check if address is in expected range */ + if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) { + printf("%s():%i: Allocated from unexpected address space\n", + __func__, __LINE__); + goto fail; + } + + /* we've allocated something - removing memory should fail */ + if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 || + rte_errno != EBUSY) { + printf("%s():%i: Removing memory succeeded when memory is not free\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) { + printf("%s():%i: Destroying heap succeeded when memory is not free\n", + __func__, __LINE__); + goto fail; + } + + /* try allocating an IOVA-contiguous memzone - this should succeed + * because we've set up a contiguous IOVA table. + */ + mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id, + RTE_MEMZONE_IOVA_CONTIG); + if (mz == NULL) { + printf("%s():%i: Failed to reserve memzone\n", + __func__, __LINE__); + goto fail; + } + + rte_malloc_dump_stats(stdout, NULL); + rte_malloc_dump_heaps(stdout); + + /* free memory - removing it should now succeed */ + rte_free(ptr); + ptr = NULL; + + rte_memzone_free(mz); + mz = NULL; + + if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) { + printf("%s():%i: Removing memory from heap failed\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_destroy(heap_name) != 0) { + printf("%s():%i: Destroying heap failed\n", + __func__, __LINE__); + goto fail; + } + + return 0; +fail: + rte_memzone_free(mz); + rte_free(ptr); + /* even if something failed, attempt to clean up */ + rte_malloc_heap_memory_remove(heap_name, addr, len); + rte_malloc_heap_destroy(heap_name); + + return -1; +} + +/* we need to test attach/detach in secondary processes. */ +static int +test_external_mem(void) +{ + size_t len = EXTERNAL_MEM_SZ; + size_t pgsz = RTE_PGSIZE_4K; + rte_iova_t iova[len / pgsz]; + void *addr; + int ret, n_pages; + + /* create external memory area */ + n_pages = RTE_DIM(iova); + addr = mmap(NULL, len, PROT_WRITE | PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (addr == MAP_FAILED) { + printf("%s():%i: Failed to create dummy memory area\n", + __func__, __LINE__); + return -1; + } + for (int i = 0; i < n_pages; i++) { + /* arbitrary IOVA */ + rte_iova_t tmp = 0x100000000 + i * pgsz; + iova[i] = tmp; + } + + ret = test_invalid_param(addr, len, pgsz, iova, n_pages); + ret |= test_basic(addr, len, pgsz, iova, n_pages); + + munmap(addr, len); + + return ret; +} + +REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem); From patchwork Tue Oct 2 13:34:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45895 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 154A91B156; Tue, 2 Oct 2018 15:35:44 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 587CA1B135 for ; Tue, 2 Oct 2018 15:35:35 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="95369386" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 02 Oct 2018 06:35:08 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ67g009271; Tue, 2 Oct 2018 14:35:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ6vu032245; Tue, 2 Oct 2018 14:35:06 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ6ta032241; Tue, 2 Oct 2018 14:35:06 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Wenzhuo Lu , Jingjing Wu , Bernard Iremonger , John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:57 +0100 Message-Id: <26140c6b05fab32ce44da040de23bf308598923c.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, mempools can only be allocated either using native DPDK memory, or anonymous memory. This patch will add two new methods to allocate mempool using external memory (regular or hugepage memory), and add documentation about it to testpmd user guide. It adds a new flag "--mp-alloc", with four possible values: native (use regular DPDK allocator), anon (use anonymous mempool), xmem (use externally allocated memory area), and xmemhuge (use externally allocated hugepage memory area). Old flag "--mp-anon" is kept for compatibility. All external memory is allocated using the same external heap, but each will allocate and add a new memory area. Signed-off-by: Anatoly Burakov Suggested-by: Konstantin Ananyev Acked-by: Bernard Iremonger --- app/test-pmd/config.c | 21 +- app/test-pmd/parameters.c | 23 +- app/test-pmd/testpmd.c | 325 ++++++++++++++++++++++++-- app/test-pmd/testpmd.h | 13 +- doc/guides/testpmd_app_ug/run_app.rst | 12 + 5 files changed, 369 insertions(+), 25 deletions(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 794aa5268..3b921cfc6 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -2423,6 +2423,23 @@ fwd_config_setup(void) simple_fwd_config_setup(); } +static const char * +mp_alloc_to_str(uint8_t mode) +{ + switch (mode) { + case MP_ALLOC_NATIVE: + return "native"; + case MP_ALLOC_ANON: + return "anon"; + case MP_ALLOC_XMEM: + return "xmem"; + case MP_ALLOC_XMEM_HUGE: + return "xmemhuge"; + default: + return "invalid"; + } +} + void pkt_fwd_config_display(struct fwd_config *cfg) { @@ -2431,12 +2448,12 @@ pkt_fwd_config_display(struct fwd_config *cfg) streamid_t sm_id; printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - " - "NUMA support %s, MP over anonymous pages %s\n", + "NUMA support %s, MP allocation mode: %s\n", cfg->fwd_eng->fwd_mode_name, retry_enabled == 0 ? "" : " with retry", cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams, numa_support == 1 ? "enabled" : "disabled", - mp_anon != 0 ? "enabled" : "disabled"); + mp_alloc_to_str(mp_alloc_type)); if (retry_enabled) printf("TX retry num: %u, delay between TX retries: %uus\n", diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 9220e1c1b..565bea730 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -190,6 +190,11 @@ usage(char* progname) printf(" --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n"); printf(" --mlockall: lock all memory\n"); printf(" --no-mlockall: do not lock all memory\n"); + printf(" --mp-alloc : mempool allocation method.\n" + " native: use regular DPDK memory to create and populate mempool\n" + " anon: use regular DPDK memory to create and anonymous memory to populate mempool\n" + " xmem: use anonymous memory to create and populate mempool\n" + " xmemhuge: use anonymous hugepage memory to create and populate mempool\n"); } #ifdef RTE_LIBRTE_CMDLINE @@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv) { "vxlan-gpe-port", 1, 0, 0 }, { "mlockall", 0, 0, 0 }, { "no-mlockall", 0, 0, 0 }, + { "mp-alloc", 1, 0, 0 }, { 0, 0, 0, 0 }, }; @@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv) if (!strcmp(lgopts[opt_idx].name, "numa")) numa_support = 1; if (!strcmp(lgopts[opt_idx].name, "mp-anon")) { - mp_anon = 1; + mp_alloc_type = MP_ALLOC_ANON; + } + if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) { + if (!strcmp(optarg, "native")) + mp_alloc_type = MP_ALLOC_NATIVE; + else if (!strcmp(optarg, "anon")) + mp_alloc_type = MP_ALLOC_ANON; + else if (!strcmp(optarg, "xmem")) + mp_alloc_type = MP_ALLOC_XMEM; + else if (!strcmp(optarg, "xmemhuge")) + mp_alloc_type = MP_ALLOC_XMEM_HUGE; + else + rte_exit(EXIT_FAILURE, + "mp-alloc %s invalid - must be: " + "native, anon, xmem or xmemhuge\n", + optarg); } if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) { if (parse_portnuma_config(optarg)) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 001f0e552..d9e0a5ddb 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -63,6 +64,22 @@ #include "testpmd.h" +#ifndef MAP_HUGETLB +/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */ +#define HUGE_FLAG (0x40000) +#else +#define HUGE_FLAG MAP_HUGETLB +#endif + +#ifndef MAP_HUGE_SHIFT +/* older kernels (or FreeBSD) will not have this define */ +#define HUGE_SHIFT (26) +#else +#define HUGE_SHIFT MAP_HUGE_SHIFT +#endif + +#define EXTMEM_HEAP_NAME "extmem" + uint16_t verbose_level = 0; /**< Silent by default. */ int testpmd_logtype; /**< Log type for testpmd logs */ @@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */ uint8_t socket_num = UMA_NO_CONFIG; /* - * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs. + * Select mempool allocation type: + * - native: use regular DPDK memory + * - anon: use regular DPDK memory to create mempool, but populate using + * anonymous memory (may not be IOVA-contiguous) + * - xmem: use externally allocated hugepage memory */ -uint8_t mp_anon = 0; +uint8_t mp_alloc_type = MP_ALLOC_NATIVE; /* * Store specified sockets on which memory pool to be used by ports @@ -527,6 +548,236 @@ set_def_fwd_config(void) set_default_fwd_ports_config(); } +/* extremely pessimistic estimation of memory required to create a mempool */ +static int +calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out) +{ + unsigned int n_pages, mbuf_per_pg, leftover; + uint64_t total_mem, mbuf_mem, obj_sz; + + /* there is no good way to predict how much space the mempool will + * occupy because it will allocate chunks on the fly, and some of those + * will come from default DPDK memory while some will come from our + * external memory, so just assume 128MB will be enough for everyone. + */ + uint64_t hdr_mem = 128 << 20; + + /* account for possible non-contiguousness */ + obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL); + if (obj_sz > pgsz) { + TESTPMD_LOG(ERR, "Object size is bigger than page size\n"); + return -1; + } + + mbuf_per_pg = pgsz / obj_sz; + leftover = (nb_mbufs % mbuf_per_pg) > 0; + n_pages = (nb_mbufs / mbuf_per_pg) + leftover; + + mbuf_mem = n_pages * pgsz; + + total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz); + + if (total_mem > SIZE_MAX) { + TESTPMD_LOG(ERR, "Memory size too big\n"); + return -1; + } + *out = (size_t)total_mem; + + return 0; +} + +static inline uint32_t +bsf64(uint64_t v) +{ + return (uint32_t)__builtin_ctzll(v); +} + +static inline uint32_t +log2_u64(uint64_t v) +{ + if (v == 0) + return 0; + v = rte_align64pow2(v); + return bsf64(v); +} + +static int +pagesz_flags(uint64_t page_sz) +{ + /* as per mmap() manpage, all page sizes are log2 of page size + * shifted by MAP_HUGE_SHIFT + */ + int log2 = log2_u64(page_sz); + + return (log2 << HUGE_SHIFT); +} + +static void * +alloc_mem(size_t memsz, size_t pgsz, bool huge) +{ + void *addr; + int flags; + + /* allocate anonymous hugepages */ + flags = MAP_ANONYMOUS | MAP_PRIVATE; + if (huge) + flags |= HUGE_FLAG | pagesz_flags(pgsz); + + addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0); + if (addr == MAP_FAILED) + return NULL; + + return addr; +} + +struct extmem_param { + void *addr; + size_t len; + size_t pgsz; + rte_iova_t *iova_table; + unsigned int iova_table_len; +}; + +static int +create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param, + bool huge) +{ + uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */ + RTE_PGSIZE_16M, RTE_PGSIZE_16G}; /* POWER */ + unsigned int cur_page, n_pages, pgsz_idx; + size_t mem_sz, cur_pgsz; + rte_iova_t *iovas = NULL; + void *addr; + int ret; + + for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) { + /* skip anything that is too big */ + if (pgsizes[pgsz_idx] > SIZE_MAX) + continue; + + cur_pgsz = pgsizes[pgsz_idx]; + + /* if we were told not to allocate hugepages, override */ + if (!huge) + cur_pgsz = sysconf(_SC_PAGESIZE); + + ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz); + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot calculate memory size\n"); + return -1; + } + + /* allocate our memory */ + addr = alloc_mem(mem_sz, cur_pgsz, huge); + + /* if we couldn't allocate memory with a specified page size, + * that doesn't mean we can't do it with other page sizes, so + * try another one. + */ + if (addr == NULL) + continue; + + /* store IOVA addresses for every page in this memory area */ + n_pages = mem_sz / cur_pgsz; + + iovas = malloc(sizeof(*iovas) * n_pages); + + if (iovas == NULL) { + TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n"); + goto fail; + } + /* lock memory if it's not huge pages */ + if (!huge) + mlock(addr, mem_sz); + + /* populate IOVA addresses */ + for (cur_page = 0; cur_page < n_pages; cur_page++) { + rte_iova_t iova; + size_t offset; + void *cur; + + offset = cur_pgsz * cur_page; + cur = RTE_PTR_ADD(addr, offset); + + /* touch the page before getting its IOVA */ + *(volatile char *)cur = 0; + + iova = rte_mem_virt2iova(cur); + + iovas[cur_page] = iova; + } + + break; + } + /* if we couldn't allocate anything */ + if (iovas == NULL) + return -1; + + param->addr = addr; + param->len = mem_sz; + param->pgsz = cur_pgsz; + param->iova_table = iovas; + param->iova_table_len = n_pages; + + return 0; +fail: + if (iovas) + free(iovas); + if (addr) + munmap(addr, mem_sz); + + return -1; +} + +static int +setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge) +{ + struct extmem_param param; + int socket_id, ret; + + memset(¶m, 0, sizeof(param)); + + /* check if our heap exists */ + socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME); + if (socket_id < 0) { + /* create our heap */ + ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME); + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot create heap\n"); + return -1; + } + } + + ret = create_extmem(nb_mbufs, mbuf_sz, ¶m, huge); + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot create memory area\n"); + return -1; + } + + /* we now have a valid memory area, so add it to heap */ + ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME, + param.addr, param.len, param.iova_table, + param.iova_table_len, param.pgsz); + + /* when using VFIO, memory is automatically mapped for DMA by EAL */ + + /* not needed any more */ + free(param.iova_table); + + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot add memory to heap\n"); + munmap(param.addr, param.len); + return -1; + } + + /* success */ + + TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n", + param.len >> 20); + + return 0; +} + /* * Configuration initialisation done once at init time. */ @@ -545,27 +796,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf, "create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n", pool_name, nb_mbuf, mbuf_seg_size, socket_id); - if (mp_anon != 0) { - rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf, - mb_size, (unsigned) mb_mempool_cache, - sizeof(struct rte_pktmbuf_pool_private), - socket_id, 0); - if (rte_mp == NULL) - goto err; + switch (mp_alloc_type) { + case MP_ALLOC_NATIVE: + { + /* wrapper to rte_mempool_create() */ + TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", + rte_mbuf_best_mempool_ops()); + rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, + mb_mempool_cache, 0, mbuf_seg_size, socket_id); + break; + } + case MP_ALLOC_ANON: + { + rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf, + mb_size, (unsigned int) mb_mempool_cache, + sizeof(struct rte_pktmbuf_pool_private), + socket_id, 0); + if (rte_mp == NULL) + goto err; + + if (rte_mempool_populate_anon(rte_mp) == 0) { + rte_mempool_free(rte_mp); + rte_mp = NULL; + goto err; + } + rte_pktmbuf_pool_init(rte_mp, NULL); + rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL); + break; + } + case MP_ALLOC_XMEM: + case MP_ALLOC_XMEM_HUGE: + { + int heap_socket; + bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE; - if (rte_mempool_populate_anon(rte_mp) == 0) { - rte_mempool_free(rte_mp); - rte_mp = NULL; - goto err; + if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0) + rte_exit(EXIT_FAILURE, "Could not create external memory\n"); + + heap_socket = + rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME); + if (heap_socket < 0) + rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n"); + + TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", + rte_mbuf_best_mempool_ops()); + rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, + mb_mempool_cache, 0, mbuf_seg_size, + heap_socket); + break; + } + default: + { + rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n"); } - rte_pktmbuf_pool_init(rte_mp, NULL); - rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL); - } else { - /* wrapper to rte_mempool_create() */ - TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", - rte_mbuf_best_mempool_ops()); - rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, - mb_mempool_cache, 0, mbuf_seg_size, socket_id); } err: diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index a1f661472..65e0cec90 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -69,6 +69,16 @@ enum { PORT_TOPOLOGY_LOOP, }; +enum { + MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */ + MP_ALLOC_ANON, + /**< allocate mempool natively, but populate using anonymous memory */ + MP_ALLOC_XMEM, + /**< allocate and populate mempool using anonymous memory */ + MP_ALLOC_XMEM_HUGE + /**< allocate and populate mempool using anonymous hugepage memory */ +}; + #ifdef RTE_TEST_PMD_RECORD_BURST_STATS /** * The data structure associated with RX and TX packet burst statistics @@ -304,7 +314,8 @@ extern uint8_t numa_support; /**< set by "--numa" parameter */ extern uint16_t port_topology; /**< set by "--port-topology" parameter */ extern uint8_t no_flush_rx; /**`` + + Select mempool allocation mode: + + * native: create and populate mempool using native DPDK memory + * anon: create mempool using native DPDK memory, but populate using + anonymous memory + * xmem: create and populate mempool using externally and anonymously + allocated area + * xmemhuge: create and populate mempool using externally and anonymously + allocated hugepage area From patchwork Tue Oct 2 13:34:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45891 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7B03C1B134; Tue, 2 Oct 2018 15:35:37 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 84DAE6C9B for ; Tue, 2 Oct 2018 15:35:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="78070990" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 02 Oct 2018 06:35:07 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ6Yp009277; Tue, 2 Oct 2018 14:35:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ6fo032252; Tue, 2 Oct 2018 14:35:06 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ6JJ032248; Tue, 2 Oct 2018 14:35:06 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:58 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 20/21] doc: add external memory feature to the release notes X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Document the addition of external memory support to DPDK. Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/release_18_11.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index e7674adb9..8fe463d72 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -54,6 +54,12 @@ New Features Also, make sure to start the actual text at the margin. ========================================================= +* **Added support for using externally allocated memory in DPDK.** + + DPDK has gained support for creating new ``rte_malloc`` heaps referencing + memory that was created outside of DPDK's own page allocator, and using that + memory natively with any other DPDK library or data structure. + * **Add support to offload more flow match and actions for CXGBE PMD** Flow API support has been enhanced for CXGBE Poll Mode Driver to offload: From patchwork Tue Oct 2 13:34:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45884 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 08B307EE3; Tue, 2 Oct 2018 15:35:26 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 2EE6D5A44 for ; Tue, 2 Oct 2018 15:35:12 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 06:35:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="267788306" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga005.fm.intel.com with ESMTP; 02 Oct 2018 06:35:08 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w92DZ7em009289; Tue, 2 Oct 2018 14:35:07 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w92DZ6pH032259; Tue, 2 Oct 2018 14:35:06 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w92DZ6Yo032255; Tue, 2 Oct 2018 14:35:06 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com, alejandro.lucero@netronome.com Date: Tue, 2 Oct 2018 14:34:59 +0100 Message-Id: <83dbdc020317a7c5da9f4788cbf7b93a9d4072e9.1538486972.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v9 21/21] doc: add external memory feature to programmer's guide X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a short chapter on usage of external memory in DPDK to the Programmer's Guide. Signed-off-by: Anatoly Burakov --- .../prog_guide/env_abstraction_layer.rst | 37 +++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index d362c9209..00ce64ceb 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -213,6 +213,43 @@ Normally, these options do not need to be changed. can later be mapped into that preallocated VA space (if dynamic memory mode is enabled), and can optionally be mapped into it at startup. +Support for Externally Allocated Memory +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is possible to use externally allocated memory in DPDK, using a set of malloc +heap API's. Support for externally allocated memory is implemented through +overloading the socket ID - externally allocated heaps will have socket ID's +that would be considered invalid under normal circumstances. Requesting an +allocation to take place from a specified externally allocated memory is a +matter of supplying the correct socket ID to DPDK allocator, either directly +(e.g. through a call to ``rte_malloc``) or indirectly (through data +structure-specific allocation API's such as ``rte_ring_create``). + +Since there is no way DPDK can verify whether memory are is available or valid, +this responsibility falls on the shoulders of the user. All multiprocess +synchronization is also user's responsibility, as well as ensuring that all +calls to add/attach/detach/remove memory are done in the correct order. It is +not required to attach to a memory area in all processes - only attach to memory +areas as needed. + +The expected workflow is as follows: + +* Get a pointer to memory area +* Create a named heap +* Add memory area(s) to the heap + - If IOVA table is not specified, IOVA addresses will be assumed to be + unavailable, and DMA mappings will not be performed + - Other processes must attach to the memory area before they can use it +* Get socket ID used for the heap +* Use normal DPDK allocation procedures, using supplied socket ID +* If memory area is no longer needed, it can be removed from the heap + - Other processes must detach from this memory area before it can be removed +* If heap is no longer needed, remove it + - Socket ID will become invalid and will not be reused + +For more information, please refer to ``rte_malloc`` API documentation, +specifically the ``rte_malloc_heap_*`` family of function calls. + PCI Access ~~~~~~~~~~