From patchwork Fri Sep 21 16:13:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 45125 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E37A25F1C; Fri, 21 Sep 2018 18:14:15 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id CE55B5F14 for ; Fri, 21 Sep 2018 18:14:14 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Sep 2018 09:14:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,285,1534834800"; d="scan'208";a="75173450" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 21 Sep 2018 09:14:10 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w8LGE9Ow029177; Fri, 21 Sep 2018 17:14:09 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w8LGE9Sd002758; Fri, 21 Sep 2018 17:14:09 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w8LGE9ok002752; Fri, 21 Sep 2018 17:14:09 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com Date: Fri, 21 Sep 2018 17:13:49 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This is a proposal to enable using externally allocated memory in DPDK. In a nutshell, here is what is being done here: - Index internal malloc heaps by NUMA node index, rather than NUMA node itself (external heaps will have ID's in order of creation) - Add identifier string to malloc heap, to uniquely identify it - Each new heap will receive a unique socket ID that will be used by allocator to decide from which heap (internal or external) to allocate requested amount of memory - Allow creating named heaps and add/remove memory to/from those heaps - Allocate memseg lists at runtime, to keep track of IOVA addresses of externally allocated memory - If IOVA addresses aren't provided, use RTE_BAD_IOVA - Allow malloc and memzones to allocate from external heaps - Allow other data structures to allocate from externall heaps The responsibility to ensure memory is accessible before using it is on the shoulders of the user - there is no checking done with regards to validity of the memory (nor could there be...). The general approach is to create heap and add memory into it. For any other process wishing to use the same memory, said memory must first be attached (otherwise some things will not work). A design decision was made to make multiprocess synchronization a manual process. Due to underlying issues with attaching to fbarrays in secondary processes, this design was deemed to be better because we don't want to fail to create external heap in the primary because something in the secondary has failed when in fact we may not eve have wanted this memory to be accessible in the secondary in the first place. Using external memory in multiprocess is *hard*, because not only memory space needs to be preallocated, but it also needs to be attached in each process to allow other processes to access the page table. The attach API call may or may not succeed, depending on memory layout, for reasons similar to other multiprocess failures. This is treated as a "known issue" for this release. Creating and destroying heaps is currently restricted to primary processes, because we need to keep track of all socket ID's we've ever used to prevent their reuse, and obviously different processes would have kept different socket ID counters, and it isn't important enough to put into shared memory. This means that secondary processes will not be able to create new heaps. If this use case is important enough, we can put the max socket ID into shared memory, or allow socket ID reuse (which i do not think is a good idea because it has the potential to make things harder to debug). v4 -> v3 changes: - Dropped sample application in favor of new testpmd flag - Added new flag to testpmd, with four options of mempool allocation - Added new API to check if a socket ID belongs to an external heap - Adjusted malloc and mempool code to not make any assumptions about IOVA-contiguousness when dealing with externally allocated memory v3 -> v2 changes: - Rebase on top of latest master - Clarifications added to mempool code as per Andrew Rynchenko's comments v2 -> v1 changes: - Fixed NULL dereference on heap socket ID lookup - Fixed memseg offset calculation on adding memory to heap - Improved unit test to test for above bugfixes - Restricted heap creation to primary processes only - Added sample application - Added documentation RFC -> v1 changes: - Removed the "named heaps" API, allocate using fake socket ID instead - Added multiprocess support - Everything is now thread-safe - Numerous bugfixes and API improvements Anatoly Burakov (20): mem: add length to memseg list mem: allow memseg lists to be marked as external malloc: index heaps using heap ID rather than NUMA node mem: do not check for invalid socket ID flow_classify: do not check for invalid socket ID pipeline: do not check for invalid socket ID sched: do not check for invalid socket ID malloc: add name to malloc heaps malloc: add function to query socket ID of named heap malloc: add function to check if socket is external malloc: allow creating malloc heaps malloc: allow destroying heaps malloc: allow adding memory to named heaps malloc: allow removing memory from named heaps malloc: allow attaching to external memory chunks malloc: allow detaching from external memory test: add unit tests for external memory support app/testpmd: add support for external memory doc: add external memory feature to the release notes doc: add external memory feature to programmer's guide app/test-pmd/config.c | 21 +- app/test-pmd/parameters.c | 23 +- app/test-pmd/testpmd.c | 337 +++++++++++++- app/test-pmd/testpmd.h | 13 +- config/common_base | 1 + config/rte_config.h | 1 + .../prog_guide/env_abstraction_layer.rst | 38 ++ doc/guides/rel_notes/deprecation.rst | 15 - doc/guides/rel_notes/release_18_11.rst | 24 +- doc/guides/testpmd_app_ug/run_app.rst | 12 + drivers/bus/fslmc/fslmc_vfio.c | 7 +- drivers/bus/pci/linux/pci.c | 2 +- drivers/net/mlx4/mlx4_mr.c | 3 + drivers/net/mlx5/mlx5.c | 5 +- drivers/net/mlx5/mlx5_mr.c | 3 + drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +- lib/librte_eal/bsdapp/eal/Makefile | 2 +- lib/librte_eal/bsdapp/eal/eal.c | 3 + lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +- lib/librte_eal/common/eal_common_memory.c | 8 +- lib/librte_eal/common/eal_common_memzone.c | 8 +- .../common/include/rte_eal_memconfig.h | 6 +- lib/librte_eal/common/include/rte_malloc.h | 198 +++++++++ .../common/include/rte_malloc_heap.h | 3 + lib/librte_eal/common/include/rte_memory.h | 9 + lib/librte_eal/common/malloc_elem.c | 10 +- lib/librte_eal/common/malloc_heap.c | 300 +++++++++++-- lib/librte_eal/common/malloc_heap.h | 17 + lib/librte_eal/common/rte_malloc.c | 420 +++++++++++++++++- lib/librte_eal/linuxapp/eal/Makefile | 2 +- lib/librte_eal/linuxapp/eal/eal.c | 10 +- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +- lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +- lib/librte_eal/meson.build | 2 +- lib/librte_eal/rte_eal_version.map | 8 + lib/librte_flow_classify/rte_flow_classify.c | 3 +- lib/librte_mempool/rte_mempool.c | 57 ++- lib/librte_pipeline/rte_pipeline.c | 3 +- lib/librte_sched/rte_sched.c | 2 +- test/test/Makefile | 1 + test/test/autotest_data.py | 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 389 ++++++++++++++++ test/test/test_malloc.c | 3 + test/test/test_memzone.c | 3 + 46 files changed, 1897 insertions(+), 137 deletions(-) create mode 100644 test/test/test_external_mem.c