From patchwork Fri Sep 21 16:14:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 45137 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E68A01B10F; Fri, 21 Sep 2018 18:14:52 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 20C225F24 for ; Fri, 21 Sep 2018 18:14:24 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Sep 2018 09:14:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,285,1534834800"; d="scan'208";a="82282948" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 21 Sep 2018 09:14:13 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w8LGEC98029234; Fri, 21 Sep 2018 17:14:12 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w8LGECSt002935; Fri, 21 Sep 2018 17:14:12 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w8LGECH4002929; Fri, 21 Sep 2018 17:14:12 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Wenzhuo Lu , Jingjing Wu , Bernard Iremonger , John McNamara , Marko Kovacevic , laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com Date: Fri, 21 Sep 2018 17:14:07 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v4 18/20] app/testpmd: add support for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, mempools can only be allocated either using native DPDK memory, or anonymous memory. This patch will add two new methods to allocate mempool using external memory (regular or hugepage memory), and add documentation about it to testpmd user guide. It adds a new flag "--mp-alloc", with four possible values: native (use regular DPDK allocator), anon (use anonymous mempool), xmem (use externally allocated memory area), and xmemhuge (use externally allocated hugepage memory area). Old flag "--mp-anon" is kept for compatibility. All external memory is allocated using the same external heap, but each will allocate and add a new memory area. Signed-off-by: Anatoly Burakov Suggested-by: Konstantin Ananyev --- app/test-pmd/config.c | 21 +- app/test-pmd/parameters.c | 23 +- app/test-pmd/testpmd.c | 337 ++++++++++++++++++++++++-- app/test-pmd/testpmd.h | 13 +- doc/guides/testpmd_app_ug/run_app.rst | 12 + 5 files changed, 381 insertions(+), 25 deletions(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index a0f934932..4789910b3 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -2413,6 +2413,23 @@ fwd_config_setup(void) simple_fwd_config_setup(); } +static const char * +mp_alloc_to_str(uint8_t mode) +{ + switch (mode) { + case MP_ALLOC_NATIVE: + return "native"; + case MP_ALLOC_ANON: + return "anon"; + case MP_ALLOC_XMEM: + return "xmem"; + case MP_ALLOC_XMEM_HUGE: + return "xmemhuge"; + default: + return "invalid"; + } +} + void pkt_fwd_config_display(struct fwd_config *cfg) { @@ -2421,12 +2438,12 @@ pkt_fwd_config_display(struct fwd_config *cfg) streamid_t sm_id; printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - " - "NUMA support %s, MP over anonymous pages %s\n", + "NUMA support %s, MP allocation mode: %s\n", cfg->fwd_eng->fwd_mode_name, retry_enabled == 0 ? "" : " with retry", cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams, numa_support == 1 ? "enabled" : "disabled", - mp_anon != 0 ? "enabled" : "disabled"); + mp_alloc_to_str(mp_alloc_type)); if (retry_enabled) printf("TX retry num: %u, delay between TX retries: %uus\n", diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 9220e1c1b..b4016668c 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -190,6 +190,11 @@ usage(char* progname) printf(" --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n"); printf(" --mlockall: lock all memory\n"); printf(" --no-mlockall: do not lock all memory\n"); + printf(" --mp-alloc : mempool allocation method.\n" + " native: use regular DPDK memory to create and populate mempool\n" + " anon: use regular DPDK memory to create and anonymous memory to populate mempool\n" + " xmem: use anonymous memory to create and populate mempool\n" + " xmemhuge: use anonymous hugepage memory to create and populate mempool\n"); } #ifdef RTE_LIBRTE_CMDLINE @@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv) { "vxlan-gpe-port", 1, 0, 0 }, { "mlockall", 0, 0, 0 }, { "no-mlockall", 0, 0, 0 }, + { "mp-alloc", 1, 0, 0 }, { 0, 0, 0, 0 }, }; @@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv) if (!strcmp(lgopts[opt_idx].name, "numa")) numa_support = 1; if (!strcmp(lgopts[opt_idx].name, "mp-anon")) { - mp_anon = 1; + mp_alloc_type = MP_ALLOC_ANON; + } + if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) { + if (!strcmp(optarg, "native")) + mp_alloc_type = MP_ALLOC_NATIVE; + else if (!strcmp(optarg, "anon")) + mp_alloc_type = MP_ALLOC_ANON; + else if (!strcmp(optarg, "xmem")) + mp_alloc_type = MP_ALLOC_XMEM; + else if (!strcmp(optarg, "xmemhuge")) + mp_alloc_type = MP_ALLOC_XMEM_HUGE; + else + rte_exit(EXIT_FAILURE, + "mp-alloc %s invalid - must be: " + "native, anon or xmem\n", + optarg); } if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) { if (parse_portnuma_config(optarg)) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 001f0e552..7c5f7dd0a 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -60,9 +61,18 @@ #ifdef RTE_LIBRTE_LATENCY_STATS #include #endif +#include #include "testpmd.h" +#ifndef MAP_HUGE_SHIFT +#define HUGE_SHIFT 26 +#else +#define HUGE_SHIFT MAP_HUGE_SHIFT +#endif + +#define EXTMEM_HEAP_NAME "extmem" + uint16_t verbose_level = 0; /**< Silent by default. */ int testpmd_logtype; /**< Log type for testpmd logs */ @@ -88,9 +98,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */ uint8_t socket_num = UMA_NO_CONFIG; /* - * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs. + * Select mempool allocation type: + * - native: use regular DPDK memory + * - anon: use regular DPDK memory to create mempool, but populate using + * anonymous memory (may not be IOVA-contiguous) + * - xmem: use externally allocated hugepage memory */ -uint8_t mp_anon = 0; +uint8_t mp_alloc_type = MP_ALLOC_NATIVE; /* * Store specified sockets on which memory pool to be used by ports @@ -527,6 +541,255 @@ set_def_fwd_config(void) set_default_fwd_ports_config(); } +/* extremely pessimistic estimation of memory required to create a mempool */ +static int +calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out) +{ + unsigned int n_pages, mbuf_per_pg, leftover; + uint64_t total_mem, mbuf_mem, obj_sz; + + /* there is no good way to predict how much space the mempool will + * occupy because it will allocate chunks on the fly, and some of those + * will come from default DPDK memory while some will come from our + * external memory, so just assume 32MB will be enough for everyone. + */ + uint64_t hdr_mem = 32 << 20; + + /* account for possible non-contiguousness */ + obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL); + if (obj_sz > pgsz) { + TESTPMD_LOG(ERR, "Object size is bigger than page size\n"); + return -1; + } + + mbuf_per_pg = pgsz / obj_sz; + leftover = (nb_mbufs % mbuf_per_pg) > 0; + n_pages = (nb_mbufs / mbuf_per_pg) + leftover; + + mbuf_mem = n_pages * pgsz; + + total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz); + + if (total_mem > SIZE_MAX) { + TESTPMD_LOG(ERR, "Memory size too big\n"); + return -1; + } + *out = (size_t)total_mem; + + return 0; +} + +static inline uint32_t +bsf64(uint64_t v) +{ + return (uint32_t)__builtin_ctzll(v); +} + +static inline uint32_t +log2_u64(uint64_t v) +{ + if (v == 0) + return 0; + v = rte_align64pow2(v); + return bsf64(v); +} + +static int +pagesz_flags(uint64_t page_sz) +{ + /* as per mmap() manpage, all page sizes are log2 of page size + * shifted by MAP_HUGE_SHIFT + */ + int log2 = log2_u64(page_sz); + return log2 << HUGE_SHIFT; +} + +static void * +alloc_mem(size_t memsz, size_t pgsz, bool huge) +{ + void *addr; + int flags; + + /* allocate anonymous hugepages */ + flags = MAP_ANONYMOUS | MAP_PRIVATE; + if (huge) + flags |= MAP_HUGETLB | pagesz_flags(pgsz); + + addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0); + if (addr == MAP_FAILED) + return NULL; + + return addr; +} + +struct extmem_param { + void *addr; + size_t len; + size_t pgsz; + rte_iova_t *iova_table; + unsigned int iova_table_len; +}; + +static int +create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param, + bool huge) +{ + uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */ + RTE_PGSIZE_16M, RTE_PGSIZE_16G}; /* POWER */ + unsigned int n_pages, cur_page, pgsz_idx; + size_t mem_sz, offset, cur_pgsz; + bool vfio_supported = true; + rte_iova_t *iovas = NULL; + void *addr; + int ret; + + for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) { + /* skip anything that is too big */ + if (pgsizes[pgsz_idx] > SIZE_MAX) + continue; + + cur_pgsz = pgsizes[pgsz_idx]; + + /* if we were told not to allocate hugepages, override */ + if (!huge) + cur_pgsz = sysconf(_SC_PAGESIZE); + + ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz); + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot calculate memory size\n"); + return -1; + } + + /* allocate our memory */ + addr = alloc_mem(mem_sz, cur_pgsz, huge); + + /* if we couldn't allocate memory with a specified page size, + * that doesn't mean we can't do it with other page sizes, so + * try another one. + */ + if (addr == NULL) + continue; + + /* store IOVA addresses for every page in this memory area */ + n_pages = mem_sz / cur_pgsz; + + iovas = malloc(sizeof(*iovas) * n_pages); + + if (iovas == NULL) { + TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n"); + goto fail; + } + + /* populate IOVA table */ + for (cur_page = 0; cur_page < n_pages; cur_page++) { + rte_iova_t iova; + void *cur; + + offset = cur_pgsz * cur_page; + cur = RTE_PTR_ADD(addr, offset); + + /* touch the page before finding its IOVA */ + *(volatile char *)cur = *(volatile char *)cur; + + iova = (uintptr_t)rte_mem_virt2iova(cur); + + iovas[cur_page] = iova; + + if (vfio_supported) { + /* map memory for DMA */ + ret = rte_vfio_dma_map((uintptr_t)cur, + iova, cur_pgsz); + if (ret < 0) { + /* + * ENODEV means VFIO is not initialized + * ENOTSUP means current IOMMU mode + * doesn't support mapping + * both cases are not an error + */ + if (rte_errno == ENOTSUP || + rte_errno == ENODEV) + /* VFIO is unsupported, don't + * try again. + */ + vfio_supported = false; + else + /* this is an actual error */ + goto fail; + } + } + } + /* lock memory if it's not huge pages */ + if (!huge) + mlock(addr, mem_sz); + + break; + } + /* if we couldn't allocate anything */ + if (iovas == NULL) + return -1; + + param->addr = addr; + param->len = mem_sz; + param->pgsz = cur_pgsz; + param->iova_table = iovas; + param->iova_table_len = n_pages; + + return 0; +fail: + if (iovas) + free(iovas); + if (addr) + munmap(addr, mem_sz); + + return -1; +} + +static int +setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge) +{ + struct extmem_param param = {}; + int socket_id, ret; + + /* check if our heap exists */ + socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME); + if (socket_id < 0) { + /* create our heap */ + ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME); + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot create heap\n"); + return -1; + } + } + + + ret = create_extmem(nb_mbufs, mbuf_sz, ¶m, huge); + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot create memory area\n"); + return -1; + } + + /* we now have a valid memory area, so add it to heap */ + ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME, + param.addr, param.len, param.iova_table, + param.iova_table_len, param.pgsz); + + /* not needed any more */ + free(param.iova_table); + + if (ret < 0) { + TESTPMD_LOG(ERR, "Cannot add memory to heap\n"); + munmap(param.addr, param.len); + return -1; + } + + /* success */ + + TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n", + param.len >> 20); + + return 0; +} + /* * Configuration initialisation done once at init time. */ @@ -545,27 +808,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf, "create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n", pool_name, nb_mbuf, mbuf_seg_size, socket_id); - if (mp_anon != 0) { - rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf, - mb_size, (unsigned) mb_mempool_cache, - sizeof(struct rte_pktmbuf_pool_private), - socket_id, 0); - if (rte_mp == NULL) - goto err; + switch (mp_alloc_type) { + case MP_ALLOC_NATIVE: + { + /* wrapper to rte_mempool_create() */ + TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", + rte_mbuf_best_mempool_ops()); + rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, + mb_mempool_cache, 0, mbuf_seg_size, socket_id); + break; + } + case MP_ALLOC_ANON: + { + rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf, + mb_size, (unsigned int) mb_mempool_cache, + sizeof(struct rte_pktmbuf_pool_private), + socket_id, 0); + if (rte_mp == NULL) + goto err; + + if (rte_mempool_populate_anon(rte_mp) == 0) { + rte_mempool_free(rte_mp); + rte_mp = NULL; + goto err; + } + rte_pktmbuf_pool_init(rte_mp, NULL); + rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL); + break; + } + case MP_ALLOC_XMEM: + case MP_ALLOC_XMEM_HUGE: + { + int heap_socket; + bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE; - if (rte_mempool_populate_anon(rte_mp) == 0) { - rte_mempool_free(rte_mp); - rte_mp = NULL; - goto err; + if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0) + rte_exit(EXIT_FAILURE, "Could not create external memory\n"); + + heap_socket = + rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME); + if (heap_socket < 0) + rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n"); + + TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", + rte_mbuf_best_mempool_ops()); + rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, + mb_mempool_cache, 0, mbuf_seg_size, + heap_socket); + break; + } + default: + { + rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n"); } - rte_pktmbuf_pool_init(rte_mp, NULL); - rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL); - } else { - /* wrapper to rte_mempool_create() */ - TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", - rte_mbuf_best_mempool_ops()); - rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, - mb_mempool_cache, 0, mbuf_seg_size, socket_id); } err: diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index a1f661472..65e0cec90 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -69,6 +69,16 @@ enum { PORT_TOPOLOGY_LOOP, }; +enum { + MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */ + MP_ALLOC_ANON, + /**< allocate mempool natively, but populate using anonymous memory */ + MP_ALLOC_XMEM, + /**< allocate and populate mempool using anonymous memory */ + MP_ALLOC_XMEM_HUGE + /**< allocate and populate mempool using anonymous hugepage memory */ +}; + #ifdef RTE_TEST_PMD_RECORD_BURST_STATS /** * The data structure associated with RX and TX packet burst statistics @@ -304,7 +314,8 @@ extern uint8_t numa_support; /**< set by "--numa" parameter */ extern uint16_t port_topology; /**< set by "--port-topology" parameter */ extern uint8_t no_flush_rx; /**`` + + Select mempool allocation mode: + + * native: create and populate mempool using native DPDK memory + * anon: create mempool using native DPDK memory, but populate using + anonymous memory + * xmem: create and populate mempool using externally and anonymously + allocated area + * xmemhuge: create and populate mempool using externally and anonymously + allocated hugepage area