From patchwork Tue Dec 19 11:14:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 32466 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2B1691B1B0; Tue, 19 Dec 2017 12:15:21 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 9028A1B019 for ; Tue, 19 Dec 2017 12:14:56 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Dec 2017 03:14:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,426,1508828400"; d="scan'208";a="3023086" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 19 Dec 2017 03:14:54 -0800 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id vBJBErgw003151; Tue, 19 Dec 2017 11:14:53 GMT Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id vBJBErAp010334; Tue, 19 Dec 2017 11:14:53 GMT Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id vBJBErkJ010330; Tue, 19 Dec 2017 11:14:53 GMT From: Anatoly Burakov To: dev@dpdk.org Cc: andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, keith.wiles@intel.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net Date: Tue, 19 Dec 2017 11:14:48 +0000 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If a user has specified that the zone should have contiguous memory, use the new _contig allocation API's instead of normal ones. Otherwise, account for the fact that unless we're in IOVA_AS_VA mode, we cannot guarantee that the pages would be physically contiguous, so we calculate the memzone size and alignments as if we were getting the smallest page size available. Signed-off-by: Anatoly Burakov --- lib/librte_mempool/rte_mempool.c | 84 +++++++++++++++++++++++++++++++++++----- 1 file changed, 75 insertions(+), 9 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index d50dba4..4b9ab22 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -127,6 +127,26 @@ static unsigned optimize_object_size(unsigned obj_size) return new_obj_size * RTE_MEMPOOL_ALIGN; } +static size_t +get_min_page_size(void) { + const struct rte_mem_config *mcfg = + rte_eal_get_configuration()->mem_config; + int i; + size_t min_pagesz = SIZE_MAX; + + for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { + const struct rte_memseg_list *msl = &mcfg->memsegs[i]; + + if (msl->base_va == NULL) + continue; + + if (msl->hugepage_sz < min_pagesz) + min_pagesz = msl->hugepage_sz; + } + + return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz; +} + static void mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova) { @@ -568,6 +588,7 @@ rte_mempool_populate_default(struct rte_mempool *mp) unsigned mz_id, n; unsigned int mp_flags; int ret; + bool force_contig, no_contig; /* mempool must not be populated */ if (mp->nb_mem_chunks != 0) @@ -582,10 +603,46 @@ rte_mempool_populate_default(struct rte_mempool *mp) /* update mempool capabilities */ mp->flags |= mp_flags; - if (rte_eal_has_hugepages()) { - pg_shift = 0; /* not needed, zone is physically contiguous */ + no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG; + force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG; + + /* + * there are several considerations for page size and page shift here. + * + * if we don't need our mempools to have physically contiguous objects, + * then just set page shift and page size to 0, because the user has + * indicated that there's no need to care about anything. + * + * if we do need contiguous objects, there is also an option to reserve + * the entire mempool memory as one contiguous block of memory, in + * which case the page shift and alignment wouldn't matter as well. + * + * if we require contiguous objects, but not necessarily the entire + * mempool reserved space to be contiguous, then there are two options. + * + * if our IO addresses are virtual, not actual physical (IOVA as VA + * case), then no page shift needed - our memory allocation will give us + * contiguous physical memory as far as the hardware is concerned, so + * act as if we're getting contiguous memory. + * + * if our IO addresses are physical, we may get memory from bigger + * pages, or we might get memory from smaller pages, and how much of it + * we require depends on whether we want bigger or smaller pages. + * However, requesting each and every memory size is too much work, so + * what we'll do instead is walk through the page sizes available, pick + * the smallest one and set up page shift to match that one. We will be + * wasting some space this way, but it's much nicer than looping around + * trying to reserve each and every page size. + */ + + if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) { pg_sz = 0; + pg_shift = 0; align = RTE_CACHE_LINE_SIZE; + } else if (rte_eal_has_hugepages()) { + pg_sz = get_min_page_size(); + pg_shift = rte_bsf32(pg_sz); + align = pg_sz; } else { pg_sz = getpagesize(); pg_shift = rte_bsf32(pg_sz); @@ -604,23 +661,32 @@ rte_mempool_populate_default(struct rte_mempool *mp) goto fail; } - mz = rte_memzone_reserve_aligned(mz_name, size, - mp->socket_id, mz_flags, align); - /* not enough memory, retry with the biggest zone we have */ - if (mz == NULL) - mz = rte_memzone_reserve_aligned(mz_name, 0, + if (force_contig) { + /* + * if contiguous memory for entire mempool memory was + * requested, don't try reserving again if we fail. + */ + mz = rte_memzone_reserve_aligned_contig(mz_name, size, + mp->socket_id, mz_flags, align); + } else { + mz = rte_memzone_reserve_aligned(mz_name, size, mp->socket_id, mz_flags, align); + /* not enough memory, retry with the biggest zone we have */ + if (mz == NULL) + mz = rte_memzone_reserve_aligned(mz_name, 0, + mp->socket_id, mz_flags, align); + } if (mz == NULL) { ret = -rte_errno; goto fail; } - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) + if (no_contig) iova = RTE_BAD_IOVA; else iova = mz->iova; - if (rte_eal_has_hugepages()) + if (rte_eal_has_hugepages() && force_contig) ret = rte_mempool_populate_iova(mp, mz->addr, iova, mz->len, rte_mempool_memchunk_mz_free,