From patchwork Tue Apr 11 06:48:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Morten_Br=C3=B8rup?= X-Patchwork-Id: 125893 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BFAE242919; Tue, 11 Apr 2023 08:48:49 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4B19F40DFD; Tue, 11 Apr 2023 08:48:49 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id B4C3C40A8B for ; Tue, 11 Apr 2023 08:48:48 +0200 (CEST) Received: from dkrd2.smartsharesys.local ([192.168.4.12]) by smartserver.smartsharesystems.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 11 Apr 2023 08:48:47 +0200 From: =?utf-8?q?Morten_Br=C3=B8rup?= To: olivier.matz@6wind.com, andrew.rybchenko@oktetlabs.ru Cc: dev@dpdk.org, =?utf-8?q?Morten_Br=C3=B8rup?= Subject: [PATCH] mempool: optimize get objects with constant n Date: Tue, 11 Apr 2023 08:48:45 +0200 Message-Id: <20230411064845.37713-1-mb@smartsharesystems.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-OriginalArrivalTime: 11 Apr 2023 06:48:47.0171 (UTC) FILETIME=[AA674930:01D96C41] X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org When getting objects from the mempool, the number of objects to get is often constant at build time. This patch adds another code path for this case, so the compiler can optimize more, e.g. unroll the copy loop when the entire request is satisfied from the cache. On an Intel(R) Xeon(R) E5-2620 v4 CPU, and compiled with gcc 9.4.0, mempool_perf_test with constant n shows an increase in rate_persec by an average of 17 %, minimum 9.5 %, maximum 24 %. The code path where the number of objects to get is unknown at build time remains essentially unchanged. Signed-off-by: Morten Brørup Acked-by: Bruce Richardson --- lib/mempool/rte_mempool.h | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index 9f530db24b..ade0100ec7 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -1500,15 +1500,33 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table, if (unlikely(cache == NULL)) goto driver_dequeue; - /* Use the cache as much as we have to return hot objects first */ - len = RTE_MIN(remaining, cache->len); cache_objs = &cache->objs[cache->len]; + + if (__extension__(__builtin_constant_p(n)) && n <= cache->len) { + /* + * The request size is known at build time, and + * the entire request can be satisfied from the cache, + * so let the compiler unroll the fixed length copy loop. + */ + cache->len -= n; + for (index = 0; index < n; index++) + *obj_table++ = *--cache_objs; + + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); + + return 0; + } + + /* Use the cache as much as we have to return hot objects first */ + len = __extension__(__builtin_constant_p(n)) ? cache->len : + RTE_MIN(remaining, cache->len); cache->len -= len; remaining -= len; for (index = 0; index < len; index++) *obj_table++ = *--cache_objs; - if (remaining == 0) { + if (!__extension__(__builtin_constant_p(n)) && remaining == 0) { /* The entire request is satisfied from the cache. */ RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);