From patchwork Fri Oct 11 08:18:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Mattias_R=C3=B6nnblom?= X-Patchwork-Id: 145741 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1AF3845B12; Fri, 11 Oct 2024 10:28:50 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1E65240611; Fri, 11 Oct 2024 10:28:29 +0200 (CEST) Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on2066.outbound.protection.outlook.com [40.107.103.66]) by mails.dpdk.org (Postfix) with ESMTP id 2405B400D5 for ; Fri, 11 Oct 2024 10:28:22 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=iEkBsFnQtTrFqTcghNV8kXQyiWFu10p4syTN2NuhBIPQ9P8LUtdD1GlASzX88y6d/a0VeyHm6ANafIdcX56rQICSTYxR6Ea7CN+G4OdEK+XP4o3PxMDO3rFpn9Cp7MJn3McFJjLRYJriZhoIy4Vuhmgqq2jFMpB+P0FFVBd5BKAq1+KeEPV2XLZ954DZI3UzHGIN6C86xaJcJ5hvRMDXWAoE03nTk0WaPkNw4RC2ylTsPaFobnwDTkiOH7MiuTPxDolWPsNERrBsEjZT1MCaBnJcaloRQcg0D+mXFrYiZCkJ3v6teYOHJ3dLkimFgFedtFR9erS96zvvyYPREhdgtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3SE7ek+/krDICvlTVpg7t/7m7wAOxBjgLDNI2xEB+jk=; b=SGzQ6r77h8A9OffE3hW2eg3z7x1JAKJiZtCJvxaInxJla6YfoEHTIyUGP0QGa5jfQK/kVjcAP8GqtUmF2Y3utnbNa1C+mys0XY/bab6tbpcjF0xMiCsnMxLLFXoz6AWiunKIEQ1we6bkeamIm7+vooDZDjDmKIJgg97tFtZizUkmSsimI0TBTEcf3kSV4HRsTBfYOC9IJVD+mqwBBf17cO8lkoWA0M14TP5/qTYDLxjzDMW1M7Y8vFcwdUact/9/Xxkl3hs9TO3H+qZQtmjdHlzhm2i2QxBWkrf1W07ghocxCChPBM1AMp3oOVI3QLtDn275azgXuEeaiCn1f8jx5g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 192.176.1.74) smtp.rcpttodomain=dpdk.org smtp.mailfrom=ericsson.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=ericsson.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3SE7ek+/krDICvlTVpg7t/7m7wAOxBjgLDNI2xEB+jk=; b=FhVuXIUsxpEyspdIPV1qznj2ZI4+LFXDq5GFbn9AgrM4mqLQnhtM57VaSHMmMdOUxO+szaIyICDkdbP0Hu8KcuC2EaFmJZRSF1mUUnwqCc/cDiRrw+s1gm4Z2A9+xRG0hnT+Uu1ReF68P9j8E/pNzQRs4zcE8gpXE4cQu007G7Fi/kY7/+Z/dqDUZQdcs8wqXQsmZkxwytnUfWG2G8ilif4otxTvLt3WYsyJWKex7edQqeRqxNWQn10UX0PJeq8JDlGJqDsuyCjUIe+GrdpV9Ag0QPGns0ju73PD/knmNKwth9jXoqrOmZBCWq5RUVgmwrZUjChXKQulc7XUNe5Cug== Received: from AM4PR0302CA0032.eurprd03.prod.outlook.com (2603:10a6:205:2::45) by PAXPR07MB7952.eurprd07.prod.outlook.com (2603:10a6:102:15f::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.18; Fri, 11 Oct 2024 08:28:18 +0000 Received: from AMS0EPF000001AE.eurprd05.prod.outlook.com (2603:10a6:205:2:cafe::4) by AM4PR0302CA0032.outlook.office365.com (2603:10a6:205:2::45) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.20 via Frontend Transport; Fri, 11 Oct 2024 08:28:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 192.176.1.74) smtp.mailfrom=ericsson.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=ericsson.com; Received-SPF: Pass (protection.outlook.com: domain of ericsson.com designates 192.176.1.74 as permitted sender) receiver=protection.outlook.com; client-ip=192.176.1.74; helo=oa.msg.ericsson.com; pr=C Received: from oa.msg.ericsson.com (192.176.1.74) by AMS0EPF000001AE.mail.protection.outlook.com (10.167.16.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Fri, 11 Oct 2024 08:28:17 +0000 Received: from seliicinfr00049.seli.gic.ericsson.se (153.88.142.248) by smtp-central.internal.ericsson.com (100.87.178.61) with Microsoft SMTP Server id 15.2.1544.11; Fri, 11 Oct 2024 10:28:16 +0200 Received: from breslau.. (seliicwb00002.seli.gic.ericsson.se [10.156.25.100]) by seliicinfr00049.seli.gic.ericsson.se (Postfix) with ESMTP id E0CEC380070; Fri, 11 Oct 2024 10:28:15 +0200 (CEST) From: =?utf-8?q?Mattias_R=C3=B6nnblom?= To: CC: , =?utf-8?q?Morten_Br=C3=B8rup?= , Stephen Hemminger , Konstantin Ananyev , David Marchand , Jerin Jacob , Luka Jankovic , =?utf-8?q?Mattias_R=C3=B6nnblom?= , Chengwen Feng Subject: [PATCH v10 3/7] eal: add lcore variable performance test Date: Fri, 11 Oct 2024 10:18:57 +0200 Message-ID: <20241011081901.816211-4-mattias.ronnblom@ericsson.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011081901.816211-1-mattias.ronnblom@ericsson.com> References: <20241010142205.813134-2-mattias.ronnblom@ericsson.com> <20241011081901.816211-1-mattias.ronnblom@ericsson.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AMS0EPF000001AE:EE_|PAXPR07MB7952:EE_ X-MS-Office365-Filtering-Correlation-Id: 9b12695a-951a-47cc-8080-08dce9cea874 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|36860700013|1800799024|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?DVppSkDvhDQmYmu3NelgVjMr3z4PHy2?= =?utf-8?q?nsOv/mW33DdQnyRD9sCIYX7iFOoa/4BujZrKc4Pw5TxRtPWwDauNv2AM1YAVW5EaK?= =?utf-8?q?Z0emyGIDr3LczJOOPnd+vz5gDYJPxV+nV4lC/Rtt9BDzsh2fpwTZi4demAnemGAD+?= =?utf-8?q?RGI+kIbRjNCnuZDtynEjw4bG7tUe2RwgA9E4wwHeAQ+bYAqwT73d7y7IqDQHBMkwa?= =?utf-8?q?CeIt1z8LtwSCabKggHFGL8/vWZJJfAA+O+79p4QRtxygCvNdpDy1D2sh14+zYb3d7?= =?utf-8?q?bVh+5LmhCAAF/eZaBrRUfAA/PaY9a17olILtKLmQVVA2XIiCAYerQ/6Xn5B2nmZ77?= =?utf-8?q?yMi5Ff8N05CBylrBWEpJ65dQg2tTvclCV7dhW+t8OjPnUOchxuMqqUL8nyZr34V4D?= =?utf-8?q?uIFbJRZmaB/tCASn5Jq4lJWrxEcy8fKkYkOUY+KeZGl7cYT6eWgrClEpcwKP0h7pB?= =?utf-8?q?jL3i0Zq//92/IRVrP5OwZ23DYBojTiE1+rYc0j+ZscPoQLTbPtMWOuUsqxucmhw+7?= =?utf-8?q?Fxt/7a2e/lOQrNqMOnz0iZzXqHWtON3nlg7p+trnuGVwazMoAe8oo0SBDsAYgbjsv?= =?utf-8?q?j9rMKGrblX+jvYhb1EqQ/MeWlgRTihEWxeK9DEZ/JZM2MYXCqA8oVwsITo7EoPXDC?= =?utf-8?q?LXgf1Zj/HmPfktSniAToVXloOgZYKRjz1uIem8/S/maoAP98yQlTxRw6eGVLraMYe?= =?utf-8?q?EMpblrJfOnGQMkwVvaIpHxMRb/eeO8xgZYKSbAmMp8iddPt53xk09wbN2wkmixZph?= =?utf-8?q?tciPKx1GeZUL8Pqy4VqkJghA+zdIHlYnUS0zDnjr4up2mNLbZamNFnaQzvDvj04r6?= =?utf-8?q?pzwsZ3dgu9xXQnIs/BmGIdcWCIHYff7OW3ObiILAwEHWDUiazcCmDqLiF1t9eK2uN?= =?utf-8?q?Pwp5hEQYfoMuIWA2xp5DfVXZMeo2xo/8wl96EL6L9CycJazZ9RwMgXkzI26vovRRd?= =?utf-8?q?v73QCYrgUKvqciGoMbRJJIsuUWTtIYZGMKTTwsV3iR9y3zRml6hzVRzBo4z+JLoql?= =?utf-8?q?7HsTNPrz6bC9nFyiCBg+qiIMDJr+7JvGq0WlLkhvuT5QFOT8LIjXYrcxbs8pW2waI?= =?utf-8?q?lkFgX9pWbzn+cauHGFB32PMN/cKpsnVyUDxmh1OYtR/0Q4Rb7+du/YZs/WFEAwbbJ?= =?utf-8?q?oF690rMTVShOjA4wNXdrcP4U3kS5rrNPYDH9j3m4J8b75Z/bUyuxXSYy4ZgoxO+yZ?= =?utf-8?q?uwmKGiVL6RsQ1MbSWqb0goSrG4BFZi85Fo2GOeihfIY2yhx/Aa8dO36kodMyYjr7I?= =?utf-8?q?uWVdo5dMExi7jD9FKjkemuoAa1SI6BlgT4qQlGmuoI3Di2FaX5i9GwTtA4yxi3zyk?= =?utf-8?q?KHeJPKfoeiZP?= X-Forefront-Antispam-Report: CIP:192.176.1.74; CTRY:SE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:oa.msg.ericsson.com; PTR:office365.se.ericsson.net; CAT:NONE; SFS:(13230040)(82310400026)(36860700013)(1800799024)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: ericsson.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 08:28:17.8041 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9b12695a-951a-47cc-8080-08dce9cea874 X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=92e84ceb-fbfd-47ab-be52-080c6b87953f; Ip=[192.176.1.74]; Helo=[oa.msg.ericsson.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF000001AE.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR07MB7952 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add basic micro benchmark for lcore variables, in an attempt to assure that the overhead isn't significantly greater than alternative approaches, in scenarios where the benefits aren't expected to show up (i.e., when plenty of cache is available compared to the working set size of the per-lcore data). Signed-off-by: Mattias Rönnblom Acked-by: Chengwen Feng Acked-by: Stephen Hemminger Acked-by: Morten Brørup --- PATCH v8: * Fix spelling. (Morten Brørup) PATCH v6: * Use floating point math when calculating per-update latency. (Morten Brørup) PATCH v5: * Add variant of thread-local storage with initialization performed at the time of thread creation to the benchmark scenarios. (Morten Brørup) PATCH v4: * Rework the tests to be a little less unrealistic. Instead of a single dummy module using a single variable, use a number of variables/modules. In this way, differences in cache effects may show up. * Add RTE_CACHE_GUARD to better mimic that static array pattern. (Morten Brørup) * Show latencies as TSC cycles. (Morten Brørup) --- app/test/meson.build | 1 + app/test/test_lcore_var_perf.c | 257 +++++++++++++++++++++++++++++++++ 2 files changed, 258 insertions(+) create mode 100644 app/test/test_lcore_var_perf.c diff --git a/app/test/meson.build b/app/test/meson.build index 48279522f0..d4e0c59900 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -104,6 +104,7 @@ source_file_deps = { 'test_kvargs.c': ['kvargs'], 'test_latencystats.c': ['ethdev', 'latencystats', 'metrics'] + sample_packet_forward_deps, 'test_lcore_var.c': [], + 'test_lcore_var_perf.c': [], 'test_lcores.c': [], 'test_link_bonding.c': ['ethdev', 'net_bond', 'net'] + packet_burst_generator_deps + virtual_pmd_deps, diff --git a/app/test/test_lcore_var_perf.c b/app/test/test_lcore_var_perf.c new file mode 100644 index 0000000000..2efb8342d1 --- /dev/null +++ b/app/test/test_lcore_var_perf.c @@ -0,0 +1,257 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Ericsson AB + */ + +#define MAX_MODS 1024 + +#include + +#include +#include +#include +#include +#include + +#include "test.h" + +struct mod_lcore_state { + uint64_t a; + uint64_t b; + uint64_t sum; +}; + +static void +mod_init(struct mod_lcore_state *state) +{ + state->a = rte_rand(); + state->b = rte_rand(); + state->sum = 0; +} + +static __rte_always_inline void +mod_update(volatile struct mod_lcore_state *state) +{ + state->sum += state->a * state->b; +} + +struct __rte_cache_aligned mod_lcore_state_aligned { + struct mod_lcore_state mod_state; + + RTE_CACHE_GUARD; +}; + +static struct mod_lcore_state_aligned +sarray_lcore_state[MAX_MODS][RTE_MAX_LCORE]; + +static void +sarray_init(void) +{ + unsigned int lcore_id = rte_lcore_id(); + int mod; + + for (mod = 0; mod < MAX_MODS; mod++) { + struct mod_lcore_state *mod_state = + &sarray_lcore_state[mod][lcore_id].mod_state; + + mod_init(mod_state); + } +} + +static __rte_noinline void +sarray_update(unsigned int mod) +{ + unsigned int lcore_id = rte_lcore_id(); + struct mod_lcore_state *mod_state = + &sarray_lcore_state[mod][lcore_id].mod_state; + + mod_update(mod_state); +} + +struct mod_lcore_state_lazy { + struct mod_lcore_state mod_state; + bool initialized; +}; + +/* + * Note: it's usually a bad idea have this much thread-local storage + * allocated in a real application, since it will incur a cost on + * thread creation and non-lcore thread memory usage. + */ +static RTE_DEFINE_PER_LCORE(struct mod_lcore_state_lazy, + tls_lcore_state)[MAX_MODS]; + +static inline void +tls_init(struct mod_lcore_state_lazy *state) +{ + mod_init(&state->mod_state); + + state->initialized = true; +} + +static __rte_noinline void +tls_lazy_update(unsigned int mod) +{ + struct mod_lcore_state_lazy *state = + &RTE_PER_LCORE(tls_lcore_state[mod]); + + /* With thread-local storage, initialization must usually be lazy */ + if (!state->initialized) + tls_init(state); + + mod_update(&state->mod_state); +} + +static __rte_noinline void +tls_update(unsigned int mod) +{ + struct mod_lcore_state_lazy *state = + &RTE_PER_LCORE(tls_lcore_state[mod]); + + mod_update(&state->mod_state); +} + +RTE_LCORE_VAR_HANDLE(struct mod_lcore_state, lvar_lcore_state)[MAX_MODS]; + +static void +lvar_init(void) +{ + unsigned int mod; + + for (mod = 0; mod < MAX_MODS; mod++) { + RTE_LCORE_VAR_ALLOC(lvar_lcore_state[mod]); + + struct mod_lcore_state *state = + RTE_LCORE_VAR_VALUE(lvar_lcore_state[mod]); + + mod_init(state); + } +} + +static __rte_noinline void +lvar_update(unsigned int mod) +{ + struct mod_lcore_state *state = + RTE_LCORE_VAR_VALUE(lvar_lcore_state[mod]); + + mod_update(state); +} + +static void +shuffle(unsigned int *elems, size_t len) +{ + size_t i; + + for (i = len - 1; i > 0; i--) { + unsigned int other = rte_rand_max(i + 1); + + unsigned int tmp = elems[other]; + elems[other] = elems[i]; + elems[i] = tmp; + } +} + +#define ITERATIONS UINT64_C(10000000) + +static inline double +benchmark_access(const unsigned int *mods, unsigned int num_mods, + void (*init_fun)(void), void (*update_fun)(unsigned int)) +{ + unsigned int i; + double start; + double end; + double latency; + unsigned int num_mods_mask = num_mods - 1; + + RTE_VERIFY(rte_is_power_of_2(num_mods)); + + if (init_fun != NULL) + init_fun(); + + /* Warm up cache and make sure TLS variables are initialized */ + for (i = 0; i < num_mods; i++) + update_fun(i); + + start = rte_rdtsc(); + + for (i = 0; i < ITERATIONS; i++) + update_fun(mods[i & num_mods_mask]); + + end = rte_rdtsc(); + + latency = (end - start) / (double)ITERATIONS; + + return latency; +} + +static void +test_lcore_var_access_n(unsigned int num_mods) +{ + double sarray_latency; + double tls_latency; + double lazy_tls_latency; + double lvar_latency; + unsigned int mods[num_mods]; + unsigned int i; + + for (i = 0; i < num_mods; i++) + mods[i] = i; + + shuffle(mods, num_mods); + + sarray_latency = + benchmark_access(mods, num_mods, sarray_init, sarray_update); + + tls_latency = + benchmark_access(mods, num_mods, NULL, tls_update); + + lazy_tls_latency = + benchmark_access(mods, num_mods, NULL, tls_lazy_update); + + lvar_latency = + benchmark_access(mods, num_mods, lvar_init, lvar_update); + + printf("%17u %8.1f %14.1f %15.1f %10.1f\n", num_mods, sarray_latency, + tls_latency, lazy_tls_latency, lvar_latency); +} + +/* + * The potential performance benefit of lcore variables compared to + * the use of statically sized, lcore id-indexed arrays is not + * shorter latencies in a scenario with low cache pressure, but rather + * fewer cache misses in a real-world scenario, with extensive cache + * usage. These tests are a crude simulation of such, using dummy + * modules, each with a small, per-lcore state. Note however that + * these tests have very little non-lcore/thread local state, which is + * unrealistic. + */ + +static int +test_lcore_var_access(void) +{ + unsigned int num_mods = 1; + + printf("- Latencies [TSC cycles/update] -\n"); + printf("Number of Static Thread-local Thread-local Lcore\n"); + printf("Modules/Variables Array Storage Storage (Lazy) Variables\n"); + + for (num_mods = 1; num_mods <= MAX_MODS; num_mods *= 2) + test_lcore_var_access_n(num_mods); + + return TEST_SUCCESS; +} + +static struct unit_test_suite lcore_var_testsuite = { + .suite_name = "lcore variable perf autotest", + .unit_test_cases = { + TEST_CASE(test_lcore_var_access), + TEST_CASES_END() + }, +}; + +static int +test_lcore_var_perf(void) +{ + return unit_test_suite_runner(&lcore_var_testsuite); +} + +REGISTER_PERF_TEST(lcore_var_perf_autotest, test_lcore_var_perf);