From patchwork Fri Sep 25 17:43:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Lariau X-Patchwork-Id: 78860 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 477F5A04C0; Fri, 25 Sep 2020 19:44:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A817E1E9B1; Fri, 25 Sep 2020 19:44:09 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 08DB11E964 for ; Fri, 25 Sep 2020 19:44:07 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 57EDA101E; Fri, 25 Sep 2020 10:44:06 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.54.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C661E3F718; Fri, 25 Sep 2020 10:44:05 -0700 (PDT) From: Steven Lariau To: Cc: dev@dpdk.org, nd@arm.com, Steven Lariau Date: Fri, 25 Sep 2020 18:43:34 +0100 Message-Id: <20200925174340.10014-1-steven.lariau@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200911152938.8019-1-steven.lariau@arm.com> References: <20200911152938.8019-1-steven.lariau@arm.com> Subject: [dpdk-dev] [PATCH v2 0/5] lib/stack: improve lockfree C11 implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" One implementation of the DPDK stack library is lockfree, based on C11 memory model for atomics. Some of these atomic operations use unnecessary memory orders, that can be relaxed. This patch relax some of these operations in order to improve the performance of the stack library. The patch was tested on several architectures, to ensure that the implementation is correct, and to measure performance. Below are the results for a few architectures on multithread stack lockfree test. The cycles count is the average number of cycles per item to perform a bulk push / pop. $sudo ./builddir/app/dpdk-test RTE>>stack_lf_perf_autotest difference compared to main Cycles count on ThunderX2 2 cores, bulk size = 8: -15.85% 2 cores, bulk size = 32: -04.56% 4 cores, bulk size = 8: -05.00% 4 cores, bulk size = 32: -04.35% 16 cores, bulk size = 8: -02.38% 16 cores, bulk size = 32: -01.88% difference compared to main Cycles count on N1SDP 2 cores, batch size = 8: +00.77% 2 cores, batch size = 32: -16.00% difference compared to main Cycles count on Skylake 2 cores, bulk size = 8: -00.18% 2 cores, bulk size = 32: -00.95% 4 cores, bulk size = 8: -01.19% 4 cores, bulk size = 32: +00.64% 16 cores, bulk size = 8: +01.20% 16 cores, bulk size = 32: +00.48% v2: add comment to explain why pop head CAS relaxed is valid added Fixes information Steven Lariau (5): lib/stack: fix inconsistent weak / strong cas lib/stack: remove push acquire fence lib/stack: remove redundant orderings for list->len lib/stack: reload head when pop fails lib/stack: remove pop cas release ordering lib/librte_stack/rte_stack_lf_c11.h | 32 +++++++++++++++++++---------- 1 file changed, 21 insertions(+), 11 deletions(-)