From patchwork Thu Dec 27 04:13:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 49281 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7E0723421; Thu, 27 Dec 2018 05:14:15 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 53E09325F; Thu, 27 Dec 2018 05:14:14 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A242C15BF; Wed, 26 Dec 2018 20:14:13 -0800 (PST) Received: from net-debian.shanghai.arm.com (net-debian.shanghai.arm.com [10.169.36.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B8FCC3F5AF; Wed, 26 Dec 2018 20:14:11 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, david.marchand@redhat.com, nd@arm.com, Gavin Hu , stable@dpdk.org Date: Thu, 27 Dec 2018 12:13:44 +0800 Message-Id: <20181227041349.3058-2-gavin.hu@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181227041349.3058-1-gavin.hu@arm.com> References: <20181227041349.3058-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v3 1/6] eal: fix clang compilation error on x86 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When CONFIG_RTE_FORCE_INTRINSICS is enabled for x86, the clang compilation error was: include/generic/rte_atomic.h:215:9: error: implicit declaration of function '__atomic_exchange_2' is invalid in C99 include/generic/rte_atomic.h:494:9: error: implicit declaration of function '__atomic_exchange_4' is invalid in C99 include/generic/rte_atomic.h:772:9: error: implicit declaration of function '__atomic_exchange_8' is invalid in C99 Use __atomic_exchange_n instead of __atomic_exchange_(2/4/8). For more information, please refer to: http://mails.dpdk.org/archives/dev/2018-April/096776.html Fixes: 7bdccb93078e ("eal: fix ARM build with clang") Cc: stable@dpdk.org Signed-off-by: Gavin Hu Acked-by: Jerin Jacob --- lib/librte_eal/common/include/generic/rte_atomic.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h b/lib/librte_eal/common/include/generic/rte_atomic.h index b99ba4688..ed5b125b3 100644 --- a/lib/librte_eal/common/include/generic/rte_atomic.h +++ b/lib/librte_eal/common/include/generic/rte_atomic.h @@ -212,7 +212,7 @@ rte_atomic16_exchange(volatile uint16_t *dst, uint16_t val); static inline uint16_t rte_atomic16_exchange(volatile uint16_t *dst, uint16_t val) { -#if defined(RTE_ARCH_ARM64) && defined(RTE_TOOLCHAIN_CLANG) +#if defined(RTE_TOOLCHAIN_CLANG) return __atomic_exchange_n(dst, val, __ATOMIC_SEQ_CST); #else return __atomic_exchange_2(dst, val, __ATOMIC_SEQ_CST); @@ -495,7 +495,7 @@ rte_atomic32_exchange(volatile uint32_t *dst, uint32_t val); static inline uint32_t rte_atomic32_exchange(volatile uint32_t *dst, uint32_t val) { -#if defined(RTE_ARCH_ARM64) && defined(RTE_TOOLCHAIN_CLANG) +#if defined(RTE_TOOLCHAIN_CLANG) return __atomic_exchange_n(dst, val, __ATOMIC_SEQ_CST); #else return __atomic_exchange_4(dst, val, __ATOMIC_SEQ_CST); @@ -777,7 +777,7 @@ rte_atomic64_exchange(volatile uint64_t *dst, uint64_t val); static inline uint64_t rte_atomic64_exchange(volatile uint64_t *dst, uint64_t val) { -#if defined(RTE_ARCH_ARM64) && defined(RTE_TOOLCHAIN_CLANG) +#if defined(RTE_TOOLCHAIN_CLANG) return __atomic_exchange_n(dst, val, __ATOMIC_SEQ_CST); #else return __atomic_exchange_8(dst, val, __ATOMIC_SEQ_CST); From patchwork Thu Dec 27 04:13:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 49282 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 559764C74; Thu, 27 Dec 2018 05:14:19 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 423D337B0 for ; Thu, 27 Dec 2018 05:14:16 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AAAA91650; Wed, 26 Dec 2018 20:14:15 -0800 (PST) Received: from net-debian.shanghai.arm.com (net-debian.shanghai.arm.com [10.169.36.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id DC80B3F5AF; Wed, 26 Dec 2018 20:14:13 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, david.marchand@redhat.com, nd@arm.com, Gavin Hu Date: Thu, 27 Dec 2018 12:13:45 +0800 Message-Id: <20181227041349.3058-3-gavin.hu@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181227041349.3058-1-gavin.hu@arm.com> References: <20181227041349.3058-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v3 2/6] test/spinlock: remove 1us delay for correct benchmarking X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The test is to benchmark the performance of spinlock by counting the number of spinlock acquire and release operations within the specified time. A typical pair of lock and unlock operations costs tens or hundreds of nano seconds, in comparison to this, delaying 1 us outside of the locked region is too much, compromising the goal of benchmarking the lock and unlock performance. Signed-off-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Joyce Kong Reviewed-by: Phil Yang Reviewed-by: Honnappa Nagarahalli Reviewed-by: Ola Liljedahl Acked-by: Jerin Jacob --- test/test/test_spinlock.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/test/test/test_spinlock.c b/test/test/test_spinlock.c index 73bff128e..6795195ae 100644 --- a/test/test/test_spinlock.c +++ b/test/test/test_spinlock.c @@ -120,8 +120,6 @@ load_loop_fn(void *func_param) lcount++; if (use_lock) rte_spinlock_unlock(&lk); - /* delay to make lock duty cycle slighlty realistic */ - rte_delay_us(1); time_diff = rte_get_timer_cycles() - begin; } lock_count[lcore] = lcount; From patchwork Thu Dec 27 04:13:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 49283 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1AE164C8E; Thu, 27 Dec 2018 05:14:21 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 4B51A4C74 for ; Thu, 27 Dec 2018 05:14:18 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AB0D9EBD; Wed, 26 Dec 2018 20:14:17 -0800 (PST) Received: from net-debian.shanghai.arm.com (net-debian.shanghai.arm.com [10.169.36.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E38AC3F5AF; Wed, 26 Dec 2018 20:14:15 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, david.marchand@redhat.com, nd@arm.com, Gavin Hu Date: Thu, 27 Dec 2018 12:13:46 +0800 Message-Id: <20181227041349.3058-4-gavin.hu@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181227041349.3058-1-gavin.hu@arm.com> References: <20181227041349.3058-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v3 3/6] test/spinlock: get timestamp more precisely X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" To precisely benchmark the spinlock performance, uses the precise version of getting timestamps, which enforces the timestamps are obtained at the expected places. Signed-off-by: Gavin Hu Reviewed-by: Phil Yang --- test/test/test_spinlock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/test/test/test_spinlock.c b/test/test/test_spinlock.c index 6795195ae..648474833 100644 --- a/test/test/test_spinlock.c +++ b/test/test/test_spinlock.c @@ -113,14 +113,14 @@ load_loop_fn(void *func_param) if (lcore != rte_get_master_lcore()) while (rte_atomic32_read(&synchro) == 0); - begin = rte_get_timer_cycles(); + begin = rte_rdtsc_precise(); while (time_diff < hz * TIME_MS / 1000) { if (use_lock) rte_spinlock_lock(&lk); lcount++; if (use_lock) rte_spinlock_unlock(&lk); - time_diff = rte_get_timer_cycles() - begin; + time_diff = rte_rdtsc_precise() - begin; } lock_count[lcore] = lcount; return 0; From patchwork Thu Dec 27 04:13:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 49284 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2FD094C99; Thu, 27 Dec 2018 05:14:24 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 573D34C8B for ; Thu, 27 Dec 2018 05:14:20 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B6C8B165C; Wed, 26 Dec 2018 20:14:19 -0800 (PST) Received: from net-debian.shanghai.arm.com (net-debian.shanghai.arm.com [10.169.36.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E6A423F5AF; Wed, 26 Dec 2018 20:14:17 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, david.marchand@redhat.com, nd@arm.com, Gavin Hu Date: Thu, 27 Dec 2018 12:13:47 +0800 Message-Id: <20181227041349.3058-5-gavin.hu@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181227041349.3058-1-gavin.hu@arm.com> References: <20181227041349.3058-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v3 4/6] test/spinlock: amortize the cost of getting time X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Instead of getting timestamps per iteration, amortize its overhead can help getting more precise benchmarking results. Signed-off-by: Gavin Hu Reviewed-by: Joyce Kong --- test/test/test_spinlock.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/test/test/test_spinlock.c b/test/test/test_spinlock.c index 648474833..e9839b979 100644 --- a/test/test/test_spinlock.c +++ b/test/test/test_spinlock.c @@ -96,9 +96,9 @@ test_spinlock_recursive_per_core(__attribute__((unused)) void *arg) } static rte_spinlock_t lk = RTE_SPINLOCK_INITIALIZER; -static uint64_t lock_count[RTE_MAX_LCORE] = {0}; +static uint64_t time_count[RTE_MAX_LCORE] = {0}; -#define TIME_MS 100 +#define MAX_LOOP 10000 static int load_loop_fn(void *func_param) @@ -114,15 +114,14 @@ load_loop_fn(void *func_param) while (rte_atomic32_read(&synchro) == 0); begin = rte_rdtsc_precise(); - while (time_diff < hz * TIME_MS / 1000) { + while (lcount < MAX_LOOP) { if (use_lock) rte_spinlock_lock(&lk); - lcount++; if (use_lock) rte_spinlock_unlock(&lk); - time_diff = rte_rdtsc_precise() - begin; } - lock_count[lcore] = lcount; + time_diff = rte_rdtsc_precise() - begin; + time_count[lcore] = time_diff * 1000000 / hz; return 0; } @@ -136,14 +135,16 @@ test_spinlock_perf(void) printf("\nTest with no lock on single core...\n"); load_loop_fn(&lock); - printf("Core [%u] count = %"PRIu64"\n", lcore, lock_count[lcore]); - memset(lock_count, 0, sizeof(lock_count)); + printf("Core [%u] Cost Time = %"PRIu64" us\n", lcore, + time_count[lcore]); + memset(time_count, 0, sizeof(time_count)); printf("\nTest with lock on single core...\n"); lock = 1; load_loop_fn(&lock); - printf("Core [%u] count = %"PRIu64"\n", lcore, lock_count[lcore]); - memset(lock_count, 0, sizeof(lock_count)); + printf("Core [%u] Cost Time = %"PRIu64" us\n", lcore, + time_count[lcore]); + memset(time_count, 0, sizeof(time_count)); printf("\nTest with lock on %u cores...\n", rte_lcore_count()); @@ -158,11 +159,12 @@ test_spinlock_perf(void) rte_eal_mp_wait_lcore(); RTE_LCORE_FOREACH(i) { - printf("Core [%u] count = %"PRIu64"\n", i, lock_count[i]); - total += lock_count[i]; + printf("Core [%u] Cost Time = %"PRIu64" us\n", i, + time_count[i]); + total += time_count[i]; } - printf("Total count = %"PRIu64"\n", total); + printf("Total Cost Time = %"PRIu64" us\n", total); return 0; } From patchwork Thu Dec 27 04:13:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 49285 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DDF624CAB; Thu, 27 Dec 2018 05:14:25 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 5347E4C94 for ; Thu, 27 Dec 2018 05:14:22 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B525D1682; Wed, 26 Dec 2018 20:14:21 -0800 (PST) Received: from net-debian.shanghai.arm.com (net-debian.shanghai.arm.com [10.169.36.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EF5B43F5AF; Wed, 26 Dec 2018 20:14:19 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, david.marchand@redhat.com, nd@arm.com, Gavin Hu Date: Thu, 27 Dec 2018 12:13:48 +0800 Message-Id: <20181227041349.3058-6-gavin.hu@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181227041349.3058-1-gavin.hu@arm.com> References: <20181227041349.3058-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v3 5/6] spinlock: reimplement with atomic one-way barrier builtins X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. Here is the assembly code of __sync_compare_and_swap builtin. __sync_bool_compare_and_swap(dst, exp, src); 0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8] 0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6] 0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4] 0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff 0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0] 0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1 0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4 // b.any 0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0] 0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0 0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish 0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none The benchmarking results showed 3X performance gain on Cavium ThunderX2 and 13% on Qualcomm Falmon and 3.7% on 4-A72 Marvell macchiatobin. Here is the example test result on TX2: *** spinlock_autotest without this patch *** Core [123] Cost Time = 639822 us Core [124] Cost Time = 633253 us Core [125] Cost Time = 646030 us Core [126] Cost Time = 643189 us Core [127] Cost Time = 647039 us Total Cost Time = 95433298 us *** spinlock_autotest with this patch *** Core [123] Cost Time = 163615 us Core [124] Cost Time = 166471 us Core [125] Cost Time = 189044 us Core [126] Cost Time = 195745 us Core [127] Cost Time = 78423 us Total Cost Time = 27339656 us Signed-off-by: Gavin Hu Reviewed-by: Phil Yang Reviewed-by: Honnappa Nagarahalli Reviewed-by: Ola Liljedahl Reviewed-by: Steve Capper --- lib/librte_eal/common/include/generic/rte_spinlock.h | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h index c4c3fc31e..87ae7a4f1 100644 --- a/lib/librte_eal/common/include/generic/rte_spinlock.h +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h @@ -61,9 +61,14 @@ rte_spinlock_lock(rte_spinlock_t *sl); static inline void rte_spinlock_lock(rte_spinlock_t *sl) { - while (__sync_lock_test_and_set(&sl->locked, 1)) - while(sl->locked) + int exp = 0; + + while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { + while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) rte_pause(); + exp = 0; + } } #endif @@ -80,7 +85,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl); static inline void rte_spinlock_unlock (rte_spinlock_t *sl) { - __sync_lock_release(&sl->locked); + __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE); } #endif @@ -99,7 +104,10 @@ rte_spinlock_trylock (rte_spinlock_t *sl); static inline int rte_spinlock_trylock (rte_spinlock_t *sl) { - return __sync_lock_test_and_set(&sl->locked,1) == 0; + int exp = 0; + return __atomic_compare_exchange_n(&sl->locked, &exp, 1, + 0, /* disallow spurious failure */ + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } #endif @@ -113,7 +121,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl) */ static inline int rte_spinlock_is_locked (rte_spinlock_t *sl) { - return sl->locked; + return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE); } /** From patchwork Thu Dec 27 04:13:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 49286 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F075D5323; Thu, 27 Dec 2018 05:14:27 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 50E254C9F for ; Thu, 27 Dec 2018 05:14:24 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B677E15BF; Wed, 26 Dec 2018 20:14:23 -0800 (PST) Received: from net-debian.shanghai.arm.com (net-debian.shanghai.arm.com [10.169.36.53]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EE89F3F5AF; Wed, 26 Dec 2018 20:14:21 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, david.marchand@redhat.com, nd@arm.com, Joyce Kong Date: Thu, 27 Dec 2018 12:13:49 +0800 Message-Id: <20181227041349.3058-7-gavin.hu@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181227041349.3058-1-gavin.hu@arm.com> References: <20181227041349.3058-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v3 6/6] spinlock: ticket based to improve fairness X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Joyce Kong The old implementation is unfair, some threads may take locks aggressively while leaving the other threads starving for long time. As shown in the following test, within same period of time, there are threads taking locks much more times than the others. The new implementation gives each waiting thread a ticket and they can take the lock one by one, first come, first serviced, this avoids starvation for too long time and is more predictable. *** spinlock_autotest without this patch *** Core [0] count = 89 Core [1] count = 84 Core [2] count = 94 ... Core [208] count = 171 Core [209] count = 152 Core [210] count = 161 Core [211] count = 187 *** spinlock_autotest with this patch *** Core [0] count = 534 Core [1] count = 533 Core [2] count = 542 ... Core [208] count = 554 Core [209] count = 556 Core [210] count = 555 Core [211] count = 551 The overal spinlock fairness increased on thundex-2. Signed-off-by: Joyce Kong --- .../common/include/arch/ppc_64/rte_spinlock.h | 5 ++ .../common/include/arch/x86/rte_spinlock.h | 6 +++ .../common/include/generic/rte_spinlock.h | 53 +++++++++++++--------- 3 files changed, 42 insertions(+), 22 deletions(-) diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_spinlock.h b/lib/librte_eal/common/include/arch/ppc_64/rte_spinlock.h index 39815d9ee..9fa904f92 100644 --- a/lib/librte_eal/common/include/arch/ppc_64/rte_spinlock.h +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_spinlock.h @@ -65,6 +65,11 @@ rte_spinlock_trylock(rte_spinlock_t *sl) return __sync_lock_test_and_set(&sl->locked, 1) == 0; } +static inline int +rte_spinlock_is_locked(rte_spinlock_t *sl) +{ + return sl->locked; +} #endif static inline int rte_tm_supported(void) diff --git a/lib/librte_eal/common/include/arch/x86/rte_spinlock.h b/lib/librte_eal/common/include/arch/x86/rte_spinlock.h index e2e2b2643..db80fa420 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_spinlock.h +++ b/lib/librte_eal/common/include/arch/x86/rte_spinlock.h @@ -65,6 +65,12 @@ rte_spinlock_trylock (rte_spinlock_t *sl) return lockval == 0; } + +static inline int +rte_spinlock_is_locked(rte_spinlock_t *sl) +{ + return sl->locked; +} #endif extern uint8_t rte_rtm_supported; diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h index 87ae7a4f1..607abd400 100644 --- a/lib/librte_eal/common/include/generic/rte_spinlock.h +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h @@ -27,8 +27,12 @@ /** * The rte_spinlock_t type. */ -typedef struct { - volatile int locked; /**< lock status 0 = unlocked, 1 = locked */ +typedef union { + volatile int locked; /* lock status 0 = unlocked, 1 = locked */ + struct { + uint16_t current; + uint16_t next; + } s; } rte_spinlock_t; /** @@ -45,7 +49,8 @@ typedef struct { static inline void rte_spinlock_init(rte_spinlock_t *sl) { - sl->locked = 0; + __atomic_store_n(&sl->s.current, 0, __ATOMIC_RELAXED); + __atomic_store_n(&sl->s.next, 0, __ATOMIC_RELAXED); } /** @@ -61,14 +66,9 @@ rte_spinlock_lock(rte_spinlock_t *sl); static inline void rte_spinlock_lock(rte_spinlock_t *sl) { - int exp = 0; - - while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) - rte_pause(); - exp = 0; - } + uint16_t me = __atomic_fetch_add(&sl->s.next, 1, __ATOMIC_RELAXED); + while (__atomic_load_n(&sl->s.current, __ATOMIC_ACQUIRE) != me) + rte_pause(); } #endif @@ -79,13 +79,15 @@ rte_spinlock_lock(rte_spinlock_t *sl) * A pointer to the spinlock. */ static inline void -rte_spinlock_unlock (rte_spinlock_t *sl); +rte_spinlock_unlock(rte_spinlock_t *sl); #ifdef RTE_FORCE_INTRINSICS static inline void -rte_spinlock_unlock (rte_spinlock_t *sl) +rte_spinlock_unlock(rte_spinlock_t *sl) { - __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE); + uint16_t i = __atomic_load_n(&sl->s.current, __ATOMIC_RELAXED); + i++; + __atomic_store_n(&sl->s.current, i, __ATOMIC_RELAXED); } #endif @@ -98,16 +100,19 @@ rte_spinlock_unlock (rte_spinlock_t *sl) * 1 if the lock is successfully taken; 0 otherwise. */ static inline int -rte_spinlock_trylock (rte_spinlock_t *sl); +rte_spinlock_trylock(rte_spinlock_t *sl); #ifdef RTE_FORCE_INTRINSICS static inline int -rte_spinlock_trylock (rte_spinlock_t *sl) +rte_spinlock_trylock(rte_spinlock_t *sl) { - int exp = 0; - return __atomic_compare_exchange_n(&sl->locked, &exp, 1, - 0, /* disallow spurious failure */ - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); + uint16_t me = __atomic_fetch_add(&sl->s.next, 1, __ATOMIC_RELAXED); + while (__atomic_load_n(&sl->s.current, __ATOMIC_RELAXED) != me) { + __atomic_sub_fetch(&sl->s.next, 1, __ATOMIC_RELAXED); + return 0; + } + + return 1; } #endif @@ -119,10 +124,14 @@ rte_spinlock_trylock (rte_spinlock_t *sl) * @return * 1 if the lock is currently taken; 0 otherwise. */ -static inline int rte_spinlock_is_locked (rte_spinlock_t *sl) +#ifdef RTE_FORCE_INTRINSICS +static inline int +rte_spinlock_is_locked(rte_spinlock_t *sl) { - return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE); + return (__atomic_load_n(&sl->s.current, __ATOMIC_RELAXED) != + __atomic_load_n(&sl->s.next, __ATOMIC_RELAXED)); } +#endif /** * Test if hardware transactional memory (lock elision) is supported