From patchwork Fri Nov 9 11:42:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 47965 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 268BD4CBD; Fri, 9 Nov 2018 12:43:07 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id CC2254C9C; Fri, 9 Nov 2018 12:43:04 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 213F615AB; Fri, 9 Nov 2018 03:43:04 -0800 (PST) Received: from net-arm-c2400.shanghai.arm.com (net-arm-c2400.shanghai.arm.com [10.169.40.108]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5AA553F718; Fri, 9 Nov 2018 03:43:02 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, stephen@networkplumber.org, olivier.matz@6wind.com, chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, jerin.jacob@caviumnetworks.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, stable@dpdk.org Date: Fri, 9 Nov 2018 19:42:46 +0800 Message-Id: <1541763767-7399-2-git-send-email-gavin.hu@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1541763767-7399-1-git-send-email-gavin.hu@arm.com> References: <1541763767-7399-1-git-send-email-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v1 1/2] ring: keep the deterministic order allowing retry to work X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Use case scenario: 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some reasons (running out of cpu time, preempted,...) 2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head forward. 3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail beyond the prod.head read by thread 1. 4) Thread 1 is re-scheduled. It reads cons.tail. cpu1(producer) cpu2(producer) cpu3(consumer) load r->prod.head ^ load r->prod.head | load r->cons.tail | store r->prod.head(+n) stalled <-- enqueue -----> | store r->prod.tail(+n) | load r->cons.head | load r->prod.tail | store r->cons.head(+n) | <...dequeue.....> v store r->cons.tail(+n) load r->cons.tail For thread 1, the __atomic_compare_exchange_n detects the outdated prod.head and retry the flow with the new one. This retry flow works ok on strong ordering platform(eg:x86). But for weak ordering platforms(arm, ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new but cons.tail becomes too old, the retry flow, based on the detection of outdated head, does not trigger as expected, thus the outdate cons.tail causes wrong free_entries. Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries. The fix is to keep the deterministic order of two loads allowing the retry to work. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 With the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 The results showed the thread fence degrade the performance slightly, but it is required for correctness. Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl --- lib/librte_ring/rte_ring_c11_mem.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 7bc74a4..dc49a99 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -66,6 +66,9 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, /* Reset n to the initial burst count */ n = max; + /* Ensure the head is read before tail */ + __atomic_thread_fence(__ATOMIC_ACQUIRE); + /* load-acquire synchronize with store-release of ht->tail * in update_tail. */ @@ -139,6 +142,9 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, /* Restore n as it may change every loop */ n = max; + /* Ensure the head is read before tail */ + __atomic_thread_fence(__ATOMIC_ACQUIRE); + /* this load-acquire synchronize with store-release of ht->tail * in update_tail. */ From patchwork Fri Nov 9 11:42:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Hu X-Patchwork-Id: 47966 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2AB954F9C; Fri, 9 Nov 2018 12:43:09 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id CCDFF4CAB; Fri, 9 Nov 2018 12:43:06 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2181B80D; Fri, 9 Nov 2018 03:43:06 -0800 (PST) Received: from net-arm-c2400.shanghai.arm.com (net-arm-c2400.shanghai.arm.com [10.169.40.108]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5EA5D3F718; Fri, 9 Nov 2018 03:43:04 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, stephen@networkplumber.org, olivier.matz@6wind.com, chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, jerin.jacob@caviumnetworks.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, stable@dpdk.org Date: Fri, 9 Nov 2018 19:42:47 +0800 Message-Id: <1541763767-7399-3-git-send-email-gavin.hu@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1541763767-7399-1-git-send-email-gavin.hu@arm.com> References: <1541763767-7399-1-git-send-email-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v1 2/2] ring: relaxed ordering for load and store the head X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When calling __atomic_compare_exchange_n, use relaxed ordering for the success case, as multiple producers/consumers do not release updates to each other so no need for acquire or release ordering. Because the thread fence in place, ordering for the first iteration can be relaxed. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 With the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.59 MP/MC bulk enq/dequeue (size: 8): 10.54 SP/SC bulk enq/dequeue (size: 32): 1.73 MP/MC bulk enq/dequeue (size: 32): 2.38 No significant improvement, nor regression was seen,as the optimisation is not at the critical path. Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl --- lib/librte_ring/rte_ring_c11_mem.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index dc49a99..0fb73a3 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -61,7 +61,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, unsigned int max = n; int success; - *old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE); + *old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED); do { /* Reset n to the initial burst count */ n = max; @@ -97,7 +97,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, /* on failure, *old_head is updated */ success = __atomic_compare_exchange_n(&r->prod.head, old_head, *new_head, - 0, __ATOMIC_ACQUIRE, + 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } while (unlikely(success == 0)); return n; @@ -137,7 +137,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, int success; /* move cons.head atomically */ - *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE); + *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED); do { /* Restore n as it may change every loop */ n = max; @@ -172,7 +172,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, /* on failure, *old_head will be updated */ success = __atomic_compare_exchange_n(&r->cons.head, old_head, *new_head, - 0, __ATOMIC_ACQUIRE, + 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } while (unlikely(success == 0)); return n;