From patchwork Thu Jun 15 20:13:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wathsala Wathawana Vithanage X-Patchwork-Id: 128760 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8761742CCA; Thu, 15 Jun 2023 22:14:01 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 695A542C54; Thu, 15 Jun 2023 22:13:53 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 570CE41156 for ; Thu, 15 Jun 2023 22:13:52 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 06D202F4; Thu, 15 Jun 2023 13:14:36 -0700 (PDT) Received: from ampere-altra-2-1.usa.Arm.com (ampere-altra-2-1.usa.arm.com [10.118.91.158]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BACE43F663; Thu, 15 Jun 2023 13:13:51 -0700 (PDT) From: Wathsala Vithanage To: honnappa.nagarahalli@arm.com, konstantin.v.ananyev@yandex.ru, thomas@monjalon.net, ruifeng.wang@arm.com Cc: dev@dpdk.org, nd@arm.com, Wathsala Vithanage Subject: [RFC] ring: further performance improvements with C11 Date: Thu, 15 Jun 2023 20:13:35 +0000 Message-Id: <20230615201335.919563-2-wathsala.vithanage@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230615201335.919563-1-wathsala.vithanage@arm.com> References: <20230615201335.919563-1-wathsala.vithanage@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org For improved performance over the current C11 based ring implementation following changes were made. (1) Replace tail store with RELEASE semantics in __rte_ring_update_tail with a RELEASE fence. Replace load of the tail with ACQUIRE semantics in __rte_ring_move_prod_head and __rte_ring_move_cons_head with ACQUIRE fences. (2) Remove ACQUIRE fences between load of the old_head and load of the cons_tail in __rte_ring_move_prod_head and __rte_ring_move_cons_head. These two fences are not required for the safety of the ring library. Signed-off-by: Wathsala Vithanage Reviewed-by: Honnappa Nagarahalli Reviewed-by: Ruifeng Wang --- .mailmap | 1 + lib/ring/rte_ring_c11_pvt.h | 35 ++++++++++++++++++++--------------- 2 files changed, 21 insertions(+), 15 deletions(-) diff --git a/.mailmap b/.mailmap index 4018f0fc47..367115d134 100644 --- a/.mailmap +++ b/.mailmap @@ -1430,6 +1430,7 @@ Walter Heymans Wang Sheng-Hui Wangyu (Eric) Waterman Cao +Wathsala Vithanage Weichun Chen Wei Dai Weifeng Li diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index f895950df4..63fe58ce9e 100644 --- a/lib/ring/rte_ring_c11_pvt.h +++ b/lib/ring/rte_ring_c11_pvt.h @@ -16,6 +16,13 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, uint32_t single, uint32_t enqueue) { RTE_SET_USED(enqueue); + /* + * Updating of ht->tail cannot happen before elements are added to or + * removed from the ring, as it could result in data races between + * producer and consumer threads. Therefore we need a release + * barrier here. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); /* * If there are other enqueues/dequeues in progress that preceded us, @@ -24,7 +31,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, if (!single) rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); - __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); + __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELAXED); } /** @@ -66,14 +73,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, /* Reset n to the initial burst count */ n = max; - /* Ensure the head is read before tail */ - __atomic_thread_fence(__ATOMIC_ACQUIRE); - - /* load-acquire synchronize with store-release of ht->tail - * in update_tail. - */ cons_tail = __atomic_load_n(&r->cons.tail, - __ATOMIC_ACQUIRE); + __ATOMIC_RELAXED); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have @@ -100,6 +101,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } while (unlikely(success == 0)); + /* + * Ensure that updates to the ring doesn't rise above + * load of the new_head in SP and MP cases. + */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); return n; } @@ -142,14 +148,8 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, /* Restore n as it may change every loop */ n = max; - /* Ensure the head is read before tail */ - __atomic_thread_fence(__ATOMIC_ACQUIRE); - - /* this load-acquire synchronize with store-release of ht->tail - * in update_tail. - */ prod_tail = __atomic_load_n(&r->prod.tail, - __ATOMIC_ACQUIRE); + __ATOMIC_RELAXED); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have @@ -175,6 +175,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } while (unlikely(success == 0)); + /* + * Ensure that updates to the ring doesn't rise above + * load of the new_head in SP and MP cases. + */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); return n; }