From patchwork Thu Jun 15 20:13:35 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Wathsala Wathawana Vithanage
 <wathsala.vithanage@arm.com>
X-Patchwork-Id: 128760
X-Patchwork-Delegate: thomas@monjalon.net
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@inbox.dpdk.org
Delivered-To: patchwork@inbox.dpdk.org
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 8761742CCA;
	Thu, 15 Jun 2023 22:14:01 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 695A542C54;
	Thu, 15 Jun 2023 22:13:53 +0200 (CEST)
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by mails.dpdk.org (Postfix) with ESMTP id 570CE41156
 for <dev@dpdk.org>; Thu, 15 Jun 2023 22:13:52 +0200 (CEST)
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 06D202F4;
 Thu, 15 Jun 2023 13:14:36 -0700 (PDT)
Received: from ampere-altra-2-1.usa.Arm.com (ampere-altra-2-1.usa.arm.com
 [10.118.91.158])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BACE43F663;
 Thu, 15 Jun 2023 13:13:51 -0700 (PDT)
From: Wathsala Vithanage <wathsala.vithanage@arm.com>
To: honnappa.nagarahalli@arm.com, konstantin.v.ananyev@yandex.ru,
 thomas@monjalon.net, ruifeng.wang@arm.com
Cc: dev@dpdk.org, nd@arm.com, Wathsala Vithanage <wathsala.vithanage@arm.com>
Subject: [RFC] ring: further performance improvements with C11
Date: Thu, 15 Jun 2023 20:13:35 +0000
Message-Id: <20230615201335.919563-2-wathsala.vithanage@arm.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230615201335.919563-1-wathsala.vithanage@arm.com>
References: <20230615201335.919563-1-wathsala.vithanage@arm.com>
MIME-Version: 1.0
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

For improved performance over the current C11 based ring implementation
following changes were made.
(1) Replace tail store with RELEASE semantics in __rte_ring_update_tail
with a RELEASE fence. Replace load of the tail with ACQUIRE semantics 
in __rte_ring_move_prod_head and __rte_ring_move_cons_head with ACQUIRE
fences.
(2) Remove ACQUIRE fences between load of the old_head and load of the
cons_tail in __rte_ring_move_prod_head and __rte_ring_move_cons_head.
These two fences are not required for the safety of the ring library.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 .mailmap                    |  1 +
 lib/ring/rte_ring_c11_pvt.h | 35 ++++++++++++++++++++---------------
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/.mailmap b/.mailmap
index 4018f0fc47..367115d134 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1430,6 +1430,7 @@ Walter Heymans <walter.heymans@corigine.com>
 Wang Sheng-Hui <shhuiw@gmail.com>
 Wangyu (Eric) <seven.wangyu@huawei.com>
 Waterman Cao <waterman.cao@intel.com>
+Wathsala Vithanage <wathsala.vithanage@arm.com>
 Weichun Chen <weichunx.chen@intel.com>
 Wei Dai <wei.dai@intel.com>
 Weifeng Li <liweifeng96@126.com>
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index f895950df4..63fe58ce9e 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -16,6 +16,13 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
 		uint32_t new_val, uint32_t single, uint32_t enqueue)
 {
 	RTE_SET_USED(enqueue);
+	/*
+	 * Updating of ht->tail cannot happen before elements are added to or
+	 * removed from the ring, as it could result in data races between
+	 * producer and consumer threads. Therefore we need a release
+	 * barrier here.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	/*
 	 * If there are other enqueues/dequeues in progress that preceded us,
@@ -24,7 +31,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
 	if (!single)
 		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
 
-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
+	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELAXED);
 }
 
 /**
@@ -66,14 +73,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		/* Reset n to the initial burst count */
 		n = max;
 
-		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
-
-		/* load-acquire synchronize with store-release of ht->tail
-		 * in update_tail.
-		 */
 		cons_tail = __atomic_load_n(&r->cons.tail,
-					__ATOMIC_ACQUIRE);
+					__ATOMIC_RELAXED);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -100,6 +101,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 					0, __ATOMIC_RELAXED,
 					__ATOMIC_RELAXED);
 	} while (unlikely(success == 0));
+	/*
+	 * Ensure that updates to the ring doesn't rise above
+	 * load of the new_head in SP and MP cases.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
 	return n;
 }
 
@@ -142,14 +148,8 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		/* Restore n as it may change every loop */
 		n = max;
 
-		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
-
-		/* this load-acquire synchronize with store-release of ht->tail
-		 * in update_tail.
-		 */
 		prod_tail = __atomic_load_n(&r->prod.tail,
-					__ATOMIC_ACQUIRE);
+					__ATOMIC_RELAXED);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -175,6 +175,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 							0, __ATOMIC_RELAXED,
 							__ATOMIC_RELAXED);
 	} while (unlikely(success == 0));
+	/*
+	 * Ensure that updates to the ring doesn't rise above
+	 * load of the new_head in SP and MP cases.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
 	return n;
 }