From patchwork Fri Feb 13 01:38:20 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Cunming Liang <cunming.liang@intel.com>
X-Patchwork-Id: 3257
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [IPv6:::1])
	by dpdk.org (Postfix) with ESMTP id C9B40AD97;
	Fri, 13 Feb 2015 02:39:34 +0100 (CET)
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
	by dpdk.org (Postfix) with ESMTP id 5695CAD85
	for <dev@dpdk.org>; Fri, 13 Feb 2015 02:39:14 +0100 (CET)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
	by fmsmga102.fm.intel.com with ESMTP; 12 Feb 2015 17:39:12 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,568,1418112000"; d="scan'208";a="526919378"
Received: from shvmail01.sh.intel.com ([10.239.29.42])
	by orsmga003.jf.intel.com with ESMTP; 12 Feb 2015 17:30:57 -0800
Received: from shecgisg004.sh.intel.com (shecgisg004.sh.intel.com
	[10.239.29.89])
	by shvmail01.sh.intel.com with ESMTP id t1D1dATw006211;
	Fri, 13 Feb 2015 09:39:10 +0800
Received: from shecgisg004.sh.intel.com (localhost [127.0.0.1])
	by shecgisg004.sh.intel.com (8.13.6/8.13.6/SuSE Linux 0.8) with ESMTP
	id t1D1d6aF001742; Fri, 13 Feb 2015 09:39:08 +0800
Received: (from cliang18@localhost)
	by shecgisg004.sh.intel.com (8.13.6/8.13.6/Submit) id t1D1d6ud001738;
	Fri, 13 Feb 2015 09:39:06 +0800
From: Cunming Liang <cunming.liang@intel.com>
To: dev@dpdk.org
Date: Fri, 13 Feb 2015 09:38:20 +0800
Message-Id: <1423791501-1555-19-git-send-email-cunming.liang@intel.com>
X-Mailer: git-send-email 1.7.4.1
In-Reply-To: <1423791501-1555-1-git-send-email-cunming.liang@intel.com>
References: <1423728996-3004-1-git-send-email-cunming.liang@intel.com>
	<1423791501-1555-1-git-send-email-cunming.liang@intel.com>
Subject: [dpdk-dev] [PATCH v6 18/19] ring: add sched_yield to avoid spin
	forever
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Add a sched_yield() syscall if the thread spins for too long, waiting other thread to finish its operations on the ring.
That gives pre-empted thread a chance to proceed and finish with ring enqnue/dequeue operation.
The purpose is to reduce contention on the ring. By ring_perf_test, it doesn't shows additional perf penalty.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 v6 changes:
   rename RTE_RING_PAUSE_REP to RTE_RING_PAUSE_REP_COUNT
   set default value as '0' in configure file

 v5 changes:
   add RTE_RING_PAUSE_REP to config file

 v4 changes:
   update and add more comments on sched_yield()

 v3 changes:
   new patch adding sched_yield() in rte_ring to avoid long spin

 config/common_bsdapp       |  1 +
 config/common_linuxapp     |  1 +
 lib/librte_ring/rte_ring.h | 31 +++++++++++++++++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 57bacb8..b9a9eeb 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -234,6 +234,7 @@ CONFIG_RTE_PMD_PACKET_PREFETCH=y
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_SPLIT_PROD_CONS=n
+CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d428f84..abca5ff 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -242,6 +242,7 @@ CONFIG_RTE_PMD_PACKET_PREFETCH=y
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_SPLIT_PROD_CONS=n
+CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 39bacdd..9bc1d5e 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -127,6 +127,11 @@ struct rte_ring_debug_stats {
 #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
 #define RTE_RING_MZ_PREFIX "RG_"
 
+#ifndef RTE_RING_PAUSE_REP_COUNT
+#define RTE_RING_PAUSE_REP_COUNT 0 /**< yield after pause num of times, no yield
+				    * if RTE_RING_PAUSE_REP not defined. */
+#endif
+
 /**
  * An RTE ring structure.
  *
@@ -410,7 +415,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t cons_tail, free_entries;
 	const unsigned max = n;
 	int success;
-	unsigned i;
+	unsigned i, rep = 0;
 	uint32_t mask = r->prod.mask;
 	int ret;
 
@@ -468,9 +473,18 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->prod.tail != prod_head))
+	while (unlikely(r->prod.tail != prod_head)) {
 		rte_pause();
 
+		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
+		 * for other thread finish. It gives pre-empted thread a chance
+		 * to proceed and finish with ring denqnue operation. */
+		if (RTE_RING_PAUSE_REP_COUNT &&
+		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
+			rep = 0;
+			sched_yield();
+		}
+	}
 	r->prod.tail = prod_next;
 	return ret;
 }
@@ -589,7 +603,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_next, entries;
 	const unsigned max = n;
 	int success;
-	unsigned i;
+	unsigned i, rep = 0;
 	uint32_t mask = r->prod.mask;
 
 	/* move cons.head atomically */
@@ -634,9 +648,18 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * If there are other dequeues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->cons.tail != cons_head))
+	while (unlikely(r->cons.tail != cons_head)) {
 		rte_pause();
 
+		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
+		 * for other thread finish. It gives pre-empted thread a chance
+		 * to proceed and finish with ring denqnue operation. */
+		if (RTE_RING_PAUSE_REP_COUNT &&
+		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
+			rep = 0;
+			sched_yield();
+		}
+	}
 	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;