From patchwork Mon Apr  8 03:02:29 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Phil Yang <phil.yang@arm.com>
X-Patchwork-Id: 52377
X-Patchwork-Delegate: thomas@monjalon.net
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 0A8D14CBD;
	Mon,  8 Apr 2019 05:01:14 +0200 (CEST)
Received: from foss.arm.com (foss.arm.com [217.140.101.70])
	by dpdk.org (Postfix) with ESMTP id 7A3162C24
	for <dev@dpdk.org>; Mon,  8 Apr 2019 05:01:10 +0200 (CEST)
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BE006168F;
	Sun,  7 Apr 2019 20:01:09 -0700 (PDT)
Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.108.140])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
	416443F718; Sun,  7 Apr 2019 20:01:08 -0700 (PDT)
From: Phil Yang <phil.yang@arm.com>
To: dev@dpdk.org,
	thomas@monjalon.net
Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com,
	honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com
Date: Mon,  8 Apr 2019 11:02:29 +0800
Message-Id: <1554692551-28275-2-git-send-email-phil.yang@arm.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1554692551-28275-1-git-send-email-phil.yang@arm.com>
References: <1554692551-28275-1-git-send-email-phil.yang@arm.com>
In-Reply-To: <1546508946-12552-1-git-send-email-phil.yang@arm.com>
References: <1546508946-12552-1-git-send-email-phil.yang@arm.com>
Subject: [dpdk-dev] [PATCH v4 1/3] packet_ordering: add statistics for each
	worker thread
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

The current implementation using the '__sync' built-ins to synchronize
statistics within worker threads. The '__sync' built-ins functions are
full barriers which will affect the performance, so add a per worker
packets statistics to remove the synchronisation between worker threads.

Since the maximum core number can get to 256, so disable the per core
stats print in default and add the --insight-worker option to enable it.

For example:
sudo examples/packet_ordering/arm64-armv8a-linuxapp-gcc/packet_ordering \
-l 112-115 --socket-mem=1024,1024 -n 4 -- -p 0x03 --insight-worker

RX thread stats:
 - Pkts rxd:                            226539223
 - Pkts enqd to workers ring:           226539223

Worker thread stats on core [113]:
 - Pkts deqd from workers ring:         77557888
 - Pkts enqd to tx ring:                77557888
 - Pkts enq to tx failed:               0

Worker thread stats on core [114]:
 - Pkts deqd from workers ring:         148981335
 - Pkts enqd to tx ring:                148981335
 - Pkts enq to tx failed:               0

Worker thread stats:
 - Pkts deqd from workers ring:         226539223
 - Pkts enqd to tx ring:                226539223
 - Pkts enq to tx failed:               0

TX stats:
 - Pkts deqd from tx ring:              226539223
 - Ro Pkts transmitted:                 226539168
 - Ro Pkts tx failed:                   0
 - Pkts transmitted w/o reorder:        0
 - Pkts tx failed w/o reorder:          0

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 doc/guides/sample_app_ug/packet_ordering.rst |  4 ++-
 examples/packet_ordering/main.c              | 50 +++++++++++++++++++++++++---
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst
index 7cfcf3f..1c8ee5d 100644
--- a/doc/guides/sample_app_ug/packet_ordering.rst
+++ b/doc/guides/sample_app_ug/packet_ordering.rst
@@ -43,7 +43,7 @@ The application execution command line is:
 
 .. code-block:: console
 
-    ./test-pipeline [EAL options] -- -p PORTMASK [--disable-reorder]
+    ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker]
 
 The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores.
 The first CPU core in the core mask is the master core and would be assigned to
@@ -56,3 +56,5 @@ then the other pair from 2 to 3 and from 3 to 2, having [0,1] and [2,3] pairs.
 
 The disable-reorder long option does, as its name implies, disable the reordering
 of traffic, which should help evaluate reordering performance impact.
+
+The insight-worker long option enables output the packet statistics of each worker thread.
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 149bfdd..1f1bf37 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -31,6 +31,7 @@
 
 unsigned int portmask;
 unsigned int disable_reorder;
+unsigned int insight_worker;
 volatile uint8_t quit_signal;
 
 static struct rte_mempool *mbuf_pool;
@@ -71,6 +72,14 @@ volatile struct app_stats {
 	} tx __rte_cache_aligned;
 } app_stats;
 
+/* per worker lcore stats */
+struct wkr_stats_per {
+		uint64_t deq_pkts;
+		uint64_t enq_pkts;
+		uint64_t enq_failed_pkts;
+} __rte_cache_aligned;
+
+static struct wkr_stats_per wkr_stats[RTE_MAX_LCORE] = { {0} };
 /**
  * Get the last enabled lcore ID
  *
@@ -152,6 +161,7 @@ parse_args(int argc, char **argv)
 	char *prgname = argv[0];
 	static struct option lgopts[] = {
 		{"disable-reorder", 0, 0, 0},
+		{"insight-worker", 0, 0, 0},
 		{NULL, 0, 0, 0}
 	};
 
@@ -175,6 +185,11 @@ parse_args(int argc, char **argv)
 				printf("reorder disabled\n");
 				disable_reorder = 1;
 			}
+			if (!strcmp(lgopts[option_index].name,
+						"insight-worker")) {
+				printf("print all worker statistics\n");
+				insight_worker = 1;
+			}
 			break;
 		default:
 			print_usage(prgname);
@@ -319,6 +334,11 @@ print_stats(void)
 {
 	uint16_t i;
 	struct rte_eth_stats eth_stats;
+	unsigned int lcore_id, last_lcore_id, master_lcore_id, end_w_lcore_id;
+
+	last_lcore_id   = get_last_lcore_id();
+	master_lcore_id = rte_get_master_lcore();
+	end_w_lcore_id  = get_previous_lcore_id(last_lcore_id);
 
 	printf("\nRX thread stats:\n");
 	printf(" - Pkts rxd:				%"PRIu64"\n",
@@ -326,6 +346,26 @@ print_stats(void)
 	printf(" - Pkts enqd to workers ring:		%"PRIu64"\n",
 						app_stats.rx.enqueue_pkts);
 
+	for (lcore_id = 0; lcore_id <= end_w_lcore_id; lcore_id++) {
+		if (insight_worker
+			&& rte_lcore_is_enabled(lcore_id)
+			&& lcore_id != master_lcore_id) {
+			printf("\nWorker thread stats on core [%u]:\n",
+					lcore_id);
+			printf(" - Pkts deqd from workers ring:		%"PRIu64"\n",
+					wkr_stats[lcore_id].deq_pkts);
+			printf(" - Pkts enqd to tx ring:		%"PRIu64"\n",
+					wkr_stats[lcore_id].enq_pkts);
+			printf(" - Pkts enq to tx failed:		%"PRIu64"\n",
+					wkr_stats[lcore_id].enq_failed_pkts);
+		}
+
+		app_stats.wkr.dequeue_pkts += wkr_stats[lcore_id].deq_pkts;
+		app_stats.wkr.enqueue_pkts += wkr_stats[lcore_id].enq_pkts;
+		app_stats.wkr.enqueue_failed_pkts +=
+			wkr_stats[lcore_id].enq_failed_pkts;
+	}
+
 	printf("\nWorker thread stats:\n");
 	printf(" - Pkts deqd from workers ring:		%"PRIu64"\n",
 						app_stats.wkr.dequeue_pkts);
@@ -432,13 +472,14 @@ worker_thread(void *args_ptr)
 	struct rte_mbuf *burst_buffer[MAX_PKTS_BURST] = { NULL };
 	struct rte_ring *ring_in, *ring_out;
 	const unsigned xor_val = (nb_ports > 1);
+	unsigned int core_id = rte_lcore_id();
 
 	args = (struct worker_thread_args *) args_ptr;
 	ring_in  = args->ring_in;
 	ring_out = args->ring_out;
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__,
-							rte_lcore_id());
+							core_id);
 
 	while (!quit_signal) {
 
@@ -448,7 +489,7 @@ worker_thread(void *args_ptr)
 		if (unlikely(burst_size == 0))
 			continue;
 
-		__sync_fetch_and_add(&app_stats.wkr.dequeue_pkts, burst_size);
+		wkr_stats[core_id].deq_pkts += burst_size;
 
 		/* just do some operation on mbuf */
 		for (i = 0; i < burst_size;)
@@ -457,11 +498,10 @@ worker_thread(void *args_ptr)
 		/* enqueue the modified mbufs to workers_to_tx ring */
 		ret = rte_ring_enqueue_burst(ring_out, (void *)burst_buffer,
 				burst_size, NULL);
-		__sync_fetch_and_add(&app_stats.wkr.enqueue_pkts, ret);
+		wkr_stats[core_id].enq_pkts += ret;
 		if (unlikely(ret < burst_size)) {
 			/* Return the mbufs to their respective pool, dropping packets */
-			__sync_fetch_and_add(&app_stats.wkr.enqueue_failed_pkts,
-					(int)burst_size - ret);
+			wkr_stats[core_id].enq_failed_pkts += burst_size - ret;
 			pktmbuf_free_bulk(&burst_buffer[ret], burst_size - ret);
 		}
 	}

From patchwork Mon Apr  8 03:02:30 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Phil Yang <phil.yang@arm.com>
X-Patchwork-Id: 52379
X-Patchwork-Delegate: thomas@monjalon.net
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 4CCBD4CE4;
	Mon,  8 Apr 2019 05:01:17 +0200 (CEST)
Received: from foss.arm.com (foss.arm.com [217.140.101.70])
	by dpdk.org (Postfix) with ESMTP id 18CD44C8F
	for <dev@dpdk.org>; Mon,  8 Apr 2019 05:01:12 +0200 (CEST)
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8442216A3;
	Sun,  7 Apr 2019 20:01:11 -0700 (PDT)
Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.108.140])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
	088373F718; Sun,  7 Apr 2019 20:01:09 -0700 (PDT)
From: Phil Yang <phil.yang@arm.com>
To: dev@dpdk.org,
	thomas@monjalon.net
Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com,
	honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com
Date: Mon,  8 Apr 2019 11:02:30 +0800
Message-Id: <1554692551-28275-3-git-send-email-phil.yang@arm.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1554692551-28275-1-git-send-email-phil.yang@arm.com>
References: <1554692551-28275-1-git-send-email-phil.yang@arm.com>
In-Reply-To: <1546508946-12552-1-git-send-email-phil.yang@arm.com>
References: <1546508946-12552-1-git-send-email-phil.yang@arm.com>
Subject: [dpdk-dev] [PATCH v4 2/3] test/distributor: replace sync builtins
	with atomic builtins
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

'__sync' built-in functions are deprecated, should use the '__atomic'
built-in instead. the sync built-in functions are full barriers, while
atomic built-in functions offer less restrictive one-way barriers,
which help performance.

Here is the example test result on TX2:
sudo ./arm64-armv8a-linuxapp-gcc/app/test -l 112-139 \
-n 4 --socket-mem=1024,1024 -- -i
RTE>>distributor_perf_autotest

*** distributor_perf_autotest without this patch ***

==== Cache line switch test ===
Time for 33554432 iterations = 1519202730 ticks
Ticks per iteration = 45

*** distributor_perf_autotest with this patch ***
==== Cache line switch test ===
Time for 33554432 iterations = 1251715496 ticks
Ticks per iteration = 37

Less ticks needed for the cache line switch test. It got 17% of
performance improvement.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_distributor.c      | 7 ++++---
 app/test/test_distributor_perf.c | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 98919ec..0364637 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -62,7 +62,7 @@ handle_work(void *arg)
 	struct worker_params *wp = arg;
 	struct rte_distributor *db = wp->dist;
 	unsigned int count = 0, num = 0;
-	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 	int i;
 
 	for (i = 0; i < 8; i++)
@@ -270,7 +270,7 @@ handle_work_with_free_mbufs(void *arg)
 	unsigned int count = 0;
 	unsigned int i;
 	unsigned int num = 0;
-	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 
 	for (i = 0; i < 8; i++)
 		buf[i] = NULL;
@@ -343,7 +343,8 @@ handle_work_for_shutdown_test(void *arg)
 	unsigned int total = 0;
 	unsigned int i;
 	unsigned int returned = 0;
-	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
+			__ATOMIC_RELAXED);
 
 	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index edf1998..89b28f0 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -111,7 +111,7 @@ handle_work(void *arg)
 	unsigned int count = 0;
 	unsigned int num = 0;
 	int i;
-	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
 	for (i = 0; i < 8; i++)

From patchwork Mon Apr  8 03:02:31 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Phil Yang <phil.yang@arm.com>
X-Patchwork-Id: 52380
X-Patchwork-Delegate: thomas@monjalon.net
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 00D034F93;
	Mon,  8 Apr 2019 05:01:20 +0200 (CEST)
Received: from foss.arm.com (foss.arm.com [217.140.101.70])
	by dpdk.org (Postfix) with ESMTP id D41D74CA7
	for <dev@dpdk.org>; Mon,  8 Apr 2019 05:01:13 +0200 (CEST)
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4C77A1993;
	Sun,  7 Apr 2019 20:01:13 -0700 (PDT)
Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.108.140])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
	C31043F718; Sun,  7 Apr 2019 20:01:11 -0700 (PDT)
From: Phil Yang <phil.yang@arm.com>
To: dev@dpdk.org,
	thomas@monjalon.net
Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com,
	honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com
Date: Mon,  8 Apr 2019 11:02:31 +0800
Message-Id: <1554692551-28275-4-git-send-email-phil.yang@arm.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1554692551-28275-1-git-send-email-phil.yang@arm.com>
References: <1554692551-28275-1-git-send-email-phil.yang@arm.com>
In-Reply-To: <1546508946-12552-1-git-send-email-phil.yang@arm.com>
References: <1546508946-12552-1-git-send-email-phil.yang@arm.com>
Subject: [dpdk-dev] [PATCH v4 3/3] test/ring_perf: replace sync builtins
	with atomic builtins
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

'__sync' built-in functions are deprecated, should use the '__atomic'
built-in instead. the sync built-in functions are full barriers, while
atomic built-in functions offer less restrictive one-way barriers,
which help performance.

Here is the example test result on TX2:
sudo ./arm64-armv8a-linuxapp-gcc/app/test -c 0x7fffffe \
-n 4 --socket-mem=1024,0 --file-prefix=~ -- -i
RTE>>ring_perf_autotest

*** ring_perf_autotest without this patch ***
SP/SC bulk enq/dequeue (size: 8): 6.22
MP/MC bulk enq/dequeue (size: 8): 11.50
SP/SC bulk enq/dequeue (size: 32): 1.85
MP/MC bulk enq/dequeue (size: 32): 2.66

*** ring_perf_autotest with this patch ***
SP/SC bulk enq/dequeue (size: 8): 6.13
MP/MC bulk enq/dequeue (size: 8): 9.83
SP/SC bulk enq/dequeue (size: 32): 1.96
MP/MC bulk enq/dequeue (size: 32): 2.30

So for the ring performance test, this patch improved 11% of ring
operations performance.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 app/test/test_ring_perf.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index ebb3939..e851c1a 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -160,7 +160,11 @@ enqueue_bulk(void *p)
 	unsigned i;
 	void *burst[MAX_BURST] = {0};
 
-	if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
 		while(lcore_count != 2)
 			rte_pause();
 
@@ -196,7 +200,11 @@ dequeue_bulk(void *p)
 	unsigned i;
 	void *burst[MAX_BURST] = {0};
 
-	if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
 		while(lcore_count != 2)
 			rte_pause();