[v3,2/3] test/distributor: replace sync builtins with atomic builtins

Message ID 1554274796-23258-3-git-send-email-phil.yang@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series example and test cases optimizations |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Phil Yang April 3, 2019, 6:59 a.m. UTC
  '__sync' built-in functions are deprecated, should use the '__atomic'
built-in instead. the sync built-in functions are full barriers, while
atomic built-in functions offer less restrictive one-way barriers,
which help performance.

Here is the example test result on TX2:
sudo ./arm64-armv8a-linuxapp-gcc/app/test -l 112-139 \
-n 4 --socket-mem=1024,1024 -- -i
RTE>>distributor_perf_autotest

*** distributor_perf_autotest without this patch ***
  

Comments

Honnappa Nagarahalli April 4, 2019, 3:30 p.m. UTC | #1
> 
> '__sync' built-in functions are deprecated, should use the '__atomic'
> built-in instead. the sync built-in functions are full barriers, while atomic
> built-in functions offer less restrictive one-way barriers, which help
> performance.
> 
> Here is the example test result on TX2:
> sudo ./arm64-armv8a-linuxapp-gcc/app/test -l 112-139 \ -n 4 --socket-
> mem=1024,1024 -- -i
> RTE>>distributor_perf_autotest
> 
> *** distributor_perf_autotest without this patch *** ==== Cache line switch
> test === Time for 33554432 iterations = 1519202730 ticks Ticks per iteration
> = 45
> 
> *** distributor_perf_autotest with this patch *** ==== Cache line switch test
> === Time for 33554432 iterations = 1251715496 ticks Ticks per iteration = 37
> 
> Less ticks needed for the cache line switch test. It got 17% of performance
> improvement.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> ---
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
  

Patch

==== Cache line switch test ===
Time for 33554432 iterations = 1519202730 ticks
Ticks per iteration = 45

*** distributor_perf_autotest with this patch ***
==== Cache line switch test ===
Time for 33554432 iterations = 1251715496 ticks
Ticks per iteration = 37

Less ticks needed for the cache line switch test. It got 17% of
performance improvement.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 app/test/test_distributor.c      | 7 ++++---
 app/test/test_distributor_perf.c | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 98919ec..0364637 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -62,7 +62,7 @@  handle_work(void *arg)
 	struct worker_params *wp = arg;
 	struct rte_distributor *db = wp->dist;
 	unsigned int count = 0, num = 0;
-	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 	int i;
 
 	for (i = 0; i < 8; i++)
@@ -270,7 +270,7 @@  handle_work_with_free_mbufs(void *arg)
 	unsigned int count = 0;
 	unsigned int i;
 	unsigned int num = 0;
-	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 
 	for (i = 0; i < 8; i++)
 		buf[i] = NULL;
@@ -343,7 +343,8 @@  handle_work_for_shutdown_test(void *arg)
 	unsigned int total = 0;
 	unsigned int i;
 	unsigned int returned = 0;
-	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
+			__ATOMIC_RELAXED);
 
 	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index edf1998..89b28f0 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -111,7 +111,7 @@  handle_work(void *arg)
 	unsigned int count = 0;
 	unsigned int num = 0;
 	int i;
-	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
 	for (i = 0; i < 8; i++)