From patchwork Fri Mar 29 10:56:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 51902 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 65FD84C95; Fri, 29 Mar 2019 11:56:57 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id BE6684C8D for ; Fri, 29 Mar 2019 11:56:53 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 072D515BF; Fri, 29 Mar 2019 03:56:53 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.106.173]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 7A60F3F575; Fri, 29 Mar 2019 03:56:51 -0700 (PDT) From: Phil Yang To: dev@dpdk.org, thomas@monjalon.net Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com Date: Fri, 29 Mar 2019 18:56:36 +0800 Message-Id: <1553856998-25394-2-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1553856998-25394-1-git-send-email-phil.yang@arm.com> References: <1553856998-25394-1-git-send-email-phil.yang@arm.com> In-Reply-To: <1546508946-12552-1-git-send-email-phil.yang@arm.com> References: <1546508946-12552-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v2 1/3] packet_ordering: add statistics for each worker thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The current implementation using '__sync' built-ins to synchronize statistics within worker threads. '__sync' built-ins functions are full barriers which will affect the performance, so add a per worker packets statistics. Enable by option --insight-worker. For example: sudo examples/packet_ordering/arm64-armv8a-linuxapp-gcc/packet_ordering \ -l 112-115 --socket-mem=1024,1024 -n 4 -- -p 0x03 --insight-worker RX thread stats: - Pkts rxd: 226539223 - Pkts enqd to workers ring: 226539223 Worker thread stats on core [113]: - Pkts deqd from workers ring: 77557888 - Pkts enqd to tx ring: 77557888 - Pkts enq to tx failed: 0 Worker thread stats on core [114]: - Pkts deqd from workers ring: 148981335 - Pkts enqd to tx ring: 148981335 - Pkts enq to tx failed: 0 Worker thread stats: - Pkts deqd from workers ring: 226539223 - Pkts enqd to tx ring: 226539223 - Pkts enq to tx failed: 0 TX stats: - Pkts deqd from tx ring: 226539223 - Ro Pkts transmitted: 226539168 - Ro Pkts tx failed: 0 - Pkts transmitted w/o reorder: 0 - Pkts tx failed w/o reorder: 0 Suggested-by: Honnappa Nagarahalli Signed-off-by: Phil Yang Reviewed-by: Gavin Hu --- doc/guides/sample_app_ug/packet_ordering.rst | 4 ++- examples/packet_ordering/main.c | 50 +++++++++++++++++++++++++--- 2 files changed, 48 insertions(+), 6 deletions(-) diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst index 7cfcf3f..9c5ceea 100644 --- a/doc/guides/sample_app_ug/packet_ordering.rst +++ b/doc/guides/sample_app_ug/packet_ordering.rst @@ -43,7 +43,7 @@ The application execution command line is: .. code-block:: console - ./test-pipeline [EAL options] -- -p PORTMASK [--disable-reorder] + ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker] The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores. The first CPU core in the core mask is the master core and would be assigned to @@ -56,3 +56,5 @@ then the other pair from 2 to 3 and from 3 to 2, having [0,1] and [2,3] pairs. The disable-reorder long option does, as its name implies, disable the reordering of traffic, which should help evaluate reordering performance impact. + +The insight-worker long option enable output the packet statistics of each worker thread. diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c index 149bfdd..8145074 100644 --- a/examples/packet_ordering/main.c +++ b/examples/packet_ordering/main.c @@ -31,6 +31,7 @@ unsigned int portmask; unsigned int disable_reorder; +unsigned int insight_worker; volatile uint8_t quit_signal; static struct rte_mempool *mbuf_pool; @@ -71,6 +72,14 @@ volatile struct app_stats { } tx __rte_cache_aligned; } app_stats; +/* per worker lcore stats */ +struct wkr_stats_per { + uint64_t deq_pkts; + uint64_t enq_pkts; + uint64_t enq_failed_pkts; +} __rte_cache_aligned; + +static struct wkr_stats_per wkr_stats[RTE_MAX_LCORE] = {0}; /** * Get the last enabled lcore ID * @@ -152,6 +161,7 @@ parse_args(int argc, char **argv) char *prgname = argv[0]; static struct option lgopts[] = { {"disable-reorder", 0, 0, 0}, + {"insight-worker", 0, 0, 0}, {NULL, 0, 0, 0} }; @@ -175,6 +185,11 @@ parse_args(int argc, char **argv) printf("reorder disabled\n"); disable_reorder = 1; } + if (!strcmp(lgopts[option_index].name, + "insight-worker")) { + printf("print all worker statistics\n"); + insight_worker = 1; + } break; default: print_usage(prgname); @@ -319,6 +334,11 @@ print_stats(void) { uint16_t i; struct rte_eth_stats eth_stats; + unsigned int lcore_id, last_lcore_id, master_lcore_id, end_w_lcore_id; + + last_lcore_id = get_last_lcore_id(); + master_lcore_id = rte_get_master_lcore(); + end_w_lcore_id = get_previous_lcore_id(last_lcore_id); printf("\nRX thread stats:\n"); printf(" - Pkts rxd: %"PRIu64"\n", @@ -326,6 +346,26 @@ print_stats(void) printf(" - Pkts enqd to workers ring: %"PRIu64"\n", app_stats.rx.enqueue_pkts); + for (lcore_id = 0; lcore_id <= end_w_lcore_id; lcore_id++) { + if (insight_worker + && rte_lcore_is_enabled(lcore_id) + && lcore_id != master_lcore_id) { + printf("\nWorker thread stats on core [%u]:\n", + lcore_id); + printf(" - Pkts deqd from workers ring: %"PRIu64"\n", + wkr_stats[lcore_id].deq_pkts); + printf(" - Pkts enqd to tx ring: %"PRIu64"\n", + wkr_stats[lcore_id].enq_pkts); + printf(" - Pkts enq to tx failed: %"PRIu64"\n", + wkr_stats[lcore_id].enq_failed_pkts); + } + + app_stats.wkr.dequeue_pkts += wkr_stats[lcore_id].deq_pkts; + app_stats.wkr.enqueue_pkts += wkr_stats[lcore_id].enq_pkts; + app_stats.wkr.enqueue_failed_pkts += + wkr_stats[lcore_id].enq_failed_pkts; + } + printf("\nWorker thread stats:\n"); printf(" - Pkts deqd from workers ring: %"PRIu64"\n", app_stats.wkr.dequeue_pkts); @@ -432,13 +472,14 @@ worker_thread(void *args_ptr) struct rte_mbuf *burst_buffer[MAX_PKTS_BURST] = { NULL }; struct rte_ring *ring_in, *ring_out; const unsigned xor_val = (nb_ports > 1); + unsigned int core_id = rte_lcore_id(); args = (struct worker_thread_args *) args_ptr; ring_in = args->ring_in; ring_out = args->ring_out; RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__, - rte_lcore_id()); + core_id); while (!quit_signal) { @@ -448,7 +489,7 @@ worker_thread(void *args_ptr) if (unlikely(burst_size == 0)) continue; - __sync_fetch_and_add(&app_stats.wkr.dequeue_pkts, burst_size); + wkr_stats[core_id].deq_pkts += burst_size; /* just do some operation on mbuf */ for (i = 0; i < burst_size;) @@ -457,11 +498,10 @@ worker_thread(void *args_ptr) /* enqueue the modified mbufs to workers_to_tx ring */ ret = rte_ring_enqueue_burst(ring_out, (void *)burst_buffer, burst_size, NULL); - __sync_fetch_and_add(&app_stats.wkr.enqueue_pkts, ret); + wkr_stats[core_id].enq_pkts += ret; if (unlikely(ret < burst_size)) { /* Return the mbufs to their respective pool, dropping packets */ - __sync_fetch_and_add(&app_stats.wkr.enqueue_failed_pkts, - (int)burst_size - ret); + wkr_stats[core_id].enq_failed_pkts += burst_size - ret; pktmbuf_free_bulk(&burst_buffer[ret], burst_size - ret); } } From patchwork Fri Mar 29 10:56:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 51903 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9D5274C9D; Fri, 29 Mar 2019 11:57:00 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 5B7EA4C94 for ; Fri, 29 Mar 2019 11:56:55 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BDE4780D; Fri, 29 Mar 2019 03:56:54 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.106.173]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 43EF33F575; Fri, 29 Mar 2019 03:56:53 -0700 (PDT) From: Phil Yang To: dev@dpdk.org, thomas@monjalon.net Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com Date: Fri, 29 Mar 2019 18:56:37 +0800 Message-Id: <1553856998-25394-3-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1553856998-25394-1-git-send-email-phil.yang@arm.com> References: <1553856998-25394-1-git-send-email-phil.yang@arm.com> In-Reply-To: <1546508946-12552-1-git-send-email-phil.yang@arm.com> References: <1546508946-12552-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v2 2/3] test/distributor: replace sync builtins with atomic builtins X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" '__sync' built-in functions are deprecated, should use the '__atomic' built-in instead. the sync built-in functions are full barriers, while atomic built-in functions offer less restrictive one-way barriers, which help performance. Here is the example test result on TX2: sudo ./arm64-armv8a-linuxapp-gcc/app/test -l 112-139 \ -n 4 --socket-mem=1024,1024 -- -i RTE>>distributor_perf_autotest *** distributor_perf_autotest without this patch *** ==== Cache line switch test === Time for 33554432 iterations = 1519202730 ticks Ticks per iteration = 45 *** distributor_perf_autotest with this patch *** ==== Cache line switch test === Time for 33554432 iterations = 1251715496 ticks Ticks per iteration = 37 Less ticks needed for the cache line switch test. It got 17% of performance improvement. Signed-off-by: Phil Yang Reviewed-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Joyce Kong Reviewed-by: Dharmik Thakkar --- app/test/test_distributor.c | 18 +++++++++++++++++- app/test/test_distributor_perf.c | 7 ++++++- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c index 98919ec..ddab08d 100644 --- a/app/test/test_distributor.c +++ b/app/test/test_distributor.c @@ -62,9 +62,14 @@ handle_work(void *arg) struct worker_params *wp = arg; struct rte_distributor *db = wp->dist; unsigned int count = 0, num = 0; - unsigned int id = __sync_fetch_and_add(&worker_idx, 1); int i; +#ifdef RTE_USE_C11_MEM_MODEL + unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED); +#else + unsigned int id = __sync_fetch_and_add(&worker_idx, 1); +#endif + for (i = 0; i < 8; i++) buf[i] = NULL; num = rte_distributor_get_pkt(db, id, buf, buf, num); @@ -270,7 +275,12 @@ handle_work_with_free_mbufs(void *arg) unsigned int count = 0; unsigned int i; unsigned int num = 0; + +#ifdef RTE_USE_C11_MEM_MODEL + unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED); +#else unsigned int id = __sync_fetch_and_add(&worker_idx, 1); +#endif for (i = 0; i < 8; i++) buf[i] = NULL; @@ -343,7 +353,13 @@ handle_work_for_shutdown_test(void *arg) unsigned int total = 0; unsigned int i; unsigned int returned = 0; + +#ifdef RTE_USE_C11_MEM_MODEL + const unsigned int id = __atomic_fetch_add(&worker_idx, 1, + __ATOMIC_RELAXED); +#else const unsigned int id = __sync_fetch_and_add(&worker_idx, 1); +#endif num = rte_distributor_get_pkt(d, id, buf, buf, num); diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c index edf1998..9367460 100644 --- a/app/test/test_distributor_perf.c +++ b/app/test/test_distributor_perf.c @@ -111,9 +111,14 @@ handle_work(void *arg) unsigned int count = 0; unsigned int num = 0; int i; - unsigned int id = __sync_fetch_and_add(&worker_idx, 1); struct rte_mbuf *buf[8] __rte_cache_aligned; +#ifdef RTE_USE_C11_MEM_MODEL + unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED); +#else + unsigned int id = __sync_fetch_and_add(&worker_idx, 1); +#endif + for (i = 0; i < 8; i++) buf[i] = NULL; From patchwork Fri Mar 29 10:56:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 51904 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 31FA74CA9; Fri, 29 Mar 2019 11:57:04 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 14C2D2BF4 for ; Fri, 29 Mar 2019 11:56:57 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7AD0615BF; Fri, 29 Mar 2019 03:56:56 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.106.173]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 064403F575; Fri, 29 Mar 2019 03:56:54 -0700 (PDT) From: Phil Yang To: dev@dpdk.org, thomas@monjalon.net Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com Date: Fri, 29 Mar 2019 18:56:38 +0800 Message-Id: <1553856998-25394-4-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1553856998-25394-1-git-send-email-phil.yang@arm.com> References: <1553856998-25394-1-git-send-email-phil.yang@arm.com> In-Reply-To: <1546508946-12552-1-git-send-email-phil.yang@arm.com> References: <1546508946-12552-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v2 3/3] test/ring_perf: replace sync builtins with atomic builtins X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" '__sync' built-in functions are deprecated, should use the '__atomic' built-in instead. the sync built-in functions are full barriers, while atomic built-in functions offer less restrictive one-way barriers, which help performance. Here is the example test result on TX2: sudo ./arm64-armv8a-linuxapp-gcc/app/test -c 0x7fffffe \ -n 4 --socket-mem=1024,0 --file-prefix=~ -- -i RTE>>ring_perf_autotest *** ring_perf_autotest without this patch *** SP/SC bulk enq/dequeue (size: 8): 6.22 MP/MC bulk enq/dequeue (size: 8): 11.50 SP/SC bulk enq/dequeue (size: 32): 1.85 MP/MC bulk enq/dequeue (size: 32): 2.66 *** ring_perf_autotest with this patch *** SP/SC bulk enq/dequeue (size: 8): 6.13 MP/MC bulk enq/dequeue (size: 8): 9.83 SP/SC bulk enq/dequeue (size: 32): 1.96 MP/MC bulk enq/dequeue (size: 32): 2.30 So for the ring performance test, this patch improved 11% of ring operations performance. Signed-off-by: Phil Yang Reviewed-by: Gavin Hu Reviewed-by: Joyce Kong Reviewed-by: Dharmik Thakkar --- app/test/test_ring_perf.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index ebb3939..e851c1a 100644 --- a/app/test/test_ring_perf.c +++ b/app/test/test_ring_perf.c @@ -160,7 +160,11 @@ enqueue_bulk(void *p) unsigned i; void *burst[MAX_BURST] = {0}; - if ( __sync_add_and_fetch(&lcore_count, 1) != 2 ) +#ifdef RTE_USE_C11_MEM_MODEL + if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2) +#else + if (__sync_add_and_fetch(&lcore_count, 1) != 2) +#endif while(lcore_count != 2) rte_pause(); @@ -196,7 +200,11 @@ dequeue_bulk(void *p) unsigned i; void *burst[MAX_BURST] = {0}; - if ( __sync_add_and_fetch(&lcore_count, 1) != 2 ) +#ifdef RTE_USE_C11_MEM_MODEL + if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2) +#else + if (__sync_add_and_fetch(&lcore_count, 1) != 2) +#endif while(lcore_count != 2) rte_pause();