From patchwork Fri Jul 15 13:12:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 113982 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 60573A0032; Fri, 15 Jul 2022 15:12:54 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 48CF940A87; Fri, 15 Jul 2022 15:12:54 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id 49A4640696 for ; Fri, 15 Jul 2022 15:12:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657890772; x=1689426772; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=IhUMbp1Q/bztpc8928bjtDieWC4tLM9T7p+uiAXQXPg=; b=CckGJ2j9y+U+Gavel6KjeUxyqnByp7OJwa/SgXBa2Zla+dkRKRwEWImZ fMV1d8HYsLoh3SpaasHZAF2ydFhUbV+uYHn0FZQj+7xJYGIsVPEEau1Td bpZSNZedj0N+0bTfdDyrMylxOlP+oNgroIEzhLbMQ3wCaqGs5dOm0RlQr YQowH62lUnBqR4h+xlksL0Cm2YpJioDKk4MhaTHZshgOvPXzcxY4ehpIx adqjF9Q03FZZ+qewIREQTR0qOHajO8qpiwTqIFsKYDSOLiu4SxqsPIDy0 9HAModpHf8eigpJU4d3rt6D6UiZQhLaNiZH4xUfhx6l7DkYjYQSXBu/up A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349753303" X-IronPort-AV: E=Sophos;i="5.92,274,1650956400"; d="scan'208";a="349753303" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jul 2022 06:12:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,274,1650956400"; d="scan'208";a="596465176" Received: from silpixa00401191.ir.intel.com ([10.55.128.75]) by orsmga002.jf.intel.com with ESMTP; 15 Jul 2022 06:12:46 -0700 From: Anatoly Burakov To: dev@dpdk.org, Bruce Richardson , Nicolas Chautru , Fan Zhang , Ashish Gupta , Akhil Goyal , David Hunt , Chengwen Feng , Kevin Laatz , Ray Kinsella , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , Jerin Jacob , Sachin Saxena , Hemant Agrawal , Ori Kam , Honnappa Nagarahalli , Konstantin Ananyev Cc: Conor Walsh Subject: [PATCH v1 1/2] eal: add lcore busyness telemetry Date: Fri, 15 Jul 2022 13:12:44 +0000 Message-Id: <24c49429394294cfbf0d9c506b205029bac77c8b.1657890378.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Currently, there is no way to measure lcore busyness in a passive way, without any modifications to the application. This patch adds a new EAL API that will be able to passively track core busyness. The busyness is calculated by relying on the fact that most DPDK API's will poll for packets. Empty polls can be counted as "idle", while non-empty polls can be counted as busy. To measure lcore busyness, we simply call the telemetry timestamping function with the number of polls a particular code section has processed, and count the number of cycles we've spent processing empty bursts. The more empty bursts we encounter, the less cycles we spend in "busy" state, and the less core busyness will be reported. In order for all of the above to work without modifications to the application, the library code needs to be instrumented with calls to the lcore telemetry busyness timestamping function. The following parts of DPDK are instrumented with lcore telemetry calls: - All major driver API's: - ethdev - cryptodev - compressdev - regexdev - bbdev - rawdev - eventdev - dmadev - Some additional libraries: - ring - distributor To avoid performance impact from having lcore telemetry support, a global variable is exported by EAL, and a call to timestamping function is wrapped into a macro, so that whenever telemetry is disabled, it only takes one additional branch and no function calls are performed. It is also possible to disable it at compile time by commenting out RTE_LCORE_BUSYNESS from build config. This patch also adds a telemetry endpoint to report lcore busyness, as well as telemetry endpoints to enable/disable lcore telemetry. Signed-off-by: Kevin Laatz Signed-off-by: Conor Walsh Signed-off-by: David Hunt Signed-off-by: Anatoly Burakov --- Notes: We did a couple of quick smoke tests to see if this patch causes any performance degradation, and it seemed to have none that we could measure. Telemetry can be disabled at compile time via a config option, while at runtime it can be disabled, seemingly at a cost of one additional branch. That said, our benchmarking efforts were admittedly not very rigorous, so comments welcome! config/rte_config.h | 2 + lib/bbdev/rte_bbdev.h | 17 +- lib/compressdev/rte_compressdev.c | 2 + lib/cryptodev/rte_cryptodev.h | 2 + lib/distributor/rte_distributor.c | 21 +- lib/distributor/rte_distributor_single.c | 14 +- lib/dmadev/rte_dmadev.h | 15 +- lib/eal/common/eal_common_lcore_telemetry.c | 274 ++++++++++++++++++++ lib/eal/common/meson.build | 1 + lib/eal/include/rte_lcore.h | 80 ++++++ lib/eal/meson.build | 3 + lib/eal/version.map | 7 + lib/ethdev/rte_ethdev.h | 2 + lib/eventdev/rte_eventdev.h | 10 +- lib/rawdev/rte_rawdev.c | 5 +- lib/regexdev/rte_regexdev.h | 5 +- lib/ring/rte_ring_elem_pvt.h | 1 + 17 files changed, 437 insertions(+), 24 deletions(-) create mode 100644 lib/eal/common/eal_common_lcore_telemetry.c diff --git a/config/rte_config.h b/config/rte_config.h index 46549cb062..583cb6f7a5 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -39,6 +39,8 @@ #define RTE_LOG_DP_LEVEL RTE_LOG_INFO #define RTE_BACKTRACE 1 #define RTE_MAX_VFIO_CONTAINERS 64 +#define RTE_LCORE_BUSYNESS 1 +#define RTE_LCORE_BUSYNESS_PERIOD 4000000ULL /* bsd module defines */ #define RTE_CONTIGMEM_MAX_NUM_BUFS 64 diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index b88c88167e..d6ed176cce 100644 --- a/lib/bbdev/rte_bbdev.h +++ b/lib/bbdev/rte_bbdev.h @@ -28,6 +28,7 @@ extern "C" { #include #include +#include #include "rte_bbdev_op.h" @@ -599,7 +600,9 @@ rte_bbdev_dequeue_enc_ops(uint16_t dev_id, uint16_t queue_id, { struct rte_bbdev *dev = &rte_bbdev_devices[dev_id]; struct rte_bbdev_queue_data *q_data = &dev->data->queues[queue_id]; - return dev->dequeue_enc_ops(q_data, ops, num_ops); + const uint16_t nb_ops = dev->dequeue_enc_ops(q_data, ops, num_ops); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } /** @@ -631,7 +634,9 @@ rte_bbdev_dequeue_dec_ops(uint16_t dev_id, uint16_t queue_id, { struct rte_bbdev *dev = &rte_bbdev_devices[dev_id]; struct rte_bbdev_queue_data *q_data = &dev->data->queues[queue_id]; - return dev->dequeue_dec_ops(q_data, ops, num_ops); + const uint16_t nb_ops = dev->dequeue_dec_ops(q_data, ops, num_ops); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } @@ -662,7 +667,9 @@ rte_bbdev_dequeue_ldpc_enc_ops(uint16_t dev_id, uint16_t queue_id, { struct rte_bbdev *dev = &rte_bbdev_devices[dev_id]; struct rte_bbdev_queue_data *q_data = &dev->data->queues[queue_id]; - return dev->dequeue_ldpc_enc_ops(q_data, ops, num_ops); + const uint16_t nb_ops = dev->dequeue_ldpc_enc_ops(q_data, ops, num_ops); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } /** @@ -692,7 +699,9 @@ rte_bbdev_dequeue_ldpc_dec_ops(uint16_t dev_id, uint16_t queue_id, { struct rte_bbdev *dev = &rte_bbdev_devices[dev_id]; struct rte_bbdev_queue_data *q_data = &dev->data->queues[queue_id]; - return dev->dequeue_ldpc_dec_ops(q_data, ops, num_ops); + const uint16_t nb_ops = dev->dequeue_ldpc_dec_ops(q_data, ops, num_ops); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } /** Definitions of device event types */ diff --git a/lib/compressdev/rte_compressdev.c b/lib/compressdev/rte_compressdev.c index 22c438f2dd..912cee9a16 100644 --- a/lib/compressdev/rte_compressdev.c +++ b/lib/compressdev/rte_compressdev.c @@ -580,6 +580,8 @@ rte_compressdev_dequeue_burst(uint8_t dev_id, uint16_t qp_id, nb_ops = (*dev->dequeue_burst) (dev->data->queue_pairs[qp_id], ops, nb_ops); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h index 56f459c6a0..072874020d 100644 --- a/lib/cryptodev/rte_cryptodev.h +++ b/lib/cryptodev/rte_cryptodev.h @@ -1915,6 +1915,8 @@ rte_cryptodev_dequeue_burst(uint8_t dev_id, uint16_t qp_id, rte_rcu_qsbr_thread_offline(list->qsbr, 0); } #endif + + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); return nb_ops; } diff --git a/lib/distributor/rte_distributor.c b/lib/distributor/rte_distributor.c index 3035b7a999..35b0d8d36b 100644 --- a/lib/distributor/rte_distributor.c +++ b/lib/distributor/rte_distributor.c @@ -56,6 +56,8 @@ rte_distributor_request_pkt(struct rte_distributor *d, while (rte_rdtsc() < t) rte_pause(); + /* this was an empty poll */ + RTE_LCORE_TELEMETRY_TIMESTAMP(0); } /* @@ -134,24 +136,29 @@ rte_distributor_get_pkt(struct rte_distributor *d, if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) { if (return_count <= 1) { + uint16_t cnt; pkts[0] = rte_distributor_get_pkt_single(d->d_single, - worker_id, return_count ? oldpkt[0] : NULL); - return (pkts[0]) ? 1 : 0; - } else - return -EINVAL; + worker_id, + return_count ? oldpkt[0] : NULL); + cnt = (pkts[0] != NULL) ? 1 : 0; + RTE_LCORE_TELEMETRY_TIMESTAMP(cnt); + return cnt; + } + return -EINVAL; } rte_distributor_request_pkt(d, worker_id, oldpkt, return_count); - count = rte_distributor_poll_pkt(d, worker_id, pkts); - while (count == -1) { + while ((count = rte_distributor_poll_pkt(d, worker_id, pkts)) == -1) { uint64_t t = rte_rdtsc() + 100; while (rte_rdtsc() < t) rte_pause(); - count = rte_distributor_poll_pkt(d, worker_id, pkts); + /* this was an empty poll */ + RTE_LCORE_TELEMETRY_TIMESTAMP(0); } + RTE_LCORE_TELEMETRY_TIMESTAMP(count); return count; } diff --git a/lib/distributor/rte_distributor_single.c b/lib/distributor/rte_distributor_single.c index 2c77ac454a..dc58791bf4 100644 --- a/lib/distributor/rte_distributor_single.c +++ b/lib/distributor/rte_distributor_single.c @@ -31,8 +31,13 @@ rte_distributor_request_pkt_single(struct rte_distributor_single *d, union rte_distributor_buffer_single *buf = &d->bufs[worker_id]; int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_GET_BUF; - RTE_WAIT_UNTIL_MASKED(&buf->bufptr64, RTE_DISTRIB_FLAGS_MASK, - ==, 0, __ATOMIC_RELAXED); + + while (!(__atomic_load_n(&buf->bufptr64, __ATOMIC_RELAXED) + & RTE_DISTRIB_FLAGS_MASK) == 0) { + rte_pause(); + /* this was an empty poll */ + RTE_LCORE_TELEMETRY_TIMESTAMP(0); + } /* Sync with distributor on GET_BUF flag. */ __atomic_store_n(&(buf->bufptr64), req, __ATOMIC_RELEASE); @@ -59,8 +64,11 @@ rte_distributor_get_pkt_single(struct rte_distributor_single *d, { struct rte_mbuf *ret; rte_distributor_request_pkt_single(d, worker_id, oldpkt); - while ((ret = rte_distributor_poll_pkt_single(d, worker_id)) == NULL) + while ((ret = rte_distributor_poll_pkt_single(d, worker_id)) == NULL) { rte_pause(); + /* this was an empty poll */ + RTE_LCORE_TELEMETRY_TIMESTAMP(0); + } return ret; } diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h index e7f992b734..98176a6a7a 100644 --- a/lib/dmadev/rte_dmadev.h +++ b/lib/dmadev/rte_dmadev.h @@ -149,6 +149,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -1027,7 +1028,7 @@ rte_dma_completed(int16_t dev_id, uint16_t vchan, const uint16_t nb_cpls, uint16_t *last_idx, bool *has_error) { struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id]; - uint16_t idx; + uint16_t idx, nb_ops; bool err; #ifdef RTE_DMADEV_DEBUG @@ -1050,8 +1051,10 @@ rte_dma_completed(int16_t dev_id, uint16_t vchan, const uint16_t nb_cpls, has_error = &err; *has_error = false; - return (*obj->completed)(obj->dev_private, vchan, nb_cpls, last_idx, - has_error); + nb_ops = (*obj->completed)(obj->dev_private, vchan, nb_cpls, last_idx, + has_error); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } /** @@ -1090,7 +1093,7 @@ rte_dma_completed_status(int16_t dev_id, uint16_t vchan, enum rte_dma_status_code *status) { struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id]; - uint16_t idx; + uint16_t idx, nb_ops; #ifdef RTE_DMADEV_DEBUG if (!rte_dma_is_valid(dev_id) || nb_cpls == 0 || status == NULL) @@ -1101,8 +1104,10 @@ rte_dma_completed_status(int16_t dev_id, uint16_t vchan, if (last_idx == NULL) last_idx = &idx; - return (*obj->completed_status)(obj->dev_private, vchan, nb_cpls, + nb_ops = (*obj->completed_status)(obj->dev_private, vchan, nb_cpls, last_idx, status); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } /** diff --git a/lib/eal/common/eal_common_lcore_telemetry.c b/lib/eal/common/eal_common_lcore_telemetry.c new file mode 100644 index 0000000000..5e4ea15ff5 --- /dev/null +++ b/lib/eal/common/eal_common_lcore_telemetry.c @@ -0,0 +1,274 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2014 Intel Corporation + */ + +#include +#include +#include + +#include +#include +#include +#include + +#ifdef RTE_LCORE_BUSYNESS +#include +#endif + +int __rte_lcore_telemetry_enabled; + +#ifdef RTE_LCORE_BUSYNESS + +struct lcore_telemetry { + int busyness; + /**< Calculated busyness (gets set/returned by the API) */ + int raw_busyness; + /**< Calculated busyness times 100. */ + uint64_t interval_ts; + /**< when previous telemetry interval started */ + uint64_t empty_cycles; + /**< empty cycle count since last interval */ + uint64_t last_poll_ts; + /**< last poll timestamp */ + bool last_empty; + /**< if last poll was empty */ + unsigned int contig_poll_cnt; + /**< contiguous (always empty/non empty) poll counter */ +} __rte_cache_aligned; +static struct lcore_telemetry telemetry_data[RTE_MAX_LCORE]; + +#define LCORE_BUSYNESS_MAX 100 +#define LCORE_BUSYNESS_NOT_SET -1 +#define LCORE_BUSYNESS_MIN 0 + +static void lcore_config_init(void) +{ + int lcore_id; + RTE_LCORE_FOREACH(lcore_id) { + struct lcore_telemetry *td = &telemetry_data[lcore_id]; + + td->interval_ts = 0; + td->last_poll_ts = 0; + td->empty_cycles = 0; + td->last_empty = true; + td->contig_poll_cnt = 0; + td->busyness = LCORE_BUSYNESS_NOT_SET; + td->raw_busyness = 0; + } +} + +int rte_lcore_busyness(unsigned int lcore_id) +{ + const uint64_t active_thresh = RTE_LCORE_BUSYNESS_PERIOD * 1000; + struct lcore_telemetry *tdata; + + if (lcore_id >= RTE_MAX_LCORE) + return -EINVAL; + tdata = &telemetry_data[lcore_id]; + + /* if the lcore is not active */ + if (tdata->interval_ts == 0) + return LCORE_BUSYNESS_NOT_SET; + /* if the core hasn't been active in a while */ + else if ((rte_rdtsc() - tdata->interval_ts) > active_thresh) + return LCORE_BUSYNESS_NOT_SET; + + /* this core is active, report its busyness */ + return telemetry_data[lcore_id].busyness; +} + +int rte_lcore_busyness_enabled(void) +{ + return __rte_lcore_telemetry_enabled; +} + +void rte_lcore_busyness_enabled_set(int enable) +{ + __rte_lcore_telemetry_enabled = !!enable; + + if (!enable) + lcore_config_init(); +} + +static inline int calc_raw_busyness(const struct lcore_telemetry *tdata, + const uint64_t empty, const uint64_t total) +{ + /* + * we don't want to use floating point math here, but we want for our + * busyness to react smoothly to sudden changes, while still keeping the + * accuracy and making sure that over time the average follows busyness + * as measured just-in-time. therefore, we will calculate the average + * busyness using integer math, but shift the decimal point two places + * to the right, so that 100.0 becomes 10000. this allows us to report + * integer values (0..100) while still allowing ourselves to follow the + * just-in-time measurements when we calculate our averages. + */ + const int max_raw_idle = LCORE_BUSYNESS_MAX * 100; + + /* + * at upper end of the busyness scale, going up from 90->100 will take + * longer than going from 10->20 because of the averaging. to address + * this, we invert the scale when doing calculations: that is, we + * effectively calculate average *idle* cycle percentage, not average + * *busy* cycle percentage. this means that the scale is naturally + * biased towards fast scaling up, and slow scaling down. + */ + const int prev_raw_idle = max_raw_idle - tdata->raw_busyness; + + /* calculate rate of idle cycles, times 100 */ + const int cur_raw_idle = (int)((empty * max_raw_idle) / total); + + /* smoothen the idleness */ + const int smoothened_idle = (cur_raw_idle + prev_raw_idle * 4) / 5; + + /* convert idleness back to busyness */ + return max_raw_idle - smoothened_idle; +} + +void __rte_lcore_telemetry_timestamp(uint16_t nb_rx) +{ + const unsigned int lcore_id = rte_lcore_id(); + uint64_t interval_ts, empty_cycles, cur_tsc, last_poll_ts; + struct lcore_telemetry *tdata = &telemetry_data[lcore_id]; + const bool empty = nb_rx == 0; + uint64_t diff_int, diff_last; + bool last_empty; + + last_empty = tdata->last_empty; + + /* optimization: don't do anything if status hasn't changed */ + if (last_empty == empty && tdata->contig_poll_cnt++ < 32) + return; + /* status changed or we're waiting for too long, reset counter */ + tdata->contig_poll_cnt = 0; + + cur_tsc = rte_rdtsc(); + + interval_ts = tdata->interval_ts; + empty_cycles = tdata->empty_cycles; + last_poll_ts = tdata->last_poll_ts; + + diff_int = cur_tsc - interval_ts; + diff_last = cur_tsc - last_poll_ts; + + /* is this the first time we're here? */ + if (interval_ts == 0) { + tdata->busyness = LCORE_BUSYNESS_MIN; + tdata->raw_busyness = 0; + tdata->interval_ts = cur_tsc; + tdata->empty_cycles = 0; + tdata->contig_poll_cnt = 0; + goto end; + } + + /* update the empty counter if we got an empty poll earlier */ + if (last_empty) + empty_cycles += diff_last; + + /* have we passed the interval? */ + if (diff_int > RTE_LCORE_BUSYNESS_PERIOD) { + int raw_busyness; + + /* get updated busyness value */ + raw_busyness = calc_raw_busyness(tdata, empty_cycles, diff_int); + + /* set a new interval, reset empty counter */ + tdata->interval_ts = cur_tsc; + tdata->empty_cycles = 0; + tdata->raw_busyness = raw_busyness; + /* bring busyness back to 0..100 range, biased to round up */ + tdata->busyness = (raw_busyness + 50) / 100; + } else + /* we may have updated empty counter */ + tdata->empty_cycles = empty_cycles; + +end: + /* update status for next poll */ + tdata->last_poll_ts = cur_tsc; + tdata->last_empty = empty; +} + +static int +lcore_busyness_enable(const char *cmd __rte_unused, + const char *params __rte_unused, + struct rte_tel_data *d) +{ + rte_lcore_busyness_enabled_set(1); + + rte_tel_data_start_dict(d); + + rte_tel_data_add_dict_int(d, "busyness_enabled", 1); + + return 0; +} + +static int +lcore_busyness_disable(const char *cmd __rte_unused, + const char *params __rte_unused, + struct rte_tel_data *d) +{ + rte_lcore_busyness_enabled_set(0); + + rte_tel_data_start_dict(d); + + rte_tel_data_add_dict_int(d, "busyness_enabled", 0); + + return 0; +} + +static int +lcore_handle_busyness(const char *cmd __rte_unused, + const char *params __rte_unused, struct rte_tel_data *d) +{ + char corenum[64]; + int i; + + rte_tel_data_start_dict(d); + + RTE_LCORE_FOREACH(i) { + if (!rte_lcore_is_enabled(i)) + continue; + snprintf(corenum, sizeof(corenum), "%d", i); + rte_tel_data_add_dict_int(d, corenum, rte_lcore_busyness(i)); + } + + return 0; +} + +RTE_INIT(lcore_init_telemetry) +{ + __rte_lcore_telemetry_enabled = true; + + lcore_config_init(); + + rte_telemetry_register_cmd("/eal/lcore/busyness", lcore_handle_busyness, + "return percentage busyness of cores"); + + rte_telemetry_register_cmd("/eal/lcore/busyness_enable", lcore_busyness_enable, + "enable lcore busyness measurement"); + + rte_telemetry_register_cmd("/eal/lcore/busyness_disable", lcore_busyness_disable, + "disable lcore busyness measurement"); +} + +#else + +int rte_lcore_busyness(unsigned int lcore_id __rte_unused) +{ + return -ENOTSUP; +} + +int rte_lcore_busyness_enabled(void) +{ + return -ENOTSUP; +} + +void rte_lcore_busyness_enabled_set(int enable __rte_unused) +{ +} + +void __rte_lcore_telemetry_timestamp(uint16_t nb_rx __rte_unused) +{ +} + +#endif diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..a743e66a7d 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -17,6 +17,7 @@ sources += files( 'eal_common_hexdump.c', 'eal_common_interrupts.c', 'eal_common_launch.c', + 'eal_common_lcore_telemetry.c', 'eal_common_lcore.c', 'eal_common_log.c', 'eal_common_mcfg.c', diff --git a/lib/eal/include/rte_lcore.h b/lib/eal/include/rte_lcore.h index b598e1b9ec..ab7a8e1e26 100644 --- a/lib/eal/include/rte_lcore.h +++ b/lib/eal/include/rte_lcore.h @@ -415,6 +415,86 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Read busyness value corresponding to an lcore. + * + * @param lcore_id + * Lcore to read busyness value for. + * @return + * - value between 0 and 100 on success + * - -1 if lcore is not active + * - -EINVAL if lcore is invalid + * - -ENOMEM if not enough memory available + * - -ENOTSUP if not supported + */ +__rte_experimental +int +rte_lcore_busyness(unsigned int lcore_id); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Check if lcore busyness telemetry is enabled. + * + * @return + * - 1 if lcore telemetry is enabled + * - 0 if lcore telemetry is disabled + * - -ENOTSUP if not lcore telemetry supported + */ +__rte_experimental +int +rte_lcore_busyness_enabled(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Enable or disable busyness telemetry. + * + * @param enable + * 1 to enable, 0 to disable + */ +__rte_experimental +void +rte_lcore_busyness_enabled_set(int enable); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Lcore telemetry timestamping function. + * + * @param nb_rx + * Number of buffers processed by lcore. + */ +__rte_experimental +void +__rte_lcore_telemetry_timestamp(uint16_t nb_rx); + +/** @internal lcore telemetry enabled status */ +extern int __rte_lcore_telemetry_enabled; + +/** + * Call lcore telemetry timestamp function. + * + * @param nb_rx + * Number of buffers processed by lcore. + */ +#ifdef RTE_LCORE_BUSYNESS +#define RTE_LCORE_TELEMETRY_TIMESTAMP(nb_rx) \ + do { \ + if (__rte_lcore_telemetry_enabled) \ + __rte_lcore_telemetry_timestamp(nb_rx); \ + } while (0) +#else +#define RTE_LCORE_TELEMETRY_TIMESTAMP(nb_rx) \ + while (0) +#endif + #ifdef __cplusplus } #endif diff --git a/lib/eal/meson.build b/lib/eal/meson.build index 056beb9461..7199aa03c2 100644 --- a/lib/eal/meson.build +++ b/lib/eal/meson.build @@ -25,6 +25,9 @@ subdir(arch_subdir) deps += ['kvargs'] if not is_windows deps += ['telemetry'] +else + # core busyness telemetry depends on telemetry library + dpdk_conf.set('RTE_LCORE_BUSYNESS', false) endif if dpdk_conf.has('RTE_USE_LIBBSD') ext_deps += libbsd diff --git a/lib/eal/version.map b/lib/eal/version.map index c2a2cebf69..52061b30f0 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -424,6 +424,13 @@ EXPERIMENTAL { rte_thread_self; rte_thread_set_affinity_by_id; rte_thread_set_priority; + + # added in 22.11 + __rte_lcore_telemetry_timestamp; + __rte_lcore_telemetry_enabled; + rte_lcore_busyness; + rte_lcore_busyness_enabled; + rte_lcore_busyness_enabled_set; }; INTERNAL { diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index de9e970d4d..1caecd5a11 100644 --- a/lib/ethdev/rte_ethdev.h +++ b/lib/ethdev/rte_ethdev.h @@ -5675,6 +5675,8 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id, #endif rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx); + + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_rx); return nb_rx; } diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h index 6a6f6ea4c1..a1d42d9214 100644 --- a/lib/eventdev/rte_eventdev.h +++ b/lib/eventdev/rte_eventdev.h @@ -2153,6 +2153,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[], uint16_t nb_events, uint64_t timeout_ticks) { const struct rte_event_fp_ops *fp_ops; + uint16_t nb_evts; void *port; fp_ops = &rte_event_fp_ops[dev_id]; @@ -2175,10 +2176,13 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[], * requests nb_events as const one */ if (nb_events == 1) - return (fp_ops->dequeue)(port, ev, timeout_ticks); + nb_evts = (fp_ops->dequeue)(port, ev, timeout_ticks); else - return (fp_ops->dequeue_burst)(port, ev, nb_events, - timeout_ticks); + nb_evts = (fp_ops->dequeue_burst)(port, ev, nb_events, + timeout_ticks); + + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_evts); + return nb_evts; } #define RTE_EVENT_DEV_MAINT_OP_FLUSH (1 << 0) diff --git a/lib/rawdev/rte_rawdev.c b/lib/rawdev/rte_rawdev.c index 2f0a4f132e..27163e87cb 100644 --- a/lib/rawdev/rte_rawdev.c +++ b/lib/rawdev/rte_rawdev.c @@ -226,12 +226,15 @@ rte_rawdev_dequeue_buffers(uint16_t dev_id, rte_rawdev_obj_t context) { struct rte_rawdev *dev; + int nb_ops; RTE_RAWDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL); dev = &rte_rawdevs[dev_id]; RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dequeue_bufs, -ENOTSUP); - return (*dev->dev_ops->dequeue_bufs)(dev, buffers, count, context); + nb_ops = (*dev->dev_ops->dequeue_bufs)(dev, buffers, count, context); + RTE_LCORE_TELEMETRY_TIMESTAMP(nb_ops); + return nb_ops; } int diff --git a/lib/regexdev/rte_regexdev.h b/lib/regexdev/rte_regexdev.h index 3bce8090f6..781055b4eb 100644 --- a/lib/regexdev/rte_regexdev.h +++ b/lib/regexdev/rte_regexdev.h @@ -1530,6 +1530,7 @@ rte_regexdev_dequeue_burst(uint8_t dev_id, uint16_t qp_id, struct rte_regex_ops **ops, uint16_t nb_ops) { struct rte_regexdev *dev = &rte_regex_devices[dev_id]; + uint16_t deq_ops; #ifdef RTE_LIBRTE_REGEXDEV_DEBUG RTE_REGEXDEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL); RTE_FUNC_PTR_OR_ERR_RET(*dev->dequeue, -ENOTSUP); @@ -1538,7 +1539,9 @@ rte_regexdev_dequeue_burst(uint8_t dev_id, uint16_t qp_id, return -EINVAL; } #endif - return (*dev->dequeue)(dev, qp_id, ops, nb_ops); + deq_ops = (*dev->dequeue)(dev, qp_id, ops, nb_ops); + RTE_LCORE_TELEMETRY_TIMESTAMP(deq_ops); + return deq_ops; } #ifdef __cplusplus diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h index 83788c56e6..6db09d4291 100644 --- a/lib/ring/rte_ring_elem_pvt.h +++ b/lib/ring/rte_ring_elem_pvt.h @@ -379,6 +379,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table, end: if (available != NULL) *available = entries - n; + RTE_LCORE_TELEMETRY_TIMESTAMP(n); return n; } From patchwork Fri Jul 15 13:12:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 113983 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9AD53A0032; Fri, 15 Jul 2022 15:12:58 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4B14442B6E; Fri, 15 Jul 2022 15:12:55 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id 567D540696 for ; Fri, 15 Jul 2022 15:12:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657890773; x=1689426773; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=gP7vrnF4HANoo7eBTk54dxmhtHUVid+UpouPvC+Y1Uc=; b=MswUMZOpY6uAWx8fhAdAmAHeY9BK+mOB9q6Z7WCvfo88AtMFZBLm3XiK exLKL0hUbmKDKVJHKgj7kFPs/u4Cvnua+5NsKIJVHm31QjZ4NESY+Gw+y jgQjHBAGrDDJj4E+z4eb8C9kWfAW1a4TxDtH3mpg1WBz0DPzmqe6s59fv saK4qhDkS063UxPggqz3tSGbUIMuBB0p8cpi265a6y1fuYQQ6Y1/IDWFh 23hFutak4MlzvXlvOzqk1DMgGr14jDNG2/3kYXrp5HAqcL0Ul2sW4dzW/ cRyyeoWrHvAXCvrXvDukV3U0+L6owJ73s66qKLrzOvDICKBJ71rs96rUo A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349753307" X-IronPort-AV: E=Sophos;i="5.92,274,1650956400"; d="scan'208";a="349753307" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jul 2022 06:12:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,274,1650956400"; d="scan'208";a="596465177" Received: from silpixa00401191.ir.intel.com ([10.55.128.75]) by orsmga002.jf.intel.com with ESMTP; 15 Jul 2022 06:12:51 -0700 From: Anatoly Burakov To: dev@dpdk.org Subject: [PATCH v1 2/2] eal: add cpuset lcore telemetry entries Date: Fri, 15 Jul 2022 13:12:45 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: <24c49429394294cfbf0d9c506b205029bac77c8b.1657890378.git.anatoly.burakov@intel.com> References: <24c49429394294cfbf0d9c506b205029bac77c8b.1657890378.git.anatoly.burakov@intel.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Expose per-lcore cpuset information to telemetry. Signed-off-by: Anatoly Burakov --- lib/eal/common/eal_common_lcore_telemetry.c | 47 +++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/lib/eal/common/eal_common_lcore_telemetry.c b/lib/eal/common/eal_common_lcore_telemetry.c index 5e4ea15ff5..39fffe2b93 100644 --- a/lib/eal/common/eal_common_lcore_telemetry.c +++ b/lib/eal/common/eal_common_lcore_telemetry.c @@ -19,6 +19,8 @@ int __rte_lcore_telemetry_enabled; #ifdef RTE_LCORE_BUSYNESS +#include "eal_private.h" + struct lcore_telemetry { int busyness; /**< Calculated busyness (gets set/returned by the API) */ @@ -235,6 +237,48 @@ lcore_handle_busyness(const char *cmd __rte_unused, return 0; } +static int +lcore_handle_cpuset(const char *cmd __rte_unused, + const char *params __rte_unused, + struct rte_tel_data *d) +{ + char corenum[64]; + int i; + + rte_tel_data_start_dict(d); + + RTE_LCORE_FOREACH(i) { + const struct lcore_config *cfg = &lcore_config[i]; + const rte_cpuset_t *cpuset = &cfg->cpuset; + struct rte_tel_data *ld; + unsigned int cpu; + + if (!rte_lcore_is_enabled(i)) + continue; + + /* create an array of integers */ + ld = rte_tel_data_alloc(); + if (ld == NULL) + return -ENOMEM; + rte_tel_data_start_array(ld, RTE_TEL_INT_VAL); + + /* add cpu ID's from cpuset to the array */ + for (cpu = 0; cpu < CPU_SETSIZE; cpu++) { + if (!CPU_ISSET(cpu, cpuset)) + continue; + rte_tel_data_add_array_int(ld, cpu); + } + + /* add array to the per-lcore container */ + snprintf(corenum, sizeof(corenum), "%d", i); + + /* tell telemetry library to free this array automatically */ + rte_tel_data_add_dict_container(d, corenum, ld, 0); + } + + return 0; +} + RTE_INIT(lcore_init_telemetry) { __rte_lcore_telemetry_enabled = true; @@ -249,6 +293,9 @@ RTE_INIT(lcore_init_telemetry) rte_telemetry_register_cmd("/eal/lcore/busyness_disable", lcore_busyness_disable, "disable lcore busyness measurement"); + + rte_telemetry_register_cmd("/eal/lcore/cpuset", lcore_handle_cpuset, + "list physical core affinity for each lcore"); } #else From patchwork Wed Aug 24 16:24:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Laatz X-Patchwork-Id: 115380 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DA2A6A0543; Wed, 24 Aug 2022 18:22:02 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7425A40DDE; Wed, 24 Aug 2022 18:22:02 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 6B9554067B for ; Wed, 24 Aug 2022 18:22:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661358121; x=1692894121; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CzZi+HjlAulJGNYdg6frUjBQkcK74Sy42zSNa22KgOY=; b=DKWqoP58TqAYhu5Q/RZGNxmq2DYzYxGsIg4NcnKUKX9NDELvzNQhuCY3 PYg3ygewwPIAUqCc9fjiXjJRlbTqanuKpxgbSFjjY+/Jmx13UiVKoq1rz IjWrRfjEJ0+hKdlIo2AqZkYDo8DOh6dUBXVTRA7DWpVTHjr90OqD7oRIU fSRtgn7Pkz5jrKV0qXRpsYUG6YJW/DCuMK4d1QKlB/oiV6wVcbYIFrD3+ voeBz9y7VO84jcamt6YuB2Muk7dVO7NLByuZmY48N2dcq6HHO0FrZALsR 9exCqlKZHVkbZm9UwuIHI0R9Cy1wUyNYu99Sc4S8td4ahzbN1EKZO9Ysz A==; X-IronPort-AV: E=McAfee;i="6500,9779,10449"; a="295291916" X-IronPort-AV: E=Sophos;i="5.93,260,1654585200"; d="scan'208";a="295291916" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2022 09:21:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,260,1654585200"; d="scan'208";a="699128531" Received: from silpixa00401122.ir.intel.com ([10.237.213.42]) by FMSMGA003.fm.intel.com with ESMTP; 24 Aug 2022 09:21:53 -0700 From: Kevin Laatz To: dev@dpdk.org Cc: anatoly.burakov@intel.com, Kevin Laatz Subject: [PATCH v2 3/3] doc: add howto guide for lcore poll busyness Date: Wed, 24 Aug 2022 17:24:42 +0100 Message-Id: <20220824162442.631456-4-kevin.laatz@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220824162442.631456-1-kevin.laatz@intel.com> References: <24c49429394294cfbf0d9c506b205029bac77c8b.1657890378.git.anatoly.burakov@intel.com> <20220824162442.631456-1-kevin.laatz@intel.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add a new section to the howto guides for using the new lcore poll busyness telemetry endpoints and describe general usage. Signed-off-by: Kevin Laatz --- doc/guides/howto/lcore_busyness.rst | 79 +++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 doc/guides/howto/lcore_busyness.rst diff --git a/doc/guides/howto/lcore_busyness.rst b/doc/guides/howto/lcore_busyness.rst new file mode 100644 index 0000000000..c8ccd3f513 --- /dev/null +++ b/doc/guides/howto/lcore_busyness.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2022 Intel Corporation. + +Lcore Poll Busyness Telemetry +======================== + +The lcore poll busyness telemetry provides a built-in, generic method of gathering +lcore utilization metrics for running applications. These metrics are exposed +via a new telemetry endpoint. + +Since most DPDK APIs poll for packets, the poll busyness is calculated based on +APIs receiving packets. Empty polls are considered as idle, while non-empty polls +are considered busy. Using the amount of cycles spent processing empty polls, the +busyness can be calculated and recorded. + +Application Specified Busyness +------------------------------ + +Improved accuracy of the reported busyness may need more contextual awareness +from the application. For example, a pipelined application may make a number of +calls to rx_burst before processing packets. Any processing done on this 'bulk' +would need to be marked as "busy" cycles, not just the last received burst. This +type of awareness is only available within the application. + +Applications can be modified to incorporate the extra contextual awareness in +order to improve the reported busyness by marking areas of code as "busy" or +"idle" appropriately. This can be done by inserting the timestamping macro:: + + RTE_LCORE_TELEMETRY_TIMESTAMP(0) /* to mark section as idle */ + RTE_LCORE_TELEMETRY_TIMESTAMP(32) /* where 32 is nb_pkts to mark section as busy (non-zero is busy) */ + +All cycles since the last state change will be counted towards the current state's +counter. + +Consuming the Telemetry +----------------------- + +The telemetry gathered for lcore poll busyness can be read from the `telemetry.py` +script via the new `/eal/lcore/poll_busyness` endpoint:: + + $ ./usertools/dpdk-telemetry.py + --> /eal/lcore/poll_busyness + {"/eal/lcore/poll_busyness": {"12": -1, "13": 85, "14": 84}} + +* Cores not collecting busyness will report "-1". E.g. control cores or inactive cores. +* All enabled cores will report their busyness in the range 0-100. + +Disabling Lcore Poll Busyness Telemetry +---------------------------------- + +Some applications may not want lcore poll busyness telemetry to be tracked, for +example performance critical applications or applications that are already being +monitored by other tools gathering similar or more application specific information. + +For those applications, there are two ways in which this telemetry can be disabled. + +At compile time +^^^^^^^^^^^^^^^ + +Support can be disabled at compile time via the meson option. It is enabled by +default.:: + + $ meson configure -Denable_lcore_poll_busyness=false + +At run time +^^^^^^^^^^^ + +Support can also be disabled during runtime. This comes at the cost of an +additional branch, however no additional function calls are performed. + +To disable support at runtime, a call can be made to the +`/eal/lcore/poll_busyness_disable` endpoint:: + + $ ./usertools/dpdk-telemetry.py + --> /eal/lcore/poll_busyness_disable + {"/eal/lcore/poll_busyness_disable": {"busyness_enabled": 0}} + +It can be re-enabled at run time with the `/eal/lcore/poll_busyness_enable` +endpoint.