[RFC,2/4] eal: allow applications to report their cpu utilization

Message ID 20221123101931.1688238-3-rjarry@redhat.com (mailing list archive)
State Not Applicable, archived
Delegated to: Thomas Monjalon
Headers
Series lcore telemetry improvements |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Robin Jarry Nov. 23, 2022, 10:19 a.m. UTC
  Allow applications to register a callback that will be invoked in
rte_lcore_dump() and when requesting lcore info in the telemetry API.

The callback is expected to return a number between 0 and 100
representing the percentage of busy cycles spent over a fixed period of
time. The period of time is configured when registering the callback.

Cc: Bruce Richardson <bruce.richardson@intel.com>
Cc: Jerin Jacob <jerinj@marvell.com>
Cc: Kevin Laatz <kevin.laatz@intel.com>
Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Cc: Mattias Rönnblom <hofors@lysator.liu.se>
Cc: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
 lib/eal/common/eal_common_lcore.c | 37 ++++++++++++++++++++++++++++---
 lib/eal/include/rte_lcore.h       | 30 +++++++++++++++++++++++++
 lib/eal/version.map               |  1 +
 3 files changed, 65 insertions(+), 3 deletions(-)
  

Comments

Morten Brørup Nov. 23, 2022, 11:52 a.m. UTC | #1
> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Wednesday, 23 November 2022 11.19
> To: dev@dpdk.org
> Cc: Bruce Richardson; Jerin Jacob; Kevin Laatz; Konstantin Ananyev;
> Mattias Rönnblom; Morten Brørup; Robin Jarry
> Subject: [RFC PATCH 2/4] eal: allow applications to report their cpu
> utilization
> 
> Allow applications to register a callback that will be invoked in
> rte_lcore_dump() and when requesting lcore info in the telemetry API.
> 
> The callback is expected to return a number between 0 and 100
> representing the percentage of busy cycles spent over a fixed period of
> time. The period of time is configured when registering the callback.
> 
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Cc: Jerin Jacob <jerinj@marvell.com>
> Cc: Kevin Laatz <kevin.laatz@intel.com>
> Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Cc: Mattias Rönnblom <hofors@lysator.liu.se>
> Cc: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---

This patch simply provides a function for the application to register a constant X and a callback, which returns Y.

X happens to be a duration in seconds.
Y can be a number between 0 and 100, and happens to be the lcore business (to be calculated by the application).

So I agree that it contains no controversial calculations. :-)

However, if the lcore business is supposed to be used for power management or similar, it must have much higher resolution than one second.

Also, CPU Usage is often reported in multiple time intervals, e.g. /proc/loadavg provides 1, 5 and 10 minute load averages.

Perhaps a deeper issue is that the output could also be considered statistics, which is handled differently in different applications. E.g. the statistics module in the SmartShare StraightShaper application includes histories in multiple time resolutions, e.g. 5 minutes in 1-second intervals, up to 1 year in 1 day intervals.

On the other hand, if the application must expose 1/5/10 minute statistics, it could register a callback with a 1 minute interval, and aggregate the numbers it its own statistics module.

Here's completely different angle, considering how statistics is often collected and processed by SNMP based tools:

This patch is based on a "gauge" (i.e. the busyness percentage) and an "interval" (i.e. the duration the gauge covers). I have to sample this gauge exactly every interval to collect data for a busyness chart. If the application's reporting interval is 1 second, I must sample the gauge every second, or statistical information will be lost.

Instead, I would prefer the callback to return two counters: units_passed (e.g. number of cycles since application start) and units_busy (e.g. number of busy cycles since application start).

I can sample these at any interval, and calculate the busyness of that interval as the difference: (units_busy - units_busy_before) / (units_passed - units_passed_before).

If needed, I can also sample them at multiple intervals, e.g. every 1, 5 and 10 minutes, and expose in the "loadavg".

I can also sample them every millisecond if I need to react quickly to a sudden increase/drop in busyness.
  
Robin Jarry Nov. 23, 2022, 1:29 p.m. UTC | #2
Hi Morten,

Morten Brørup, Nov 23, 2022 at 12:52:
> This patch is based on a "gauge" (i.e. the busyness percentage) and an
> "interval" (i.e. the duration the gauge covers). I have to sample this
> gauge exactly every interval to collect data for a busyness chart. If
> the application's reporting interval is 1 second, I must sample the
> gauge every second, or statistical information will be lost.
>
> Instead, I would prefer the callback to return two counters:
> units_passed (e.g. number of cycles since application start) and
> units_busy (e.g. number of busy cycles since application start).
>
> I can sample these at any interval, and calculate the busyness of that
> interval as the difference: (units_busy - units_busy_before)
> / (units_passed - units_passed_before).
>
> If needed, I can also sample them at multiple intervals, e.g. every 1,
> 5 and 10 minutes, and expose in the "loadavg".
>
> I can also sample them every millisecond if I need to react quickly to
> a sudden increase/drop in busyness.

Your proposal makes a lot of sense and it will even be easier to
implement for applications. I'll do that for v2.

Thanks for the feedback.
  
Stephen Hemminger Nov. 23, 2022, 4:38 p.m. UTC | #3
On Wed, 23 Nov 2022 11:19:29 +0100
Robin Jarry <rjarry@redhat.com> wrote:

> +static rte_lcore_busy_percent_cb lcore_busy_cb;
> +static unsigned int lcore_busy_period;
> +
> +void
> +rte_lcore_register_busy_percent_cb(rte_lcore_busy_percent_cb cb, unsigned int period)
> +{
> +	lcore_busy_cb = cb;
> +	lcore_busy_period = period;
> +}
> +
> +static int
> +lcore_busy_percent(unsigned int lcore_id)
> +{
> +	int percent = -1;
> +	if (lcore_busy_cb)
> +		percent = lcore_busy_cb(lcore_id);
> +	if (percent > 100)
> +		percent = 100;
> +	return percent;
> +}

This is a case where floating point double precision might be
a better API.
  

Patch

diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
index 31e3965dc5ad..9a85fd8854df 100644
--- a/lib/eal/common/eal_common_lcore.c
+++ b/lib/eal/common/eal_common_lcore.c
@@ -420,14 +420,36 @@  rte_lcore_iterate(rte_lcore_iterate_cb cb, void *arg)
 	return ret;
 }
 
+static rte_lcore_busy_percent_cb lcore_busy_cb;
+static unsigned int lcore_busy_period;
+
+void
+rte_lcore_register_busy_percent_cb(rte_lcore_busy_percent_cb cb, unsigned int period)
+{
+	lcore_busy_cb = cb;
+	lcore_busy_period = period;
+}
+
+static int
+lcore_busy_percent(unsigned int lcore_id)
+{
+	int percent = -1;
+	if (lcore_busy_cb)
+		percent = lcore_busy_cb(lcore_id);
+	if (percent > 100)
+		percent = 100;
+	return percent;
+}
+
 static int
 lcore_dump_cb(unsigned int lcore_id, void *arg)
 {
 	struct rte_config *cfg = rte_eal_get_configuration();
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
+	char busy_str[16];
 	const char *role;
 	FILE *f = arg;
-	int ret;
+	int ret, busy;
 
 	switch (cfg->lcore_role[lcore_id]) {
 	case ROLE_RTE:
@@ -446,9 +468,16 @@  lcore_dump_cb(unsigned int lcore_id, void *arg)
 
 	ret = eal_thread_dump_affinity(&lcore_config[lcore_id].cpuset, cpuset,
 		sizeof(cpuset));
-	fprintf(f, "lcore %u, socket %u, role %s, cpuset %s%s\n", lcore_id,
+	busy = lcore_busy_percent(lcore_id);
+	if (busy < 0) {
+		snprintf(busy_str, sizeof(busy_str), "%s", "N/A");
+	} else {
+		snprintf(busy_str, sizeof(busy_str), "%d%% last %d sec",
+			busy, lcore_busy_period);
+	}
+	fprintf(f, "lcore %u, socket %u, role %s, cpuset %s%s, busy %s\n", lcore_id,
 		rte_lcore_to_socket_id(lcore_id), role, cpuset,
-		ret == 0 ? "" : "...");
+		ret == 0 ? "" : "...", busy_str);
 	return 0;
 }
 
@@ -517,6 +546,8 @@  lcore_telemetry_info_cb(unsigned int lcore_id, void *arg)
 	rte_tel_data_add_dict_int(info->d, "socket", rte_lcore_to_socket_id(lcore_id));
 	rte_tel_data_add_dict_string(info->d, "role", role);
 	rte_tel_data_add_dict_string(info->d, "cpuset", cpuset);
+	rte_tel_data_add_dict_int(info->d, "busy_percent", lcore_busy_percent(lcore_id));
+	rte_tel_data_add_dict_int(info->d, "busy_period", lcore_busy_period);
 
 	return 0;
 }
diff --git a/lib/eal/include/rte_lcore.h b/lib/eal/include/rte_lcore.h
index 6938c3fd7b81..b1223eaa12bf 100644
--- a/lib/eal/include/rte_lcore.h
+++ b/lib/eal/include/rte_lcore.h
@@ -328,6 +328,36 @@  typedef int (*rte_lcore_iterate_cb)(unsigned int lcore_id, void *arg);
 int
 rte_lcore_iterate(rte_lcore_iterate_cb cb, void *arg);
 
+/**
+ * Callback to allow applications to report CPU utilization.
+ *
+ * @param lcore_id
+ *   The lcore to consider.
+ * @return
+ *   - A number between 0 and 100 representing the percentage of busy cycles
+ *     over the last period for the given lcore_id.
+ *   - -1 if the information is not available or if any error occurred.
+ */
+typedef int (*rte_lcore_busy_percent_cb)(unsigned int lcore_id);
+
+/**
+ * Register a callback from an application to be called in rte_lcore_dump()
+ * and the /eal/lcore/info telemetry endpoint handler.
+ *
+ * Applications are expected to return a number between 0 and 100 representing
+ * the percentage of busy cycles over the last period for the provided lcore_id.
+ * The implementation details for computing such a ratio is specific to each
+ * application.
+ *
+ * @param cb
+ *   The callback function.
+ * @param period
+ *   The period in seconds over which the percentage of busy cycles will be
+ *   reported by the application.
+ */
+__rte_experimental
+void rte_lcore_register_busy_percent_cb(rte_lcore_busy_percent_cb cb, unsigned int period);
+
 /**
  * List all lcores.
  *
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 7ad12a7dc985..138537ee5835 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -440,6 +440,7 @@  EXPERIMENTAL {
 	rte_thread_detach;
 	rte_thread_equal;
 	rte_thread_join;
+	rte_lcore_register_busy_percent_cb;
 };
 
 INTERNAL {