[v8,5/5] eal: add lcore usage telemetry endpoint
Checks
Commit Message
Allow fetching CPU cycles usage for all lcores with a single request.
This endpoint is intended for repeated and frequent invocations by
external monitoring systems and therefore returns condensed data.
It consists of a single dictionary with three keys: "lcore_ids",
"total_cycles" and "busy_cycles" that are mapped to three arrays of
integer values. Each array has the same number of values, one per lcore,
in the same order.
Example:
--> /eal/lcore/usage
{
"/eal/lcore/usage": {
"lcore_ids": [
4,
5
],
"total_cycles": [
23846845590,
23900558914
],
"busy_cycles": [
21043446682,
21448837316
]
}
}
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
---
Notes:
v7 -> v8: no change
doc/guides/rel_notes/release_23_03.rst | 5 +-
lib/eal/common/eal_common_lcore.c | 64 ++++++++++++++++++++++++++
2 files changed, 67 insertions(+), 2 deletions(-)
Comments
> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Thursday, 2 February 2023 14.43
>
> Allow fetching CPU cycles usage for all lcores with a single request.
> This endpoint is intended for repeated and frequent invocations by
> external monitoring systems and therefore returns condensed data.
>
> It consists of a single dictionary with three keys: "lcore_ids",
> "total_cycles" and "busy_cycles" that are mapped to three arrays of
> integer values. Each array has the same number of values, one per
> lcore,
> in the same order.
>
> Example:
>
> --> /eal/lcore/usage
> {
> "/eal/lcore/usage": {
> "lcore_ids": [
> 4,
> 5
> ],
> "total_cycles": [
> 23846845590,
> 23900558914
> ],
> "busy_cycles": [
> 21043446682,
> 21448837316
> ]
> }
> }
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Hi Robin,
On 2023/2/2 21:43, Robin Jarry wrote:
> Allow fetching CPU cycles usage for all lcores with a single request.
> This endpoint is intended for repeated and frequent invocations by
> external monitoring systems and therefore returns condensed data.
>
> It consists of a single dictionary with three keys: "lcore_ids",
> "total_cycles" and "busy_cycles" that are mapped to three arrays of
> integer values. Each array has the same number of values, one per lcore,
> in the same order.
>
> Example:
>
> --> /eal/lcore/usage
> {
> "/eal/lcore/usage": {
> "lcore_ids": [
> 4,
> 5
> ],
> "total_cycles": [
> 23846845590,
> 23900558914
> ],
> "busy_cycles": [
> 21043446682,
> 21448837316
> ]
> }
The telemetry should be human-readable also.
so why not "/eal/lcore/usage": {
"lcore_4" : {
"total_cycles" : xxx
"busy_cycles" : xxx
"busy/total ratio" : "xx%"
},
"lcore_5" : {
"total_cycles" : yyy
"busy_cycles" : yyy
"busy/total ratio" : "yy%"
},
}
> }
>
...
fengchengwen, Feb 06, 2023 at 04:27:
> The telemetry should be human-readable also.
>
> so why not "/eal/lcore/usage": {
> "lcore_4" : {
> "total_cycles" : xxx
> "busy_cycles" : xxx
> "busy/total ratio" : "xx%"
> },
> "lcore_5" : {
> "total_cycles" : yyy
> "busy_cycles" : yyy
> "busy/total ratio" : "yy%"
> },
> }
The raw data is exposed and can be rendered any way you like. This
should be left to external monitoring tools, such as grafana & al.
On 2023/2/6 16:24, Robin Jarry wrote:
> fengchengwen, Feb 06, 2023 at 04:27:
>> The telemetry should be human-readable also.
>>
>> so why not "/eal/lcore/usage": {
>> "lcore_4" : {
>> "total_cycles" : xxx
>> "busy_cycles" : xxx
>> "busy/total ratio" : "xx%"
>> },
>> "lcore_5" : {
>> "total_cycles" : yyy
>> "busy_cycles" : yyy
>> "busy/total ratio" : "yy%"
>> },
>> }
>
> The raw data is exposed and can be rendered any way you like. This should be left to external monitoring tools, such as grafana & al.
It's a small step in programming, but it's more user friendly.
Once done, user who use telemetry could be benefiting.
And it's also be render by monitoring tools because there's no data loss.
>
>
> .
@@ -80,8 +80,9 @@ New Features
* **Added support for reporting lcore usage in applications.**
- * The ``/eal/lcore/list`` and ``/eal/lcore/info`` telemetry endpoints have
- been added to provide information similar to ``rte_lcore_dump()``.
+ * The ``/eal/lcore/list``, ``/eal/lcore/usage`` and ``/eal/lcore/info``
+ telemetry endpoints have been added to provide information similar to
+ ``rte_lcore_dump()``.
* Applications can register a callback at startup via
``rte_lcore_register_usage_cb()`` to provide lcore usage information.
@@ -577,6 +577,67 @@ handle_lcore_info(const char *cmd __rte_unused, const char *params, struct rte_t
return rte_lcore_iterate(lcore_telemetry_info_cb, &info);
}
+struct lcore_telemetry_usage {
+ struct rte_tel_data *lcore_ids;
+ struct rte_tel_data *total_cycles;
+ struct rte_tel_data *busy_cycles;
+};
+
+static int
+lcore_telemetry_usage_cb(unsigned int lcore_id, void *arg)
+{
+ struct lcore_telemetry_usage *u = arg;
+ struct rte_lcore_usage usage;
+ rte_lcore_usage_cb usage_cb;
+
+ /* The callback may not set all the fields in the structure, so clear it here. */
+ memset(&usage, 0, sizeof(usage));
+ /*
+ * Guard against concurrent modification of lcore_usage_cb.
+ * rte_lcore_register_usage_cb() should only be called once at application init
+ * but nothing prevents and application to reset the callback to NULL.
+ */
+ usage_cb = lcore_usage_cb;
+ if (usage_cb != NULL && usage_cb(lcore_id, &usage) == 0) {
+ rte_tel_data_add_array_int(u->lcore_ids, lcore_id);
+ rte_tel_data_add_array_u64(u->total_cycles, usage.total_cycles);
+ rte_tel_data_add_array_u64(u->busy_cycles, usage.busy_cycles);
+ }
+
+ return 0;
+}
+
+static int
+handle_lcore_usage(const char *cmd __rte_unused,
+ const char *params __rte_unused,
+ struct rte_tel_data *d)
+{
+ struct lcore_telemetry_usage usage;
+ struct rte_tel_data *lcore_ids = rte_tel_data_alloc();
+ struct rte_tel_data *total_cycles = rte_tel_data_alloc();
+ struct rte_tel_data *busy_cycles = rte_tel_data_alloc();
+
+ if (!lcore_ids || !total_cycles || !busy_cycles) {
+ rte_tel_data_free(lcore_ids);
+ rte_tel_data_free(total_cycles);
+ rte_tel_data_free(busy_cycles);
+ return -ENOMEM;
+ }
+
+ rte_tel_data_start_dict(d);
+ rte_tel_data_start_array(lcore_ids, RTE_TEL_INT_VAL);
+ rte_tel_data_start_array(total_cycles, RTE_TEL_U64_VAL);
+ rte_tel_data_start_array(busy_cycles, RTE_TEL_U64_VAL);
+ rte_tel_data_add_dict_container(d, "lcore_ids", lcore_ids, 0);
+ rte_tel_data_add_dict_container(d, "total_cycles", total_cycles, 0);
+ rte_tel_data_add_dict_container(d, "busy_cycles", busy_cycles, 0);
+ usage.lcore_ids = lcore_ids;
+ usage.total_cycles = total_cycles;
+ usage.busy_cycles = busy_cycles;
+
+ return rte_lcore_iterate(lcore_telemetry_usage_cb, &usage);
+}
+
RTE_INIT(lcore_telemetry)
{
rte_telemetry_register_cmd(
@@ -585,5 +646,8 @@ RTE_INIT(lcore_telemetry)
rte_telemetry_register_cmd(
"/eal/lcore/info", handle_lcore_info,
"Returns lcore info. Parameters: int lcore_id");
+ rte_telemetry_register_cmd(
+ "/eal/lcore/usage", handle_lcore_usage,
+ "Returns lcore cycles usage. Takes no parameters");
}
#endif /* !RTE_EXEC_ENV_WINDOWS */