[v8,5/5] eal: add lcore usage telemetry endpoint

Message ID 20230202134329.539625-6-rjarry@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series lcore telemetry improvements |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Robin Jarry Feb. 2, 2023, 1:43 p.m. UTC
  Allow fetching CPU cycles usage for all lcores with a single request.
This endpoint is intended for repeated and frequent invocations by
external monitoring systems and therefore returns condensed data.

It consists of a single dictionary with three keys: "lcore_ids",
"total_cycles" and "busy_cycles" that are mapped to three arrays of
integer values. Each array has the same number of values, one per lcore,
in the same order.

Example:

 --> /eal/lcore/usage
 {
   "/eal/lcore/usage": {
     "lcore_ids": [
       4,
       5
     ],
     "total_cycles": [
       23846845590,
       23900558914
     ],
     "busy_cycles": [
       21043446682,
       21448837316
     ]
   }
 }

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
---

Notes:
    v7 -> v8: no change

 doc/guides/rel_notes/release_23_03.rst |  5 +-
 lib/eal/common/eal_common_lcore.c      | 64 ++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 2 deletions(-)
  

Comments

Morten Brørup Feb. 2, 2023, 2 p.m. UTC | #1
> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Thursday, 2 February 2023 14.43
> 
> Allow fetching CPU cycles usage for all lcores with a single request.
> This endpoint is intended for repeated and frequent invocations by
> external monitoring systems and therefore returns condensed data.
> 
> It consists of a single dictionary with three keys: "lcore_ids",
> "total_cycles" and "busy_cycles" that are mapped to three arrays of
> integer values. Each array has the same number of values, one per
> lcore,
> in the same order.
> 
> Example:
> 
>  --> /eal/lcore/usage
>  {
>    "/eal/lcore/usage": {
>      "lcore_ids": [
>        4,
>        5
>      ],
>      "total_cycles": [
>        23846845590,
>        23900558914
>      ],
>      "busy_cycles": [
>        21043446682,
>        21448837316
>      ]
>    }
>  }
> 
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
> ---

Acked-by: Morten Brørup <mb@smartsharesystems.com>
  
Chengwen Feng Feb. 6, 2023, 3:27 a.m. UTC | #2
Hi Robin,

On 2023/2/2 21:43, Robin Jarry wrote:
> Allow fetching CPU cycles usage for all lcores with a single request.
> This endpoint is intended for repeated and frequent invocations by
> external monitoring systems and therefore returns condensed data.
> 
> It consists of a single dictionary with three keys: "lcore_ids",
> "total_cycles" and "busy_cycles" that are mapped to three arrays of
> integer values. Each array has the same number of values, one per lcore,
> in the same order.
> 
> Example:
> 
>  --> /eal/lcore/usage
>  {
>    "/eal/lcore/usage": {
>      "lcore_ids": [
>        4,
>        5
>      ],
>      "total_cycles": [
>        23846845590,
>        23900558914
>      ],
>      "busy_cycles": [
>        21043446682,
>        21448837316
>      ]
>    }

The telemetry should be human-readable also.

so why not "/eal/lcore/usage": {
    "lcore_4" : {
        "total_cycles" : xxx
        "busy_cycles" : xxx
        "busy/total ratio" : "xx%"
    },
    "lcore_5" : {
        "total_cycles" : yyy
        "busy_cycles" : yyy
        "busy/total ratio" : "yy%"
    },
}

>  }
> 

...
  
Robin Jarry Feb. 6, 2023, 8:24 a.m. UTC | #3
fengchengwen, Feb 06, 2023 at 04:27:
> The telemetry should be human-readable also.
>
> so why not "/eal/lcore/usage": {
>     "lcore_4" : {
>         "total_cycles" : xxx
>         "busy_cycles" : xxx
>         "busy/total ratio" : "xx%"
>     },
>     "lcore_5" : {
>         "total_cycles" : yyy
>         "busy_cycles" : yyy
>         "busy/total ratio" : "yy%"
>     },
> }

The raw data is exposed and can be rendered any way you like. This 
should be left to external monitoring tools, such as grafana & al.
  
Chengwen Feng Feb. 6, 2023, 11:32 a.m. UTC | #4
On 2023/2/6 16:24, Robin Jarry wrote:
> fengchengwen, Feb 06, 2023 at 04:27:
>> The telemetry should be human-readable also.
>>
>> so why not "/eal/lcore/usage": {
>>     "lcore_4" : {
>>         "total_cycles" : xxx
>>         "busy_cycles" : xxx
>>         "busy/total ratio" : "xx%"
>>     },
>>     "lcore_5" : {
>>         "total_cycles" : yyy
>>         "busy_cycles" : yyy
>>         "busy/total ratio" : "yy%"
>>     },
>> }
> 
> The raw data is exposed and can be rendered any way you like. This should be left to external monitoring tools, such as grafana & al.

It's a small step in programming, but it's more user friendly.

Once done, user who use telemetry could be benefiting.
And it's also be render by monitoring tools because there's no data loss.

> 
> 
> .
  

Patch

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index f407dc3df7a8..31c282bbb489 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -80,8 +80,9 @@  New Features
 
 * **Added support for reporting lcore usage in applications.**
 
-  * The ``/eal/lcore/list`` and ``/eal/lcore/info`` telemetry endpoints have
-    been added to provide information similar to ``rte_lcore_dump()``.
+  * The ``/eal/lcore/list``, ``/eal/lcore/usage`` and ``/eal/lcore/info``
+    telemetry endpoints have been added to provide information similar to
+    ``rte_lcore_dump()``.
   * Applications can register a callback at startup via
     ``rte_lcore_register_usage_cb()`` to provide lcore usage information.
 
diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
index bbb734098b42..c28d4e194c30 100644
--- a/lib/eal/common/eal_common_lcore.c
+++ b/lib/eal/common/eal_common_lcore.c
@@ -577,6 +577,67 @@  handle_lcore_info(const char *cmd __rte_unused, const char *params, struct rte_t
 	return rte_lcore_iterate(lcore_telemetry_info_cb, &info);
 }
 
+struct lcore_telemetry_usage {
+	struct rte_tel_data *lcore_ids;
+	struct rte_tel_data *total_cycles;
+	struct rte_tel_data *busy_cycles;
+};
+
+static int
+lcore_telemetry_usage_cb(unsigned int lcore_id, void *arg)
+{
+	struct lcore_telemetry_usage *u = arg;
+	struct rte_lcore_usage usage;
+	rte_lcore_usage_cb usage_cb;
+
+	/* The callback may not set all the fields in the structure, so clear it here. */
+	memset(&usage, 0, sizeof(usage));
+	/*
+	 * Guard against concurrent modification of lcore_usage_cb.
+	 * rte_lcore_register_usage_cb() should only be called once at application init
+	 * but nothing prevents and application to reset the callback to NULL.
+	 */
+	usage_cb = lcore_usage_cb;
+	if (usage_cb != NULL && usage_cb(lcore_id, &usage) == 0) {
+		rte_tel_data_add_array_int(u->lcore_ids, lcore_id);
+		rte_tel_data_add_array_u64(u->total_cycles, usage.total_cycles);
+		rte_tel_data_add_array_u64(u->busy_cycles, usage.busy_cycles);
+	}
+
+	return 0;
+}
+
+static int
+handle_lcore_usage(const char *cmd __rte_unused,
+		const char *params __rte_unused,
+		struct rte_tel_data *d)
+{
+	struct lcore_telemetry_usage usage;
+	struct rte_tel_data *lcore_ids = rte_tel_data_alloc();
+	struct rte_tel_data *total_cycles = rte_tel_data_alloc();
+	struct rte_tel_data *busy_cycles = rte_tel_data_alloc();
+
+	if (!lcore_ids || !total_cycles || !busy_cycles) {
+		rte_tel_data_free(lcore_ids);
+		rte_tel_data_free(total_cycles);
+		rte_tel_data_free(busy_cycles);
+		return -ENOMEM;
+	}
+
+	rte_tel_data_start_dict(d);
+	rte_tel_data_start_array(lcore_ids, RTE_TEL_INT_VAL);
+	rte_tel_data_start_array(total_cycles, RTE_TEL_U64_VAL);
+	rte_tel_data_start_array(busy_cycles, RTE_TEL_U64_VAL);
+	rte_tel_data_add_dict_container(d, "lcore_ids", lcore_ids, 0);
+	rte_tel_data_add_dict_container(d, "total_cycles", total_cycles, 0);
+	rte_tel_data_add_dict_container(d, "busy_cycles", busy_cycles, 0);
+	usage.lcore_ids = lcore_ids;
+	usage.total_cycles = total_cycles;
+	usage.busy_cycles = busy_cycles;
+
+	return rte_lcore_iterate(lcore_telemetry_usage_cb, &usage);
+}
+
 RTE_INIT(lcore_telemetry)
 {
 	rte_telemetry_register_cmd(
@@ -585,5 +646,8 @@  RTE_INIT(lcore_telemetry)
 	rte_telemetry_register_cmd(
 		"/eal/lcore/info", handle_lcore_info,
 		"Returns lcore info. Parameters: int lcore_id");
+	rte_telemetry_register_cmd(
+		"/eal/lcore/usage", handle_lcore_usage,
+		"Returns lcore cycles usage. Takes no parameters");
 }
 #endif /* !RTE_EXEC_ENV_WINDOWS */