[v5,2/2] mem: telemetry support for system memory information
Checks
Commit Message
Changes adds telemetry support to display system memory information,
allocated using calls malloc, calloc, mmap, etc. This patch
is based on malloc_info. This patch adds following endpoints:
1. /sysmem/sys_heap_list
The commands displays the arenas currently in use.
Example:
--> /sysmem/sys_heap_list
{"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
2. /sysmem/sys_heap_info,<arena-id>
This command displays the information about arena.
Example:
--> /sysmem/sys_heap_info,0
{"/sysmem/sys_heap_info": {"Arena_id": 0, "Allocated_size": 2069934, \
"Free_count": 4, "Free_size": 223826, "Curr_size": 2293760, \
"Mmap_count": 0, "Mmap_size": 0, "Heap_count": 0, \
"Heap_size": 2293760}}
--> /sysmem/sys_heap_info,6
{"/sysmem/sys_heap_info": {"Arena_id": 6, "Allocated_size": 3136, \
"Free_count": 2, "Free_size": 193472, "Curr_size": 196608, \
"Mmap_count": 0, "Mmap_size": 0, "Heap_count": 1, \
"Heap_size": 196608}}
The last arena-id in the list gives total of all arenas.
--> /sysmem/sys_heap_info,10
{"/sysmem/sys_heap_info": {"Arena_id": 10, "Allocated_size": 2107774, \
"Free_count": 20, "Free_size": 1955458, "Curr_size": 4063232, \
"Mmap_count": 0, "Mmap_size": 0, "Heap_count": 0, \
"Heap_size": 4063232}}
Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
lib/eal/common/eal_common_memory.c | 330 +++++++++++++++++++++++++++++
1 file changed, 330 insertions(+)
Comments
On Thu, Sep 29, 2022 at 1:44 PM Amit Prakash Shukla
<amitprakashs@marvell.com> wrote:
>
> Changes adds telemetry support to display system memory information,
> allocated using calls malloc, calloc, mmap, etc. This patch
> is based on malloc_info. This patch adds following endpoints:
malloc_info is a GNU extension.
It is not available in musl and it breaks compilation in Alpine Linux.
So I can't take this patch.
>
> 1. /sysmem/sys_heap_list
> The commands displays the arenas currently in use.
> Example:
> --> /sysmem/sys_heap_list
> {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
>
I am unsure about the command names.
Those commands are really low level and tied to glibc malloc.
It is unlikely we will have an unified naming with other libc/OS.
I would prefer another pair of eyes, especially on this patch.
Dmitry, Anatoly?
Thanks David for the feedback. Please find the proposed fixed inline.
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Saturday, October 8, 2022 1:17 AM
> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Thomas Monjalon
> <thomas@monjalon.net>; Anatoly Burakov <anatoly.burakov@intel.com>;
> dmitry.kozliuk@gmail.com
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> bruce.richardson@intel.com; ciara.power@intel.com
> Subject: [EXT] Re: [PATCH v5 2/2] mem: telemetry support for system
> memory information
>
> External Email
>
> ----------------------------------------------------------------------
> On Thu, Sep 29, 2022 at 1:44 PM Amit Prakash Shukla
> <amitprakashs@marvell.com> wrote:
> >
> > Changes adds telemetry support to display system memory information,
> > allocated using calls malloc, calloc, mmap, etc. This patch is based
> > on malloc_info. This patch adds following endpoints:
>
> malloc_info is a GNU extension.
> It is not available in musl and it breaks compilation in Alpine Linux.
> So I can't take this patch.
Shall we limit this for Glibc at compile time ?
+#if defined __GLIBC__ && defined __GLIBC_PREREQ
+#if __GLIBC_PREREQ(2, 10)
/* Gets system memory stat's XML format. */
ret = malloc_info(0, fp);
+#endif
+#else
+ RTE_LOG(DEBUG, EAL, "Error: malloc_info not supported by libc\n");
+ fclose(fp);
+ return -1;
+#endif
I will send out a v6 if the above solution is fine.
>
>
> >
> > 1. /sysmem/sys_heap_list
> > The commands displays the arenas currently in use.
> > Example:
> > --> /sysmem/sys_heap_list
> > {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> >
>
> I am unsure about the command names.
>
> Those commands are really low level and tied to glibc malloc.
> It is unlikely we will have an unified naming with other libc/OS.
>
> I would prefer another pair of eyes, especially on this patch.
> Dmitry, Anatoly?
>
>
> --
> David Marchand
2022-10-11 07:10 (UTC+0000), Amit Prakash Shukla:
> Thanks David for the feedback. Please find the proposed fixed inline.
>
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Saturday, October 8, 2022 1:17 AM
> > To: Amit Prakash Shukla <amitprakashs@marvell.com>; Thomas Monjalon
> > <thomas@monjalon.net>; Anatoly Burakov <anatoly.burakov@intel.com>;
> > dmitry.kozliuk@gmail.com
> > Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > bruce.richardson@intel.com; ciara.power@intel.com
> > Subject: [EXT] Re: [PATCH v5 2/2] mem: telemetry support for system
> > memory information
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > On Thu, Sep 29, 2022 at 1:44 PM Amit Prakash Shukla
> > <amitprakashs@marvell.com> wrote:
> > >
> > > Changes adds telemetry support to display system memory information,
> > > allocated using calls malloc, calloc, mmap, etc. This patch is based
> > > on malloc_info. This patch adds following endpoints:
> >
> > malloc_info is a GNU extension.
> > It is not available in musl and it breaks compilation in Alpine Linux.
> > So I can't take this patch.
>
> Shall we limit this for Glibc at compile time ?
>
> +#if defined __GLIBC__ && defined __GLIBC_PREREQ
> +#if __GLIBC_PREREQ(2, 10)
> /* Gets system memory stat's XML format. */
> ret = malloc_info(0, fp);
> +#endif
> +#else
> + RTE_LOG(DEBUG, EAL, "Error: malloc_info not supported by libc\n");
> + fclose(fp);
> + return -1;
> +#endif
>
> I will send out a v6 if the above solution is fine.
>
> >
> >
> > >
> > > 1. /sysmem/sys_heap_list
> > > The commands displays the arenas currently in use.
> > > Example:
> > > --> /sysmem/sys_heap_list
> > > {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> > >
> >
> > I am unsure about the command names.
> >
> > Those commands are really low level and tied to glibc malloc.
> > It is unlikely we will have an unified naming with other libc/OS.
> >
> > I would prefer another pair of eyes, especially on this patch.
> > Dmitry, Anatoly?
I agree with David.
Microsoft CRT provides completely different statistics.
I can't say for FreeBSD.
Even if GNU libc is used, the process can be run with jemalloc, for example,
and this statistics become meaningless then.
What is "system" memory, anyway? It can be acquired bypassing malloc,
it can be hugepages mmap'd without DPDK, etc.
What is the task this API must solve?
My guess is that monitoring needs to know how much physical memory
1) the process uses and 2) how much it can use, so on Linux this API
could be implemented using getrusage() and getrlimit().
I don't think that a fully portable implementation is possible,
but at least API will be unified.
On Thu, 20 Oct 2022 22:18:27 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
> >
> > I will send out a v6 if the above solution is fine.
> >
> > >
> > >
> > > >
> > > > 1. /sysmem/sys_heap_list
> > > > The commands displays the arenas currently in use.
> > > > Example:
> > > > --> /sysmem/sys_heap_list
> > > > {"/sysmem/sys_heap_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> > > >
> > >
> > > I am unsure about the command names.
> > >
> > > Those commands are really low level and tied to glibc malloc.
> > > It is unlikely we will have an unified naming with other libc/OS.
> > >
> > > I would prefer another pair of eyes, especially on this patch.
> > > Dmitry, Anatoly?
>
> I agree with David.
>
> Microsoft CRT provides completely different statistics.
> I can't say for FreeBSD.
> Even if GNU libc is used, the process can be run with jemalloc, for example,
> and this statistics become meaningless then.
> What is "system" memory, anyway? It can be acquired bypassing malloc,
> it can be hugepages mmap'd without DPDK, etc.
>
> What is the task this API must solve?
> My guess is that monitoring needs to know how much physical memory
> 1) the process uses and 2) how much it can use, so on Linux this API
> could be implemented using getrusage() and getrlimit().
> I don't think that a fully portable implementation is possible,
> but at least API will be unified.
NAK
You are opening a mess here.
The malloc_info() exporting XML also means you have to parse the crap.
Telemetry to should stick to the DPDK environment and not try to support
looking at other parts of the system. Otherwise, it risks becoming
tied to one OS and version.
@@ -9,6 +9,9 @@
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
+#ifdef RTE_EXEC_ENV_LINUX
+#include <malloc.h>
+#endif
#include <rte_fbarray.h>
#include <rte_memory.h>
@@ -1124,6 +1127,12 @@ rte_eal_memory_init(void)
#define EAL_MEMSEG_INFO_REQ "/eal/memseg_info"
#define EAL_ELEMENT_LIST_REQ "/eal/element_list"
#define EAL_ELEMENT_INFO_REQ "/eal/element_info"
+
+#ifdef RTE_EXEC_ENV_LINUX
+#define SYSMEMORY_LIST_REQ "/sysmem/sys_heap_list"
+#define SYSMEMORY_INFO_REQ "/sysmem/sys_heap_info"
+#endif
+
#define ADDR_STR 15
@@ -1685,6 +1694,318 @@ handle_eal_element_info_request(const char *cmd __rte_unused,
return 0;
}
+#ifdef RTE_EXEC_ENV_LINUX
+#define MAX_SYS_MEM_ARENAS 128
+#define MAX_TAG_CHAR 128
+
+/* Memory size are in bytes. */
+struct mem_stats {
+ uint64_t fast_count; /* Number of free blocks in fast bin. */
+ uint64_t fast_size; /* Size in bytes of free blocks in fast bin. */
+ uint64_t rest_count; /* Number of free blocks in bin. */
+ uint64_t rest_size; /* Size in bytes of free blocks in bin. */
+ uint64_t mmap_count; /* Number of mmap blocks. */
+ uint64_t mmap_size; /* Size in bytes of mmap'd memory. */
+ uint64_t curr_size; /* Size in bytes allocated by system. */
+ uint64_t heap_size; /* Heap size in bytes. */
+ uint64_t heap_count; /* Number of heaps. */
+};
+
+struct rte_heap_mem_stats {
+ unsigned int num_active_arena;
+ struct mem_stats stats[MAX_SYS_MEM_ARENAS];
+};
+
+/* This function shall be called to parse only attributes.
+ * Parsing of the "tags" shall be done by the caller.
+ */
+static int
+parse_attr(char *buf, uint32_t *i, char *attr, const char *key)
+{
+ int j = 0;
+ int keymatch = 0;
+
+ attr[j] = '\0';
+
+ while ((buf[*i] != '>') && (j < MAX_TAG_CHAR)) {
+ /* Ignore spaces. */
+ if (buf[*i] == ' ') {
+ attr[j] = '\0';
+ j = 0;
+ (*i)++;
+ continue;
+ }
+
+ /* Attribute key */
+ if (buf[*i] == '=') {
+ attr[j] = '\0';
+ j = 0;
+ (*i)++;
+
+ /* If the key is matched, extract the value. */
+ if (strncmp(attr, key, strlen(key)) != 0)
+ continue;
+ else
+ keymatch = 1;
+ }
+
+ /* Attribute value */
+ if ((buf[*i] == '"') && (keymatch == 1)) {
+ j = 0;
+ (*i)++;
+
+ while ((buf[*i] != '"') && (j < MAX_TAG_CHAR))
+ attr[(j)++] = buf[(*i)++];
+ attr[j] = '\0';
+ (*i)++;
+ return 0;
+ }
+
+ keymatch = 0;
+ attr[(j)++] = buf[(*i)++];
+ }
+
+ (*i)++;
+ return -1;
+}
+
+/* Get the system memory stats into buffer by calling malloc_info().
+ * malloc_info() returns the stats in XML format. Parse the XML to extract
+ * number of heaps, size of each heap, free memory in heap.
+ */
+static int
+parse_heap_mem_stats(struct rte_heap_mem_stats *heap_stats)
+{
+ char tag[MAX_TAG_CHAR] = {0};
+ int old_mem_index = -1;
+ int mem_index = -1;
+ uint32_t i = 0;
+ uint32_t j = 0;
+ size_t length;
+ char *buf;
+ FILE *fp;
+ int ret;
+
+ /* buf is dynamically allocated by open_memstream. */
+ fp = open_memstream(&buf, &length);
+ if (fp == NULL) {
+ RTE_LOG(DEBUG, EAL, "Error: Failed to open memory stream\n");
+ return -1;
+ }
+
+ /* Gets system memory stat's XML format. */
+ ret = malloc_info(0, fp);
+ fclose(fp);
+
+ if (ret != 0) {
+ RTE_LOG(DEBUG, EAL, "Error: malloc_info returned error\n");
+ return -1;
+ }
+
+ while (i < length) {
+ j = 0;
+ tag[j] = '\0';
+
+ /* Ignore newline and spaces. */
+ if ((buf[i] == '\n') || (buf[i] == ' ') || (buf[i] == '/') ||
+ (buf[i] == '>')) {
+ i++;
+ continue;
+ }
+
+ if (buf[i] == '<') {
+ i++;
+ while ((buf[i] != ' ') && (buf[i] != '>') &&
+ (j < MAX_TAG_CHAR)) {
+ tag[j++] = buf[i++];
+ }
+
+ if (strncmp(tag, "heap", strlen("heap")) == 0) {
+ old_mem_index = mem_index++;
+ if (mem_index >= MAX_SYS_MEM_ARENAS) {
+ RTE_LOG(DEBUG, EAL, "Memory arena "
+ "exceeded max limit: %d",
+ MAX_SYS_MEM_ARENAS);
+ goto done;
+ }
+ heap_stats->num_active_arena++;
+ }
+
+ continue;
+ }
+
+ if (mem_index < 0) {
+ i++;
+ continue;
+ }
+
+ if (parse_attr(buf, &i, tag, "type") < 0)
+ continue;
+
+ if (strncmp(tag, "fast", strlen("fast")) == 0) {
+ /* For total of all arenas, "heap" tag is not present
+ * in xml. Below check is to handle that scenarios.
+ *
+ * FIXME: mem_index increment shall be independent of
+ * the tag.
+ */
+ if (old_mem_index == mem_index) {
+ mem_index++;
+ if (mem_index >= MAX_SYS_MEM_ARENAS) {
+ RTE_LOG(DEBUG, EAL, "Memory arena "
+ "exceeded max limit: %d\n",
+ MAX_SYS_MEM_ARENAS);
+ goto done;
+ }
+ heap_stats->num_active_arena++;
+ }
+ old_mem_index = mem_index;
+
+ if (parse_attr(buf, &i, tag, "count") == 0)
+ heap_stats->stats[mem_index].fast_count =
+ strtoul(tag, NULL, 10);
+ if (parse_attr(buf, &i, tag, "size") == 0)
+ heap_stats->stats[mem_index].fast_size =
+ strtoul(tag, NULL, 10);
+ continue;
+ }
+
+ if (strncmp(tag, "rest", strlen("rest")) == 0) {
+ if (parse_attr(buf, &i, tag, "count") == 0)
+ heap_stats->stats[mem_index].rest_count =
+ strtoul(tag, NULL, 10);
+ if (parse_attr(buf, &i, tag, "size") == 0)
+ heap_stats->stats[mem_index].rest_size =
+ strtoul(tag, NULL, 10);
+ continue;
+ }
+
+ if (strncmp(tag, "current", strlen("current")) == 0) {
+ if (parse_attr(buf, &i, tag, "size") == 0)
+ heap_stats->stats[mem_index].curr_size =
+ strtoul(tag, NULL, 10);
+ continue;
+ }
+
+ if (strncmp(tag, "total", strlen("total")) == 0) {
+ if (parse_attr(buf, &i, tag, "size") == 0)
+ heap_stats->stats[mem_index].heap_size =
+ strtoul(tag, NULL, 10);
+ continue;
+ }
+
+ if (strncmp(tag, "subheaps", strlen("subheaps")) == 0) {
+ if (parse_attr(buf, &i, tag, "size") == 0)
+ heap_stats->stats[mem_index].heap_count =
+ strtoul(tag, NULL, 10);
+ continue;
+ }
+
+ if (strncmp(tag, "mmap", strlen("mmap")) == 0) {
+ if (parse_attr(buf, &i, tag, "count") == 0)
+ heap_stats->stats[mem_index].mmap_count =
+ strtoul(tag, NULL, 10);
+ if (parse_attr(buf, &i, tag, "size") == 0)
+ heap_stats->stats[mem_index].mmap_size =
+ strtoul(tag, NULL, 10);
+ continue;
+ }
+
+ i++;
+ }
+
+done:
+ /* All done! Let's free the buf. */
+ free(buf);
+ return 0;
+}
+
+static int
+handle_sysmem_list_request(const char *cmd __rte_unused,
+ const char *params __rte_unused,
+ struct rte_tel_data *d)
+{
+ struct rte_heap_mem_stats heap_mem_stats;
+ unsigned int num_arena;
+ unsigned int i;
+
+ memset(&heap_mem_stats, 0, sizeof(struct rte_heap_mem_stats));
+ if (parse_heap_mem_stats(&heap_mem_stats) != 0)
+ return -1;
+
+ /* Note:
+ * Total active arenas are (num_active_arena - 1). The last entry in
+ * the array is total of all arenas.
+ */
+ num_arena = heap_mem_stats.num_active_arena;
+
+ rte_tel_data_start_array(d, RTE_TEL_INT_VAL);
+ for (i = 0; i < num_arena; i++)
+ rte_tel_data_add_array_int(d, i);
+
+ return 0;
+}
+
+static int
+handle_sysmem_info_request(const char *cmd __rte_unused, const char *params,
+ struct rte_tel_data *d)
+{
+ struct rte_heap_mem_stats heap_mem_stats;
+ unsigned int arena_id;
+ uint64_t free_size;
+ uint64_t free_count;
+ uint64_t allocated_size;
+
+ if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+ return -1;
+
+ arena_id = (unsigned int)strtoul(params, NULL, 10);
+ if (arena_id > UINT32_MAX)
+ return -1;
+
+ if (arena_id >= MAX_SYS_MEM_ARENAS)
+ return -1;
+
+ memset(&heap_mem_stats, 0, sizeof(struct rte_heap_mem_stats));
+ if (parse_heap_mem_stats(&heap_mem_stats) != 0)
+ return -1;
+
+ if (arena_id >= heap_mem_stats.num_active_arena) {
+ RTE_LOG(DEBUG, EAL, "Memory arena exceeded max limit: %d\n",
+ MAX_SYS_MEM_ARENAS);
+ return -1;
+ }
+
+ /* Fast and rest account for the total free memory. */
+ free_size = heap_mem_stats.stats[arena_id].fast_size +
+ heap_mem_stats.stats[arena_id].rest_size;
+
+ free_count = heap_mem_stats.stats[arena_id].fast_count +
+ heap_mem_stats.stats[arena_id].rest_count;
+
+ /* (System memory - free size) = allocated memory size. */
+ allocated_size = heap_mem_stats.stats[arena_id].curr_size - free_size;
+
+ rte_tel_data_start_dict(d);
+ rte_tel_data_add_dict_int(d, "Arena_id", arena_id);
+ rte_tel_data_add_dict_int(d, "Allocated_size", allocated_size);
+ rte_tel_data_add_dict_u64(d, "Free_count", free_count);
+ rte_tel_data_add_dict_u64(d, "Free_size", free_size);
+ rte_tel_data_add_dict_u64(d, "Curr_size",
+ heap_mem_stats.stats[arena_id].curr_size);
+ rte_tel_data_add_dict_u64(d, "Mmap_count",
+ heap_mem_stats.stats[arena_id].mmap_count);
+ rte_tel_data_add_dict_u64(d, "Mmap_size",
+ heap_mem_stats.stats[arena_id].mmap_size);
+ rte_tel_data_add_dict_u64(d, "Heap_count",
+ heap_mem_stats.stats[arena_id].heap_count);
+ rte_tel_data_add_dict_u64(d, "Heap_size",
+ heap_mem_stats.stats[arena_id].heap_size);
+
+ return 0;
+}
+#endif
+
RTE_INIT(memory_telemetry)
{
rte_telemetry_register_cmd(
@@ -1716,5 +2037,14 @@ RTE_INIT(memory_telemetry)
rte_telemetry_register_cmd(EAL_ELEMENT_INFO_REQ,
handle_eal_element_info_request,
"Returns element info. Parameters: int heap_id, memseg_list_id, memseg_id, start_elem_id, end_elem_id");
+
+#ifdef RTE_EXEC_ENV_LINUX
+ rte_telemetry_register_cmd(SYSMEMORY_LIST_REQ,
+ handle_sysmem_list_request,
+ "Returns element information. Takes no parameters");
+ rte_telemetry_register_cmd(SYSMEMORY_INFO_REQ,
+ handle_sysmem_info_request,
+ "Returns element information. Parameters: int arena_id");
+#endif
}
#endif