[v8,9/9] app/test: add memarea to malloc-perf-autotest
Checks
Commit Message
This patch adds memarea to malloc_perf_autotest.
Test platform: Kunpeng920
Test command: dpdk-test -a 0000:7d:00.3 -l 10-12
Test result:
USER1: Performance: rte_memarea
USER1: Size (B) Runs Alloc (us) Free (us) Total (us) memset (us)
USER1: 64 10000 0.03 0.03 0.06 0.01
USER1: 128 10000 0.02 0.03 0.05 0.01
USER1: 1024 10000 0.03 0.05 0.07 0.20
USER1: 4096 10000 0.03 0.05 0.07 0.34
USER1: 65536 10000 0.10 0.08 0.18 2.14
USER1: 1048576 644 0.10 0.04 0.14 29.07
USER1: 2097152 322 0.10 0.04 0.14 57.50
USER1: 4194304 161 0.12 0.04 0.15 114.50
USER1: 16777216 40 0.11 0.04 0.15 456.09
USER1: 1073741824 Interrupted: out of memory. [1]
Compared with rte_malloc:
USER1: Performance: rte_malloc
USER1: Size (B) Runs Alloc (us) Free (us) Total (us) memset (us)
USER1: 64 10000 0.14 0.07 0.21 0.01
USER1: 128 10000 0.10 0.05 0.15 0.01
USER1: 1024 10000 0.11 0.18 0.29 0.21
USER1: 4096 10000 0.13 0.39 0.53 0.35
USER1: 65536 10000 0.17 2.27 2.44 2.15
USER1: 1048576 10000 37.21 71.63 108.84 29.08
USER1: 2097152 10000 8831.15 160.02 8991.17 63.52
USER1: 4194304 10000 47131.88 413.75 47545.62 173.79
USER1: 16777216 4221 119604.60 2209.73 121814.34 964.42
USER1: 1073741824 31 335058.32 223369.31 558427.63 62440.87
[1] The total-size of the memarea is restricted to avoid creation
failed.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
app/test/test_malloc_perf.c | 55 ++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
Comments
2022-10-11 12:17 (UTC+0000), Chengwen Feng:
> This patch adds memarea to malloc_perf_autotest.
>
> Test platform: Kunpeng920
> Test command: dpdk-test -a 0000:7d:00.3 -l 10-12
> Test result:
> USER1: Performance: rte_memarea
> USER1: Size (B) Runs Alloc (us) Free (us) Total (us) memset (us)
> USER1: 64 10000 0.03 0.03 0.06 0.01
> USER1: 128 10000 0.02 0.03 0.05 0.01
> USER1: 1024 10000 0.03 0.05 0.07 0.20
> USER1: 4096 10000 0.03 0.05 0.07 0.34
> USER1: 65536 10000 0.10 0.08 0.18 2.14
> USER1: 1048576 644 0.10 0.04 0.14 29.07
> USER1: 2097152 322 0.10 0.04 0.14 57.50
> USER1: 4194304 161 0.12 0.04 0.15 114.50
> USER1: 16777216 40 0.11 0.04 0.15 456.09
> USER1: 1073741824 Interrupted: out of memory. [1]
>
> Compared with rte_malloc:
> USER1: Performance: rte_malloc
> USER1: Size (B) Runs Alloc (us) Free (us) Total (us) memset (us)
> USER1: 64 10000 0.14 0.07 0.21 0.01
> USER1: 128 10000 0.10 0.05 0.15 0.01
> USER1: 1024 10000 0.11 0.18 0.29 0.21
> USER1: 4096 10000 0.13 0.39 0.53 0.35
> USER1: 65536 10000 0.17 2.27 2.44 2.15
> USER1: 1048576 10000 37.21 71.63 108.84 29.08
> USER1: 2097152 10000 8831.15 160.02 8991.17 63.52
> USER1: 4194304 10000 47131.88 413.75 47545.62 173.79
> USER1: 16777216 4221 119604.60 2209.73 121814.34 964.42
> USER1: 1073741824 31 335058.32 223369.31 558427.63 62440.87
>
> [1] The total-size of the memarea is restricted to avoid creation
> failed.
This is not a fair comparison:
rte_malloc time includes obtaining memory from the system.
I think that memarea should have a dedicated benchmark,
because eventually it will be interesting to compare memarea
with different sources and algorithms.
It will be also possible to add DPDK allocator to the comparison
by running it for an isolated heap that doesn't grow.
(In some distant future it would be cool to make DPDK allocator pluggable!)
Some shared code between this benchmark and the new one can be factored out.
Hi Dmitry,
On 2022/10/11 23:58, Dmitry Kozlyuk wrote:
> 2022-10-11 12:17 (UTC+0000), Chengwen Feng:
>> This patch adds memarea to malloc_perf_autotest.
>>
>> Test platform: Kunpeng920
>> Test command: dpdk-test -a 0000:7d:00.3 -l 10-12
>> Test result:
>> USER1: Performance: rte_memarea
>> USER1: Size (B) Runs Alloc (us) Free (us) Total (us) memset (us)
>> USER1: 64 10000 0.03 0.03 0.06 0.01
>> USER1: 128 10000 0.02 0.03 0.05 0.01
>> USER1: 1024 10000 0.03 0.05 0.07 0.20
>> USER1: 4096 10000 0.03 0.05 0.07 0.34
>> USER1: 65536 10000 0.10 0.08 0.18 2.14
>> USER1: 1048576 644 0.10 0.04 0.14 29.07
>> USER1: 2097152 322 0.10 0.04 0.14 57.50
>> USER1: 4194304 161 0.12 0.04 0.15 114.50
>> USER1: 16777216 40 0.11 0.04 0.15 456.09
>> USER1: 1073741824 Interrupted: out of memory. [1]
>>
>> Compared with rte_malloc:
>> USER1: Performance: rte_malloc
>> USER1: Size (B) Runs Alloc (us) Free (us) Total (us) memset (us)
>> USER1: 64 10000 0.14 0.07 0.21 0.01
>> USER1: 128 10000 0.10 0.05 0.15 0.01
>> USER1: 1024 10000 0.11 0.18 0.29 0.21
>> USER1: 4096 10000 0.13 0.39 0.53 0.35
>> USER1: 65536 10000 0.17 2.27 2.44 2.15
>> USER1: 1048576 10000 37.21 71.63 108.84 29.08
>> USER1: 2097152 10000 8831.15 160.02 8991.17 63.52
>> USER1: 4194304 10000 47131.88 413.75 47545.62 173.79
>> USER1: 16777216 4221 119604.60 2209.73 121814.34 964.42
>> USER1: 1073741824 31 335058.32 223369.31 558427.63 62440.87
>>
>> [1] The total-size of the memarea is restricted to avoid creation
>> failed.
>
> This is not a fair comparison:
> rte_malloc time includes obtaining memory from the system.
Yes, but I want to keep this patch, at least we know the different.
> I think that memarea should have a dedicated benchmark,
> because eventually it will be interesting to compare memarea
> with different sources and algorithms.
It may take a long time to reach a benchmark that everyone agrees with.
I will try after this patch set upstreamed.
> It will be also possible to add DPDK allocator to the comparison
> by running it for an isolated heap that doesn't grow.
> (In some distant future it would be cool to make DPDK allocator pluggable!)
> Some shared code between this benchmark and the new one can be factored out.
> .
>
@@ -7,10 +7,12 @@
#include <rte_cycles.h>
#include <rte_errno.h>
#include <rte_malloc.h>
+#include <rte_memarea.h>
#include <rte_memzone.h>
#include "test.h"
+#define PERFTEST_MAX_RUNS 10000
#define TEST_LOG(level, ...) RTE_LOG(level, USER1, __VA_ARGS__)
typedef void * (alloc_t)(const char *name, size_t size, unsigned int align);
@@ -147,10 +149,52 @@ memzone_free(void *addr)
rte_memzone_free((struct rte_memzone *)addr);
}
+static struct rte_memarea *test_ma;
+
+static int
+memarea_pre_env(void)
+{
+ struct rte_memarea_param init = { 0 };
+ snprintf(init.name, sizeof(init.name), "perftest");
+ init.source = RTE_MEMAREA_SOURCE_HEAP;
+ init.alg = RTE_MEMAREA_ALG_NEXTFIT;
+ init.total_sz = PERFTEST_MAX_RUNS * KB * 66; /* test for max 64KB (add 2KB for meta) */
+ init.mt_safe = 1;
+ init.numa_socket = SOCKET_ID_ANY;
+ init.bak_memarea = NULL;
+ test_ma = rte_memarea_create(&init);
+ if (test_ma == NULL) {
+ fprintf(stderr, "memarea create failed, skip memarea perftest!\n");
+ return -1;
+ }
+ return 0;
+}
+
+static void
+memarea_clear_env(void)
+{
+ rte_memarea_destroy(test_ma);
+ test_ma = NULL;
+}
+
+static void *
+memarea_alloc(const char *name, size_t size, unsigned int align)
+{
+ RTE_SET_USED(name);
+ RTE_SET_USED(align);
+ return rte_memarea_alloc(test_ma, size, 0);
+}
+
+static void
+memarea_free(void *addr)
+{
+ rte_memarea_free(test_ma, addr);
+}
+
static int
test_malloc_perf(void)
{
- static const size_t MAX_RUNS = 10000;
+ static const size_t MAX_RUNS = PERFTEST_MAX_RUNS;
double memset_us_gb = 0;
@@ -168,6 +212,15 @@ test_malloc_perf(void)
NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0)
return -1;
+ if (memarea_pre_env() < 0)
+ return 0;
+ if (test_alloc_perf("rte_memarea", memarea_alloc, memarea_free,
+ memset, memset_us_gb, MAX_RUNS) < 0) {
+ memarea_clear_env();
+ return -1;
+ }
+ memarea_clear_env();
+
return 0;
}