[v8,9/9] app/test: add memarea to malloc-perf-autotest

Message ID 20221011121720.2657-10-fengchengwen@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series introduce memarea library |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/github-robot: build fail github build: failed
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing fail Testing issues
ci/iol-x86_64-compile-testing fail Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/Intel-compilation fail Compilation issues
ci/intel-Testing success Testing PASS

Commit Message

fengchengwen Oct. 11, 2022, 12:17 p.m. UTC
  This patch adds memarea to malloc_perf_autotest.

Test platform: Kunpeng920
Test command: dpdk-test -a 0000:7d:00.3 -l 10-12
Test result:
USER1: Performance: rte_memarea
USER1:    Size (B)   Runs  Alloc (us) Free (us) Total (us)  memset (us)
USER1:          64  10000        0.03      0.03       0.06         0.01
USER1:         128  10000        0.02      0.03       0.05         0.01
USER1:        1024  10000        0.03      0.05       0.07         0.20
USER1:        4096  10000        0.03      0.05       0.07         0.34
USER1:       65536  10000        0.10      0.08       0.18         2.14
USER1:     1048576    644        0.10      0.04       0.14        29.07
USER1:     2097152    322        0.10      0.04       0.14        57.50
USER1:     4194304    161        0.12      0.04       0.15       114.50
USER1:    16777216     40        0.11      0.04       0.15       456.09
USER1:  1073741824 Interrupted: out of memory. [1]

Compared with rte_malloc:
USER1: Performance: rte_malloc
USER1:    Size (B)   Runs  Alloc (us) Free (us) Total (us)  memset (us)
USER1:          64  10000        0.14      0.07       0.21         0.01
USER1:         128  10000        0.10      0.05       0.15         0.01
USER1:        1024  10000        0.11      0.18       0.29         0.21
USER1:        4096  10000        0.13      0.39       0.53         0.35
USER1:       65536  10000        0.17      2.27       2.44         2.15
USER1:     1048576  10000       37.21     71.63     108.84        29.08
USER1:     2097152  10000     8831.15    160.02    8991.17        63.52
USER1:     4194304  10000    47131.88    413.75   47545.62       173.79
USER1:    16777216   4221   119604.60   2209.73  121814.34       964.42
USER1:  1073741824     31   335058.32 223369.31  558427.63     62440.87

[1] The total-size of the memarea is restricted to avoid creation
failed.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 app/test/test_malloc_perf.c | 55 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)
  

Comments

Dmitry Kozlyuk Oct. 11, 2022, 3:58 p.m. UTC | #1
2022-10-11 12:17 (UTC+0000), Chengwen Feng:
> This patch adds memarea to malloc_perf_autotest.
> 
> Test platform: Kunpeng920
> Test command: dpdk-test -a 0000:7d:00.3 -l 10-12
> Test result:
> USER1: Performance: rte_memarea
> USER1:    Size (B)   Runs  Alloc (us) Free (us) Total (us)  memset (us)
> USER1:          64  10000        0.03      0.03       0.06         0.01
> USER1:         128  10000        0.02      0.03       0.05         0.01
> USER1:        1024  10000        0.03      0.05       0.07         0.20
> USER1:        4096  10000        0.03      0.05       0.07         0.34
> USER1:       65536  10000        0.10      0.08       0.18         2.14
> USER1:     1048576    644        0.10      0.04       0.14        29.07
> USER1:     2097152    322        0.10      0.04       0.14        57.50
> USER1:     4194304    161        0.12      0.04       0.15       114.50
> USER1:    16777216     40        0.11      0.04       0.15       456.09
> USER1:  1073741824 Interrupted: out of memory. [1]
> 
> Compared with rte_malloc:
> USER1: Performance: rte_malloc
> USER1:    Size (B)   Runs  Alloc (us) Free (us) Total (us)  memset (us)
> USER1:          64  10000        0.14      0.07       0.21         0.01
> USER1:         128  10000        0.10      0.05       0.15         0.01
> USER1:        1024  10000        0.11      0.18       0.29         0.21
> USER1:        4096  10000        0.13      0.39       0.53         0.35
> USER1:       65536  10000        0.17      2.27       2.44         2.15
> USER1:     1048576  10000       37.21     71.63     108.84        29.08
> USER1:     2097152  10000     8831.15    160.02    8991.17        63.52
> USER1:     4194304  10000    47131.88    413.75   47545.62       173.79
> USER1:    16777216   4221   119604.60   2209.73  121814.34       964.42
> USER1:  1073741824     31   335058.32 223369.31  558427.63     62440.87
> 
> [1] The total-size of the memarea is restricted to avoid creation
> failed.

This is not a fair comparison:
rte_malloc time includes obtaining memory from the system.
I think that memarea should have a dedicated benchmark,
because eventually it will be interesting to compare memarea
with different sources and algorithms.
It will be also possible to add DPDK allocator to the comparison
by running it for an isolated heap that doesn't grow.
(In some distant future it would be cool to make DPDK allocator pluggable!)
Some shared code between this benchmark and the new one can be factored out.
  
fengchengwen Oct. 12, 2022, 8:03 a.m. UTC | #2
Hi Dmitry,

On 2022/10/11 23:58, Dmitry Kozlyuk wrote:
> 2022-10-11 12:17 (UTC+0000), Chengwen Feng:
>> This patch adds memarea to malloc_perf_autotest.
>>
>> Test platform: Kunpeng920
>> Test command: dpdk-test -a 0000:7d:00.3 -l 10-12
>> Test result:
>> USER1: Performance: rte_memarea
>> USER1:    Size (B)   Runs  Alloc (us) Free (us) Total (us)  memset (us)
>> USER1:          64  10000        0.03      0.03       0.06         0.01
>> USER1:         128  10000        0.02      0.03       0.05         0.01
>> USER1:        1024  10000        0.03      0.05       0.07         0.20
>> USER1:        4096  10000        0.03      0.05       0.07         0.34
>> USER1:       65536  10000        0.10      0.08       0.18         2.14
>> USER1:     1048576    644        0.10      0.04       0.14        29.07
>> USER1:     2097152    322        0.10      0.04       0.14        57.50
>> USER1:     4194304    161        0.12      0.04       0.15       114.50
>> USER1:    16777216     40        0.11      0.04       0.15       456.09
>> USER1:  1073741824 Interrupted: out of memory. [1]
>>
>> Compared with rte_malloc:
>> USER1: Performance: rte_malloc
>> USER1:    Size (B)   Runs  Alloc (us) Free (us) Total (us)  memset (us)
>> USER1:          64  10000        0.14      0.07       0.21         0.01
>> USER1:         128  10000        0.10      0.05       0.15         0.01
>> USER1:        1024  10000        0.11      0.18       0.29         0.21
>> USER1:        4096  10000        0.13      0.39       0.53         0.35
>> USER1:       65536  10000        0.17      2.27       2.44         2.15
>> USER1:     1048576  10000       37.21     71.63     108.84        29.08
>> USER1:     2097152  10000     8831.15    160.02    8991.17        63.52
>> USER1:     4194304  10000    47131.88    413.75   47545.62       173.79
>> USER1:    16777216   4221   119604.60   2209.73  121814.34       964.42
>> USER1:  1073741824     31   335058.32 223369.31  558427.63     62440.87
>>
>> [1] The total-size of the memarea is restricted to avoid creation
>> failed.
> 
> This is not a fair comparison:
> rte_malloc time includes obtaining memory from the system.

Yes, but I want to keep this patch, at least we know the different.

> I think that memarea should have a dedicated benchmark,
> because eventually it will be interesting to compare memarea
> with different sources and algorithms.

It may take a long time to reach a benchmark that everyone agrees with.
I will try after this patch set upstreamed.

> It will be also possible to add DPDK allocator to the comparison
> by running it for an isolated heap that doesn't grow.
> (In some distant future it would be cool to make DPDK allocator pluggable!)
> Some shared code between this benchmark and the new one can be factored out.
> .
>
  

Patch

diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
index ccec43ae84..a8b4531fe3 100644
--- a/app/test/test_malloc_perf.c
+++ b/app/test/test_malloc_perf.c
@@ -7,10 +7,12 @@ 
 #include <rte_cycles.h>
 #include <rte_errno.h>
 #include <rte_malloc.h>
+#include <rte_memarea.h>
 #include <rte_memzone.h>
 
 #include "test.h"
 
+#define PERFTEST_MAX_RUNS	10000
 #define TEST_LOG(level, ...) RTE_LOG(level, USER1, __VA_ARGS__)
 
 typedef void * (alloc_t)(const char *name, size_t size, unsigned int align);
@@ -147,10 +149,52 @@  memzone_free(void *addr)
 	rte_memzone_free((struct rte_memzone *)addr);
 }
 
+static struct rte_memarea *test_ma;
+
+static int
+memarea_pre_env(void)
+{
+	struct rte_memarea_param init = { 0 };
+	snprintf(init.name, sizeof(init.name), "perftest");
+	init.source = RTE_MEMAREA_SOURCE_HEAP;
+	init.alg = RTE_MEMAREA_ALG_NEXTFIT;
+	init.total_sz = PERFTEST_MAX_RUNS * KB * 66; /* test for max 64KB (add 2KB for meta) */
+	init.mt_safe = 1;
+	init.numa_socket = SOCKET_ID_ANY;
+	init.bak_memarea = NULL;
+	test_ma = rte_memarea_create(&init);
+	if (test_ma == NULL) {
+		fprintf(stderr, "memarea create failed, skip memarea perftest!\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+memarea_clear_env(void)
+{
+	rte_memarea_destroy(test_ma);
+	test_ma = NULL;
+}
+
+static void *
+memarea_alloc(const char *name, size_t size, unsigned int align)
+{
+	RTE_SET_USED(name);
+	RTE_SET_USED(align);
+	return rte_memarea_alloc(test_ma, size, 0);
+}
+
+static void
+memarea_free(void *addr)
+{
+	rte_memarea_free(test_ma, addr);
+}
+
 static int
 test_malloc_perf(void)
 {
-	static const size_t MAX_RUNS = 10000;
+	static const size_t MAX_RUNS = PERFTEST_MAX_RUNS;
 
 	double memset_us_gb = 0;
 
@@ -168,6 +212,15 @@  test_malloc_perf(void)
 			NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0)
 		return -1;
 
+	if (memarea_pre_env() < 0)
+		return 0;
+	if (test_alloc_perf("rte_memarea", memarea_alloc, memarea_free,
+			memset, memset_us_gb, MAX_RUNS) < 0) {
+		memarea_clear_env();
+		return -1;
+	}
+	memarea_clear_env();
+
 	return 0;
 }