[RFC] move memset out of hold lock when rte_free

Message ID 20230831111937.60975-1-changfengnan@bytedance.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series [RFC] move memset out of hold lock when rte_free |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation fail Compilation issues
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

Fengnan Chang Aug. 31, 2023, 11:19 a.m. UTC
  In rte_free, the most cost time part of the whole process is
memset, we can do memset without hold heap->lock, the benefit
is reduce lock contention when multi thread try alloc or free.

In my test with 40 cores machine, I add some code to account
whole function cost in test_align_overlap_per_lcore with different
alloc/size, under legacy memory mode without existing hugepage,
files, this is test result:

size    w/      w/o
64      119us   118us
128     124us   118us
1024    137us   127us
4096    137us   140us
8192    142us   158us
16384   138us   186us
65536   139us   375us
131072  133us   627us
524277  694us   2973us
1048576 2117us  7685us

Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
---
 lib/eal/common/malloc_elem.c | 16 ----------------
 lib/eal/common/malloc_heap.c | 26 ++++++++++++++++++++++++--
 2 files changed, 24 insertions(+), 18 deletions(-)
  

Comments

Stephen Hemminger Sept. 6, 2023, 3:08 p.m. UTC | #1
On Thu, 31 Aug 2023 19:19:37 +0800
Fengnan Chang <changfengnan@bytedance.com> wrote:

> +#ifndef RTE_MALLOC_DEBUG
> +	if (internal_conf->legacy_mem) {
> +		/* If orig_elem is dirty, the joint element is clean, we need do memset now */
> +		if (elem->orig_elem->dirty && !elem->dirty)
> +			memset(ptr, 0, data_len);
> +	} else if (!elem->dirty) {
> +		memset(ptr, 0, data_len);
> +	}
> +#else
> +	/* Always poison the memory in debug mode. */
> +	memset(ptr, MALLOC_POISON, data_len);
> +#endif

The code reads better if positive clause was first.
I.e.

#ifdef RTE_MALLOC_DEBUG
	/* Always poison the memory in debug mode. */
	memset(ptr, MALLOC_POISON, data_len);
#else
...
  
Fengnan Chang Sept. 12, 2023, 2:44 a.m. UTC | #2
This patch still have problem, I'll fix next version.

Stephen Hemminger <stephen@networkplumber.org> 于2023年9月6日周三 23:08写道:
>
> On Thu, 31 Aug 2023 19:19:37 +0800
> Fengnan Chang <changfengnan@bytedance.com> wrote:
>
> > +#ifndef RTE_MALLOC_DEBUG
> > +     if (internal_conf->legacy_mem) {
> > +             /* If orig_elem is dirty, the joint element is clean, we need do memset now */
> > +             if (elem->orig_elem->dirty && !elem->dirty)
> > +                     memset(ptr, 0, data_len);
> > +     } else if (!elem->dirty) {
> > +             memset(ptr, 0, data_len);
> > +     }
> > +#else
> > +     /* Always poison the memory in debug mode. */
> > +     memset(ptr, MALLOC_POISON, data_len);
> > +#endif
>
> The code reads better if positive clause was first.

Got it,  I'll do as you suggest in next version.
> I.e.
>
> #ifdef RTE_MALLOC_DEBUG
>         /* Always poison the memory in debug mode. */
>         memset(ptr, MALLOC_POISON, data_len);
> #else
> ...
  
Fengnan Chang Sept. 12, 2023, 9:13 a.m. UTC | #3
This problem had fix in this patch
http://patches.dpdk.org/project/dpdk/patch/20230912090415.48709-1-changfengnan@bytedance.com/
I'm doing long-term test, especially rte_zmalloc.

Fengnan Chang <changfengnan@bytedance.com> 于2023年9月12日周二 10:44写道:
>
> This patch still have problem, I'll fix next version.
>
> Stephen Hemminger <stephen@networkplumber.org> 于2023年9月6日周三 23:08写道:
> >
> > On Thu, 31 Aug 2023 19:19:37 +0800
> > Fengnan Chang <changfengnan@bytedance.com> wrote:
> >
> > > +#ifndef RTE_MALLOC_DEBUG
> > > +     if (internal_conf->legacy_mem) {
> > > +             /* If orig_elem is dirty, the joint element is clean, we need do memset now */
> > > +             if (elem->orig_elem->dirty && !elem->dirty)
> > > +                     memset(ptr, 0, data_len);
> > > +     } else if (!elem->dirty) {
> > > +             memset(ptr, 0, data_len);
> > > +     }
> > > +#else
> > > +     /* Always poison the memory in debug mode. */
> > > +     memset(ptr, MALLOC_POISON, data_len);
> > > +#endif
> >
> > The code reads better if positive clause was first.
>
> Got it,  I'll do as you suggest in next version.
> > I.e.
> >
> > #ifdef RTE_MALLOC_DEBUG
> >         /* Always poison the memory in debug mode. */
> >         memset(ptr, MALLOC_POISON, data_len);
> > #else
> > ...
  

Patch

diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c
index 35a2313d04..763bbe179b 100644
--- a/lib/eal/common/malloc_elem.c
+++ b/lib/eal/common/malloc_elem.c
@@ -569,12 +569,6 @@  malloc_elem_join_adjacent_free(struct malloc_elem *elem)
 struct malloc_elem *
 malloc_elem_free(struct malloc_elem *elem)
 {
-	void *ptr;
-	size_t data_len;
-
-	ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN);
-	data_len = elem->size - MALLOC_ELEM_OVERHEAD;
-
 	/*
 	 * Consider the element clean for the purposes of joining.
 	 * If both neighbors are clean or non-existent,
@@ -591,16 +585,6 @@  malloc_elem_free(struct malloc_elem *elem)
 
 	/* decrease heap's count of allocated elements */
 	elem->heap->alloc_count--;
-
-#ifndef RTE_MALLOC_DEBUG
-	/* Normally clear the memory when needed. */
-	if (!elem->dirty)
-		memset(ptr, 0, data_len);
-#else
-	/* Always poison the memory in debug mode. */
-	memset(ptr, MALLOC_POISON, data_len);
-#endif
-
 	return elem;
 }
 
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index d25bdc98f9..a5fdc4cc6f 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -862,6 +862,8 @@  malloc_heap_free(struct malloc_elem *elem)
 	unsigned int i, n_segs, before_space, after_space;
 	int ret;
 	bool unmapped = false;
+	void *ptr;
+	size_t data_len;
 	const struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
@@ -875,16 +877,36 @@  malloc_heap_free(struct malloc_elem *elem)
 	msl = elem->msl;
 	page_sz = (size_t)msl->page_sz;
 
-	rte_spinlock_lock(&(heap->lock));
-
 	void *asan_ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN + elem->pad);
 	size_t asan_data_len = elem->size - MALLOC_ELEM_OVERHEAD - elem->pad;
 
+	ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN);
+	data_len = elem->size - MALLOC_ELEM_OVERHEAD;
+
+	/* If orig_elem is clean, any child elem should be clean, so let's do memset
+	 * before hold lock.
+	 */
+	if (internal_conf->legacy_mem && !elem->orig_elem->dirty)
+		memset(ptr, 0, data_len);
+
+	rte_spinlock_lock(&(heap->lock));
 	/* mark element as free */
 	elem->state = ELEM_FREE;
 
 	elem = malloc_elem_free(elem);
 
+#ifndef RTE_MALLOC_DEBUG
+	if (internal_conf->legacy_mem) {
+		/* If orig_elem is dirty, the joint element is clean, we need do memset now */
+		if (elem->orig_elem->dirty && !elem->dirty)
+			memset(ptr, 0, data_len);
+	} else if (!elem->dirty) {
+		memset(ptr, 0, data_len);
+	}
+#else
+	/* Always poison the memory in debug mode. */
+	memset(ptr, MALLOC_POISON, data_len);
+#endif
 	/* anything after this is a bonus */
 	ret = 0;