dmadev: standardize alignment and allocation

Message ID 20240202090633.10816-1-pbhagavatula@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series dmadev: standardize alignment and allocation |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Pavan Nikhilesh Bhagavatula Feb. 2, 2024, 9:06 a.m. UTC
  From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Align fp_objects based on cacheline size, allocate
devices and fp_objects memory on hugepages.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/dmadev/rte_dmadev.c      | 6 ++----
 lib/dmadev/rte_dmadev_core.h | 2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)
  

Comments

fengchengwen Feb. 4, 2024, 1:38 a.m. UTC | #1
Hi Pavan,

Alloc fp_objects from rte_memory is a good idea, but this may cause
the rte_memory memory leak, especially in multi-process scenario.

Currently, there is no mechanism for releasing such a rte_memory which
don't belong to any driver.

So I suggest: maybe we could add rte_mem_align API which alloc from libc
and use in this cases.

BTW: the rte_dma_devices is only used in control-path, so it don't need
use rte_memory API, but I think it could use the new rte_mem_align API.

Thanks

On 2024/2/2 17:06, pbhagavatula@marvell.com wrote:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Align fp_objects based on cacheline size, allocate
> devices and fp_objects memory on hugepages.
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  lib/dmadev/rte_dmadev.c      | 6 ++----
>  lib/dmadev/rte_dmadev_core.h | 2 +-
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> index 67434c805f43..1fe1434019f0 100644
> --- a/lib/dmadev/rte_dmadev.c
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -143,10 +143,9 @@ dma_fp_data_prepare(void)
>  	 */
>  	size = dma_devices_max * sizeof(struct rte_dma_fp_object) +
>  		RTE_CACHE_LINE_SIZE;
> -	ptr = malloc(size);
> +	ptr = rte_zmalloc("", size, RTE_CACHE_LINE_SIZE);
>  	if (ptr == NULL)
>  		return -ENOMEM;
> -	memset(ptr, 0, size);
>  
>  	rte_dma_fp_objs = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
>  	for (i = 0; i < dma_devices_max; i++)
> @@ -164,10 +163,9 @@ dma_dev_data_prepare(void)
>  		return 0;
>  
>  	size = dma_devices_max * sizeof(struct rte_dma_dev);
> -	rte_dma_devices = malloc(size);
> +	rte_dma_devices = rte_zmalloc("", size, RTE_CACHE_LINE_SIZE);
>  	if (rte_dma_devices == NULL)
>  		return -ENOMEM;
> -	memset(rte_dma_devices, 0, size);
>  
>  	return 0;
>  }
> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> index 064785686f7f..e8239c2d22b6 100644
> --- a/lib/dmadev/rte_dmadev_core.h
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -73,7 +73,7 @@ struct rte_dma_fp_object {
>  	rte_dma_completed_t        completed;
>  	rte_dma_completed_status_t completed_status;
>  	rte_dma_burst_capacity_t   burst_capacity;
> -} __rte_aligned(128);
> +} __rte_cache_aligned;
>  
>  extern struct rte_dma_fp_object *rte_dma_fp_objs;
>  
>
  
Pavan Nikhilesh Bhagavatula Feb. 10, 2024, 6:20 a.m. UTC | #2
> Hi Pavan,
> 
> Alloc fp_objects from rte_memory is a good idea, but this may cause
> the rte_memory memory leak, especially in multi-process scenario.
> 
> Currently, there is no mechanism for releasing such a rte_memory which
> don't belong to any driver.
>

Yeah, secondary process will leak rte_zmalloc allocations if not freed.
The only option currently is to use mmap and allocate non-shared memory 
on secondary, which is not ideal.
 
> So I suggest: maybe we could add rte_mem_align API which alloc from libc
> and use in this cases.
>

Yeah, maybe in future we could add something like rte_zmalloc_private which 
would create new mappings on secondary process. But that is out of scope for 
this patch.
 
I will send a v2 dropping the malloc changes and keeping the cache alignment changes.

> BTW: the rte_dma_devices is only used in control-path, so it don't need
> use rte_memory API, but I think it could use the new rte_mem_align API.
> 
> Thanks
> 

Thanks,
Pavan.

> On 2024/2/2 17:06, pbhagavatula@marvell.com wrote:
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > Align fp_objects based on cacheline size, allocate
> > devices and fp_objects memory on hugepages.
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> >  lib/dmadev/rte_dmadev.c      | 6 ++----
> >  lib/dmadev/rte_dmadev_core.h | 2 +-
> >  2 files changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> > index 67434c805f43..1fe1434019f0 100644
> > --- a/lib/dmadev/rte_dmadev.c
> > +++ b/lib/dmadev/rte_dmadev.c
> > @@ -143,10 +143,9 @@ dma_fp_data_prepare(void)
> >  	 */
> >  	size = dma_devices_max * sizeof(struct rte_dma_fp_object) +
> >  		RTE_CACHE_LINE_SIZE;
> > -	ptr = malloc(size);
> > +	ptr = rte_zmalloc("", size, RTE_CACHE_LINE_SIZE);
> >  	if (ptr == NULL)
> >  		return -ENOMEM;
> > -	memset(ptr, 0, size);
> >
> >  	rte_dma_fp_objs = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> >  	for (i = 0; i < dma_devices_max; i++)
> > @@ -164,10 +163,9 @@ dma_dev_data_prepare(void)
> >  		return 0;
> >
> >  	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > -	rte_dma_devices = malloc(size);
> > +	rte_dma_devices = rte_zmalloc("", size, RTE_CACHE_LINE_SIZE);
> >  	if (rte_dma_devices == NULL)
> >  		return -ENOMEM;
> > -	memset(rte_dma_devices, 0, size);
> >
> >  	return 0;
> >  }
> > diff --git a/lib/dmadev/rte_dmadev_core.h
> b/lib/dmadev/rte_dmadev_core.h
> > index 064785686f7f..e8239c2d22b6 100644
> > --- a/lib/dmadev/rte_dmadev_core.h
> > +++ b/lib/dmadev/rte_dmadev_core.h
> > @@ -73,7 +73,7 @@ struct rte_dma_fp_object {
> >  	rte_dma_completed_t        completed;
> >  	rte_dma_completed_status_t completed_status;
> >  	rte_dma_burst_capacity_t   burst_capacity;
> > -} __rte_aligned(128);
> > +} __rte_cache_aligned;
> >
> >  extern struct rte_dma_fp_object *rte_dma_fp_objs;
> >
> >
  

Patch

diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 67434c805f43..1fe1434019f0 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -143,10 +143,9 @@  dma_fp_data_prepare(void)
 	 */
 	size = dma_devices_max * sizeof(struct rte_dma_fp_object) +
 		RTE_CACHE_LINE_SIZE;
-	ptr = malloc(size);
+	ptr = rte_zmalloc("", size, RTE_CACHE_LINE_SIZE);
 	if (ptr == NULL)
 		return -ENOMEM;
-	memset(ptr, 0, size);
 
 	rte_dma_fp_objs = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
 	for (i = 0; i < dma_devices_max; i++)
@@ -164,10 +163,9 @@  dma_dev_data_prepare(void)
 		return 0;
 
 	size = dma_devices_max * sizeof(struct rte_dma_dev);
-	rte_dma_devices = malloc(size);
+	rte_dma_devices = rte_zmalloc("", size, RTE_CACHE_LINE_SIZE);
 	if (rte_dma_devices == NULL)
 		return -ENOMEM;
-	memset(rte_dma_devices, 0, size);
 
 	return 0;
 }
diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
index 064785686f7f..e8239c2d22b6 100644
--- a/lib/dmadev/rte_dmadev_core.h
+++ b/lib/dmadev/rte_dmadev_core.h
@@ -73,7 +73,7 @@  struct rte_dma_fp_object {
 	rte_dma_completed_t        completed;
 	rte_dma_completed_status_t completed_status;
 	rte_dma_burst_capacity_t   burst_capacity;
-} __rte_aligned(128);
+} __rte_cache_aligned;
 
 extern struct rte_dma_fp_object *rte_dma_fp_objs;