[v3] dmadev: fix structure alignment

Message ID 20240320072332.1433526-1-wenwux.ma@intel.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series [v3] dmadev: fix structure alignment |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing fail Testing issues
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Ma, WenwuX March 20, 2024, 7:23 a.m. UTC
  The structure rte_dma_dev needs to be aligned to the cache line, but
the return value of malloc may not be aligned to the cache line. When
we use memset to clear the rte_dma_dev object, it may cause a segmentation
fault in clang-x86-platform.

This is because clang uses the "vmovaps" assembly instruction for
memset, which requires that the operands (rte_dma_dev objects) must
aligned on a 16-byte boundary or a general-protection exception (#GP)
is generated.

Therefore, either additional memory is applied for re-alignment, or the
rte_dma_dev object does not require cache line alignment. The patch
chooses the former option to fix the issue.

Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
Cc: stable@dpdk.org

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
v2:
 - Because of performance drop, adjust the code to
   no longer demand cache line alignment
v3:
 - back to v1 patch

---
 lib/dmadev/rte_dmadev.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)
  

Comments

fengchengwen March 20, 2024, 9:31 a.m. UTC | #1
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>

On 2024/3/20 15:23, Wenwu Ma wrote:
> The structure rte_dma_dev needs to be aligned to the cache line, but
> the return value of malloc may not be aligned to the cache line. When
> we use memset to clear the rte_dma_dev object, it may cause a segmentation
> fault in clang-x86-platform.
> 
> This is because clang uses the "vmovaps" assembly instruction for
> memset, which requires that the operands (rte_dma_dev objects) must
> aligned on a 16-byte boundary or a general-protection exception (#GP)
> is generated.
> 
> Therefore, either additional memory is applied for re-alignment, or the
> rte_dma_dev object does not require cache line alignment. The patch
> chooses the former option to fix the issue.
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
> v2:
>  - Because of performance drop, adjust the code to
>    no longer demand cache line alignment
> v3:
>  - back to v1 patch
> 
> ---
>  lib/dmadev/rte_dmadev.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> index 5953a77bd6..61e106d574 100644
> --- a/lib/dmadev/rte_dmadev.c
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -160,15 +160,25 @@ static int
>  dma_dev_data_prepare(void)
>  {
>  	size_t size;
> +	void *ptr;
>  
>  	if (rte_dma_devices != NULL)
>  		return 0;
>  
> -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> -	rte_dma_devices = malloc(size);
> -	if (rte_dma_devices == NULL)
> +	/* The dma device object is expected to align cacheline, but
> +	 * the return value of malloc may not be aligned to the cache line.
> +	 * Therefore, extra memory is applied for realignment.
> +	 * note: We do not call posix_memalign/aligned_alloc because it is
> +	 * version dependent on libc.
> +	 */
> +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> +		RTE_CACHE_LINE_SIZE;
> +	ptr = malloc(size);
> +	if (ptr == NULL)
>  		return -ENOMEM;
> -	memset(rte_dma_devices, 0, size);
> +	memset(ptr, 0, size);
> +
> +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
>  
>  	return 0;
>  }
>
  
Thomas Monjalon March 20, 2024, 11:37 a.m. UTC | #2
20/03/2024 08:23, Wenwu Ma:
> The structure rte_dma_dev needs to be aligned to the cache line, but
> the return value of malloc may not be aligned to the cache line. When
> we use memset to clear the rte_dma_dev object, it may cause a segmentation
> fault in clang-x86-platform.
> 
> This is because clang uses the "vmovaps" assembly instruction for
> memset, which requires that the operands (rte_dma_dev objects) must
> aligned on a 16-byte boundary or a general-protection exception (#GP)
> is generated.
> 
> Therefore, either additional memory is applied for re-alignment, or the
> rte_dma_dev object does not require cache line alignment. The patch
> chooses the former option to fix the issue.
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
[..]
> -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> -	rte_dma_devices = malloc(size);
> -	if (rte_dma_devices == NULL)
> +	/* The dma device object is expected to align cacheline, but
> +	 * the return value of malloc may not be aligned to the cache line.
> +	 * Therefore, extra memory is applied for realignment.
> +	 * note: We do not call posix_memalign/aligned_alloc because it is
> +	 * version dependent on libc.
> +	 */
> +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> +		RTE_CACHE_LINE_SIZE;
> +	ptr = malloc(size);
> +	if (ptr == NULL)
>  		return -ENOMEM;
> -	memset(rte_dma_devices, 0, size);
> +	memset(ptr, 0, size);
> +
> +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);

Why not using aligned_alloc()?
https://en.cppreference.com/w/c/memory/aligned_alloc
  
Ma, WenwuX March 21, 2024, 1:25 a.m. UTC | #3
Hi, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, March 20, 2024 7:37 PM
> To: fengchengwen@huawei.com; Ma, WenwuX <wenwux.ma@intel.com>
> Cc: dev@dpdk.org; Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] dmadev: fix structure alignment
> 
> 20/03/2024 08:23, Wenwu Ma:
> > The structure rte_dma_dev needs to be aligned to the cache line, but
> > the return value of malloc may not be aligned to the cache line. When
> > we use memset to clear the rte_dma_dev object, it may cause a
> > segmentation fault in clang-x86-platform.
> >
> > This is because clang uses the "vmovaps" assembly instruction for
> > memset, which requires that the operands (rte_dma_dev objects) must
> > aligned on a 16-byte boundary or a general-protection exception (#GP)
> > is generated.
> >
> > Therefore, either additional memory is applied for re-alignment, or
> > the rte_dma_dev object does not require cache line alignment. The
> > patch chooses the former option to fix the issue.
> >
> > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> [..]
> > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > -	rte_dma_devices = malloc(size);
> > -	if (rte_dma_devices == NULL)
> > +	/* The dma device object is expected to align cacheline, but
> > +	 * the return value of malloc may not be aligned to the cache line.
> > +	 * Therefore, extra memory is applied for realignment.
> > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > +	 * version dependent on libc.
> > +	 */
> > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > +		RTE_CACHE_LINE_SIZE;
> > +	ptr = malloc(size);
> > +	if (ptr == NULL)
> >  		return -ENOMEM;
> > -	memset(rte_dma_devices, 0, size);
> > +	memset(ptr, 0, size);
> > +
> > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> 
> Why not using aligned_alloc()?
> https://en.cppreference.com/w/c/memory/aligned_alloc
> 
> 
because it is version dependent on libc.
  
Thomas Monjalon March 21, 2024, 8:30 a.m. UTC | #4
21/03/2024 02:25, Ma, WenwuX:
> Hi, Thomas
> 
> From: Thomas Monjalon <thomas@monjalon.net>
> > 20/03/2024 08:23, Wenwu Ma:
> > > The structure rte_dma_dev needs to be aligned to the cache line, but
> > > the return value of malloc may not be aligned to the cache line. When
> > > we use memset to clear the rte_dma_dev object, it may cause a
> > > segmentation fault in clang-x86-platform.
> > >
> > > This is because clang uses the "vmovaps" assembly instruction for
> > > memset, which requires that the operands (rte_dma_dev objects) must
> > > aligned on a 16-byte boundary or a general-protection exception (#GP)
> > > is generated.
> > >
> > > Therefore, either additional memory is applied for re-alignment, or
> > > the rte_dma_dev object does not require cache line alignment. The
> > > patch chooses the former option to fix the issue.
> > >
> > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > [..]
> > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > -	rte_dma_devices = malloc(size);
> > > -	if (rte_dma_devices == NULL)
> > > +	/* The dma device object is expected to align cacheline, but
> > > +	 * the return value of malloc may not be aligned to the cache line.
> > > +	 * Therefore, extra memory is applied for realignment.
> > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > +	 * version dependent on libc.
> > > +	 */
> > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > +		RTE_CACHE_LINE_SIZE;
> > > +	ptr = malloc(size);
> > > +	if (ptr == NULL)
> > >  		return -ENOMEM;
> > > -	memset(rte_dma_devices, 0, size);
> > > +	memset(ptr, 0, size);
> > > +
> > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > 
> > Why not using aligned_alloc()?
> > https://en.cppreference.com/w/c/memory/aligned_alloc
> > 
> > 
> because it is version dependent on libc.

Which libc is required?
  
Ma, WenwuX March 21, 2024, 8:57 a.m. UTC | #5
Hi, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, March 21, 2024 4:31 PM
> To: fengchengwen@huawei.com; Ma, WenwuX <wenwux.ma@intel.com>
> Cc: dev@dpdk.org; Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] dmadev: fix structure alignment
> 
> 21/03/2024 02:25, Ma, WenwuX:
> > Hi, Thomas
> >
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 20/03/2024 08:23, Wenwu Ma:
> > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > but the return value of malloc may not be aligned to the cache
> > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > cause a segmentation fault in clang-x86-platform.
> > > >
> > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > must aligned on a 16-byte boundary or a general-protection
> > > > exception (#GP) is generated.
> > > >
> > > > Therefore, either additional memory is applied for re-alignment,
> > > > or the rte_dma_dev object does not require cache line alignment.
> > > > The patch chooses the former option to fix the issue.
> > > >
> > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > [..]
> > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > -	rte_dma_devices = malloc(size);
> > > > -	if (rte_dma_devices == NULL)
> > > > +	/* The dma device object is expected to align cacheline, but
> > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > +	 * Therefore, extra memory is applied for realignment.
> > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > +	 * version dependent on libc.
> > > > +	 */
> > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > +		RTE_CACHE_LINE_SIZE;
> > > > +	ptr = malloc(size);
> > > > +	if (ptr == NULL)
> > > >  		return -ENOMEM;
> > > > -	memset(rte_dma_devices, 0, size);
> > > > +	memset(ptr, 0, size);
> > > > +
> > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > >
> > > Why not using aligned_alloc()?
> > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > >
> > >
> > because it is version dependent on libc.
> 
> Which libc is required?
> 
In the NOTE section of the link you gave there is this quote:

This function is not supported in Microsoft C Runtime library because its implementation of std::free is unable to handle aligned allocations of any kind. Instead, MS CRT provides _aligned_malloc (to be freed with _aligned_free).
  
Ma, WenwuX March 21, 2024, 9:18 a.m. UTC | #6
Hi, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, March 21, 2024 4:31 PM
> To: fengchengwen@huawei.com; Ma, WenwuX <wenwux.ma@intel.com>
> Cc: dev@dpdk.org; Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] dmadev: fix structure alignment
> 
> 21/03/2024 02:25, Ma, WenwuX:
> > Hi, Thomas
> >
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 20/03/2024 08:23, Wenwu Ma:
> > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > but the return value of malloc may not be aligned to the cache
> > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > cause a segmentation fault in clang-x86-platform.
> > > >
> > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > must aligned on a 16-byte boundary or a general-protection
> > > > exception (#GP) is generated.
> > > >
> > > > Therefore, either additional memory is applied for re-alignment,
> > > > or the rte_dma_dev object does not require cache line alignment.
> > > > The patch chooses the former option to fix the issue.
> > > >
> > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > [..]
> > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > -	rte_dma_devices = malloc(size);
> > > > -	if (rte_dma_devices == NULL)
> > > > +	/* The dma device object is expected to align cacheline, but
> > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > +	 * Therefore, extra memory is applied for realignment.
> > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > +	 * version dependent on libc.
> > > > +	 */
> > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > +		RTE_CACHE_LINE_SIZE;
> > > > +	ptr = malloc(size);
> > > > +	if (ptr == NULL)
> > > >  		return -ENOMEM;
> > > > -	memset(rte_dma_devices, 0, size);
> > > > +	memset(ptr, 0, size);
> > > > +
> > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > >
> > > Why not using aligned_alloc()?
> > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > >
> > >
> > because it is version dependent on libc.
> 
> Which libc is required?
> 

using the 'man aligned_alloc' command, we has the following description:

VERSIONS
       The functions memalign(), valloc(), and pvalloc() have been available in all Linux libc libraries.

       The function aligned_alloc() was added to glibc in version 2.16.

       The function posix_memalign() is available since glibc 2.1.91.
  
Thomas Monjalon March 21, 2024, 10:06 a.m. UTC | #7
21/03/2024 10:18, Ma, WenwuX:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 21/03/2024 02:25, Ma, WenwuX:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 20/03/2024 08:23, Wenwu Ma:
> > > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > > but the return value of malloc may not be aligned to the cache
> > > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > > cause a segmentation fault in clang-x86-platform.
> > > > >
> > > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > > must aligned on a 16-byte boundary or a general-protection
> > > > > exception (#GP) is generated.
> > > > >
> > > > > Therefore, either additional memory is applied for re-alignment,
> > > > > or the rte_dma_dev object does not require cache line alignment.
> > > > > The patch chooses the former option to fix the issue.
> > > > >
> > > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > > [..]
> > > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > > -	rte_dma_devices = malloc(size);
> > > > > -	if (rte_dma_devices == NULL)
> > > > > +	/* The dma device object is expected to align cacheline, but
> > > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > > +	 * Therefore, extra memory is applied for realignment.
> > > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > > +	 * version dependent on libc.
> > > > > +	 */
> > > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > > +		RTE_CACHE_LINE_SIZE;
> > > > > +	ptr = malloc(size);
> > > > > +	if (ptr == NULL)
> > > > >  		return -ENOMEM;
> > > > > -	memset(rte_dma_devices, 0, size);
> > > > > +	memset(ptr, 0, size);
> > > > > +
> > > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > > >
> > > > Why not using aligned_alloc()?
> > > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > > >
> > > >
> > > because it is version dependent on libc.
> > 
> > Which libc is required?
> > 
> 
> using the 'man aligned_alloc' command, we has the following description:
> 
> VERSIONS
>        The functions memalign(), valloc(), and pvalloc() have been available in all Linux libc libraries.
> 
>        The function aligned_alloc() was added to glibc in version 2.16.

released in 2012-06-30

>        The function posix_memalign() is available since glibc 2.1.91.

I think we could bump our libc requirements for these functions.

I understand there is also a concern on Windows,
but an alternative exists there.
We may need a wrapper like "rte_alloc_align".
  
Tyler Retzlaff March 21, 2024, 4:05 p.m. UTC | #8
On Thu, Mar 21, 2024 at 11:06:34AM +0100, Thomas Monjalon wrote:
> 21/03/2024 10:18, Ma, WenwuX:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 21/03/2024 02:25, Ma, WenwuX:
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > 20/03/2024 08:23, Wenwu Ma:
> > > > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > > > but the return value of malloc may not be aligned to the cache
> > > > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > > > cause a segmentation fault in clang-x86-platform.
> > > > > >
> > > > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > > > must aligned on a 16-byte boundary or a general-protection
> > > > > > exception (#GP) is generated.
> > > > > >
> > > > > > Therefore, either additional memory is applied for re-alignment,
> > > > > > or the rte_dma_dev object does not require cache line alignment.
> > > > > > The patch chooses the former option to fix the issue.
> > > > > >
> > > > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > > > Cc: stable@dpdk.org
> > > > > >
> > > > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > > > [..]
> > > > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > > > -	rte_dma_devices = malloc(size);
> > > > > > -	if (rte_dma_devices == NULL)
> > > > > > +	/* The dma device object is expected to align cacheline, but
> > > > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > > > +	 * Therefore, extra memory is applied for realignment.
> > > > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > > > +	 * version dependent on libc.
> > > > > > +	 */
> > > > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > > > +		RTE_CACHE_LINE_SIZE;
> > > > > > +	ptr = malloc(size);
> > > > > > +	if (ptr == NULL)
> > > > > >  		return -ENOMEM;
> > > > > > -	memset(rte_dma_devices, 0, size);
> > > > > > +	memset(ptr, 0, size);
> > > > > > +
> > > > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > > > >
> > > > > Why not using aligned_alloc()?
> > > > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > > > >
> > > > >
> > > > because it is version dependent on libc.
> > > 
> > > Which libc is required?
> > > 
> > 
> > using the 'man aligned_alloc' command, we has the following description:
> > 
> > VERSIONS
> >        The functions memalign(), valloc(), and pvalloc() have been available in all Linux libc libraries.
> > 
> >        The function aligned_alloc() was added to glibc in version 2.16.
> 
> released in 2012-06-30

If we are using C11 we probably already implicitly depend on the glibc
that supports aligned_alloc (introduced in C11).

> 
> >        The function posix_memalign() is available since glibc 2.1.91.
> 
> I think we could bump our libc requirements for these functions.
> 
> I understand there is also a concern on Windows,
> but an alternative exists there.
> We may need a wrapper like "rte_alloc_align".

Yes, I'm afraid we would probably have to introduce
rte_aligned_alloc/rte_aligned_free. On Windows this would simply
forward to _aligned_alloc() and _aligned_free() respectively.

ty
  

Patch

diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 5953a77bd6..61e106d574 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -160,15 +160,25 @@  static int
 dma_dev_data_prepare(void)
 {
 	size_t size;
+	void *ptr;
 
 	if (rte_dma_devices != NULL)
 		return 0;
 
-	size = dma_devices_max * sizeof(struct rte_dma_dev);
-	rte_dma_devices = malloc(size);
-	if (rte_dma_devices == NULL)
+	/* The dma device object is expected to align cacheline, but
+	 * the return value of malloc may not be aligned to the cache line.
+	 * Therefore, extra memory is applied for realignment.
+	 * note: We do not call posix_memalign/aligned_alloc because it is
+	 * version dependent on libc.
+	 */
+	size = dma_devices_max * sizeof(struct rte_dma_dev) +
+		RTE_CACHE_LINE_SIZE;
+	ptr = malloc(size);
+	if (ptr == NULL)
 		return -ENOMEM;
-	memset(rte_dma_devices, 0, size);
+	memset(ptr, 0, size);
+
+	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
 
 	return 0;
 }