[v2] dmadev: fix structure alignment

Message ID 20240315014331.1376720-1-wenwux.ma@intel.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series [v2] dmadev: fix structure alignment |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Ma, WenwuX March 15, 2024, 1:43 a.m. UTC
  The structure rte_dma_dev needs only 8 byte alignment.
This patch replaces __rte_cache_aligned of rte_dma_dev
with __rte_aligned(8).

Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
Cc: stable@dpdk.org

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
v2:
 - Because of performance drop, adjust the code to
   no longer demand cache line alignment

---
 lib/dmadev/rte_dmadev_pmd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Tyler Retzlaff March 15, 2024, 6:02 a.m. UTC | #1
On Fri, Mar 15, 2024 at 09:43:31AM +0800, Wenwu Ma wrote:
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev
> with __rte_aligned(8).
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---

Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
  
fengchengwen March 15, 2024, 6:06 a.m. UTC | #2
Hi Wenwu,

On 2024/3/15 9:43, Wenwu Ma wrote:
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev
> with __rte_aligned(8).
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
> v2:
>  - Because of performance drop, adjust the code to
>    no longer demand cache line alignment

Which two versions observed performance drop? And which benchmark observed drop?
Could you provide more information?

> 
> ---
>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> index 58729088ff..b569bb3502 100644
> --- a/lib/dmadev/rte_dmadev_pmd.h
> +++ b/lib/dmadev/rte_dmadev_pmd.h
> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>   * @internal
>   * The generic data structure associated with each DMA device.
>   */
> -struct __rte_cache_aligned rte_dma_dev {
> +struct __rte_aligned(8) rte_dma_dev {

The DMA fast-path was implemented by struct rte_dma_fp_objs, which is not
rte_dma_dev? So why is it a problem here?

Thanks

>  	/** Device info which supplied during device initialization. */
>  	struct rte_device *device;
>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data. */
>
  
Ma, WenwuX March 15, 2024, 6:25 a.m. UTC | #3
Hi Chengwen,

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 15, 2024 2:06 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 9:43, Wenwu Ma wrote:
> > The structure rte_dma_dev needs only 8 byte alignment.
> > This patch replaces __rte_cache_aligned of rte_dma_dev with
> > __rte_aligned(8).
> >
> > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > ---
> > v2:
> >  - Because of performance drop, adjust the code to
> >    no longer demand cache line alignment
> 
> Which two versions observed performance drop? And which benchmark
> observed drop?
> Could you provide more information?
> 
> >
V1 patch:
https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-1-wenwux.ma@intel.com/

To view detailed results, visit:
https://lab.dpdk.org/results/dashboard/patchsets/29472/

> > ---
> >  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/dmadev/rte_dmadev_pmd.h
> b/lib/dmadev/rte_dmadev_pmd.h
> > index 58729088ff..b569bb3502 100644
> > --- a/lib/dmadev/rte_dmadev_pmd.h
> > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >   * @internal
> >   * The generic data structure associated with each DMA device.
> >   */
> > -struct __rte_cache_aligned rte_dma_dev {
> > +struct __rte_aligned(8) rte_dma_dev {
> 
> The DMA fast-path was implemented by struct rte_dma_fp_objs, which is not
> rte_dma_dev? So why is it a problem here?
> 
> Thanks
> 
The DMA device object is expected to align cache line, so clang will use “vmovaps” assembly instruction, 

And the instruction demands 16 bytes alignment or will cause segment fault in some environments.


> >  	/** Device info which supplied during device initialization. */
> >  	struct rte_device *device;
> >  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> > */
> >
  
Ma, WenwuX March 15, 2024, 7:44 a.m. UTC | #4
Hi Chengwen,

> -----Original Message-----
> From: Ma, WenwuX
> Sent: Friday, March 15, 2024 2:26 PM
> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Chengwen,
> 
> > -----Original Message-----
> > From: fengchengwen <fengchengwen@huawei.com>
> > Sent: Friday, March 15, 2024 2:06 PM
> > To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> > Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> > Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >
> > Hi Wenwu,
> >
> > On 2024/3/15 9:43, Wenwu Ma wrote:
> > > The structure rte_dma_dev needs only 8 byte alignment.
> > > This patch replaces __rte_cache_aligned of rte_dma_dev with
> > > __rte_aligned(8).
> > >
> > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > ---
> > > v2:
> > >  - Because of performance drop, adjust the code to
> > >    no longer demand cache line alignment
> >
> > Which two versions observed performance drop? And which benchmark
> > observed drop?
> > Could you provide more information?
> >
> > >
> V1 patch:
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> 1-wenwux.ma@intel.com/
> 
> To view detailed results, visit:
> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> 
> > > ---
> > >  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/dmadev/rte_dmadev_pmd.h
> > b/lib/dmadev/rte_dmadev_pmd.h
> > > index 58729088ff..b569bb3502 100644
> > > --- a/lib/dmadev/rte_dmadev_pmd.h
> > > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > > @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> > >   * @internal
> > >   * The generic data structure associated with each DMA device.
> > >   */
> > > -struct __rte_cache_aligned rte_dma_dev {
> > > +struct __rte_aligned(8) rte_dma_dev {
> >
> > The DMA fast-path was implemented by struct rte_dma_fp_objs, which is
> > not rte_dma_dev? So why is it a problem here?
> >
> > Thanks
> >
> The DMA device object is expected to align cache line, so clang will use
> “vmovaps” assembly instruction,
> 
> And the instruction demands 16 bytes alignment or will cause segment fault in
> some environments.
> 
Test case:
1. compile dpdk 
rm -rf x86_64-native-linuxapp-clang
CC=clang meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-clang
ninja -C x86_64-native-linuxapp-clang -j 72 
2. start dpdk-test
/root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: If it cannot be reproduced, please try using a different core)
3. exit dpdk-test
RTE>>quit
Segmentation fault (core dumped)

> 
> > >  	/** Device info which supplied during device initialization. */
> > >  	struct rte_device *device;
> > >  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> > > */
> > >
  
fengchengwen March 15, 2024, 8:31 a.m. UTC | #5
Hi Wenwu,

On 2024/3/15 15:44, Ma, WenwuX wrote:
> Hi Chengwen,
> 
>> -----Original Message-----
>> From: Ma, WenwuX
>> Sent: Friday, March 15, 2024 2:26 PM
>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Chengwen,
>>
>>> -----Original Message-----
>>> From: fengchengwen <fengchengwen@huawei.com>
>>> Sent: Friday, March 15, 2024 2:06 PM
>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>
>>> Hi Wenwu,
>>>
>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>> __rte_aligned(8).
>>>>
>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>>>> ---
>>>> v2:
>>>>  - Because of performance drop, adjust the code to
>>>>    no longer demand cache line alignment
>>>
>>> Which two versions observed performance drop? And which benchmark
>>> observed drop?
>>> Could you provide more information?
>>>
>>>>
>> V1 patch:
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma@intel.com/
>>
>> To view detailed results, visit:
>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>
>>>> ---
>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>> index 58729088ff..b569bb3502 100644
>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>   * @internal
>>>>   * The generic data structure associated with each DMA device.
>>>>   */
>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>
>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which is
>>> not rte_dma_dev? So why is it a problem here?
>>>
>>> Thanks
>>>
>> The DMA device object is expected to align cache line, so clang will use
>> “vmovaps” assembly instruction,
>>
>> And the instruction demands 16 bytes alignment or will cause segment fault in
>> some environments.
>>
> Test case:
> 1. compile dpdk 
> rm -rf x86_64-native-linuxapp-clang
> CC=clang meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-clang
> ninja -C x86_64-native-linuxapp-clang -j 72 
> 2. start dpdk-test
> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: If it cannot be reproduced, please try using a different core)
> 3. exit dpdk-test
> RTE>>quit
> Segmentation fault (core dumped)

I will try to reproduce, but still a question: does above test has already merged your patch [1] or the current main branch code has this problem?

[1] https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-1-wenwux.ma@intel.com/

Thanks

> 
>>
>>>>  	/** Device info which supplied during device initialization. */
>>>>  	struct rte_device *device;
>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>> */
>>>>
  
Ma, WenwuX March 15, 2024, 9:27 a.m. UTC | #6
Hi Chengwen

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 15, 2024 4:32 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 15:44, Ma, WenwuX wrote:
> > Hi Chengwen,
> >
> >> -----Original Message-----
> >> From: Ma, WenwuX
> >> Sent: Friday, March 15, 2024 2:26 PM
> >> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Chengwen,
> >>
> >>> -----Original Message-----
> >>> From: fengchengwen <fengchengwen@huawei.com>
> >>> Sent: Friday, March 15, 2024 2:06 PM
> >>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>
> >>> Hi Wenwu,
> >>>
> >>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>> __rte_aligned(8).
> >>>>
> >>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >>>> ---
> >>>> v2:
> >>>>  - Because of performance drop, adjust the code to
> >>>>    no longer demand cache line alignment
> >>>
> >>> Which two versions observed performance drop? And which benchmark
> >>> observed drop?
> >>> Could you provide more information?
> >>>
> >>>>
> >> V1 patch:
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux.ma@intel.com/
> >>
> >> To view detailed results, visit:
> >> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>
> >>>> ---
> >>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>> index 58729088ff..b569bb3502 100644
> >>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>>   * @internal
> >>>>   * The generic data structure associated with each DMA device.
> >>>>   */
> >>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>
> >>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>> is not rte_dma_dev? So why is it a problem here?
> >>>
> >>> Thanks
> >>>
> >> The DMA device object is expected to align cache line, so clang will
> >> use “vmovaps” assembly instruction,
> >>
> >> And the instruction demands 16 bytes alignment or will cause segment
> >> fault in some environments.
> >>
> > Test case:
> > 1. compile dpdk
> > rm -rf x86_64-native-linuxapp-clang
> > CC=clang meson -Denable_kmods=True -Dlibdir=lib
> > --default-library=static x86_64-native-linuxapp-clang ninja -C
> > x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> > /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> > --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> > If it cannot be reproduced, please try using a different core)
> > 3. exit dpdk-test
> > RTE>>quit
> > Segmentation fault (core dumped)
> 
> I will try to reproduce, but still a question: does above test has already merged
> your patch [1] or the current main branch code has this problem?
> 
> [1]
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> 1-wenwux.ma@intel.com/
> 
> Thanks
> 
the current main branch code has this problem.

Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.

> >
> >>
> >>>>  	/** Device info which supplied during device initialization. */
> >>>>  	struct rte_device *device;
> >>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> >>>> */
> >>>>
  
Jiale, SongX March 19, 2024, 9:48 a.m. UTC | #7
> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Friday, March 15, 2024 9:44 AM
> To: dev@dpdk.org; fengchengwen@huawei.com
> Cc: Jiale, SongX <songx.jiale@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>; stable@dpdk.org
> Subject: [PATCH v2] dmadev: fix structure alignment
> 
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev with
> __rte_aligned(8).
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
Tested-by: Jiale Song <songx.jiale@intel.com>
  
fengchengwen March 20, 2024, 4:11 a.m. UTC | #8
Hi Wenwu,

On 2024/3/15 17:27, Ma, WenwuX wrote:
> Hi Chengwen
> 
>> -----Original Message-----
>> From: fengchengwen <fengchengwen@huawei.com>
>> Sent: Friday, March 15, 2024 4:32 PM
>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Wenwu,
>>
>> On 2024/3/15 15:44, Ma, WenwuX wrote:
>>> Hi Chengwen,
>>>
>>>> -----Original Message-----
>>>> From: Ma, WenwuX
>>>> Sent: Friday, March 15, 2024 2:26 PM
>>>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>>>
>>>> Hi Chengwen,
>>>>
>>>>> -----Original Message-----
>>>>> From: fengchengwen <fengchengwen@huawei.com>
>>>>> Sent: Friday, March 15, 2024 2:06 PM
>>>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>>>
>>>>> Hi Wenwu,
>>>>>
>>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>>>> __rte_aligned(8).
>>>>>>
>>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>>>>>> ---
>>>>>> v2:
>>>>>>  - Because of performance drop, adjust the code to
>>>>>>    no longer demand cache line alignment
>>>>>
>>>>> Which two versions observed performance drop? And which benchmark
>>>>> observed drop?
>>>>> Could you provide more information?
>>>>>
>>>>>>
>>>> V1 patch:
>>>>
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>>>> 1-wenwux.ma@intel.com/
>>>>
>>>> To view detailed results, visit:
>>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>>>
>>>>>> ---
>>>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> index 58729088ff..b569bb3502 100644
>>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>>>   * @internal
>>>>>>   * The generic data structure associated with each DMA device.
>>>>>>   */
>>>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>>>
>>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
>>>>> is not rte_dma_dev? So why is it a problem here?
>>>>>
>>>>> Thanks
>>>>>
>>>> The DMA device object is expected to align cache line, so clang will
>>>> use “vmovaps” assembly instruction,
>>>>
>>>> And the instruction demands 16 bytes alignment or will cause segment
>>>> fault in some environments.
>>>>
>>> Test case:
>>> 1. compile dpdk
>>> rm -rf x86_64-native-linuxapp-clang
>>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
>>> --default-library=static x86_64-native-linuxapp-clang ninja -C
>>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
>>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
>>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
>>> If it cannot be reproduced, please try using a different core)
>>> 3. exit dpdk-test
>>> RTE>>quit
>>> Segmentation fault (core dumped)

I reproduce it just with --vdev=dma_skeleton.
When execute quit command, it will invoke rte_dma_close->dma_release, pls see my annotations (//) below:

void
dma_release(struct rte_dma_dev *dev)
{
	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
		rte_free(dev->data->dev_private);
		memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
	}

	dma_fp_object_dummy(dev->fp_obj);
	memset(dev, 0, sizeof(struct rte_dma_dev));   // this memset was compiles using vmovaps, its
						//  8c24da:       c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
						//  8c24de:       c5 fc 29 43 20          vmovaps %ymm0,0x20(%rbx)
						//  8c24e3:       c5 fc 29 03             vmovaps %ymm0,(%rbx)
						// but the dev is not align 16B (in my env the rte_dma_devices addr is 0x15d39950)
}

>>
>> I will try to reproduce, but still a question: does above test has already merged
>> your patch [1] or the current main branch code has this problem?
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma@intel.com/
>>
>> Thanks
>>
> the current main branch code has this problem.
> 
> Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.

The performance issue is ethdev benchmark, it will not invoke any dmadev API, I don't think these two has any relations.

So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it was not a fix for clang-x86-platform.

[1] https://lore.kernel.org/all/20240210062758.1510-1-pbhagavatula@marvell.com/T/

> 
>>>
>>>>
>>>>>>  	/** Device info which supplied during device initialization. */
>>>>>>  	struct rte_device *device;
>>>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>>>> */
>>>>>>

What more, could you please send v3? I hope it will contain the root cause and optional solutions of the segment fault problem.

BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe more xxxdev will use this type, I think that's typical.
     Maybe we should add a such mem_align() function in eal library, but this could done later.

Thanks
  
Ma, WenwuX March 20, 2024, 7:34 a.m. UTC | #9
Hi chengwen,

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Wednesday, March 20, 2024 12:12 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 17:27, Ma, WenwuX wrote:
> > Hi Chengwen
> >
> >> -----Original Message-----
> >> From: fengchengwen <fengchengwen@huawei.com>
> >> Sent: Friday, March 15, 2024 4:32 PM
> >> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Wenwu,
> >>
> >> On 2024/3/15 15:44, Ma, WenwuX wrote:
> >>> Hi Chengwen,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ma, WenwuX
> >>>> Sent: Friday, March 15, 2024 2:26 PM
> >>>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> >>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>>>
> >>>> Hi Chengwen,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: fengchengwen <fengchengwen@huawei.com>
> >>>>> Sent: Friday, March 15, 2024 2:06 PM
> >>>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>>>
> >>>>> Hi Wenwu,
> >>>>>
> >>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>>>> __rte_aligned(8).
> >>>>>>
> >>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>>>> Cc: stable@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >>>>>> ---
> >>>>>> v2:
> >>>>>>  - Because of performance drop, adjust the code to
> >>>>>>    no longer demand cache line alignment
> >>>>>
> >>>>> Which two versions observed performance drop? And which
> benchmark
> >>>>> observed drop?
> >>>>> Could you provide more information?
> >>>>>
> >>>>>>
> >>>> V1 patch:
> >>>>
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >>>> 1-wenwux.ma@intel.com/
> >>>>
> >>>> To view detailed results, visit:
> >>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>>>
> >>>>>> ---
> >>>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>>>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> index 58729088ff..b569bb3502 100644
> >>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>>>>   * @internal
> >>>>>>   * The generic data structure associated with each DMA device.
> >>>>>>   */
> >>>>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>>>
> >>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>>>> is not rte_dma_dev? So why is it a problem here?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> The DMA device object is expected to align cache line, so clang
> >>>> will use “vmovaps” assembly instruction,
> >>>>
> >>>> And the instruction demands 16 bytes alignment or will cause
> >>>> segment fault in some environments.
> >>>>
> >>> Test case:
> >>> 1. compile dpdk
> >>> rm -rf x86_64-native-linuxapp-clang
> >>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
> >>> --default-library=static x86_64-native-linuxapp-clang ninja -C
> >>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> >>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> >>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> >>> If it cannot be reproduced, please try using a different core)
> >>> 3. exit dpdk-test
> >>> RTE>>quit
> >>> Segmentation fault (core dumped)
> 
> I reproduce it just with --vdev=dma_skeleton.
> When execute quit command, it will invoke rte_dma_close->dma_release, pls
> see my annotations (//) below:
> 
> void
> dma_release(struct rte_dma_dev *dev)
> {
> 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> 		rte_free(dev->data->dev_private);
> 		memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
> 	}
> 
> 	dma_fp_object_dummy(dev->fp_obj);
> 	memset(dev, 0, sizeof(struct rte_dma_dev));   // this memset was
> compiles using vmovaps, its
> 						//  8c24da:       c5 f8 57 c0
> vxorps %xmm0,%xmm0,%xmm0
> 						//  8c24de:       c5 fc 29 43 20
> vmovaps %ymm0,0x20(%rbx)
> 						//  8c24e3:       c5 fc 29 03
> vmovaps %ymm0,(%rbx)
> 						// but the dev is not align 16B
> (in my env the rte_dma_devices addr is 0x15d39950) }
> 
> >>
> >> I will try to reproduce, but still a question: does above test has
> >> already merged your patch [1] or the current main branch code has this
> problem?
> >>
> >> [1]
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux.ma@intel.com/
> >>
> >> Thanks
> >>
> > the current main branch code has this problem.
> >
> > Both patch v1 and v2 are able to solve this problem, but v1 has a
> performance issue.
> 
> The performance issue is ethdev benchmark, it will not invoke any dmadev
> API, I don't think these two has any relations.
> 
> So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it
> was not a fix for clang-x86-platform.
> 
The performance issue is subtle, as it doesn't occur in the v2 patch. 
So, maybe it needs more investigation.

> [1] https://lore.kernel.org/all/20240210062758.1510-1-
> pbhagavatula@marvell.com/T/
> 
> >
> >>>
> >>>>
> >>>>>>  	/** Device info which supplied during device initialization. */
> >>>>>>  	struct rte_device *device;
> >>>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device
> data.
> >>>>>> */
> >>>>>>
> 
> What more, could you please send v3? I hope it will contain the root cause and
> optional solutions of the segment fault problem.
> 
I will submit v3 patch later.

> BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe
> more xxxdev will use this type, I think that's typical.
>      Maybe we should add a such mem_align() function in eal library, but this
> could done later.
> 
> Thanks
  

Patch

diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
index 58729088ff..b569bb3502 100644
--- a/lib/dmadev/rte_dmadev_pmd.h
+++ b/lib/dmadev/rte_dmadev_pmd.h
@@ -122,7 +122,7 @@  enum rte_dma_dev_state {
  * @internal
  * The generic data structure associated with each DMA device.
  */
-struct __rte_cache_aligned rte_dma_dev {
+struct __rte_aligned(8) rte_dma_dev {
 	/** Device info which supplied during device initialization. */
 	struct rte_device *device;
 	struct rte_dma_dev_data *data; /**< Pointer to shared device data. */