[v2] dmadev: fix structure alignment
Checks
Commit Message
The structure rte_dma_dev needs only 8 byte alignment.
This patch replaces __rte_cache_aligned of rte_dma_dev
with __rte_aligned(8).
Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
Cc: stable@dpdk.org
Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
v2:
- Because of performance drop, adjust the code to
no longer demand cache line alignment
---
lib/dmadev/rte_dmadev_pmd.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Fri, Mar 15, 2024 at 09:43:31AM +0800, Wenwu Ma wrote:
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev
> with __rte_aligned(8).
>
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Hi Wenwu,
On 2024/3/15 9:43, Wenwu Ma wrote:
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev
> with __rte_aligned(8).
>
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
> v2:
> - Because of performance drop, adjust the code to
> no longer demand cache line alignment
Which two versions observed performance drop? And which benchmark observed drop?
Could you provide more information?
>
> ---
> lib/dmadev/rte_dmadev_pmd.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> index 58729088ff..b569bb3502 100644
> --- a/lib/dmadev/rte_dmadev_pmd.h
> +++ b/lib/dmadev/rte_dmadev_pmd.h
> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> * @internal
> * The generic data structure associated with each DMA device.
> */
> -struct __rte_cache_aligned rte_dma_dev {
> +struct __rte_aligned(8) rte_dma_dev {
The DMA fast-path was implemented by struct rte_dma_fp_objs, which is not
rte_dma_dev? So why is it a problem here?
Thanks
> /** Device info which supplied during device initialization. */
> struct rte_device *device;
> struct rte_dma_dev_data *data; /**< Pointer to shared device data. */
>
Hi Chengwen,
> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 15, 2024 2:06 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>
> Hi Wenwu,
>
> On 2024/3/15 9:43, Wenwu Ma wrote:
> > The structure rte_dma_dev needs only 8 byte alignment.
> > This patch replaces __rte_cache_aligned of rte_dma_dev with
> > __rte_aligned(8).
> >
> > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > ---
> > v2:
> > - Because of performance drop, adjust the code to
> > no longer demand cache line alignment
>
> Which two versions observed performance drop? And which benchmark
> observed drop?
> Could you provide more information?
>
> >
V1 patch:
https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-1-wenwux.ma@intel.com/
To view detailed results, visit:
https://lab.dpdk.org/results/dashboard/patchsets/29472/
> > ---
> > lib/dmadev/rte_dmadev_pmd.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/dmadev/rte_dmadev_pmd.h
> b/lib/dmadev/rte_dmadev_pmd.h
> > index 58729088ff..b569bb3502 100644
> > --- a/lib/dmadev/rte_dmadev_pmd.h
> > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> > * @internal
> > * The generic data structure associated with each DMA device.
> > */
> > -struct __rte_cache_aligned rte_dma_dev {
> > +struct __rte_aligned(8) rte_dma_dev {
>
> The DMA fast-path was implemented by struct rte_dma_fp_objs, which is not
> rte_dma_dev? So why is it a problem here?
>
> Thanks
>
The DMA device object is expected to align cache line, so clang will use “vmovaps” assembly instruction,
And the instruction demands 16 bytes alignment or will cause segment fault in some environments.
> > /** Device info which supplied during device initialization. */
> > struct rte_device *device;
> > struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> > */
> >
Hi Chengwen,
> -----Original Message-----
> From: Ma, WenwuX
> Sent: Friday, March 15, 2024 2:26 PM
> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>
> Hi Chengwen,
>
> > -----Original Message-----
> > From: fengchengwen <fengchengwen@huawei.com>
> > Sent: Friday, March 15, 2024 2:06 PM
> > To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> > Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> > Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >
> > Hi Wenwu,
> >
> > On 2024/3/15 9:43, Wenwu Ma wrote:
> > > The structure rte_dma_dev needs only 8 byte alignment.
> > > This patch replaces __rte_cache_aligned of rte_dma_dev with
> > > __rte_aligned(8).
> > >
> > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > ---
> > > v2:
> > > - Because of performance drop, adjust the code to
> > > no longer demand cache line alignment
> >
> > Which two versions observed performance drop? And which benchmark
> > observed drop?
> > Could you provide more information?
> >
> > >
> V1 patch:
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> 1-wenwux.ma@intel.com/
>
> To view detailed results, visit:
> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>
> > > ---
> > > lib/dmadev/rte_dmadev_pmd.h | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/dmadev/rte_dmadev_pmd.h
> > b/lib/dmadev/rte_dmadev_pmd.h
> > > index 58729088ff..b569bb3502 100644
> > > --- a/lib/dmadev/rte_dmadev_pmd.h
> > > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > > @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> > > * @internal
> > > * The generic data structure associated with each DMA device.
> > > */
> > > -struct __rte_cache_aligned rte_dma_dev {
> > > +struct __rte_aligned(8) rte_dma_dev {
> >
> > The DMA fast-path was implemented by struct rte_dma_fp_objs, which is
> > not rte_dma_dev? So why is it a problem here?
> >
> > Thanks
> >
> The DMA device object is expected to align cache line, so clang will use
> “vmovaps” assembly instruction,
>
> And the instruction demands 16 bytes alignment or will cause segment fault in
> some environments.
>
Test case:
1. compile dpdk
rm -rf x86_64-native-linuxapp-clang
CC=clang meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-clang
ninja -C x86_64-native-linuxapp-clang -j 72
2. start dpdk-test
/root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: If it cannot be reproduced, please try using a different core)
3. exit dpdk-test
RTE>>quit
Segmentation fault (core dumped)
>
> > > /** Device info which supplied during device initialization. */
> > > struct rte_device *device;
> > > struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> > > */
> > >
Hi Wenwu,
On 2024/3/15 15:44, Ma, WenwuX wrote:
> Hi Chengwen,
>
>> -----Original Message-----
>> From: Ma, WenwuX
>> Sent: Friday, March 15, 2024 2:26 PM
>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Chengwen,
>>
>>> -----Original Message-----
>>> From: fengchengwen <fengchengwen@huawei.com>
>>> Sent: Friday, March 15, 2024 2:06 PM
>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>
>>> Hi Wenwu,
>>>
>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>> __rte_aligned(8).
>>>>
>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>>>> ---
>>>> v2:
>>>> - Because of performance drop, adjust the code to
>>>> no longer demand cache line alignment
>>>
>>> Which two versions observed performance drop? And which benchmark
>>> observed drop?
>>> Could you provide more information?
>>>
>>>>
>> V1 patch:
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma@intel.com/
>>
>> To view detailed results, visit:
>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>
>>>> ---
>>>> lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>> index 58729088ff..b569bb3502 100644
>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>> * @internal
>>>> * The generic data structure associated with each DMA device.
>>>> */
>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>
>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which is
>>> not rte_dma_dev? So why is it a problem here?
>>>
>>> Thanks
>>>
>> The DMA device object is expected to align cache line, so clang will use
>> “vmovaps” assembly instruction,
>>
>> And the instruction demands 16 bytes alignment or will cause segment fault in
>> some environments.
>>
> Test case:
> 1. compile dpdk
> rm -rf x86_64-native-linuxapp-clang
> CC=clang meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-clang
> ninja -C x86_64-native-linuxapp-clang -j 72
> 2. start dpdk-test
> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: If it cannot be reproduced, please try using a different core)
> 3. exit dpdk-test
> RTE>>quit
> Segmentation fault (core dumped)
I will try to reproduce, but still a question: does above test has already merged your patch [1] or the current main branch code has this problem?
[1] https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-1-wenwux.ma@intel.com/
Thanks
>
>>
>>>> /** Device info which supplied during device initialization. */
>>>> struct rte_device *device;
>>>> struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>> */
>>>>
Hi Chengwen
> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 15, 2024 4:32 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>
> Hi Wenwu,
>
> On 2024/3/15 15:44, Ma, WenwuX wrote:
> > Hi Chengwen,
> >
> >> -----Original Message-----
> >> From: Ma, WenwuX
> >> Sent: Friday, March 15, 2024 2:26 PM
> >> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Chengwen,
> >>
> >>> -----Original Message-----
> >>> From: fengchengwen <fengchengwen@huawei.com>
> >>> Sent: Friday, March 15, 2024 2:06 PM
> >>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>
> >>> Hi Wenwu,
> >>>
> >>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>> __rte_aligned(8).
> >>>>
> >>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >>>> ---
> >>>> v2:
> >>>> - Because of performance drop, adjust the code to
> >>>> no longer demand cache line alignment
> >>>
> >>> Which two versions observed performance drop? And which benchmark
> >>> observed drop?
> >>> Could you provide more information?
> >>>
> >>>>
> >> V1 patch:
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux.ma@intel.com/
> >>
> >> To view detailed results, visit:
> >> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>
> >>>> ---
> >>>> lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>> index 58729088ff..b569bb3502 100644
> >>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>> * @internal
> >>>> * The generic data structure associated with each DMA device.
> >>>> */
> >>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>
> >>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>> is not rte_dma_dev? So why is it a problem here?
> >>>
> >>> Thanks
> >>>
> >> The DMA device object is expected to align cache line, so clang will
> >> use “vmovaps” assembly instruction,
> >>
> >> And the instruction demands 16 bytes alignment or will cause segment
> >> fault in some environments.
> >>
> > Test case:
> > 1. compile dpdk
> > rm -rf x86_64-native-linuxapp-clang
> > CC=clang meson -Denable_kmods=True -Dlibdir=lib
> > --default-library=static x86_64-native-linuxapp-clang ninja -C
> > x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> > /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> > --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> > If it cannot be reproduced, please try using a different core)
> > 3. exit dpdk-test
> > RTE>>quit
> > Segmentation fault (core dumped)
>
> I will try to reproduce, but still a question: does above test has already merged
> your patch [1] or the current main branch code has this problem?
>
> [1]
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> 1-wenwux.ma@intel.com/
>
> Thanks
>
the current main branch code has this problem.
Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.
> >
> >>
> >>>> /** Device info which supplied during device initialization. */
> >>>> struct rte_device *device;
> >>>> struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> >>>> */
> >>>>
> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Friday, March 15, 2024 9:44 AM
> To: dev@dpdk.org; fengchengwen@huawei.com
> Cc: Jiale, SongX <songx.jiale@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>; stable@dpdk.org
> Subject: [PATCH v2] dmadev: fix structure alignment
>
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev with
> __rte_aligned(8).
>
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
Tested-by: Jiale Song <songx.jiale@intel.com>
Hi Wenwu,
On 2024/3/15 17:27, Ma, WenwuX wrote:
> Hi Chengwen
>
>> -----Original Message-----
>> From: fengchengwen <fengchengwen@huawei.com>
>> Sent: Friday, March 15, 2024 4:32 PM
>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Wenwu,
>>
>> On 2024/3/15 15:44, Ma, WenwuX wrote:
>>> Hi Chengwen,
>>>
>>>> -----Original Message-----
>>>> From: Ma, WenwuX
>>>> Sent: Friday, March 15, 2024 2:26 PM
>>>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>>>
>>>> Hi Chengwen,
>>>>
>>>>> -----Original Message-----
>>>>> From: fengchengwen <fengchengwen@huawei.com>
>>>>> Sent: Friday, March 15, 2024 2:06 PM
>>>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>>>
>>>>> Hi Wenwu,
>>>>>
>>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>>>> __rte_aligned(8).
>>>>>>
>>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>>>>>> ---
>>>>>> v2:
>>>>>> - Because of performance drop, adjust the code to
>>>>>> no longer demand cache line alignment
>>>>>
>>>>> Which two versions observed performance drop? And which benchmark
>>>>> observed drop?
>>>>> Could you provide more information?
>>>>>
>>>>>>
>>>> V1 patch:
>>>>
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>>>> 1-wenwux.ma@intel.com/
>>>>
>>>> To view detailed results, visit:
>>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>>>
>>>>>> ---
>>>>>> lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> index 58729088ff..b569bb3502 100644
>>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>>> * @internal
>>>>>> * The generic data structure associated with each DMA device.
>>>>>> */
>>>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>>>
>>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
>>>>> is not rte_dma_dev? So why is it a problem here?
>>>>>
>>>>> Thanks
>>>>>
>>>> The DMA device object is expected to align cache line, so clang will
>>>> use “vmovaps” assembly instruction,
>>>>
>>>> And the instruction demands 16 bytes alignment or will cause segment
>>>> fault in some environments.
>>>>
>>> Test case:
>>> 1. compile dpdk
>>> rm -rf x86_64-native-linuxapp-clang
>>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
>>> --default-library=static x86_64-native-linuxapp-clang ninja -C
>>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
>>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
>>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
>>> If it cannot be reproduced, please try using a different core)
>>> 3. exit dpdk-test
>>> RTE>>quit
>>> Segmentation fault (core dumped)
I reproduce it just with --vdev=dma_skeleton.
When execute quit command, it will invoke rte_dma_close->dma_release, pls see my annotations (//) below:
void
dma_release(struct rte_dma_dev *dev)
{
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
rte_free(dev->data->dev_private);
memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
}
dma_fp_object_dummy(dev->fp_obj);
memset(dev, 0, sizeof(struct rte_dma_dev)); // this memset was compiles using vmovaps, its
// 8c24da: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0
// 8c24de: c5 fc 29 43 20 vmovaps %ymm0,0x20(%rbx)
// 8c24e3: c5 fc 29 03 vmovaps %ymm0,(%rbx)
// but the dev is not align 16B (in my env the rte_dma_devices addr is 0x15d39950)
}
>>
>> I will try to reproduce, but still a question: does above test has already merged
>> your patch [1] or the current main branch code has this problem?
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma@intel.com/
>>
>> Thanks
>>
> the current main branch code has this problem.
>
> Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.
The performance issue is ethdev benchmark, it will not invoke any dmadev API, I don't think these two has any relations.
So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it was not a fix for clang-x86-platform.
[1] https://lore.kernel.org/all/20240210062758.1510-1-pbhagavatula@marvell.com/T/
>
>>>
>>>>
>>>>>> /** Device info which supplied during device initialization. */
>>>>>> struct rte_device *device;
>>>>>> struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>>>> */
>>>>>>
What more, could you please send v3? I hope it will contain the root cause and optional solutions of the segment fault problem.
BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe more xxxdev will use this type, I think that's typical.
Maybe we should add a such mem_align() function in eal library, but this could done later.
Thanks
Hi chengwen,
> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Wednesday, March 20, 2024 12:12 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>
> Hi Wenwu,
>
> On 2024/3/15 17:27, Ma, WenwuX wrote:
> > Hi Chengwen
> >
> >> -----Original Message-----
> >> From: fengchengwen <fengchengwen@huawei.com>
> >> Sent: Friday, March 15, 2024 4:32 PM
> >> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Wenwu,
> >>
> >> On 2024/3/15 15:44, Ma, WenwuX wrote:
> >>> Hi Chengwen,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ma, WenwuX
> >>>> Sent: Friday, March 15, 2024 2:26 PM
> >>>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> >>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>>>
> >>>> Hi Chengwen,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: fengchengwen <fengchengwen@huawei.com>
> >>>>> Sent: Friday, March 15, 2024 2:06 PM
> >>>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>>>
> >>>>> Hi Wenwu,
> >>>>>
> >>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>>>> __rte_aligned(8).
> >>>>>>
> >>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>>>> Cc: stable@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >>>>>> ---
> >>>>>> v2:
> >>>>>> - Because of performance drop, adjust the code to
> >>>>>> no longer demand cache line alignment
> >>>>>
> >>>>> Which two versions observed performance drop? And which
> benchmark
> >>>>> observed drop?
> >>>>> Could you provide more information?
> >>>>>
> >>>>>>
> >>>> V1 patch:
> >>>>
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >>>> 1-wenwux.ma@intel.com/
> >>>>
> >>>> To view detailed results, visit:
> >>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>>>
> >>>>>> ---
> >>>>>> lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>>>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> index 58729088ff..b569bb3502 100644
> >>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>>>> * @internal
> >>>>>> * The generic data structure associated with each DMA device.
> >>>>>> */
> >>>>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>>>
> >>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>>>> is not rte_dma_dev? So why is it a problem here?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> The DMA device object is expected to align cache line, so clang
> >>>> will use “vmovaps” assembly instruction,
> >>>>
> >>>> And the instruction demands 16 bytes alignment or will cause
> >>>> segment fault in some environments.
> >>>>
> >>> Test case:
> >>> 1. compile dpdk
> >>> rm -rf x86_64-native-linuxapp-clang
> >>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
> >>> --default-library=static x86_64-native-linuxapp-clang ninja -C
> >>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> >>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> >>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> >>> If it cannot be reproduced, please try using a different core)
> >>> 3. exit dpdk-test
> >>> RTE>>quit
> >>> Segmentation fault (core dumped)
>
> I reproduce it just with --vdev=dma_skeleton.
> When execute quit command, it will invoke rte_dma_close->dma_release, pls
> see my annotations (//) below:
>
> void
> dma_release(struct rte_dma_dev *dev)
> {
> if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> rte_free(dev->data->dev_private);
> memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
> }
>
> dma_fp_object_dummy(dev->fp_obj);
> memset(dev, 0, sizeof(struct rte_dma_dev)); // this memset was
> compiles using vmovaps, its
> // 8c24da: c5 f8 57 c0
> vxorps %xmm0,%xmm0,%xmm0
> // 8c24de: c5 fc 29 43 20
> vmovaps %ymm0,0x20(%rbx)
> // 8c24e3: c5 fc 29 03
> vmovaps %ymm0,(%rbx)
> // but the dev is not align 16B
> (in my env the rte_dma_devices addr is 0x15d39950) }
>
> >>
> >> I will try to reproduce, but still a question: does above test has
> >> already merged your patch [1] or the current main branch code has this
> problem?
> >>
> >> [1]
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux.ma@intel.com/
> >>
> >> Thanks
> >>
> > the current main branch code has this problem.
> >
> > Both patch v1 and v2 are able to solve this problem, but v1 has a
> performance issue.
>
> The performance issue is ethdev benchmark, it will not invoke any dmadev
> API, I don't think these two has any relations.
>
> So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it
> was not a fix for clang-x86-platform.
>
The performance issue is subtle, as it doesn't occur in the v2 patch.
So, maybe it needs more investigation.
> [1] https://lore.kernel.org/all/20240210062758.1510-1-
> pbhagavatula@marvell.com/T/
>
> >
> >>>
> >>>>
> >>>>>> /** Device info which supplied during device initialization. */
> >>>>>> struct rte_device *device;
> >>>>>> struct rte_dma_dev_data *data; /**< Pointer to shared device
> data.
> >>>>>> */
> >>>>>>
>
> What more, could you please send v3? I hope it will contain the root cause and
> optional solutions of the segment fault problem.
>
I will submit v3 patch later.
> BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe
> more xxxdev will use this type, I think that's typical.
> Maybe we should add a such mem_align() function in eal library, but this
> could done later.
>
> Thanks
@@ -122,7 +122,7 @@ enum rte_dma_dev_state {
* @internal
* The generic data structure associated with each DMA device.
*/
-struct __rte_cache_aligned rte_dma_dev {
+struct __rte_aligned(8) rte_dma_dev {
/** Device info which supplied during device initialization. */
struct rte_device *device;
struct rte_dma_dev_data *data; /**< Pointer to shared device data. */