net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
Checks
Commit Message
I hit a failure during ports drop queue RQ creation when my adapters
are on CPU socket ID 1 instead of socket ID 0:
....
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
EAL: set_mempolicy failed: Invalid argument
mlx5_common: Failed to allocate memory for RQ.
mlx5_net: Port 0 drop queue RQ creation failed.
mlx5_net: Cannot create drop RX queue
mlx5_net: probe of PCI device 0020:01:00.0 aborted after encountering an error: Success
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
EAL: set_mempolicy failed: Invalid argument
mlx5_common: Failed to allocate memory for RQ.
mlx5_net: Port 0 drop queue RQ creation failed.
mlx5_net: Cannot create drop RX queue
mlx5_net: probe of PCI device 0020:01:00.1 aborted after encountering an error: Success
TELEMETRY: No legacy callbacks, legacy socket not created
testpmd: No probed ethernet devices
...
The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
mlx5_devx_rq_create() with correct CPU socket ID.
Result with this patch:
......
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
......
Configuring Port 0 (socket 1)
Port 0: 0C:42:A1:ED:C1:20
Configuring Port 1 (socket 1)
Port 1: 0C:42:A1:ED:C1:21
Checking link statuses...
Done
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
---
drivers/net/mlx5/mlx5_devx.c | 2 ++
1 file changed, 2 insertions(+)
Comments
From: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> I hit a failure during ports drop queue RQ creation when my adapters are on CPU
> socket ID 1 instead of socket ID 0:
> ....
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.0 aborted after encountering an
> error: Success
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.1 aborted after encountering an
> error: Success
> TELEMETRY: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> ...
>
> The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before calling the
> mlx5_rxq_create_devx_rq_resources() which eventually calls
> mlx5_devx_rq_create() with correct CPU socket ID.
> Result with this patch:
> ......
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> TELEMETRY: No legacy callbacks, legacy socket not created Interactive-mode
> selected ......
> Configuring Port 0 (socket 1)
> Port 0: 0C:42:A1:ED:C1:20
> Configuring Port 1 (socket 1)
> Port 1: 0C:42:A1:ED:C1:21
> Checking link statuses...
> Done
>
>
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Hi Thinh,
> -----Original Message-----
> From: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Sent: Tuesday, March 8, 2022 12:53 AM
> To: dev@dpdk.org
> Cc: drc@linux.vnet.ibm.com; Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Subject: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
>
This is a fix, so it's better to start commit title with a fix:
Something like this maybe:
"net/mlx5: fix CPU socket ID for mlx5_rxq_ctrl"
> I hit a failure during ports drop queue RQ creation when my adapters
> are on CPU socket ID 1 instead of socket ID 0:
> ....
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.0 aborted after encountering an
> error: Success
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.1 aborted after encountering an
> error: Success
> TELEMETRY: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> ...
It's better to describe the issue rather than showing an example alone,
Maybe something like this:
The socket ID is used to determine the socket where to allocate memory for
mlx5_rxq_ctrl, currently it's set to 0 by default which mistakenly leads to
allocating the memory on socket 0 always.
>
> The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
> calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
> mlx5_devx_rq_create() with correct CPU socket ID.
> Result with this patch:
This sets the correct CPU socket ID before memory allocation to use the correct
socket ID.
> ......
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> ......
> Configuring Port 0 (socket 1)
> Port 0: 0C:42:A1:ED:C1:20
> Configuring Port 1 (socket 1)
> Port 1: 0C:42:A1:ED:C1:21
> Checking link statuses...
> Done
>
Missing:
Fixes tag:
Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
Cc: xuemingl@nvidia.com
Missing Cc stable for backport.
Cc: stable@dpdk.org
>
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
> ---
Kindest regards,
Raslan Darawsheh
Hi Raslan,
> Missing:
> Fixes tag:
>
> Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
> Cc: xuemingl@nvidia.com
I believe the bug originates from my earlier commit, not Xueming's one:
Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, March 8, 2022 2:23 PM
> To: Raslan Darawsheh <rasland@nvidia.com>
> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
> drc@linux.vnet.ibm.com
> Subject: Re: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
>
> Hi Raslan,
>
> > Missing:
> > Fixes tag:
> >
> > Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
> > Cc: xuemingl@nvidia.com
>
> I believe the bug originates from my earlier commit, not Xueming's one:
>
> Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
Yes I think you are correct, my mistake 😊
Kindest regards,
Raslan Darawsheh
Hi, Thinh
Thank you for the patch, the code looks OK to me, but commit message is not compliant:
- it should contain "fix" keyword in the title, like this:
"net/mlx5: fix CPU socket ID for Rx queue creation"
- could you, please, make problem description less personal and less wordy?
"The default CPU socket ID was used while creating the Rx queue and this caused
creation failure in case if hardware was not resided on the default socket.
The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
mlx5_devx_rq_create() with correct CPU socket ID."
- please add tags:
Cc: stable@dpdk.org
Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
With best regards,
Slava
> -----Original Message-----
> From: Raslan Darawsheh <rasland@nvidia.com>
> Sent: Tuesday, March 8, 2022 14:25
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
> drc@linux.vnet.ibm.com
> Subject: RE: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
>
>
> > -----Original Message-----
> > From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > Sent: Tuesday, March 8, 2022 2:23 PM
> > To: Raslan Darawsheh <rasland@nvidia.com>
> > Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
> > drc@linux.vnet.ibm.com
> > Subject: Re: [PATCH] net/mlx5: set correct CPU socket ID for
> > mlx5_rxq_ctrl
> >
> > Hi Raslan,
> >
> > > Missing:
> > > Fixes tag:
> > >
> > > Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
> > > Cc: xuemingl@nvidia.com
> >
> > I believe the bug originates from my earlier commit, not Xueming's one:
> >
> > Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
> Yes I think you are correct, my mistake 😊
>
> Kindest regards,
> Raslan Darawsheh
Hi,
On 3/9/2022 2:50 AM, Slava Ovsiienko wrote:
> Hi, Thinh
>
> Thank you for the patch, the code looks OK to me, but commit message is not compliant:
> - it should contain "fix" keyword in the title, like this:
> "net/mlx5: fix CPU socket ID for Rx queue creation"
> - could you, please, make problem description less personal and less wordy?
> "The default CPU socket ID was used while creating the Rx queue and this caused
> creation failure in case if hardware was not resided on the default socket.
>
> The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
> calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
> mlx5_devx_rq_create() with correct CPU socket ID."
> - please add tags:
> Cc: stable@dpdk.org
> Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
>
> With best regards,
> Slava
>
I'll resubmit the patch with suggestions above.
Regards,
Thinh
>> -----Original Message-----
>> From: Raslan Darawsheh <rasland@nvidia.com>
>> Sent: Tuesday, March 8, 2022 14:25
>> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
>> drc@linux.vnet.ibm.com
>> Subject: RE: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
>>
>>
>>> -----Original Message-----
>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>> Sent: Tuesday, March 8, 2022 2:23 PM
>>> To: Raslan Darawsheh <rasland@nvidia.com>
>>> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
>>> drc@linux.vnet.ibm.com
>>> Subject: Re: [PATCH] net/mlx5: set correct CPU socket ID for
>>> mlx5_rxq_ctrl
>>>
>>> Hi Raslan,
>>>
>>>> Missing:
>>>> Fixes tag:
>>>>
>>>> Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
>>>> Cc: xuemingl@nvidia.com
>>>
>>> I believe the bug originates from my earlier commit, not Xueming's one:
>>>
>>> Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
>> Yes I think you are correct, my mistake 😊
>>
>> Kindest regards,
>> Raslan Darawsheh
@@ -947,6 +947,8 @@ mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
rte_errno = ENOMEM;
goto error;
}
+ /* set the CPU socket ID where the rxq_ctrl was allocated */
+ rxq_ctrl->socket = socket_id;
rxq_obj->rxq_ctrl = rxq_ctrl;
rxq_ctrl->is_hairpin = false;
rxq_ctrl->sh = priv->sh;