net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl

Message ID 20220307225256.172328-1-thinhtr@linux.vnet.ibm.com (mailing list archive)
State Superseded, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Thinh Tran March 7, 2022, 10:52 p.m. UTC
  I hit a failure during ports drop queue RQ creation when my adapters
are on CPU socket ID 1 instead of socket ID 0:
....
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
EAL: set_mempolicy failed: Invalid argument
mlx5_common: Failed to allocate memory for RQ.
mlx5_net: Port 0 drop queue RQ creation failed.
mlx5_net: Cannot create drop RX queue
mlx5_net: probe of PCI device 0020:01:00.0 aborted after encountering an error: Success
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
EAL: set_mempolicy failed: Invalid argument
mlx5_common: Failed to allocate memory for RQ.
mlx5_net: Port 0 drop queue RQ creation failed.
mlx5_net: Cannot create drop RX queue
mlx5_net: probe of PCI device 0020:01:00.1 aborted after encountering an error: Success
TELEMETRY: No legacy callbacks, legacy socket not created
testpmd: No probed ethernet devices
...

The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
mlx5_devx_rq_create() with correct CPU socket ID.
Result with this patch:
......
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
......
Configuring Port 0 (socket 1)
Port 0: 0C:42:A1:ED:C1:20
Configuring Port 1 (socket 1)
Port 1: 0C:42:A1:ED:C1:21
Checking link statuses...
Done


Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
---
 drivers/net/mlx5/mlx5_devx.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Matan Azrad March 8, 2022, 10:23 a.m. UTC | #1
From: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> I hit a failure during ports drop queue RQ creation when my adapters are on CPU
> socket ID 1 instead of socket ID 0:
> ....
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.0 aborted after encountering an
> error: Success
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.1 aborted after encountering an
> error: Success
> TELEMETRY: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> ...
> 
> The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before calling the
> mlx5_rxq_create_devx_rq_resources() which eventually calls
> mlx5_devx_rq_create() with correct CPU socket ID.
> Result with this patch:
> ......
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> TELEMETRY: No legacy callbacks, legacy socket not created Interactive-mode
> selected ......
> Configuring Port 0 (socket 1)
> Port 0: 0C:42:A1:ED:C1:20
> Configuring Port 1 (socket 1)
> Port 1: 0C:42:A1:ED:C1:21
> Checking link statuses...
> Done
> 
> 
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Matan Azrad <matan@nvidia.com>
  
Raslan Darawsheh March 8, 2022, 12:14 p.m. UTC | #2
Hi Thinh,

> -----Original Message-----
> From: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Sent: Tuesday, March 8, 2022 12:53 AM
> To: dev@dpdk.org
> Cc: drc@linux.vnet.ibm.com; Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Subject: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
> 
This is a fix, so it's better to start commit title with a fix:
Something like this maybe:
"net/mlx5: fix CPU socket ID for mlx5_rxq_ctrl"
> I hit a failure during ports drop queue RQ creation when my adapters
> are on CPU socket ID 1 instead of socket ID 0:
> ....
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.0 aborted after encountering an
> error: Success
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> EAL: set_mempolicy failed: Invalid argument
> mlx5_common: Failed to allocate memory for RQ.
> mlx5_net: Port 0 drop queue RQ creation failed.
> mlx5_net: Cannot create drop RX queue
> mlx5_net: probe of PCI device 0020:01:00.1 aborted after encountering an
> error: Success
> TELEMETRY: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> ...
It's better to describe the issue rather than showing an example alone,
Maybe something like this:

The socket ID is used to determine the socket where to allocate memory for
mlx5_rxq_ctrl, currently it's set to 0 by default which mistakenly leads to 
allocating the memory on socket 0 always.

> 
> The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
> calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
> mlx5_devx_rq_create() with correct CPU socket ID.
> Result with this patch:

This sets the correct CPU socket ID before memory allocation to use the correct
socket ID.

> ......
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.0 (socket 1)
> EAL: Probe PCI driver: mlx5_pci (15b3:1019) device: 0020:01:00.1 (socket 1)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> ......
> Configuring Port 0 (socket 1)
> Port 0: 0C:42:A1:ED:C1:20
> Configuring Port 1 (socket 1)
> Port 1: 0C:42:A1:ED:C1:21
> Checking link statuses...
> Done
> 
Missing:
Fixes tag:

Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
Cc: xuemingl@nvidia.com
Missing Cc stable for backport. 

Cc: stable@dpdk.org

> 
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
> ---

Kindest regards,
Raslan Darawsheh
  
Dmitry Kozlyuk March 8, 2022, 12:23 p.m. UTC | #3
Hi Raslan,

> Missing:
> Fixes tag:
>
> Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
> Cc: xuemingl@nvidia.com

I believe the bug originates from my earlier commit, not Xueming's one:

    Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
  
Raslan Darawsheh March 8, 2022, 12:25 p.m. UTC | #4
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, March 8, 2022 2:23 PM
> To: Raslan Darawsheh <rasland@nvidia.com>
> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
> drc@linux.vnet.ibm.com
> Subject: Re: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
> 
> Hi Raslan,
> 
> > Missing:
> > Fixes tag:
> >
> > Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
> > Cc: xuemingl@nvidia.com
> 
> I believe the bug originates from my earlier commit, not Xueming's one:
> 
>     Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
Yes I think you are correct, my mistake  😊

Kindest regards,
Raslan Darawsheh
  
Slava Ovsiienko March 9, 2022, 8:50 a.m. UTC | #5
Hi, Thinh 

Thank you for the patch, the code looks OK to me, but commit message is not compliant:
- it should contain "fix" keyword in the title, like this:
  "net/mlx5: fix CPU socket ID for Rx queue creation"
- could you, please, make problem description less personal and less wordy?
 "The default CPU socket ID was used while creating the Rx queue and this caused
creation failure in case if hardware was not resided on the default socket.

The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
mlx5_devx_rq_create() with correct CPU socket ID."
- please add tags:
Cc: stable@dpdk.org
Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")

With best regards,
Slava

> -----Original Message-----
> From: Raslan Darawsheh <rasland@nvidia.com>
> Sent: Tuesday, March 8, 2022 14:25
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
> drc@linux.vnet.ibm.com
> Subject: RE: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
> 
> 
> > -----Original Message-----
> > From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > Sent: Tuesday, March 8, 2022 2:23 PM
> > To: Raslan Darawsheh <rasland@nvidia.com>
> > Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
> > drc@linux.vnet.ibm.com
> > Subject: Re: [PATCH] net/mlx5: set correct CPU socket ID for
> > mlx5_rxq_ctrl
> >
> > Hi Raslan,
> >
> > > Missing:
> > > Fixes tag:
> > >
> > > Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
> > > Cc: xuemingl@nvidia.com
> >
> > I believe the bug originates from my earlier commit, not Xueming's one:
> >
> >     Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
> Yes I think you are correct, my mistake  😊
> 
> Kindest regards,
> Raslan Darawsheh
  
Thinh Tran March 9, 2022, 5:10 p.m. UTC | #6
Hi,

On 3/9/2022 2:50 AM, Slava Ovsiienko wrote:
> Hi, Thinh
> 
> Thank you for the patch, the code looks OK to me, but commit message is not compliant:
> - it should contain "fix" keyword in the title, like this:
>    "net/mlx5: fix CPU socket ID for Rx queue creation"
> - could you, please, make problem description less personal and less wordy?
>   "The default CPU socket ID was used while creating the Rx queue and this caused
> creation failure in case if hardware was not resided on the default socket.
> 
> The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
> calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
> mlx5_devx_rq_create() with correct CPU socket ID."
> - please add tags:
> Cc: stable@dpdk.org
> Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
> 
> With best regards,
> Slava
>
I'll resubmit the patch with suggestions above.

Regards,
Thinh

>> -----Original Message-----
>> From: Raslan Darawsheh <rasland@nvidia.com>
>> Sent: Tuesday, March 8, 2022 14:25
>> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
>> drc@linux.vnet.ibm.com
>> Subject: RE: [PATCH] net/mlx5: set correct CPU socket ID for mlx5_rxq_ctrl
>>
>>
>>> -----Original Message-----
>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>> Sent: Tuesday, March 8, 2022 2:23 PM
>>> To: Raslan Darawsheh <rasland@nvidia.com>
>>> Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>; dev@dpdk.org;
>>> drc@linux.vnet.ibm.com
>>> Subject: Re: [PATCH] net/mlx5: set correct CPU socket ID for
>>> mlx5_rxq_ctrl
>>>
>>> Hi Raslan,
>>>
>>>> Missing:
>>>> Fixes tag:
>>>>
>>>> Fixes: 5ceb3a02b000 ("net/mlx5: move Rx queue DevX resource")
>>>> Cc: xuemingl@nvidia.com
>>>
>>> I believe the bug originates from my earlier commit, not Xueming's one:
>>>
>>>      Fixes: bc5bee028ebc ("net/mlx5: create drop queue using DevX")
>> Yes I think you are correct, my mistake  😊
>>
>> Kindest regards,
>> Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index af106bda50..5ab092a259 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -947,6 +947,8 @@  mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 		rte_errno = ENOMEM;
 		goto error;
 	}
+	/* set the CPU socket ID where the rxq_ctrl was allocated */
+	rxq_ctrl->socket = socket_id;
 	rxq_obj->rxq_ctrl = rxq_ctrl;
 	rxq_ctrl->is_hairpin = false;
 	rxq_ctrl->sh = priv->sh;