net/mlx5: fix keeping indirect RSS non-isolated mode

Message ID 20211116073834.2413952-1-dkozlyuk@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix keeping indirect RSS non-isolated mode |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Performance fail Performance Testing issues
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Dmitry Kozlyuk Nov. 16, 2021, 7:38 a.m. UTC
  When a port starts in non-isolated mode,
an internal indirect RSS is created that includes all configured queues
and a flow rule is created that references this indirect RSS.
If before switching to non-isolated mode an indirect RSS was created
that includes the same set of queues, it would be reused at this point.
However, because the port had been stopped (or not yet started),
the TIR for this indirect RSS had been destroyed (or not yet created).
The flow rule could not be created and the port start failed.

Creation of TIRs is moved before configuring non-isolated mode flows,
but it is not enough because of the following issue.

Commit 0cedf34da78f ("net/mlx5: move Rx queue reference count")
changed mlx5_rxq_get() not to increment RxQ control structure
reference count, mlx5_rxq_ref() was introduced for this purpose.
mlx5_ind_table_obj_attach() was not updated to use the new function,
so when the port was stopped, the control structure reference count
of an RxQ used in RSS reached zero and the structure was destroyed.

Use mlx5_rxq_ref() to keep RxQ control structure
needed for indirect RSS persistence across port restart.

Fixes: ec4e11d41d12 ("net/mlx5: preserve indirect actions on restart")
Fixes: 0cedf34da78f ("net/mlx5: move Rx queue reference count")
Cc: xuemingl@nvidia.com

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c     |  2 +-
 drivers/net/mlx5/mlx5_trigger.c | 19 +++++++++++--------
 2 files changed, 12 insertions(+), 9 deletions(-)
  

Comments

Raslan Darawsheh Nov. 16, 2021, 1:06 p.m. UTC | #1
Hi,

> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, November 16, 2021 9:39 AM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Ori Kam <orika@nvidia.com>; Raslan
> Darawsheh <rasland@nvidia.com>; Xueming(Steven) Li
> <xuemingl@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH] net/mlx5: fix keeping indirect RSS non-isolated mode
> 
> When a port starts in non-isolated mode,
> an internal indirect RSS is created that includes all configured queues
> and a flow rule is created that references this indirect RSS.
> If before switching to non-isolated mode an indirect RSS was created
> that includes the same set of queues, it would be reused at this point.
> However, because the port had been stopped (or not yet started),
> the TIR for this indirect RSS had been destroyed (or not yet created).
> The flow rule could not be created and the port start failed.
> 
> Creation of TIRs is moved before configuring non-isolated mode flows,
> but it is not enough because of the following issue.
> 
> Commit 0cedf34da78f ("net/mlx5: move Rx queue reference count")
> changed mlx5_rxq_get() not to increment RxQ control structure
> reference count, mlx5_rxq_ref() was introduced for this purpose.
> mlx5_ind_table_obj_attach() was not updated to use the new function,
> so when the port was stopped, the control structure reference count
> of an RxQ used in RSS reached zero and the structure was destroyed.
> 
> Use mlx5_rxq_ref() to keep RxQ control structure
> needed for indirect RSS persistence across port restart.
> 
> Fixes: ec4e11d41d12 ("net/mlx5: preserve indirect actions on restart")
> Fixes: 0cedf34da78f ("net/mlx5: move Rx queue reference count")
> Cc: xuemingl@nvidia.com
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 52b95d7070..d5a7155392 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2438,7 +2438,7 @@  mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
 		return ret;
 	}
 	for (i = 0; i < ind_tbl->queues_n; i++)
-		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+		mlx5_rxq_ref(dev, ind_tbl->queues[i]);
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 1952d68444..65caa5ac14 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1172,6 +1172,17 @@  mlx5_dev_start(struct rte_eth_dev *dev)
 		goto error;
 	}
 	mlx5_os_stats_init(dev);
+	/*
+	 * Attach indirection table objects detached on port stop.
+	 * They may be needed to create RSS in non-isolated mode.
+	 */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
@@ -1184,14 +1195,6 @@  mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
-	/* Attach indirection table objects detached on port stop. */
-	ret = mlx5_action_handle_attach(dev);
-	if (ret) {
-		DRV_LOG(ERR,
-			"port %u failed to attach indirect actions: %s",
-			dev->data->port_id, rte_strerror(rte_errno));
-		goto error;
-	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.