[v5,2/2] net/mlx5: support socket direct mode bonding

Message ID 20211026084830.440951-3-rongweil@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series support socket direct mode bonding |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-spell-check-testing warning Testing issues
ci/github-robot: build success github build: passed
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-mellanox-Performance fail Performance Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS

Commit Message

Rongwei Liu Oct. 26, 2021, 8:48 a.m. UTC
  In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst               |  4 +++
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 drivers/net/mlx5/linux/mlx5_os.c       | 43 ++++++++++++++++++++------
 3 files changed, 42 insertions(+), 9 deletions(-)
  

Comments

Raslan Darawsheh Oct. 26, 2021, 11:26 a.m. UTC | #1
Hi,

> -----Original Message-----
> From: Rongwei Liu <rongweil@nvidia.com>
> Sent: Tuesday, October 26, 2021 11:49 AM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v5 2/2] net/mlx5: support socket direct mode bonding
> 
> In socket direct mode, it's possible to bind any two (maybe four in future)
> PCIe devices with IDs like xxxx:xx:xx.x and yyyy:yy:yy.y. Bonding member
> interfaces are unnecessary to have the same PCIe domain/bus/device ID
> anymore,
> 
> Kernel driver uses "system_image_guid" to identify if devices can be bound
> together or not. Sysfs "phys_switch_id" is used to get "system_image_guid"
> of each network interface.
> 
> OFED 5.4+ is required to support "phys_switch_id".
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/nics/mlx5.rst               |  4 +++
>  doc/guides/rel_notes/release_21_11.rst |  4 +++
>  drivers/net/mlx5/linux/mlx5_os.c       | 43 ++++++++++++++++++++------
>  3 files changed, 42 insertions(+), 9 deletions(-)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 47709d93b3..45f44c97d7 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -468,6 +468,10 @@ Limitations
> 
>    - TXQ affinity subjects to HW hash once enabled.
> 
> +- Bonding under socket direct mode
> +
> +  - Needs OFED 5.4+.
> +
>  Statistics
>  ----------
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> index 1ccac87b73..2f46b27709 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -217,6 +217,10 @@ New Features
>    * Added PDCP short MAC-I support.
>    * Added raw vector datapath API support.
> 
> +* **Updated Mellanox mlx5 driver.**
> +
> +  * Added socket direct mode bonding support.
This part needs to be in the previously added update in the release notes.
Will fix during integration,

Kindest regards
Raslan Darawsheh
  

Patch

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 47709d93b3..45f44c97d7 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -468,6 +468,10 @@  Limitations
 
   - TXQ affinity subjects to HW hash once enabled.
 
+- Bonding under socket direct mode
+
+  - Needs OFED 5.4+.
+
 Statistics
 ----------
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 1ccac87b73..2f46b27709 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -217,6 +217,10 @@  New Features
   * Added PDCP short MAC-I support.
   * Added raw vector datapath API support.
 
+* **Updated Mellanox mlx5 driver.**
+
+  * Added socket direct mode bonding support.
+
 * **Updated NXP dpaa2_sec crypto PMD.**
 
   * Added PDCP short MAC-I support.
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 72bbb665cf..3deae861d5 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1898,6 +1898,8 @@  mlx5_device_bond_pci_match(const char *ibdev_name,
 	FILE *bond_file = NULL, *file;
 	int pf = -1;
 	int ret;
+	uint8_t cur_guid[32] = {0};
+	uint8_t guid[32] = {0};
 
 	/*
 	 * Try to get master device name. If something goes wrong suppose
@@ -1911,6 +1913,8 @@  mlx5_device_bond_pci_match(const char *ibdev_name,
 	np = mlx5_nl_portnum(nl_rdma, ibdev_name);
 	if (!np)
 		return -1;
+	if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0)
+		return -1;
 	/*
 	 * The master device might not be on the predefined port(not on port
 	 * index 1, it is not guaranteed), we have to scan all Infiniband
@@ -1938,6 +1942,7 @@  mlx5_device_bond_pci_match(const char *ibdev_name,
 		char tmp_str[IF_NAMESIZE + 32];
 		struct rte_pci_addr pci_addr;
 		struct mlx5_switch_info	info;
+		int ret;
 
 		/* Process slave interface names in the loop. */
 		snprintf(tmp_str, sizeof(tmp_str),
@@ -1969,15 +1974,6 @@  mlx5_device_bond_pci_match(const char *ibdev_name,
 				tmp_str);
 			break;
 		}
-		/* Match PCI address, allows BDF0+pfx or BDFx+pfx. */
-		if (pci_dev->domain == pci_addr.domain &&
-		    pci_dev->bus == pci_addr.bus &&
-		    pci_dev->devid == pci_addr.devid &&
-		    ((pci_dev->function == 0 &&
-		      pci_dev->function + owner == pci_addr.function) ||
-		     (pci_dev->function == owner &&
-		      pci_addr.function == owner)))
-			pf = info.port_name;
 		/* Get ifindex. */
 		snprintf(tmp_str, sizeof(tmp_str),
 			 "/sys/class/net/%s/ifindex", ifname);
@@ -1994,6 +1990,30 @@  mlx5_device_bond_pci_match(const char *ibdev_name,
 		bond_info->ports[info.port_name].pci_addr = pci_addr;
 		bond_info->ports[info.port_name].ifindex = ifindex;
 		bond_info->n_port++;
+		/*
+		 * Under socket direct mode, bonding will use
+		 * system_image_guid as identification.
+		 * After OFED 5.4, guid is readable (ret >= 0) under sysfs.
+		 * All bonding members should have the same guid even if driver
+		 * is using PCIe BDF.
+		 */
+		ret = mlx5_get_device_guid(&pci_addr, guid, sizeof(guid));
+		if (ret < 0)
+			break;
+		else if (ret > 0) {
+			if (!memcmp(guid, cur_guid, sizeof(guid)) &&
+			    owner == info.port_name &&
+			    (owner != 0 || (owner == 0 &&
+			    !rte_pci_addr_cmp(pci_dev, &pci_addr))))
+				pf = info.port_name;
+		} else if (pci_dev->domain == pci_addr.domain &&
+		    pci_dev->bus == pci_addr.bus &&
+		    pci_dev->devid == pci_addr.devid &&
+		    ((pci_dev->function == 0 &&
+		      pci_dev->function + owner == pci_addr.function) ||
+		     (pci_dev->function == owner &&
+		      pci_addr.function == owner)))
+			pf = info.port_name;
 	}
 	if (pf >= 0) {
 		/* Get bond interface info */
@@ -2006,6 +2026,11 @@  mlx5_device_bond_pci_match(const char *ibdev_name,
 			DRV_LOG(INFO, "PF device %u, bond device %u(%s)",
 				ifindex, bond_info->ifindex, bond_info->ifname);
 	}
+	if (owner == 0 && pf != 0) {
+		DRV_LOG(INFO, "PCIe instance %04x:%02x:%02x.%x isn't bonding owner",
+				pci_dev->domain, pci_dev->bus, pci_dev->devid,
+				pci_dev->function);
+	}
 	return pf;
 }