[04/12] net/mlx5: add VF LAG mode bonding device recognition

Message ID 1569398015-6027-5-git-send-email-viacheslavo@mellanox.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: add bonding configuration support |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Slava Ovsiienko Sept. 25, 2019, 7:53 a.m. UTC
  The Mellanox NICs starting from ConnectX-5 support LAG over
NIC ports internally, implemented by the NIC firmware and hardware.
The multiport NIC presents multiple physical PCI functions (PF),
with SR-IOV multiple virtual PCI functions (VFs) might be presented.
With switchdev mode the VF representors are engaged and PFs and their
VFs are connected by internal E-Switch feature. Each PF and related VFs
have dedicated E-Switch and belong to dedicated switch domain.

If NIC ports are combined to support NIC the kernel drivers introduce
the single unified Infiniband multiport devices, and all only one
unified E-Switch with single switch domain combines master PF
all all VFs. No extra DPDK bonding device is needed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit Sept. 30, 2019, 10:34 a.m. UTC | #1
On 9/25/2019 8:53 AM, Viacheslav Ovsiienko wrote:
> The Mellanox NICs starting from ConnectX-5 support LAG over
> NIC ports internally, implemented by the NIC firmware and hardware.
> The multiport NIC presents multiple physical PCI functions (PF),
> with SR-IOV multiple virtual PCI functions (VFs) might be presented.
> With switchdev mode the VF representors are engaged and PFs and their
> VFs are connected by internal E-Switch feature. Each PF and related VFs
> have dedicated E-Switch and belong to dedicated switch domain.
> 
> If NIC ports are combined to support NIC the kernel drivers introduce
> the single unified Infiniband multiport devices, and all only one
> unified E-Switch with single switch domain combines master PF
> all all VFs. No extra DPDK bonding device is needed.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

<...>

> +
> +	/* Use safe format to check maximal buffer length. */
> +#pragma GCC diagnostic ignored "-Wformat-nonliteral"
> +	while (fscanf(file, format, ifname) == 1) {
> +#pragma GCC diagnostic error "-Wformat-nonliteral"
> +		char tmp_str[IF_NAMESIZE + 32];
> +		struct rte_pci_addr pci_addr;
> +		struct mlx5_switch_info	info;


Hi Slava,

The ICC is failing because of the pragma [1],
- Can you please explain why it is needed, can we escape from it?
- If we can't escape can you please add compiler checks around it? [2]
- And should we enable it as 'error' by default, what about following [3]?


[1]
.../drivers/net/mlx5/mlx5.c(2301): error #2282: unrecognized GCC pragma
  #pragma GCC diagnostic ignored "-Wformat-nonliteral"
                         ^

[2]
#ifdef __GNUC__
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 1)
#pragma GCC diagnostic ignored "-Wformat-nonliteral"
#endif
#endif


[3]
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wformat-nonliteral"
<...>
 #pragma GCC diagnostic pop
  
Slava Ovsiienko Oct. 1, 2019, 9:02 a.m. UTC | #2
Hi, Ferruh

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, September 30, 2019 13:35
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding
> device recognition
> 
> On 9/25/2019 8:53 AM, Viacheslav Ovsiienko wrote:
> > The Mellanox NICs starting from ConnectX-5 support LAG over NIC ports
> > internally, implemented by the NIC firmware and hardware.
> > The multiport NIC presents multiple physical PCI functions (PF), with
> > SR-IOV multiple virtual PCI functions (VFs) might be presented.
> > With switchdev mode the VF representors are engaged and PFs and their
> > VFs are connected by internal E-Switch feature. Each PF and related
> > VFs have dedicated E-Switch and belong to dedicated switch domain.
> >
> > If NIC ports are combined to support NIC the kernel drivers introduce
> > the single unified Infiniband multiport devices, and all only one
> > unified E-Switch with single switch domain combines master PF all all
> > VFs. No extra DPDK bonding device is needed.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> <...>
> 
> > +
> > +	/* Use safe format to check maximal buffer length. */ #pragma GCC
> > +diagnostic ignored "-Wformat-nonliteral"
> > +	while (fscanf(file, format, ifname) == 1) { #pragma GCC diagnostic
> > +error "-Wformat-nonliteral"
> > +		char tmp_str[IF_NAMESIZE + 32];
> > +		struct rte_pci_addr pci_addr;
> > +		struct mlx5_switch_info	info;
> 
> 
> Hi Slava,
> 
> The ICC is failing because of the pragma [1],
> - Can you please explain why it is needed, can we escape from it?

GCC generates error/warning, depending on the settings "-Werror=format-nonliteral",
for the fscanf(file, format, ...) call,  if the "format" parameter is not literal.

> - If we can't escape can you please add compiler checks around it? [2]


Sure, I will. And I'm sorry for missing this in the patch.
BTW , there are multiple usages of "#ifdef __INTEL_COMPILER"
around the GCC pragmas. I think RTE_TOOLCHAIN_GCC is more relevant for the
in case being discussed.

> - And should we enable it as 'error' by default, what about following [3]?
Yes, it would be better, thanks for the nice clue.

WBR, Slava
> 
> [1]
> .../drivers/net/mlx5/mlx5.c(2301): error #2282: unrecognized GCC pragma
>   #pragma GCC diagnostic ignored "-Wformat-nonliteral"
>                          ^
> 
> [2]
> #ifdef __GNUC__
> #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 1) #pragma
> GCC diagnostic ignored "-Wformat-nonliteral"
> #endif
> #endif
> 
> 
> [3]
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wformat-nonliteral"
> <...>
>  #pragma GCC diagnostic pop
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 701da7e..12eed13 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -169,6 +169,7 @@  struct mlx5_dev_spawn_data {
 	uint32_t ifindex; /**< Network interface index. */
 	uint32_t max_port; /**< IB device maximal port index. */
 	uint32_t ibv_port; /**< IB device physical port index. */
+	int pf_bond; /**< bonding device PF index. < 0 - no bonding */
 	struct mlx5_switch_info info; /**< Switch information. */
 	struct ibv_device *ibv_dev; /**< Associated IB device. */
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
@@ -2119,6 +2120,108 @@  struct mlx5_dev_spawn_data {
 }
 
 /**
+ * Match PCI information for possible slaves of bonding device.
+ *
+ * @param[in] ibv_dev
+ *   Pointer to Infiniband device structure.
+ * @param[in] pci_dev
+ *   Pointer to PCI device structure to match PCI address.
+ * @param[in] nl_rdma
+ *   Netlink RDMA group socket handle.
+ *
+ * @return
+ *   negative value if no bonding device found, otherwise
+ *   positive index of slave PF in bonding.
+ */
+static int
+mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
+			   const struct rte_pci_device *pci_dev,
+			   int nl_rdma)
+{
+	char ifname[IF_NAMESIZE + 1];
+	unsigned int ifindex;
+	unsigned int np, i;
+	FILE *file = NULL;
+	int pf = -1;
+
+	/*
+	 * Try to get master device name. If something goes
+	 * wrong suppose the lack of kernel support and no
+	 * bonding devices.
+	 */
+	if (nl_rdma < 0)
+		return -1;
+	if (!strstr(ibv_dev->name, "bond"))
+		return -1;
+	np = mlx5_nl_portnum(nl_rdma, ibv_dev->name);
+	if (!np)
+		return -1;
+	/*
+	 * The Master device might not be on the predefined
+	 * port (not on port index 1, it is not garanted),
+	 * we have to scan all Infiniband device port and
+	 * find master.
+	 */
+	for (i = 1; i <= np; ++i) {
+		/* Check whether Infiniband port is populated. */
+		ifindex = mlx5_nl_ifindex(nl_rdma, ibv_dev->name, i);
+		if (!ifindex)
+			continue;
+		if (!if_indextoname(ifindex, ifname))
+			continue;
+		/* Try to read bonding slave names from sysfs. */
+		MKSTR(slaves,
+		      "/sys/class/net/%s/master/bonding/slaves", ifname);
+		file = fopen(slaves, "r");
+		if (file)
+			break;
+	}
+	if (!file)
+		return -1;
+	MKSTR(format, "%c%us", '%', (unsigned int)(sizeof(ifname) - 1));
+
+	/* Use safe format to check maximal buffer length. */
+#pragma GCC diagnostic ignored "-Wformat-nonliteral"
+	while (fscanf(file, format, ifname) == 1) {
+#pragma GCC diagnostic error "-Wformat-nonliteral"
+		char tmp_str[IF_NAMESIZE + 32];
+		struct rte_pci_addr pci_addr;
+		struct mlx5_switch_info	info;
+
+		/* Process slave interface names in the loop. */
+		snprintf(tmp_str, sizeof(tmp_str),
+			 "/sys/class/net/%s", ifname);
+		if (mlx5_dev_to_pci_addr(tmp_str, &pci_addr)) {
+			DRV_LOG(WARNING, "can not get PCI address"
+					 " for netdev \"%s\"", ifname);
+			continue;
+		}
+		if (pci_dev->addr.domain != pci_addr.domain ||
+		    pci_dev->addr.bus != pci_addr.bus ||
+		    pci_dev->addr.devid != pci_addr.devid ||
+		    pci_dev->addr.function != pci_addr.function)
+			continue;
+		/* Slave interface PCI address match found. */
+		fclose(file);
+		snprintf(tmp_str, sizeof(tmp_str),
+			 "/sys/class/net/%s/phys_port_name", ifname);
+		file = fopen(tmp_str, "rb");
+		if (!file)
+			break;
+		info.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET;
+		if (fscanf(file, "%32s", tmp_str) == 1)
+			mlx5_translate_port_name(tmp_str, &info);
+		if (info.name_type == MLX5_PHYS_PORT_NAME_TYPE_LEGACY ||
+		    info.name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK)
+			pf = info.port_name;
+		break;
+	}
+	if (file)
+		fclose(file);
+	return pf;
+}
+
+/**
  * DPDK callback to register a PCI device.
  *
  * This function spawns Ethernet devices out of a given PCI device.
@@ -2154,6 +2257,12 @@  struct mlx5_dev_spawn_data {
 	 * Actually this is the number of iterations to spawn.
 	 */
 	unsigned int ns = 0;
+	/*
+	 * Bonding device
+	 *   < 0 - no bonding device (single one)
+	 *  >= 0 - bonding device (value is slave PF index)
+	 */
+	int bd = -1;
 	struct mlx5_dev_spawn_data *list = NULL;
 	struct mlx5_dev_config dev_config;
 	int ret;
@@ -2185,6 +2294,30 @@  struct mlx5_dev_spawn_data {
 		struct rte_pci_addr pci_addr;
 
 		DRV_LOG(DEBUG, "checking device \"%s\"", ibv_list[ret]->name);
+		bd = mlx5_device_bond_pci_match
+				(ibv_list[ret], pci_dev, nl_rdma);
+		if (bd >= 0) {
+			/*
+			 * Bonding device detected. Only one match is allowed,
+			 * the bonding is supported over multi-port IB device,
+			 * there should be no matches on representor PCI
+			 * functions or non VF LAG bonding devices with
+			 * specified address.
+			 */
+			if (nd) {
+				DRV_LOG(ERR,
+					"multiple PCI match on bonding device"
+					"\"%s\" found", ibv_list[ret]->name);
+				rte_errno = ENOENT;
+				ret = -rte_errno;
+				goto exit;
+			}
+			DRV_LOG(INFO, "PCI information matches for"
+				      " slave %d bonding device \"%s\"",
+				      bd, ibv_list[ret]->name);
+			ibv_match[nd++] = ibv_list[ret];
+			break;
+		}
 		if (mlx5_dev_to_pci_addr
 			(ibv_list[ret]->ibdev_path, &pci_addr))
 			continue;
@@ -2220,6 +2353,13 @@  struct mlx5_dev_spawn_data {
 		if (!np)
 			DRV_LOG(WARNING, "can not get IB device \"%s\""
 					 " ports number", ibv_match[0]->name);
+		if (bd >= 0 && !np) {
+			DRV_LOG(ERR, "can not get ports"
+				     " for bonding device");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
 	}
 	/*
 	 * Now we can determine the maximal
@@ -2235,7 +2375,7 @@  struct mlx5_dev_spawn_data {
 		ret = -rte_errno;
 		goto exit;
 	}
-	if (np > 1) {
+	if (bd >= 0 || np > 1) {
 		/*
 		 * Single IB device with multiple ports found,
 		 * it may be E-Switch master device and representors.
@@ -2244,12 +2384,14 @@  struct mlx5_dev_spawn_data {
 		assert(nl_rdma >= 0);
 		assert(ns == 0);
 		assert(nd == 1);
+		assert(np);
 		for (i = 1; i <= np; ++i) {
 			list[ns].max_port = np;
 			list[ns].ibv_port = i;
 			list[ns].ibv_dev = ibv_match[0];
 			list[ns].eth_dev = NULL;
 			list[ns].pci_dev = pci_dev;
+			list[ns].pf_bond = bd;
 			list[ns].ifindex = mlx5_nl_ifindex
 					(nl_rdma, list[ns].ibv_dev->name, i);
 			if (!list[ns].ifindex) {
@@ -2279,6 +2421,21 @@  struct mlx5_dev_spawn_data {
 						(list[ns].ifindex,
 						 &list[ns].info);
 			}
+			if (!ret && bd >= 0) {
+				switch (list[ns].info.name_type) {
+				case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+					if (list[ns].info.port_name == bd)
+						ns++;
+					break;
+				case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+					if (list[ns].info.pf_num == bd)
+						ns++;
+					break;
+				default:
+					break;
+				}
+				continue;
+			}
 			if (!ret && (list[ns].info.representor ^
 				     list[ns].info.master))
 				ns++;
@@ -2317,6 +2474,7 @@  struct mlx5_dev_spawn_data {
 			list[ns].ibv_dev = ibv_match[i];
 			list[ns].eth_dev = NULL;
 			list[ns].pci_dev = pci_dev;
+			list[ns].pf_bond = -1;
 			list[ns].ifindex = 0;
 			if (nl_rdma >= 0)
 				list[ns].ifindex = mlx5_nl_ifindex