[2/2] net/bonding: fix MAC address when one port resets

Message ID 20200225092903.38455-3-huwei013@chinasoftinc.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series fixes for bonding |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed

Commit Message

Wei Hu (Xavier) Feb. 25, 2020, 9:29 a.m. UTC
  From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>

Currently, based on a active-backup bond device, in the following 2 cases:
1) The primary port resets. The link status of the primary port changes
   from up to down.
2) When switching the active port, one slave port resets at the same time.
one slave port changes to the primary port, but the new primary port's MAC
address probably cannot change to the bond device's MAC address. And we
can't continue receive packets whose destination MAC addresses are the same
as the bond devices's MAC address.

The current bonding PMD driver call mac_address_slaves_update function to
modify the MAC address of all slaves devices. In mac_address_slaves_update
function, the rte_eth_dev_default_mac_addr_set API function is called to
set the MAC address of the slave devices in turn in the for loop statement.

When one port reset, calling rte_eth_dev_default_mac_addr_set API fails
because the firmware will not respond to the commands from the driver,
and exit the loop, so other slave devices cannot continue to update the
MAC address.

This patch fixes the issue by avoid exiting the loop when calling
rte_eth_dev_default_mac_addr_set fails.

Fixes: 2efb58cbab6e ("bond: new link bonding library")
Cc: stable@dpdk.org

Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
Signed-off-by: Xuan Li <lixuan47@hisilicon.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)
  

Comments

Chas Williams April 4, 2020, 2:07 p.m. UTC | #1
The behavior is probably going to be inconsistent. Only one of the calls
to mac_address_slaves_update() is checked for failure. The two calls
are in bond_ethdev_lsc_event_callback()

                 if (internals->active_slave_count < 1) {
                         mac_address_slaves_update(bonded_eth_dev);

and in bond_ethdev_start():

         /* Update all slave devices MACs*/
         if (mac_address_slaves_update(eth_dev) != 0)
                 goto out_err;

So if the bond device is running, you safely fail to the backup device.
But if you stop and start the bond device, it will then fail to start?

What devices are you bonding together that aren't able to support
changing their MAC addresses reliably? ixgbe VF devices by any chance?
One solution is to always assume success or ignore the error from devices
that you know are not an issue. I agree that we shouldn't early exit
from mac_address_slaves_update(). I am not sure how to handle the error.

We probably shouldn't allow you to bond devices that don't support setting
MAC address. I don't think this check exists currently. That's possibly
one reason for this check.

On 2/25/20 4:29 AM, Wei Hu (Xavier) wrote:
 > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
 >
 > Currently, based on a active-backup bond device, in the following 2 
cases:
 > 1) The primary port resets. The link status of the primary port changes
 >     from up to down.
 > 2) When switching the active port, one slave port resets at the same 
time.
 > one slave port changes to the primary port, but the new primary 
port's MAC
 > address probably cannot change to the bond device's MAC address. And we
 > can't continue receive packets whose destination MAC addresses are 
the same
 > as the bond devices's MAC address.
 >
 > The current bonding PMD driver call mac_address_slaves_update function to
 > modify the MAC address of all slaves devices. In 
mac_address_slaves_update
 > function, the rte_eth_dev_default_mac_addr_set API function is called to
 > set the MAC address of the slave devices in turn in the for loop 
statement.
 >
 > When one port reset, calling rte_eth_dev_default_mac_addr_set API fails
 > because the firmware will not respond to the commands from the driver,
 > and exit the loop, so other slave devices cannot continue to update the
 > MAC address.
 >
 > This patch fixes the issue by avoid exiting the loop when calling
 > rte_eth_dev_default_mac_addr_set fails.
 >
 > Fixes: 2efb58cbab6e ("bond: new link bonding library")
 > Cc: stable@dpdk.org
 >
 > Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
 > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
 > Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
 > Signed-off-by: Xuan Li <lixuan47@hisilicon.com>
 > ---
 >   drivers/net/bonding/rte_eth_bond_pmd.c | 8 ++++++--
 >   1 file changed, 6 insertions(+), 2 deletions(-)
 >
 > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
 > index ddae3518c..ba3f342e7 100644
 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c
 > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
 > @@ -1502,6 +1502,7 @@ int
 >   mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
 >   {
 >   	struct bond_dev_private *internals = 
bonded_eth_dev->data->dev_private;
 > +	bool setted;
 >   	int i;
 >
 >   	/* Update slave devices MAC addresses */
 > @@ -1529,6 +1530,7 @@ mac_address_slaves_update(struct rte_eth_dev 
*bonded_eth_dev)
 >   	case BONDING_MODE_TLB:
 >   	case BONDING_MODE_ALB:
 >   	default:
 > +		setted = true;
 >   		for (i = 0; i < internals->slave_count; i++) {
 >   			if (internals->slaves[i].port_id ==
 >   					internals->current_primary_port) {
 > @@ -1537,7 +1539,7 @@ mac_address_slaves_update(struct rte_eth_dev 
*bonded_eth_dev)
 >   						bonded_eth_dev->data->mac_addrs)) {
 >   					RTE_BOND_LOG(ERR, "Failed to update port Id %d MAC address",
 >   							internals->current_primary_port);
 > -					return -1;
 > +					setted = false;
 >   				}
 >   			} else {
 >   				if (rte_eth_dev_default_mac_addr_set(
 > @@ -1545,10 +1547,12 @@ mac_address_slaves_update(struct rte_eth_dev 
*bonded_eth_dev)
 >   						&internals->slaves[i].persisted_mac_addr)) {
 >   					RTE_BOND_LOG(ERR, "Failed to update port Id %d MAC address",
 >   							internals->slaves[i].port_id);
 > -					return -1;
 > +					setted = false;
 >   				}
 >   			}
 >   		}
 > +		if (!setted)
 > +			return -1;
 >   	}
 >
 >   	return 0;
 >
  
Wei Hu (Xavier) April 17, 2020, 5:56 a.m. UTC | #2
Hi, Chas



On 2020/4/4 22:07, Chas Williams wrote:
> The behavior is probably going to be inconsistent. Only one of the calls
> to mac_address_slaves_update() is checked for failure. The two calls
> are in bond_ethdev_lsc_event_callback()
> 
>                  if (internals->active_slave_count < 1) {
>                          mac_address_slaves_update(bonded_eth_dev);
> 
> and in bond_ethdev_start():
> 
>          /* Update all slave devices MACs*/
>          if (mac_address_slaves_update(eth_dev) != 0)
>                  goto out_err;
> 
> So if the bond device is running, you safely fail to the backup device.
> But if you stop and start the bond device, it will then fail to start?
We will make modification as below,and will send patch V2.
After modification,It will then fail to start when failed to setting 
bond device's MAC to the new primary port.

int
mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
{
     <snip>
     bool setted;

     <snip>
     switch (internals->mode) {
     <snip>
     default:
         setted = true;
         for (i = 0; i < internals->slave_count; i++) {
             if (internals->slaves[i].port_id ==
                 internals->current_primary_port) {
                 if (rte_eth_dev_default_mac_addr_set(
                         internals->primary_port,
                         bonded_eth_dev->data->mac_addrs)) {
                             RTE_BOND_LOG(ERR, "Failed to xxx",);
                             setted = false;
                 }
             } else {
                 if (rte_eth_dev_default_mac_addr_set(
                         internals->slaves[i].port_id,
                         &internals->slaves[i].persisted_mac_addr)) {
                             RTE_BOND_LOG(ERR, "Failed to xxx",);
                 }
             }
        }
        if (!setted)
            return -1;
     }
}
I think we can ignore the failure when updating salves's MAC address in 
the mac_address_slaves_update function, because it doesn't affect
the bond's functional characteristics.

> 
> What devices are you bonding together that aren't able to support
> changing their MAC addresses reliably? ixgbe VF devices by any chance?
> One solution is to always assume success or ignore the error from devices
> that you know are not an issue. I agree that we shouldn't early exit
> from mac_address_slaves_update(). I am not sure how to handle the error.
> 
We are running based on huawei's hns3 network engine of kunpeng 920 SoC.
When the driver detects that the hardware status is abnormal, the reset 
process is executed, after hardware reset we will restore the hardware 
configuration including MAC address, and then network engine will be 
able to work normally again. During the hardware reset, the command 
sending to firmware will timeout.

> We probably shouldn't allow you to bond devices that don't support setting
> MAC address. I don't think this check exists currently. That's possibly
> one reason for this check.
> 
> On 2/25/20 4:29 AM, Wei Hu (Xavier) wrote:
>  > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
>  >
>  > Currently, based on a active-backup bond device, in the following 2 
> cases:
>  > 1) The primary port resets. The link status of the primary port changes
>  >     from up to down.
>  > 2) When switching the active port, one slave port resets at the same 
> time.
>  > one slave port changes to the primary port, but the new primary 
> port's MAC
>  > address probably cannot change to the bond device's MAC address. And we
>  > can't continue receive packets whose destination MAC addresses are 
> the same
>  > as the bond devices's MAC address.
>  >
>  > The current bonding PMD driver call mac_address_slaves_update 
> function to
>  > modify the MAC address of all slaves devices. In 
> mac_address_slaves_update
>  > function, the rte_eth_dev_default_mac_addr_set API function is called to
>  > set the MAC address of the slave devices in turn in the for loop 
> statement.
>  >
>  > When one port reset, calling rte_eth_dev_default_mac_addr_set API fails
>  > because the firmware will not respond to the commands from the driver,
>  > and exit the loop, so other slave devices cannot continue to update the
>  > MAC address.
>  >
>  > This patch fixes the issue by avoid exiting the loop when calling
>  > rte_eth_dev_default_mac_addr_set fails.
>  >
>  > Fixes: 2efb58cbab6e ("bond: new link bonding library")
>  > Cc: stable@dpdk.org
>  >
>  > Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
>  > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
>  > Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
>  > Signed-off-by: Xuan Li <lixuan47@hisilicon.com>
>  > ---
>  >   drivers/net/bonding/rte_eth_bond_pmd.c | 8 ++++++--
>  >   1 file changed, 6 insertions(+), 2 deletions(-)
>  >
>  > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
> b/drivers/net/bonding/rte_eth_bond_pmd.c
>  > index ddae3518c..ba3f342e7 100644
>  > --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>  > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>  > @@ -1502,6 +1502,7 @@ int
>  >   mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
>  >   {
>  >       struct bond_dev_private *internals = 
> bonded_eth_dev->data->dev_private;
>  > +    bool setted;
>  >       int i;
>  >
>  >       /* Update slave devices MAC addresses */
>  > @@ -1529,6 +1530,7 @@ mac_address_slaves_update(struct rte_eth_dev 
> *bonded_eth_dev)
>  >       case BONDING_MODE_TLB:
>  >       case BONDING_MODE_ALB:
>  >       default:
>  > +        setted = true;
>  >           for (i = 0; i < internals->slave_count; i++) {
>  >               if (internals->slaves[i].port_id ==
>  >                       internals->current_primary_port) {
>  > @@ -1537,7 +1539,7 @@ mac_address_slaves_update(struct rte_eth_dev 
> *bonded_eth_dev)
>  >                           bonded_eth_dev->data->mac_addrs)) {
>  >                       RTE_BOND_LOG(ERR, "Failed to update port Id %d 
> MAC address",
>  >                               internals->current_primary_port);
>  > -                    return -1;
>  > +                    setted = false;
>  >                   }
>  >               } else {
>  >                   if (rte_eth_dev_default_mac_addr_set(
>  > @@ -1545,10 +1547,12 @@ mac_address_slaves_update(struct rte_eth_dev 
> *bonded_eth_dev)
>  >                           &internals->slaves[i].persisted_mac_addr)) {
>  >                       RTE_BOND_LOG(ERR, "Failed to update port Id %d 
> MAC address",
>  >                               internals->slaves[i].port_id);
>  > -                    return -1;
>  > +                    setted = false;
>  >                   }
>  >               }
>  >           }
>  > +        if (!setted)
>  > +            return -1;
>  >       }
>  >
>  >       return 0;
>  >
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index ddae3518c..ba3f342e7 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1502,6 +1502,7 @@  int
 mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
 {
 	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
+	bool setted;
 	int i;
 
 	/* Update slave devices MAC addresses */
@@ -1529,6 +1530,7 @@  mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
 	case BONDING_MODE_TLB:
 	case BONDING_MODE_ALB:
 	default:
+		setted = true;
 		for (i = 0; i < internals->slave_count; i++) {
 			if (internals->slaves[i].port_id ==
 					internals->current_primary_port) {
@@ -1537,7 +1539,7 @@  mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
 						bonded_eth_dev->data->mac_addrs)) {
 					RTE_BOND_LOG(ERR, "Failed to update port Id %d MAC address",
 							internals->current_primary_port);
-					return -1;
+					setted = false;
 				}
 			} else {
 				if (rte_eth_dev_default_mac_addr_set(
@@ -1545,10 +1547,12 @@  mac_address_slaves_update(struct rte_eth_dev *bonded_eth_dev)
 						&internals->slaves[i].persisted_mac_addr)) {
 					RTE_BOND_LOG(ERR, "Failed to update port Id %d MAC address",
 							internals->slaves[i].port_id);
-					return -1;
+					setted = false;
 				}
 			}
 		}
+		if (!setted)
+			return -1;
 	}
 
 	return 0;