net/ixgbe: Retry SFP ID read field to handle misbehaving SFPs

Message ID 20220307223337.27876-1-jeffd@silicom-usa.com (mailing list archive)
State Superseded, archived
Delegated to: Qi Zhang
Headers
Series net/ixgbe: Retry SFP ID read field to handle misbehaving SFPs |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Jeff Daly March 7, 2022, 10:33 p.m. UTC
  From: Stephen Douthit <stephend@silicom-usa.com>

Some XGS-PON SFPs have been observed ACKing I2C reads and returning
uninitialized garbage while their uC boots.  This can lead to the SFP ID
code marking an otherwise working SFP module as unsupported if a bogus
ID value is read while its internal PHY/microcontroller is still
booting.

Retry the ID read several times looking not just for NAK, but also for a
valid ID field.

Since the device isn't NAKing the trasanction the existing longer retry
code in ixgbe_read_i2c_byte_generic_int() doesn't apply here.

Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
---
 drivers/net/ixgbe/base/ixgbe_phy.c | 31 ++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)
  

Comments

Wang, Haiyue March 17, 2022, 6:55 a.m. UTC | #1
> -----Original Message-----
> From: jeffd@silicom-usa.com <jeffd@silicom-usa.com>
> Sent: Tuesday, March 8, 2022 06:34
> To: dev@dpdk.org
> Cc: Stephen Douthit <stephend@silicom-usa.com>; Daly, Jeff <jeffd@silicom-usa.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: [PATCH] net/ixgbe: Retry SFP ID read field to handle misbehaving SFPs
> 
> From: Stephen Douthit <stephend@silicom-usa.com>
> 
> Some XGS-PON SFPs have been observed ACKing I2C reads and returning
> uninitialized garbage while their uC boots.  This can lead to the SFP ID
> code marking an otherwise working SFP module as unsupported if a bogus
> ID value is read while its internal PHY/microcontroller is still
> booting.
> 
> Retry the ID read several times looking not just for NAK, but also for a
> valid ID field.
> 
> Since the device isn't NAKing the trasanction the existing longer retry
> code in ixgbe_read_i2c_byte_generic_int() doesn't apply here.
> 
> Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
> Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
> ---
>  drivers/net/ixgbe/base/ixgbe_phy.c | 31 ++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c b/drivers/net/ixgbe/base/ixgbe_phy.c
> index d8d51d2c3f..27bce066a1 100644
> --- a/drivers/net/ixgbe/base/ixgbe_phy.c
> +++ b/drivers/net/ixgbe/base/ixgbe_phy.c
> @@ -1275,6 +1275,7 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw)
>  	u8 cable_tech = 0;
>  	u8 cable_spec = 0;
>  	u16 enforce_sfp = 0;
> +	u8 id_reads;

"u8 retries" is a better name, I think, to match the patch title.

> 
>  	DEBUGFUNC("ixgbe_identify_sfp_module_generic");
> 
> @@ -1287,12 +1288,34 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw)
>  	/* LAN ID is needed for I2C access */
>  	hw->mac.ops.set_lan_id(hw);
> 
> -	status = hw->phy.ops.read_i2c_eeprom(hw,
> -					     IXGBE_SFF_IDENTIFIER,
> -					     &identifier);
> +	/* Need to check this a couple of times for a sane value.
> +	 *
> +	 * SFPs that have a uC slaved to the I2C bus (vs. a dumb EEPROM) can be
> +	 * poorly designed such that they will ACK I2C reads and return
> +	 * whatever bogus data is in the SRAM (or whatever is backing the target
> +	 * device) before things are truly initialized.
> +	 *
> +	 * In a perfect world devices would NAK I2C requests until they were
> +	 * sane, but here we are.
> +	 *
> +	 * Give such devices a couple tries to get their act together before
> +	 * marking the device as unsupported.
> +	 */
> +	for (id_reads = 0; id_reads < 5; id_reads++) {
> +		status = hw->phy.ops.read_i2c_eeprom(hw,
> +						     IXGBE_SFF_IDENTIFIER,
> +						     &identifier);
> 
> -	if (status != IXGBE_SUCCESS)
> +		DEBUGOUT("status %d, id %d\n", status, identifier);
> +		if (!status &&
> +		    identifier == IXGBE_SFF_IDENTIFIER_SFP)

Let's not assume the ' IXGBE_SUCCESS' is '0', so change it to:

		if (status == IXGBE_SUCCESS &&
                identifier == IXGBE_SFF_IDENTIFIER_SFP)

> +			break;
> +	}
> +
> +	if (status != IXGBE_SUCCESS) {
> +		DEBUGOUT("Failed SFF ID read (%d attempts)\n", id_reads);
>  		goto err_read_i2c_eeprom;
> +	}
> 
>  	if (identifier != IXGBE_SFF_IDENTIFIER_SFP) {
>  		hw->phy.type = ixgbe_phy_sfp_unsupported;
> --
> 2.25.1
  
Wang, Haiyue March 17, 2022, 7:39 a.m. UTC | #2
> -----Original Message-----
> From: jeffd@silicom-usa.com <jeffd@silicom-usa.com>
> Sent: Tuesday, March 8, 2022 06:34
> To: dev@dpdk.org
> Cc: Stephen Douthit <stephend@silicom-usa.com>; Daly, Jeff <jeffd@silicom-usa.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: [PATCH] net/ixgbe: Retry SFP ID read field to handle misbehaving SFPs
> 
> From: Stephen Douthit <stephend@silicom-usa.com>
> 
> Some XGS-PON SFPs have been observed ACKing I2C reads and returning
> uninitialized garbage while their uC boots.  This can lead to the SFP ID
> code marking an otherwise working SFP module as unsupported if a bogus
> ID value is read while its internal PHY/microcontroller is still
> booting.
> 
> Retry the ID read several times looking not just for NAK, but also for a
> valid ID field.
> 
> Since the device isn't NAKing the trasanction the existing longer retry
> code in ixgbe_read_i2c_byte_generic_int() doesn't apply here.
> 
> Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
> Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
> ---
>  drivers/net/ixgbe/base/ixgbe_phy.c | 31 ++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c b/drivers/net/ixgbe/base/ixgbe_phy.c
> index d8d51d2c3f..27bce066a1 100644
> --- a/drivers/net/ixgbe/base/ixgbe_phy.c
> +++ b/drivers/net/ixgbe/base/ixgbe_phy.c
> @@ -1275,6 +1275,7 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw)


> +	for (id_reads = 0; id_reads < 5; id_reads++) {
> +		status = hw->phy.ops.read_i2c_eeprom(hw,
> +						     IXGBE_SFF_IDENTIFIER,
> +						     &identifier);
> 
> -	if (status != IXGBE_SUCCESS)
> +		DEBUGOUT("status %d, id %d\n", status, identifier);

DEBUGOUT("status %d, SFF identifier 0x%x\n", status, identifier);

Since "#define IXGBE_SFF_IDENTIFIER_SFP	0x3" is hex value, and the ID is too simple.

> +		if (!status &&
> +		    identifier == IXGBE_SFF_IDENTIFIER_SFP)
> +			break;
> +	}
> +
> +	if (status != IXGBE_SUCCESS) {
> +		DEBUGOUT("Failed SFF ID read (%d attempts)\n", id_reads);
>  		goto err_read_i2c_eeprom;
> +	}

This DEBUGOUT can be removed, it's redundant, every call for hw->phy.ops.read_i2c_eeprom
has added the DEBUOUT.

> 
>  	if (identifier != IXGBE_SFF_IDENTIFIER_SFP) {
>  		hw->phy.type = ixgbe_phy_sfp_unsupported;
> --
> 2.25.1
  

Patch

diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c b/drivers/net/ixgbe/base/ixgbe_phy.c
index d8d51d2c3f..27bce066a1 100644
--- a/drivers/net/ixgbe/base/ixgbe_phy.c
+++ b/drivers/net/ixgbe/base/ixgbe_phy.c
@@ -1275,6 +1275,7 @@  s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw)
 	u8 cable_tech = 0;
 	u8 cable_spec = 0;
 	u16 enforce_sfp = 0;
+	u8 id_reads;
 
 	DEBUGFUNC("ixgbe_identify_sfp_module_generic");
 
@@ -1287,12 +1288,34 @@  s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw)
 	/* LAN ID is needed for I2C access */
 	hw->mac.ops.set_lan_id(hw);
 
-	status = hw->phy.ops.read_i2c_eeprom(hw,
-					     IXGBE_SFF_IDENTIFIER,
-					     &identifier);
+	/* Need to check this a couple of times for a sane value.
+	 *
+	 * SFPs that have a uC slaved to the I2C bus (vs. a dumb EEPROM) can be
+	 * poorly designed such that they will ACK I2C reads and return
+	 * whatever bogus data is in the SRAM (or whatever is backing the target
+	 * device) before things are truly initialized.
+	 *
+	 * In a perfect world devices would NAK I2C requests until they were
+	 * sane, but here we are.
+	 *
+	 * Give such devices a couple tries to get their act together before
+	 * marking the device as unsupported.
+	 */
+	for (id_reads = 0; id_reads < 5; id_reads++) {
+		status = hw->phy.ops.read_i2c_eeprom(hw,
+						     IXGBE_SFF_IDENTIFIER,
+						     &identifier);
 
-	if (status != IXGBE_SUCCESS)
+		DEBUGOUT("status %d, id %d\n", status, identifier);
+		if (!status &&
+		    identifier == IXGBE_SFF_IDENTIFIER_SFP)
+			break;
+	}
+
+	if (status != IXGBE_SUCCESS) {
+		DEBUGOUT("Failed SFF ID read (%d attempts)\n", id_reads);
 		goto err_read_i2c_eeprom;
+	}
 
 	if (identifier != IXGBE_SFF_IDENTIFIER_SFP) {
 		hw->phy.type = ixgbe_phy_sfp_unsupported;