[v2] net/ixgbe: fix probability of obtaining mailbox lock failure

Message ID 20210906022208.9530-1-chenqiming_huawei@163.com (mailing list archive)
State Rejected, archived
Delegated to: Qi Zhang
Headers
Series [v2] net/ixgbe: fix probability of obtaining mailbox lock failure |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-x86_64-unit-testing fail Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing fail Testing issues

Commit Message

Qiming Chen Sept. 6, 2021, 2:22 a.m. UTC
  Ifconfig pf port up/down, after several times, the dpdk vf driver may fail
to obtain the mailbox lock, resulting in configuration failure and
functional failure. In order to increase the reliability of mailbox
communication, the patch uses a trial strategy.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
---
v2:
  Modify fixes commit
---
 drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)
  

Comments

Wang, Haiyue Sept. 8, 2021, 3:33 a.m. UTC | #1
> -----Original Message-----
> From: Qiming Chen <chenqiming_huawei@163.com>
> Sent: Monday, September 6, 2021 10:22
> To: dev@dpdk.org
> Cc: Wang, Haiyue <haiyue.wang@intel.com>; Qiming Chen <chenqiming_huawei@163.com>; stable@dpdk.org
> Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure
> 
> Ifconfig pf port up/down, after several times, the dpdk vf driver may fail
> to obtain the mailbox lock, resulting in configuration failure and
> functional failure. In order to increase the reliability of mailbox
> communication, the patch uses a trial strategy.

What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ?

What I got is just a little messages, no more function call. "ifconfig PF down/up".

testpmd> ixgbevf_intr_disable():  >>
ixgbe_read_mbx(): ixgbe_read_mbx
ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf
ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf

Port 0: reset event
ixgbevf_intr_enable():  >>


> 
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
> ---
> v2:
>   Modify fixes commit
> ---
>  drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c
> index 4dddff2c58..5a14fcc7b4 100644
> --- a/drivers/net/ixgbe/base/ixgbe_mbx.c
> +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c
> @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id)
>  STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw)
>  {
>  	s32 ret_val = IXGBE_ERR_MBX;
> +	s32 timeout = hw->mbx.timeout;
> +	s32 usec = hw->mbx.usec_delay;
> 
>  	DEBUGFUNC("ixgbe_obtain_mbx_lock_vf");
> 
> -	/* Take ownership of the buffer */
> -	IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
> +	do {
> +		/* Take ownership of the buffer */
> +		IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
> 
> -	/* reserve mailbox for vf use */
> -	if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU)
> -		ret_val = IXGBE_SUCCESS;
> +		/* reserve mailbox for vf use */
> +		if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) {
> +			ret_val = IXGBE_SUCCESS;
> +			break;
> +		}
> +
> +		usec_delay(usec);
> +	} while (timeout--);
> 
>  	return ret_val;
>  }
> --
> 2.30.1.windows.1
  
Qiming Chen Sept. 9, 2021, 1:56 a.m. UTC | #2
This problem is not based on the log to observe and locate, you can try the following steps to reproduce:
1) kernel pf + dpdk vf mode;
2) The vf control panel keeps adding or acquiring configurations, such as create thread to get link status, etc.
3) Write a script to repeatedly perform "if config pf down/up" operations


After a period of time, there will be a probability that the mailbox cannot be obtained, which will cause an abnormality.


This problem is reproduced locally through the development of a demo. 
The probability is relatively small and it may not be easy to reproduce, but the problem does exist.
On 9/8/2021 11:33,Wang, Haiyue<haiyue.wang@intel.com> wrote:
-----Original Message-----
From: Qiming Chen <chenqiming_huawei@163.com>
Sent: Monday, September 6, 2021 10:22
To: dev@dpdk.org
Cc: Wang, Haiyue <haiyue.wang@intel.com>; Qiming Chen <chenqiming_huawei@163.com>; stable@dpdk.org
Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure

Ifconfig pf port up/down, after several times, the dpdk vf driver may fail
to obtain the mailbox lock, resulting in configuration failure and
functional failure. In order to increase the reliability of mailbox
communication, the patch uses a trial strategy.

What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ?

What I got is just a little messages, no more function call. "ifconfig PF down/up".

testpmd> ixgbevf_intr_disable():  >>
ixgbe_read_mbx(): ixgbe_read_mbx
ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf
ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf

Port 0: reset event
ixgbevf_intr_enable():  >>



Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
---
v2:
Modify fixes commit
---
drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c
index 4dddff2c58..5a14fcc7b4 100644
--- a/drivers/net/ixgbe/base/ixgbe_mbx.c
+++ b/drivers/net/ixgbe/base/ixgbe_mbx.c
@@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id)
STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw)
{
s32 ret_val = IXGBE_ERR_MBX;
+  s32 timeout = hw->mbx.timeout;
+  s32 usec = hw->mbx.usec_delay;

DEBUGFUNC("ixgbe_obtain_mbx_lock_vf");

-  /* Take ownership of the buffer */
-  IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
+  do {
+    /* Take ownership of the buffer */
+    IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);

-  /* reserve mailbox for vf use */
-  if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU)
-    ret_val = IXGBE_SUCCESS;
+    /* reserve mailbox for vf use */
+    if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) {
+      ret_val = IXGBE_SUCCESS;
+      break;
+    }
+
+    usec_delay(usec);
+  } while (timeout--);

return ret_val;
}
--
2.30.1.windows.1
  
Wang, Haiyue Sept. 9, 2021, 2:13 a.m. UTC | #3
Again, Please DON’T REPLY with rich text, it is hard to handle
in patchwork. And DON'T REPLY on top.

BR,
Haiyue

From: Qiming Chen <chenqiming_huawei@163.com> 
Sent: Thursday, September 9, 2021 09:57
To: Wang, Haiyue <haiyue.wang@intel.com>
Cc: dev@dpdk.org; stable@dpdk.org
Subject: Re: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure

This problem is not based on the log to observe and locate, you can try the following steps to reproduce:
1) kernel pf + dpdk vf mode;
2) The vf control panel keeps adding or acquiring configurations, such as create thread to get link status, etc.
3) Write a script to repeatedly perform "if config pf down/up" operations

After a period of time, there will be a probability that the mailbox cannot be obtained, which will cause an abnormality.

This problem is reproduced locally through the development of a demo. 
The probability is relatively small and it may not be easy to reproduce, but the problem does exist.
On 9/8/2021 11:33,mailto:haiyue.wang@intel.com wrote: 
-----Original Message-----
From: Qiming Chen <mailto:chenqiming_huawei@163.com>
Sent: Monday, September 6, 2021 10:22
To: mailto:dev@dpdk.org
Cc: Wang, Haiyue <mailto:haiyue.wang@intel.com>; Qiming Chen <mailto:chenqiming_huawei@163.com>; mailto:stable@dpdk.org
Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure

Ifconfig pf port up/down, after several times, the dpdk vf driver may fail
to obtain the mailbox lock, resulting in configuration failure and
functional failure. In order to increase the reliability of mailbox
communication, the patch uses a trial strategy.

What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ?

What I got is just a little messages, no more function call. "ifconfig PF down/up".

testpmd> ixgbevf_intr_disable():  >>
ixgbe_read_mbx(): ixgbe_read_mbx
ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf
ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf

Port 0: reset event
ixgbevf_intr_enable():  >>


Fixes: af75078fece3 ("first public release")
Cc: mailto:stable@dpdk.org

Signed-off-by: Qiming Chen <mailto:chenqiming_huawei@163.com>
---
v2:
Modify fixes commit
---
drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c
index 4dddff2c58..5a14fcc7b4 100644
--- a/drivers/net/ixgbe/base/ixgbe_mbx.c
+++ b/drivers/net/ixgbe/base/ixgbe_mbx.c
@@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id)
STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw)
{
s32 ret_val = IXGBE_ERR_MBX;
+  s32 timeout = hw->mbx.timeout;
+  s32 usec = hw->mbx.usec_delay;

DEBUGFUNC("ixgbe_obtain_mbx_lock_vf");

-  /* Take ownership of the buffer */
-  IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
+  do {
+    /* Take ownership of the buffer */
+    IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);

-  /* reserve mailbox for vf use */
-  if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU)
-    ret_val = IXGBE_SUCCESS;
+    /* reserve mailbox for vf use */
+    if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) {
+      ret_val = IXGBE_SUCCESS;
+      break;
+    }
+
+    usec_delay(usec);
+  } while (timeout--);

return ret_val;
}
--
2.30.1.windows.1
  
Wang, Haiyue Sept. 9, 2021, 2:55 a.m. UTC | #4
I have to say that ixgbevf PMD have limitation to handle the reset
event, so for your application demo, if the link down/up event is
detected, it needs to reset the ixgbevf as kernel does:

BTW, retry doesn't help to make things better, you have to wait the
PF notify you thing is done.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c

static void ixgbevf_watchdog_update_link(struct ixgbevf_adapter *adapter)
{
	struct ixgbe_hw *hw = &adapter->hw;
	u32 link_speed = adapter->link_speed;
	bool link_up = adapter->link_up;
	s32 err;

	spin_lock_bh(&adapter->mbx_lock);

	err = hw->mac.ops.check_link(hw, &link_speed, &link_up, false);

	spin_unlock_bh(&adapter->mbx_lock);

	/* if check for link returns error we will need to reset */
	if (err && time_after(jiffies, adapter->last_reset + (10 * HZ))) {
		set_bit(__IXGBEVF_RESET_REQUESTED, &adapter->state);
		link_up = false;
	}

	adapter->link_up = link_up;
	adapter->link_speed = link_speed;
}

BR,
Haiyue

From: Qiming Chen <chenqiming_huawei@163.com> 
Sent: Thursday, September 9, 2021 09:57
To: Wang, Haiyue <haiyue.wang@intel.com>
Cc: dev@dpdk.org; stable@dpdk.org
Subject: Re: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure

This problem is not based on the log to observe and locate, you can try the following steps to reproduce:
1) kernel pf + dpdk vf mode;
2) The vf control panel keeps adding or acquiring configurations, such as create thread to get link status, etc.
3) Write a script to repeatedly perform "if config pf down/up" operations

After a period of time, there will be a probability that the mailbox cannot be obtained, which will cause an abnormality.

This problem is reproduced locally through the development of a demo. 
The probability is relatively small and it may not be easy to reproduce, but the problem does exist.
On 9/8/2021 11:33,mailto:haiyue.wang@intel.com wrote: 
-----Original Message-----
From: Qiming Chen <mailto:chenqiming_huawei@163.com>
Sent: Monday, September 6, 2021 10:22
To: mailto:dev@dpdk.org
Cc: Wang, Haiyue <mailto:haiyue.wang@intel.com>; Qiming Chen <mailto:chenqiming_huawei@163.com>; mailto:stable@dpdk.org
Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure

Ifconfig pf port up/down, after several times, the dpdk vf driver may fail
to obtain the mailbox lock, resulting in configuration failure and
functional failure. In order to increase the reliability of mailbox
communication, the patch uses a trial strategy.

What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ?

What I got is just a little messages, no more function call. "ifconfig PF down/up".

testpmd> ixgbevf_intr_disable():  >>
ixgbe_read_mbx(): ixgbe_read_mbx
ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf
ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf

Port 0: reset event
ixgbevf_intr_enable():  >>


Fixes: af75078fece3 ("first public release")
Cc: mailto:stable@dpdk.org

Signed-off-by: Qiming Chen <mailto:chenqiming_huawei@163.com>
---
v2:
Modify fixes commit
---
drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c
index 4dddff2c58..5a14fcc7b4 100644
--- a/drivers/net/ixgbe/base/ixgbe_mbx.c
+++ b/drivers/net/ixgbe/base/ixgbe_mbx.c
@@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id)
STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw)
{
s32 ret_val = IXGBE_ERR_MBX;
+  s32 timeout = hw->mbx.timeout;
+  s32 usec = hw->mbx.usec_delay;

DEBUGFUNC("ixgbe_obtain_mbx_lock_vf");

-  /* Take ownership of the buffer */
-  IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
+  do {
+    /* Take ownership of the buffer */
+    IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);

-  /* reserve mailbox for vf use */
-  if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU)
-    ret_val = IXGBE_SUCCESS;
+    /* reserve mailbox for vf use */
+    if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) {
+      ret_val = IXGBE_SUCCESS;
+      break;
+    }
+
+    usec_delay(usec);
+  } while (timeout--);

return ret_val;
}
--
2.30.1.windows.1
  
Qiming Chen Sept. 9, 2021, 3:34 a.m. UTC | #5
This is a problem triggered by the existing network. I discovered it a long time ago. I use the link state as an example. 
It is not to say that it is a link state problem, but to show that ixgbevf does have such a probability problem. 
The current modification and repeated verification can indeed solve the problem. The specific root cause of the problem may not be analyzed. 
Since the mailbox itself has a reliability mechanism, why not use it here?I understand that the status of the vf mailbox is read from the register. 
If you repeatedly reset the pf, will the transient fail because the register value has not been initialized, and it will succeed later?
  

Patch

diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c
index 4dddff2c58..5a14fcc7b4 100644
--- a/drivers/net/ixgbe/base/ixgbe_mbx.c
+++ b/drivers/net/ixgbe/base/ixgbe_mbx.c
@@ -370,15 +370,23 @@  STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id)
 STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw)
 {
 	s32 ret_val = IXGBE_ERR_MBX;
+	s32 timeout = hw->mbx.timeout;
+	s32 usec = hw->mbx.usec_delay;
 
 	DEBUGFUNC("ixgbe_obtain_mbx_lock_vf");
 
-	/* Take ownership of the buffer */
-	IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
+	do {
+		/* Take ownership of the buffer */
+		IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU);
 
-	/* reserve mailbox for vf use */
-	if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU)
-		ret_val = IXGBE_SUCCESS;
+		/* reserve mailbox for vf use */
+		if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) {
+			ret_val = IXGBE_SUCCESS;
+			break;
+		}
+
+		usec_delay(usec);
+	} while (timeout--);
 
 	return ret_val;
 }