net/ixgbe: fix link state timing issue on fiber ports

Message ID 1584600111-17412-1-git-send-email-phil.yang@arm.com (mailing list archive)
State Superseded, archived
Delegated to: xiaolong ye
Headers
Series net/ixgbe: fix link state timing issue on fiber ports |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Phil Yang March 19, 2020, 6:41 a.m. UTC
  With some models of fiber ports (e.g. X520-2 device ID 0x10fb), it
is possible when a port is started to experience a timing issue
which prevents the link from ever being fully set up.

In ixgbe_dev_link_update_share(), if the media type is fiber and the
link is down, a flag (IXGBE_FLAG_NEED_LINK_CONFIG) is set. A callback
to ixgbe_dev_setup_link_thread_handler() is scheduled which should
try to set up the link and clear the flag afterwards.

If the device is started before the flag is cleared, the scheduled
callback is cancelled. This causes the flag to remain set and
subsequent calls to ixgbe_dev_link_update_share() return
without trying to retrieve the link state because the flag is set.

In ixgbe_dev_cancel_link_thread(), after cancelling the callback,
unset the flag on the device to avoid this condition.

Fixes: 819d0d1d57f1 ("net/ixgbe: fix blocking system events")
Cc: stable@dpdk.org

Bugzilla ID: 388

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Lijian Zhang <lijian.zhang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)
  

Comments

Lijian Zhang March 19, 2020, 10:51 a.m. UTC | #1
This issue is firstly observed with an ixgbe NIC in VPP project, which is software switching application based on DPDK.
There's a daemon thread running in background keeping polling hardware link status, using ixgbe_dev_link_update_share().
Once flag IXGBE_FLAG_NEED_LINK_CONFIG is set, ixgbe_dev_link_update_share() will just return link down status without actually polling hardware status.

In the issue, flag IXGBE_FLAG_NEED_LINK_CONFIG is always set, and never be cleared, meaning ixgbe_dev_link_update_share() cannot get hardware status, but always get link down status.

The condition causing IXGBE_FLAG_NEED_LINK_CONFIG always set is as below.

The ixgbe_dev_link_update_share() is always running in the background.
1. In the beginning, IXGBE_FLAG_NEED_LINK_CONFIG is 0 and it is link down status.
2. ixgbe_dev_link_update_share() will set IXGBE_FLAG_NEED_LINK_CONFIG to 1
3. Then it triggers ixgbe_dev_setup_link_thread_handler() thread to configure the interface.
4. At the end of configuring thread, ixgbe_dev_setup_link_thread_handler() will clear the flag IXGBE_FLAG_NEED_LINK_CONFIG.
5. With IXGBE_FLAG_NEED_LINK_CONFIG being cleared, ixgbe_dev_link_update_share() can poll hardware link status in the next round.

But when the user is setting interface link up or down in the CLI, it will call ixgbe_dev_start() or ixgbe_dev_stop(). In both function, they will call ixgbe_dev_cancel_link_thread() to interrupt any running configuring thread (which is running in above step 3 and step 4), without clearing the flag IXGBE_FLAG_NEED_LINK_CONFIG. This will leave IXGBE_FLAG_NEED_LINK_CONFIG always set, and ixgbe_dev_link_update_share() cannot get hardware status.
Thanks.

> -----Original Message-----
> From: Phil Yang <phil.yang@arm.com>
> Sent: 2020年3月19日 14:42
> To: dev@dpdk.org; konstantin.ananyev@intel.com; wenzhuo.lu@intel.com
> Cc: qi.z.zhang@intel.com; Lijian Zhang <Lijian.Zhang@arm.com>; Gavin Hu
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org
> Subject: [PATCH] net/ixgbe: fix link state timing issue on fiber ports
> 
> With some models of fiber ports (e.g. X520-2 device ID 0x10fb), it is possible
> when a port is started to experience a timing issue which prevents the link
> from ever being fully set up.
> 
> In ixgbe_dev_link_update_share(), if the media type is fiber and the link is
> down, a flag (IXGBE_FLAG_NEED_LINK_CONFIG) is set. A callback to
> ixgbe_dev_setup_link_thread_handler() is scheduled which should try to set up
> the link and clear the flag afterwards.
> 
> If the device is started before the flag is cleared, the scheduled callback is
> cancelled. This causes the flag to remain set and subsequent calls to
> ixgbe_dev_link_update_share() return without trying to retrieve the link state
> because the flag is set.
> 
> In ixgbe_dev_cancel_link_thread(), after cancelling the callback, unset the flag
> on the device to avoid this condition.
> 
> Fixes: 819d0d1d57f1 ("net/ixgbe: fix blocking system events")
> Cc: stable@dpdk.org
> 
> Bugzilla ID: 388
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Signed-off-by: Lijian Zhang <lijian.zhang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 23b3f5b..2b65750 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -4147,11 +4147,19 @@ static void
>  ixgbe_dev_cancel_link_thread(struct rte_eth_dev *dev)  {
>  	struct ixgbe_adapter *ad = dev->data->dev_private;
> +	struct ixgbe_interrupt *intr =
> +		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
>  	void *retval;
> 
>  	if (rte_atomic32_read(&ad->link_thread_running)) {
>  		pthread_cancel(ad->link_thread_tid);
>  		pthread_join(ad->link_thread_tid, &retval);
> +		/* clear this flag once the thread has been
> +		 * cancelled, to avoid link status error in
> +		 * case unfinished threads cannot clean up
> +		 * this flag.
> +		 */
> +		intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG;
>  		rte_atomic32_clear(&ad->link_thread_running);
>  	}
>  }
> @@ -4262,8 +4270,12 @@ ixgbe_dev_link_update_share(struct rte_eth_dev
> *dev,
> 
>  	if (link_up == 0) {
>  		if (ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
> -			intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
>  			if (rte_atomic32_test_and_set(&ad-
> >link_thread_running)) {
> +				/* To avoid race condition between threads,
> set
> +				 * the IXGBE_FLAG_NEED_LINK_CONFIG flag
> only
> +				 * when there is no link thread running.
> +				 */
> +				intr->flags |=
> IXGBE_FLAG_NEED_LINK_CONFIG;
>  				if (rte_ctrl_thread_create(&ad-
> >link_thread_tid,
>  					"ixgbe-link-handler",
>  					NULL,
> --
> 2.7.4
  
Phil Yang May 8, 2020, 2:48 a.m. UTC | #2
> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix link state timing issue on fiber
> ports
> 
> With some models of fiber ports (e.g. X520-2 device ID 0x10fb), it
> is possible when a port is started to experience a timing issue
> which prevents the link from ever being fully set up.
> 
> In ixgbe_dev_link_update_share(), if the media type is fiber and the
> link is down, a flag (IXGBE_FLAG_NEED_LINK_CONFIG) is set. A callback
> to ixgbe_dev_setup_link_thread_handler() is scheduled which should
> try to set up the link and clear the flag afterwards.
> 
> If the device is started before the flag is cleared, the scheduled
> callback is cancelled. This causes the flag to remain set and
> subsequent calls to ixgbe_dev_link_update_share() return
> without trying to retrieve the link state because the flag is set.
> 
> In ixgbe_dev_cancel_link_thread(), after cancelling the callback,
> unset the flag on the device to avoid this condition.
> 
> Fixes: 819d0d1d57f1 ("net/ixgbe: fix blocking system events")
> Cc: stable@dpdk.org
> 
> Bugzilla ID: 388
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Signed-off-by: Lijian Zhang <lijian.zhang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---

Ping.

Thanks,
Phil

<Snip>
  
Xiaolong Ye May 8, 2020, 8:36 a.m. UTC | #3
On 05/08, Phil Yang wrote:
>> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix link state timing issue on fiber
>> ports
>> 
>> With some models of fiber ports (e.g. X520-2 device ID 0x10fb), it
>> is possible when a port is started to experience a timing issue
>> which prevents the link from ever being fully set up.
>> 
>> In ixgbe_dev_link_update_share(), if the media type is fiber and the
>> link is down, a flag (IXGBE_FLAG_NEED_LINK_CONFIG) is set. A callback
>> to ixgbe_dev_setup_link_thread_handler() is scheduled which should
>> try to set up the link and clear the flag afterwards.
>> 
>> If the device is started before the flag is cleared, the scheduled
>> callback is cancelled. This causes the flag to remain set and
>> subsequent calls to ixgbe_dev_link_update_share() return
>> without trying to retrieve the link state because the flag is set.
>> 
>> In ixgbe_dev_cancel_link_thread(), after cancelling the callback,
>> unset the flag on the device to avoid this condition.
>> 
>> Fixes: 819d0d1d57f1 ("net/ixgbe: fix blocking system events")
>> Cc: stable@dpdk.org
>> 
>> Bugzilla ID: 388
>> 
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Signed-off-by: Lijian Zhang <lijian.zhang@arm.com>
>> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> ---
>
>Ping.

This fix makes sense to me, thanks for the work.
And it seems can't be applied to latest dpdk-next-net-intel cleanly, could you
do a rebase?

Thanks,
Xiaolong

>
>Thanks,
>Phil
>
><Snip>
  
Phil Yang May 8, 2020, 10:31 a.m. UTC | #4
> -----Original Message-----
> From: Ye Xiaolong <xiaolong.ye@intel.com>
> Sent: Friday, May 8, 2020 4:36 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: dev@dpdk.org; konstantin.ananyev@intel.com; wenzhuo.lu@intel.com;
> qi.z.zhang@intel.com; Lijian Zhang <Lijian.Zhang@arm.com>; Gavin Hu
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix link state timing issue on fiber
> ports
> 
> On 05/08, Phil Yang wrote:
> >> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix link state timing issue on fiber
> >> ports
> >>
> >> With some models of fiber ports (e.g. X520-2 device ID 0x10fb), it
> >> is possible when a port is started to experience a timing issue
> >> which prevents the link from ever being fully set up.
> >>
> >> In ixgbe_dev_link_update_share(), if the media type is fiber and the
> >> link is down, a flag (IXGBE_FLAG_NEED_LINK_CONFIG) is set. A callback
> >> to ixgbe_dev_setup_link_thread_handler() is scheduled which should
> >> try to set up the link and clear the flag afterwards.
> >>
> >> If the device is started before the flag is cleared, the scheduled
> >> callback is cancelled. This causes the flag to remain set and
> >> subsequent calls to ixgbe_dev_link_update_share() return
> >> without trying to retrieve the link state because the flag is set.
> >>
> >> In ixgbe_dev_cancel_link_thread(), after cancelling the callback,
> >> unset the flag on the device to avoid this condition.
> >>
> >> Fixes: 819d0d1d57f1 ("net/ixgbe: fix blocking system events")
> >> Cc: stable@dpdk.org
> >>
> >> Bugzilla ID: 388
> >>
> >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> >> Signed-off-by: Lijian Zhang <lijian.zhang@arm.com>
> >> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >> ---
> >
> >Ping.
> 
> This fix makes sense to me, thanks for the work.
> And it seems can't be applied to latest dpdk-next-net-intel cleanly, could you
> do a rebase?
> 
Thank you Xiaolong.
I updated in V2, please review it.

Thanks,
Phil
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 23b3f5b..2b65750 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4147,11 +4147,19 @@  static void
 ixgbe_dev_cancel_link_thread(struct rte_eth_dev *dev)
 {
 	struct ixgbe_adapter *ad = dev->data->dev_private;
+	struct ixgbe_interrupt *intr =
+		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
 	void *retval;
 
 	if (rte_atomic32_read(&ad->link_thread_running)) {
 		pthread_cancel(ad->link_thread_tid);
 		pthread_join(ad->link_thread_tid, &retval);
+		/* clear this flag once the thread has been
+		 * cancelled, to avoid link status error in
+		 * case unfinished threads cannot clean up
+		 * this flag.
+		 */
+		intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG;
 		rte_atomic32_clear(&ad->link_thread_running);
 	}
 }
@@ -4262,8 +4270,12 @@  ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
 
 	if (link_up == 0) {
 		if (ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
-			intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
 			if (rte_atomic32_test_and_set(&ad->link_thread_running)) {
+				/* To avoid race condition between threads, set
+				 * the IXGBE_FLAG_NEED_LINK_CONFIG flag only
+				 * when there is no link thread running.
+				 */
+				intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
 				if (rte_ctrl_thread_create(&ad->link_thread_tid,
 					"ixgbe-link-handler",
 					NULL,