[v2] net/ixgbe: fix blocking system events
Checks
Commit Message
From: Zhu Tao <taox.zhu@intel.com>
IXGBE link status task use rte alarm thread in old implementation.
Sometime ixgbe link status task takes up to 9 seconds. This will
severely affect the rte-alarm-thread-dependent a task in the
system, like interrupt or hotplug event. So replace with a
independent thread which has the same thread affinity settings
as rte interrupt.
Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link update")
Cc: stable@dpdk.org
Signed-off-by: Zhu Tao <taox.zhu@intel.com>
---
drivers/net/ixgbe/ixgbe_ethdev.c | 50 +++++++++++++++++++++++++++-----
drivers/net/ixgbe/ixgbe_ethdev.h | 2 ++
2 files changed, 44 insertions(+), 8 deletions(-)
Comments
> IXGBE link status task use rte alarm thread in old implementation.
> Sometime ixgbe link status task takes up to 9 seconds. This will
> severely affect the rte-alarm-thread-dependent a task in the
> system, like interrupt or hotplug event. So replace with a
> independent thread which has the same thread affinity settings
> as rte interrupt.
>
> Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link update")
> Cc: stable@dpdk.org
>
> Signed-off-by: Zhu Tao <taox.zhu@intel.com>
> ---
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 2.17.1
On 01/15, taox.zhu@intel.com wrote:
>From: Zhu Tao <taox.zhu@intel.com>
>
>IXGBE link status task use rte alarm thread in old implementation.
>Sometime ixgbe link status task takes up to 9 seconds. This will
>severely affect the rte-alarm-thread-dependent a task in the
What does "rte-alarm-thread-dependent a task" mean?
Thanks,
Xiaolong
>system, like interrupt or hotplug event. So replace with a
>independent thread which has the same thread affinity settings
>as rte interrupt.
>
>Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link update")
>Cc: stable@dpdk.org
>
>Signed-off-by: Zhu Tao <taox.zhu@intel.com>
On 01/15, taox.zhu@intel.com wrote:
>From: Zhu Tao <taox.zhu@intel.com>
>
>IXGBE link status task use rte alarm thread in old implementation.
s/use/uses
>Sometime ixgbe link status task takes up to 9 seconds. This will
>severely affect the rte-alarm-thread-dependent a task in the
>system, like interrupt or hotplug event. So replace with a
s/a/an
>independent thread which has the same thread affinity settings
>as rte interrupt.
>
>Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link update")
Should be:
Fixes: 0408f47ba4d6 ("net/ixgbe: fix busy polling while fiber link update")
>Cc: stable@dpdk.org
>
Applied to dpdk-next-net-intel with Konstantin's ack, Thanks.
On 2/15/2020 3:41 PM, Ye Xiaolong wrote:
> On 01/15, taox.zhu@intel.com wrote:
>> From: Zhu Tao <taox.zhu@intel.com>
>>
>> IXGBE link status task use rte alarm thread in old implementation.
>
> s/use/uses
>
>> Sometime ixgbe link status task takes up to 9 seconds. This will
>> severely affect the rte-alarm-thread-dependent a task in the
>> system, like interrupt or hotplug event. So replace with a
>
> s/a/an
>
>> independent thread which has the same thread affinity settings
>> as rte interrupt.
>>
>> Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link update")
>
> Should be:
>
> Fixes: 0408f47ba4d6 ("net/ixgbe: fix busy polling while fiber link update")
>
>> Cc: stable@dpdk.org
>>
>
> Applied to dpdk-next-net-intel with Konstantin's ack, Thanks.
>
Shared build is failing because of missing pthread library, fixing while merging
to next-net:
diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index 8b9d51f75..aec56a680 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -57,6 +57,7 @@ endif
LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
LDLIBS += -lrte_bus_pci
+LDLIBS += -lpthread
#
# Add extra flags for base driver files (also known as shared code)
17/02/2020 14:01, Ferruh Yigit:
> On 2/15/2020 3:41 PM, Ye Xiaolong wrote:
> > On 01/15, taox.zhu@intel.com wrote:
> >> From: Zhu Tao <taox.zhu@intel.com>
> >>
> >> IXGBE link status task use rte alarm thread in old implementation.
> >
> > s/use/uses
> >
> >> Sometime ixgbe link status task takes up to 9 seconds. This will
> >> severely affect the rte-alarm-thread-dependent a task in the
> >> system, like interrupt or hotplug event. So replace with a
> >
> > s/a/an
> >
> >> independent thread which has the same thread affinity settings
> >> as rte interrupt.
> >>
> >> Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link update")
> >
> > Should be:
> >
> > Fixes: 0408f47ba4d6 ("net/ixgbe: fix busy polling while fiber link update")
> >
> >> Cc: stable@dpdk.org
> >>
> >
> > Applied to dpdk-next-net-intel with Konstantin's ack, Thanks.
> >
>
> Shared build is failing because of missing pthread library, fixing while merging
> to next-net:
One more thing looks strange in this patch:
ixgbe_dev_cannel_link_thread
Should it be
ixgbe_dev_cancel_link_thread
?
Note: I looked at it because I am not sure multiplying the interrupt
threads is a good idea.
Basically the link status management is too long in this driver.
Instead of fixing the root cause, you move the annoying workload
somewhere else. But it is still there...
Please could you work on a long term fix?
Hi Thomas
Thank you for your correction spelling error of 'cancel'.
Indeed it is not the best solution by creating a thread. I refer to the same solution with Linux kernel driver. Linux kernel driver manages link status by using a thread. Maybe we can figure out another better solution to fix this problem but it may take much more time. At this time, 20.02 formal release is coming and this problem affect some basic library.
Tks for your understanding.
BR,
Zhu, Tao
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, February 20, 2020 11:37 PM
> To: Ye, Xiaolong <xiaolong.ye@intel.com>; Zhu, TaoX <taox.zhu@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> Lu, Wenzhuo <wenzhuo.lu@intel.com>; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/ixgbe: fix blocking system events
>
> 17/02/2020 14:01, Ferruh Yigit:
> > On 2/15/2020 3:41 PM, Ye Xiaolong wrote:
> > > On 01/15, taox.zhu@intel.com wrote:
> > >> From: Zhu Tao <taox.zhu@intel.com>
> > >>
> > >> IXGBE link status task use rte alarm thread in old implementation.
> > >
> > > s/use/uses
> > >
> > >> Sometime ixgbe link status task takes up to 9 seconds. This will
> > >> severely affect the rte-alarm-thread-dependent a task in the
> > >> system, like interrupt or hotplug event. So replace with a
> > >
> > > s/a/an
> > >
> > >> independent thread which has the same thread affinity settings as
> > >> rte interrupt.
> > >>
> > >> Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link
> > >> update")
> > >
> > > Should be:
> > >
> > > Fixes: 0408f47ba4d6 ("net/ixgbe: fix busy polling while fiber link
> > > update")
> > >
> > >> Cc: stable@dpdk.org
> > >>
> > >
> > > Applied to dpdk-next-net-intel with Konstantin's ack, Thanks.
> > >
> >
> > Shared build is failing because of missing pthread library, fixing
> > while merging to next-net:
>
> One more thing looks strange in this patch:
> ixgbe_dev_cannel_link_thread
> Should it be
> ixgbe_dev_cancel_link_thread
> ?
>
> Note: I looked at it because I am not sure multiplying the interrupt threads is
> a good idea.
> Basically the link status management is too long in this driver.
> Instead of fixing the root cause, you move the annoying workload
> somewhere else. But it is still there...
>
> Please could you work on a long term fix?
>
21/02/2020 09:19, Zhu, TaoX:
> Hi Thomas
>
> Thank you for your correction spelling error of 'cancel'.
>
> Indeed it is not the best solution by creating a thread. I refer to the same solution with Linux kernel driver. Linux kernel driver manages link status by using a thread. Maybe we can figure out another better solution to fix this problem but it may take much more time. At this time, 20.02 formal release is coming and this problem affect some basic library.
> Tks for your understanding.
I understand, that's why I already accepted this patch in mainline (while fixing typo).
Please it would be really appreciated to work on a better solution.
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, February 20, 2020 11:37 PM
> > To: Ye, Xiaolong <xiaolong.ye@intel.com>; Zhu, TaoX <taox.zhu@intel.com>;
> > Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > Lu, Wenzhuo <wenzhuo.lu@intel.com>; stable@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] net/ixgbe: fix blocking system events
> >
> > 17/02/2020 14:01, Ferruh Yigit:
> > > On 2/15/2020 3:41 PM, Ye Xiaolong wrote:
> > > > On 01/15, taox.zhu@intel.com wrote:
> > > >> From: Zhu Tao <taox.zhu@intel.com>
> > > >>
> > > >> IXGBE link status task use rte alarm thread in old implementation.
> > > >
> > > > s/use/uses
> > > >
> > > >> Sometime ixgbe link status task takes up to 9 seconds. This will
> > > >> severely affect the rte-alarm-thread-dependent a task in the
> > > >> system, like interrupt or hotplug event. So replace with a
> > > >
> > > > s/a/an
> > > >
> > > >> independent thread which has the same thread affinity settings as
> > > >> rte interrupt.
> > > >>
> > > >> Fixes: 0408f47b ("net/ixgbe: fix busy polling while fiber link
> > > >> update")
> > > >
> > > > Should be:
> > > >
> > > > Fixes: 0408f47ba4d6 ("net/ixgbe: fix busy polling while fiber link
> > > > update")
> > > >
> > > >> Cc: stable@dpdk.org
> > > >>
> > > >
> > > > Applied to dpdk-next-net-intel with Konstantin's ack, Thanks.
> > > >
> > >
> > > Shared build is failing because of missing pthread library, fixing
> > > while merging to next-net:
> >
> > One more thing looks strange in this patch:
> > ixgbe_dev_cannel_link_thread
> > Should it be
> > ixgbe_dev_cancel_link_thread
> > ?
> >
> > Note: I looked at it because I am not sure multiplying the interrupt threads is
> > a good idea.
> > Basically the link status management is too long in this driver.
> > Instead of fixing the root cause, you move the annoying workload
> > somewhere else. But it is still there...
> >
> > Please could you work on a long term fix?
> >
>
>
@@ -229,7 +229,8 @@ static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
static void ixgbe_dev_interrupt_handler(void *param);
static void ixgbe_dev_interrupt_delayed_handler(void *param);
-static void ixgbe_dev_setup_link_alarm_handler(void *param);
+static void *ixgbe_dev_setup_link_thread_handler(void *param);
+static void ixgbe_dev_cannel_link_thread(struct rte_eth_dev *dev);
static int ixgbe_add_rar(struct rte_eth_dev *dev,
struct rte_ether_addr *mac_addr,
@@ -1078,6 +1079,7 @@ ixgbe_swfw_lock_reset(struct ixgbe_hw *hw)
static int
eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
{
+ struct ixgbe_adapter *ad = eth_dev->data->dev_private;
struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
struct ixgbe_hw *hw =
@@ -1129,6 +1131,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
return 0;
}
+ rte_atomic32_clear(&ad->link_thread_running);
rte_eth_copy_pci_info(eth_dev, pci_dev);
/* Vendor and Device ID need to be set before init of shared code */
@@ -1567,6 +1570,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
{
int diag;
uint32_t tc, tcs;
+ struct ixgbe_adapter *ad = eth_dev->data->dev_private;
struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
struct ixgbe_hw *hw =
@@ -1607,6 +1611,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
return 0;
}
+ rte_atomic32_clear(&ad->link_thread_running);
ixgbevf_parse_devargs(eth_dev->data->dev_private,
pci_dev->device.devargs);
@@ -2563,7 +2568,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();
/* Stop the link setup handler before resetting the HW. */
- rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler, dev);
+ ixgbe_dev_cannel_link_thread(dev);
/* disable uio/vfio intr/eventfd mapping */
rte_intr_disable(intr_handle);
@@ -2844,7 +2849,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();
- rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler, dev);
+ ixgbe_dev_cannel_link_thread(dev);
/* disable interrupts */
ixgbe_disable_intr(hw);
@@ -4098,9 +4103,23 @@ ixgbevf_check_link(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
}
static void
-ixgbe_dev_setup_link_alarm_handler(void *param)
+ixgbe_dev_cannel_link_thread(struct rte_eth_dev *dev)
+{
+ struct ixgbe_adapter *ad = dev->data->dev_private;
+ void *retval;
+
+ if (rte_atomic32_read(&ad->link_thread_running)) {
+ pthread_cancel(ad->link_thread_tid);
+ pthread_join(ad->link_thread_tid, &retval);
+ rte_atomic32_clear(&ad->link_thread_running);
+ }
+}
+
+static void *
+ixgbe_dev_setup_link_thread_handler(void *param)
{
struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
+ struct ixgbe_adapter *ad = dev->data->dev_private;
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_interrupt *intr =
IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
@@ -4114,6 +4133,8 @@ ixgbe_dev_setup_link_alarm_handler(void *param)
ixgbe_setup_link(hw, speed, true);
intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG;
+ rte_atomic32_clear(&ad->link_thread_running);
+ return NULL;
}
/*
@@ -4153,6 +4174,7 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
int wait_to_complete, int vf)
{
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ struct ixgbe_adapter *ad = dev->data->dev_private;
struct rte_eth_link link;
ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_UNKNOWN;
struct ixgbe_interrupt *intr =
@@ -4198,8 +4220,20 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
if (link_up == 0) {
if (ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
- rte_eal_alarm_set(10,
- ixgbe_dev_setup_link_alarm_handler, dev);
+ if (rte_atomic32_test_and_set(&ad->link_thread_running)) {
+ if (rte_ctrl_thread_create(&ad->link_thread_tid,
+ "ixgbe-link-handler",
+ NULL,
+ ixgbe_dev_setup_link_thread_handler,
+ dev) < 0) {
+ PMD_DRV_LOG(ERR,
+ "Create link thread failed!");
+ rte_atomic32_clear(&ad->link_thread_running);
+ }
+ } else {
+ PMD_DRV_LOG(ERR,
+ "Other link thread is running now!");
+ }
}
return rte_eth_linkstatus_set(dev, &link);
}
@@ -5243,7 +5277,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();
/* Stop the link setup handler before resetting the HW. */
- rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler, dev);
+ ixgbe_dev_cannel_link_thread(dev);
err = hw->mac.ops.reset_hw(hw);
if (err) {
@@ -5341,7 +5375,7 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();
- rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler, dev);
+ ixgbe_dev_cannel_link_thread(dev);
ixgbevf_intr_disable(dev);
@@ -511,6 +511,8 @@ struct ixgbe_adapter {
* mailbox status) link status.
*/
uint8_t pflink_fullchk;
+ rte_atomic32_t link_thread_running;
+ pthread_t link_thread_tid;
};
struct ixgbe_vf_representor {