[dpdk-dev,v6,0/4] support reset of VF link
diff mbox

Message ID 1467908383.1762.22.camel@brocade.com
State Not Applicable, archived
Delegated to: Bruce Richardson
Headers show

Commit Message

Luca Boccassi July 7, 2016, 4:19 p.m. UTC
On Thu, 2016-07-07 at 13:12 +0000, Lu, Wenzhuo wrote:
> > -----Original Message-----

> > From: Luca Boccassi [mailto:lboccass@Brocade.com]

> > Sent: Thursday, July 7, 2016 6:21 PM

> > To: Lu, Wenzhuo

> > Cc: dev@dpdk.org

> > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link

> > 

> > On Thu, 2016-07-07 at 01:09 +0000, Lu, Wenzhuo wrote:

> > > Hi Luca,

> > >

> > >

> > > > -----Original Message-----

> > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]

> > > > Sent: Thursday, July 7, 2016 12:23 AM

> > > > To: Lu, Wenzhuo

> > > > Cc: dev@dpdk.org

> > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link

> > > >

> > > > On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:

> > > > > Hi Luca,

> > > > >

> > > > > > -----Original Message-----

> > > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]

> > > > > > Sent: Tuesday, July 5, 2016 5:53 PM

> > > > > > To: Lu, Wenzhuo

> > > > > > Cc: dev@dpdk.org

> > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link

> > > > > >

> > > > > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:

> > > > > > > Hi Luca,

> > > > > > >

> > > > > > >

> > > > > > > > -----Original Message-----

> > > > > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]

> > > > > > > > Sent: Monday, July 4, 2016 11:48 PM

> > > > > > > > To: Lu, Wenzhuo

> > > > > > > > Cc: dev@dpdk.org

> > > > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF

> > > > > > > > link

> > > > > > > >

> > > > > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:

> > > > > > > > > If the PF link is down and up, VF link will not work accordingly.

> > > > > > > > > This patch set addes the support of VF link reset. So,

> > > > > > > > > when VF receices the messges of physical link down/up. APP

> > > > > > > > > can reset the VF link and let it recover.

> > > > > > > > >

> > > > > > > > > PS: This patch set is splitted from a previous patch set,

> > > > > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base

> > > > > > > > > on the patch set *support mailbox interruption on ixgbe/igb VF*.

> > > > > > > > >

> > > > > > > > > Wenzhuo Lu (3):

> > > > > > > > >   lib/librte_ether: support device reset

> > > > > > > > >   ixgbe: implement device reset on VF

> > > > > > > > >   igb: implement device reset on VF

> > > > > > > > >

> > > > > > > > > Zhe Tao (1):

> > > > > > > > >   i40e: implement device reset on VF

> > > > > > > > >

> > > > > > > > > v1:

> > > > > > > > > - Added the implementation for the VF reset functionality.

> > > > > > > > > v2:

> > > > > > > > > - Changed the i40e related operations during VF reset.

> > > > > > > > > v3:

> > > > > > > > > - Resent the patches because of the mail sent issue.

> > > > > > > > > v4:

> > > > > > > > > - Removed some VF reset emulation code.

> > > > > > > > > v5:

> > > > > > > > > - Removed all the code related with lock.

> > > > > > > > > v6:

> > > > > > > > > - Updated the NIC feature overview matrix.

> > > > > > > > > - Added more explanation in the doxygen comment of reset API.

> > > > > > > > >

> > > > > > > > >  doc/guides/nics/overview.rst           |  1 +

> > > > > > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++

> > > > > > > > >  drivers/net/e1000/igb_ethdev.c         | 59

> > ++++++++++++++++++++++++

> > > > > > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++

> > > > > > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83

> > > > > > > > ++++++++++++++++++++++++++++++++++

> > > > > > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++

> > > > > > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++

> > > > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64

> > > > +++++++++++++++++++++++++-

> > > > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-

> > > > > > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--

> > > > > > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++

> > > > > > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++

> > > > > > > > >  lib/librte_ether/rte_ether_version.map |  7 +++

> > > > > > > > >  13 files changed, 295 insertions(+), 5 deletions(-)

> > > > > > > >

> > > > > > > > Hello Wenzhuo,

> > > > > > > >

> > > > > > > > I'm testing this patchset, but I am sporadically running

> > > > > > > > into an issue where the VFs reset fails after the PF flaps.

> > > > > > > >

> > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.

> > > > > > > >

> > > > > > > > I am using calling rte_eth_dev_reset in response to a

> > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors

> > > > > > > > appear in the

> > > > > > > > log:

> > > > > > > >

> > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.

> > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed

> > > > > > > > queue_id=0

> > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware

> > > > > > > > (-12)

> > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.

> > > > > > > >

> > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call

> > > > > > > > in ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of

> > 2048.

> > > > > > > > The application has ~500 2MB hugepages, and there's 2GB of

> > > > > > > > free memory available on top of that.

> > > > > > > >

> > > > > > > > Have you seen this before? Any pointer or suggestion for debugging?

> > > > > > > >

> > > > > > > > Thanks!

> > > > > > > >

> > > > > > > > --

> > > > > > > > Kind regards,

> > > > > > > > Luca Boccassi

> > > > > > > I think the problem is the mbuf occupied by the packets is not

> > > > > > > released. This

> > > > > > memory has to be released by the APP, so my patches haven’t covered

> > this.

> > > > > > Actually an example is needed to show how to use the reset API.

> > > > > > I plan to modify the testpmd.

> > > > > > > You may notice this feature is postponed to 16.11. Would you

> > > > > > > like to wait for

> > > > > > the new version that will include an example?

> > > > > >

> > > > > > Hi,

> > > > > >

> > > > > > Unfortunately we need the VF reset working sooner than that, so

> > > > > > one way or the other I'll need to sort it out. Given I've got a

> > > > > > use case where this is happening, if it can be helpful for you

> > > > > > I'm more than happy to help as a guinea pig. If you could please

> > > > > > give some guidance/guidelines with regards to which API to use

> > > > > > to sort the mbuf

> > > > problem, I can try it out and give back some feedback.

> > > > > >

> > > > > > Thanks!

> > > > > I made a stupid mistake and deleted all my code. So, I have to

> > > > > take some time to rewrite it :( Attached the example I used to

> > > > > test the reset API. It's

> > > > modified from the l2fwd example. So you can compare it with l2fwd to

> > > > see what need to be added.

> > > > > Hopefully it can help :)

> > > >

> > > > Thanks! That made me understand a couple of things more, and I've

> > > > got past the problem.

> > > >

> > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking

> > call.

> > > > the _RESET event callback is fired when the PF goes down, but when I

> > > > call rte_eth_dev_reset it will block until the PF goes back up.

> > > > There is no way, as far as I can see, to know if the PF is back up before

> > calling rte_eth_dev_reset.

> > > >

> > > > This is a problem because, as far as I understand, I have to call

> > > > all the rte_eth_dev_ APIs from the same thread, in my case the

> > > > master thread, and I can't have that block potentially indefinitely.

> > > >

> > > > Would it be possible to have 2 events instead of 1, one when the PF

> > > > goes down and one when it goes up? This way an application would be

> > > > able to soft-stop the port (drain queues, etc) when the PF is down,

> > > > and then call the reset API when it goes back up.

> > > >

> > > > Thanks!

> > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.

> > > 1, Normally we use kernel driver for PF. Now the kernel driver only have one

> > kind of message for link down and up. So we cannot tell if it's down or up.

> > > 2, When the PF is down, if we don't reset the VF, VF is not working.

> > > It cannot receive any message from PF. So we cannot know that when PF

> > > is up. It means normally we have to reset VF twice when PF down and

> > > up. (Surely we can wait a while when we receive the message from PF

> > > until PF is up. But we cannot tell how long the time is appropriate.

> > > So this *wait a while* may work for flash.)

> > 

> > Thanks for the clarification, I understand.

> > 

> > The problem with a blocking call is that we basically need to spawn one thread

> > per rte_eth_dev_reset call, since there is no way of knowing if a PF is down for

> > good or just flapping, and we can't have a single thread managing all the

> > interfaces being blocked forever (EG: PF 1 and 2 go down, thread blocks on PF 1

> > reset call but it never returns, meanwhile PF 2 goes back up but call is never

> > made).

> > 

> > A colleague of mine, Eric Kinzie, suggested to add a blocking boolean parameter

> > to rte_eth_dev_reset API. If set to false, then the call will not block and just does

> > one try and return an error (EAGAIN ?). Would this be an acceptable proposition?

> It's a good suggestion. 

> And I think if the parameter is set to false and the link is not up after trying once, it will be APP's responsibility to setup a timer or something like that to keep trying to bring up the link.


That seems reasonable. I've thrown together a quick diff and played with
it on top of your patches and DPDK 2.2, seems to work as intended, I'm
attaching it for reference. Feel free to pick it up, adapt it or ignore
it :-)

Also I've noticed that the ixgbe is the only one that actually blocks,
e1000 returns already immediately if the dev_start fails (perhaps it
should be changed to be consistent?) and ixgb40 does weird things that
I'm not sure about, but couldn't spot a loop in there :-)

Also I've used int instead of bool because
drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
compilation fails when including stdbool.h and using bool in
rte_ethdev.h

-- 
Kind regards,
Luca Boccassi

Comments

Lu, Wenzhuo July 8, 2016, 12:14 a.m. UTC | #1
> > > > > > > > > Hello Wenzhuo,

> > > > > > > > >

> > > > > > > > > I'm testing this patchset, but I am sporadically running

> > > > > > > > > into an issue where the VFs reset fails after the PF flaps.

> > > > > > > > >

> > > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs

> in.

> > > > > > > > >

> > > > > > > > > I am using calling rte_eth_dev_reset in response to a

> > > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following

> > > > > > > > > errors appear in the

> > > > > > > > > log:

> > > > > > > > >

> > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.

> > > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed

> > > > > > > > > queue_id=0

> > > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX

> > > > > > > > > hardware

> > > > > > > > > (-12)

> > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.

> > > > > > > > >

> > > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc

> > > > > > > > > call in ixgbe_alloc_rx_queue_mbufs returns NULL at

> > > > > > > > > iteration 64 out of

> > > 2048.

> > > > > > > > > The application has ~500 2MB hugepages, and there's 2GB

> > > > > > > > > of free memory available on top of that.

> > > > > > > > >

> > > > > > > > > Have you seen this before? Any pointer or suggestion for

> debugging?

> > > > > > > > >

> > > > > > > > > Thanks!

> > > > > > > > >

> > > > > > > > > --

> > > > > > > > > Kind regards,

> > > > > > > > > Luca Boccassi

> > > > > > > > I think the problem is the mbuf occupied by the packets is

> > > > > > > > not released. This

> > > > > > > memory has to be released by the APP, so my patches haven’t

> > > > > > > covered

> > > this.

> > > > > > > Actually an example is needed to show how to use the reset API.

> > > > > > > I plan to modify the testpmd.

> > > > > > > > You may notice this feature is postponed to 16.11. Would

> > > > > > > > you like to wait for

> > > > > > > the new version that will include an example?

> > > > > > >

> > > > > > > Hi,

> > > > > > >

> > > > > > > Unfortunately we need the VF reset working sooner than that,

> > > > > > > so one way or the other I'll need to sort it out. Given I've

> > > > > > > got a use case where this is happening, if it can be helpful

> > > > > > > for you I'm more than happy to help as a guinea pig. If you

> > > > > > > could please give some guidance/guidelines with regards to

> > > > > > > which API to use to sort the mbuf

> > > > > problem, I can try it out and give back some feedback.

> > > > > > >

> > > > > > > Thanks!

> > > > > > I made a stupid mistake and deleted all my code. So, I have to

> > > > > > take some time to rewrite it :( Attached the example I used to

> > > > > > test the reset API. It's

> > > > > modified from the l2fwd example. So you can compare it with

> > > > > l2fwd to see what need to be added.

> > > > > > Hopefully it can help :)

> > > > >

> > > > > Thanks! That made me understand a couple of things more, and

> > > > > I've got past the problem.

> > > > >

> > > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is

> > > > > a blocking

> > > call.

> > > > > the _RESET event callback is fired when the PF goes down, but

> > > > > when I call rte_eth_dev_reset it will block until the PF goes back up.

> > > > > There is no way, as far as I can see, to know if the PF is back

> > > > > up before

> > > calling rte_eth_dev_reset.

> > > > >

> > > > > This is a problem because, as far as I understand, I have to

> > > > > call all the rte_eth_dev_ APIs from the same thread, in my case

> > > > > the master thread, and I can't have that block potentially indefinitely.

> > > > >

> > > > > Would it be possible to have 2 events instead of 1, one when the

> > > > > PF goes down and one when it goes up? This way an application

> > > > > would be able to soft-stop the port (drain queues, etc) when the

> > > > > PF is down, and then call the reset API when it goes back up.

> > > > >

> > > > > Thanks!

> > > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.

> > > > 1, Normally we use kernel driver for PF. Now the kernel driver

> > > > only have one

> > > kind of message for link down and up. So we cannot tell if it's down or up.

> > > > 2, When the PF is down, if we don't reset the VF, VF is not working.

> > > > It cannot receive any message from PF. So we cannot know that when

> > > > PF is up. It means normally we have to reset VF twice when PF down

> > > > and up. (Surely we can wait a while when we receive the message

> > > > from PF until PF is up. But we cannot tell how long the time is appropriate.

> > > > So this *wait a while* may work for flash.)

> > >

> > > Thanks for the clarification, I understand.

> > >

> > > The problem with a blocking call is that we basically need to spawn

> > > one thread per rte_eth_dev_reset call, since there is no way of

> > > knowing if a PF is down for good or just flapping, and we can't have

> > > a single thread managing all the interfaces being blocked forever

> > > (EG: PF 1 and 2 go down, thread blocks on PF 1 reset call but it

> > > never returns, meanwhile PF 2 goes back up but call is never made).

> > >

> > > A colleague of mine, Eric Kinzie, suggested to add a blocking

> > > boolean parameter to rte_eth_dev_reset API. If set to false, then

> > > the call will not block and just does one try and return an error (EAGAIN ?).

> Would this be an acceptable proposition?

> > It's a good suggestion.

> > And I think if the parameter is set to false and the link is not up after trying

> once, it will be APP's responsibility to setup a timer or something like that to

> keep trying to bring up the link.

> 

> That seems reasonable. I've thrown together a quick diff and played with it on

> top of your patches and DPDK 2.2, seems to work as intended, I'm attaching it

> for reference. Feel free to pick it up, adapt it or ignore it :-)

> 

> Also I've noticed that the ixgbe is the only one that actually blocks,

> e1000 returns already immediately if the dev_start fails (perhaps it should be

> changed to be consistent?) and ixgb40 does weird things that I'm not sure about,

> but couldn't spot a loop in there :-)

> 

> Also I've used int instead of bool because

> drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so

> compilation fails when including stdbool.h and using bool in rte_ethdev.h

> 

> --

> Kind regards,

> Luca Boccassi

Glad to know it's working now.  Thanks for your patch.  Surely I'll try to include it in the next version :)
Luca Boccassi July 8, 2016, 5:15 p.m. UTC | #2
On Fri, 2016-07-08 at 00:14 +0000, Lu, Wenzhuo wrote:
> > > > > > > > > > Hello Wenzhuo,

> > > > > > > > > >

> > > > > > > > > > I'm testing this patchset, but I am sporadically running

> > > > > > > > > > into an issue where the VFs reset fails after the PF flaps.

> > > > > > > > > >

> > > > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs

> > in.

> > > > > > > > > >

> > > > > > > > > > I am using calling rte_eth_dev_reset in response to a

> > > > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following

> > > > > > > > > > errors appear in the

> > > > > > > > > > log:

> > > > > > > > > >

> > > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.

> > > > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed

> > > > > > > > > > queue_id=0

> > > > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX

> > > > > > > > > > hardware

> > > > > > > > > > (-12)

> > > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.

> > > > > > > > > >

> > > > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc

> > > > > > > > > > call in ixgbe_alloc_rx_queue_mbufs returns NULL at

> > > > > > > > > > iteration 64 out of

> > > > 2048.

> > > > > > > > > > The application has ~500 2MB hugepages, and there's 2GB

> > > > > > > > > > of free memory available on top of that.

> > > > > > > > > >

> > > > > > > > > > Have you seen this before? Any pointer or suggestion for

> > debugging?

> > > > > > > > > >

> > > > > > > > > > Thanks!

> > > > > > > > > >

> > > > > > > > > > --

> > > > > > > > > > Kind regards,

> > > > > > > > > > Luca Boccassi

> > > > > > > > > I think the problem is the mbuf occupied by the packets is

> > > > > > > > > not released. This

> > > > > > > > memory has to be released by the APP, so my patches haven’t

> > > > > > > > covered

> > > > this.

> > > > > > > > Actually an example is needed to show how to use the reset API.

> > > > > > > > I plan to modify the testpmd.

> > > > > > > > > You may notice this feature is postponed to 16.11. Would

> > > > > > > > > you like to wait for

> > > > > > > > the new version that will include an example?

> > > > > > > >

> > > > > > > > Hi,

> > > > > > > >

> > > > > > > > Unfortunately we need the VF reset working sooner than that,

> > > > > > > > so one way or the other I'll need to sort it out. Given I've

> > > > > > > > got a use case where this is happening, if it can be helpful

> > > > > > > > for you I'm more than happy to help as a guinea pig. If you

> > > > > > > > could please give some guidance/guidelines with regards to

> > > > > > > > which API to use to sort the mbuf

> > > > > > problem, I can try it out and give back some feedback.

> > > > > > > >

> > > > > > > > Thanks!

> > > > > > > I made a stupid mistake and deleted all my code. So, I have to

> > > > > > > take some time to rewrite it :( Attached the example I used to

> > > > > > > test the reset API. It's

> > > > > > modified from the l2fwd example. So you can compare it with

> > > > > > l2fwd to see what need to be added.

> > > > > > > Hopefully it can help :)

> > > > > >

> > > > > > Thanks! That made me understand a couple of things more, and

> > > > > > I've got past the problem.

> > > > > >

> > > > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is

> > > > > > a blocking

> > > > call.

> > > > > > the _RESET event callback is fired when the PF goes down, but

> > > > > > when I call rte_eth_dev_reset it will block until the PF goes back up.

> > > > > > There is no way, as far as I can see, to know if the PF is back

> > > > > > up before

> > > > calling rte_eth_dev_reset.

> > > > > >

> > > > > > This is a problem because, as far as I understand, I have to

> > > > > > call all the rte_eth_dev_ APIs from the same thread, in my case

> > > > > > the master thread, and I can't have that block potentially indefinitely.

> > > > > >

> > > > > > Would it be possible to have 2 events instead of 1, one when the

> > > > > > PF goes down and one when it goes up? This way an application

> > > > > > would be able to soft-stop the port (drain queues, etc) when the

> > > > > > PF is down, and then call the reset API when it goes back up.

> > > > > >

> > > > > > Thanks!

> > > > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.

> > > > > 1, Normally we use kernel driver for PF. Now the kernel driver

> > > > > only have one

> > > > kind of message for link down and up. So we cannot tell if it's down or up.

> > > > > 2, When the PF is down, if we don't reset the VF, VF is not working.

> > > > > It cannot receive any message from PF. So we cannot know that when

> > > > > PF is up. It means normally we have to reset VF twice when PF down

> > > > > and up. (Surely we can wait a while when we receive the message

> > > > > from PF until PF is up. But we cannot tell how long the time is appropriate.

> > > > > So this *wait a while* may work for flash.)

> > > >

> > > > Thanks for the clarification, I understand.

> > > >

> > > > The problem with a blocking call is that we basically need to spawn

> > > > one thread per rte_eth_dev_reset call, since there is no way of

> > > > knowing if a PF is down for good or just flapping, and we can't have

> > > > a single thread managing all the interfaces being blocked forever

> > > > (EG: PF 1 and 2 go down, thread blocks on PF 1 reset call but it

> > > > never returns, meanwhile PF 2 goes back up but call is never made).

> > > >

> > > > A colleague of mine, Eric Kinzie, suggested to add a blocking

> > > > boolean parameter to rte_eth_dev_reset API. If set to false, then

> > > > the call will not block and just does one try and return an error (EAGAIN ?).

> > Would this be an acceptable proposition?

> > > It's a good suggestion.

> > > And I think if the parameter is set to false and the link is not up after trying

> > once, it will be APP's responsibility to setup a timer or something like that to

> > keep trying to bring up the link.

> > 

> > That seems reasonable. I've thrown together a quick diff and played with it on

> > top of your patches and DPDK 2.2, seems to work as intended, I'm attaching it

> > for reference. Feel free to pick it up, adapt it or ignore it :-)

> > 

> > Also I've noticed that the ixgbe is the only one that actually blocks,

> > e1000 returns already immediately if the dev_start fails (perhaps it should be

> > changed to be consistent?) and ixgb40 does weird things that I'm not sure about,

> > but couldn't spot a loop in there :-)

> > 

> > Also I've used int instead of bool because

> > drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so

> > compilation fails when including stdbool.h and using bool in rte_ethdev.h

> > 

> > --

> > Kind regards,

> > Luca Boccassi

> Glad to know it's working now.  Thanks for your patch.  Surely I'll try to include it in the next version :)


Great, thanks!

Unfortunately I found one issue: if PF is down, and then the VF on the
guest is down as well (ip link down) and then goes back up before the
PF, then calling rte_eth_dev_reset will return 0 (success), even though
the PF is still down and it should fail. This is with ixgbe. Any idea
what could be the problem?

-- 
Kind regards,
Luca Boccassi
Lu, Wenzhuo July 11, 2016, 1:32 a.m. UTC | #3
> 

> Unfortunately I found one issue: if PF is down, and then the VF on the guest is

> down as well (ip link down) and then goes back up before the PF, then calling

> rte_eth_dev_reset will return 0 (success), even though the PF is still down and it

> should fail. This is with ixgbe. Any idea what could be the problem?

I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the  registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again. 
But on igb, when PF link is down, we cannot reset VF link successfully, so when the PF link is up, we cannot receive the message. No trigger for us to reset the VF link again. That's why on igb we have to try again and again until it succeed, means until PF link is up.
So the return 0 by rte_eth_dev_reset means the resetting succeeded, not mean the rx/tx is ready. Rx/tx has to depend on the PF link is up.

> 

> --

> Kind regards,

> Luca Boccassi
Luca Boccassi July 11, 2016, 12:02 p.m. UTC | #4
On Mon, 2016-07-11 at 01:32 +0000, Lu, Wenzhuo wrote:
> > 

> > Unfortunately I found one issue: if PF is down, and then the VF on the guest is

> > down as well (ip link down) and then goes back up before the PF, then calling

> > rte_eth_dev_reset will return 0 (success), even though the PF is still down and it

> > should fail. This is with ixgbe. Any idea what could be the problem?

> I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the  registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.

> It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again. 


What message do you refer to here? I am seeing the RESET callback only
when the PF goes down, not when it goes up.

At the moment, with ixgbe, this happens:

PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
VF link has no-carrier, and traffic does NOT go through

The problem is that there is just no way of being notified that PF is
up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
need to run it again.

> But on igb, when PF link is down, we cannot reset VF link successfully, so when the PF link is up, we cannot receive the message. No trigger for us to reset the VF link again. That's why on igb we have to try again and again until it succeed, means until PF link is up.

> So the return 0 by rte_eth_dev_reset means the resetting succeeded, not mean the rx/tx is ready. Rx/tx has to depend on the PF link is up.


-- 
Kind regards,
Luca Boccassi

Patch
diff mbox

--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -260,7 +260,7 @@  static void eth_igb_configure_msix_intr(
 static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 					void *param);
 static void igbvf_mbx_process(struct rte_eth_dev *dev);
-static int igbvf_dev_reset(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev, int blocking);
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -2598,7 +2598,7 @@  void igbvf_mbx_process(struct rte_eth_de
 }
 
 static int
-igbvf_dev_reset(struct rte_eth_dev *dev)
+igbvf_dev_reset(struct rte_eth_dev *dev, __rte_unused int blocking)
 {
 	struct e1000_hw *hw =
 		 E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -2626,12 +2626,12 @@  igbvf_dev_reset(struct rte_eth_dev *dev)
 		 rte_delay_ms(1000);
 
 		 diag = igbvf_dev_start(dev);
+		 dev->data->dev_started = 1;
 		 if (diag) {
 			  PMD_INIT_LOG(ERR, "Igb VF reset: "
 					 "Failed to start device.");
-			  return diag;
+			  return -EAGAIN;
 		 }
-		 dev->data->dev_started = 1;
 		 eth_igbvf_stats_reset(dev);
 		 if (dev->data->dev_conf.intr_conf.lsc == 0)
 			  diag = eth_igb_link_update(dev, 0);
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,7 +157,7 @@  static int i40evf_dev_init(struct rte_et
 static void i40evf_dev_close(struct rte_eth_dev *dev);
 static int i40evf_dev_start(struct rte_eth_dev *dev);
 static int i40evf_dev_configure(struct rte_eth_dev *dev);
-static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev, int blocking);
 
 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -1498,7 +1498,7 @@  i40e_vf_reset_dev(struct rte_eth_dev *de
 }
 
 static int
-i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+i40evf_handle_vf_reset(struct rte_eth_dev *dev, __rte_unused int blocking)
 {
 	struct i40e_adapter *adapter =
 		 I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
@@ -1518,7 +1518,7 @@  i40evf_emulate_vf_reset(uint8_t port_id)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 
-	i40evf_handle_vf_reset(dev);
+	i40evf_handle_vf_reset(dev, 0);
 }
 
 static int
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -379,7 +379,7 @@  static void ixgbevf_dev_interrupt_handle
 		(r) = (h)->bitmap[idx] >> bit & 1;\
 	}while(0)
 
-static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev, int blocking);
 
 /*
  * The set of PCI devices this driver supports
@@ -6227,7 +6227,7 @@  static void ixgbevf_mbx_process(struct r
 }
 
 static int
-ixgbevf_dev_reset(struct rte_eth_dev *dev)
+ixgbevf_dev_reset(struct rte_eth_dev *dev, int blocking)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	int diag = 0;
@@ -6256,7 +6256,12 @@  ixgbevf_dev_reset(struct rte_eth_dev *de
 		 if (diag) {
 			  PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
 					 "Failed to start device.");
-			  continue;
+			if (blocking)
+				continue;
+			else {
+				dev->data->dev_started = 1;
+				return -EAGAIN;
+			}
 		 }
 		 dev->data->dev_started = 1;
 		 ixgbevf_dev_stats_reset(dev);
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3370,7 +3370,7 @@  rte_eth_copy_pci_info(struct rte_eth_dev
 }
 
 int
-rte_eth_dev_reset(uint8_t port_id)
+rte_eth_dev_reset(uint8_t port_id, int blocking)
 {
 	struct rte_eth_dev *dev;
 	int diag;
@@ -3381,7 +3381,7 @@  rte_eth_dev_reset(uint8_t port_id)
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
 
-	diag = (*dev->dev_ops->dev_reset)(dev);
+	diag = (*dev->dev_ops->dev_reset)(dev, blocking);
 
 	return diag;
 }
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1262,7 +1262,7 @@  typedef int (*eth_set_eeprom_t)(struct r
 				struct rte_dev_eeprom_info *info);
 /**< @internal Program eeprom data  */
 
-typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev, int blocking);
 /**< @internal Function used to reset a configured Ethernet device. */
 
 #ifdef RTE_NIC_BYPASS
@@ -3927,17 +3927,21 @@  rte_eth_dma_zone_reserve(const struct rt
  * queues, restart the port.
  * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
  * APP can drop the packets and release the buffer instead of sending them.
+ * This call will block until the PF is up again, unless blocking is false.
  *
  * @param port_id
  *   The port identifier of the Ethernet device.
+ * @param blocking
+ *   Whether or not to block if the PF is not yet UP.
  *
  * @return
  *   - (0) if successful.
  *   - (-ENODEV) if port identifier is invalid.
  *   - (-ENOTSUP) if hardware doesn't support this function.
+ *   - (-EAGAIN) if PF is not up and blocking was false.
  */
 int
-rte_eth_dev_reset(uint8_t port_id);
+rte_eth_dev_reset(uint8_t port_id, int blocking);
 
 #ifdef __cplusplus
 }