bus/vmbus: fix race in sub channel creation

Message ID 20181130202457.10888-1-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series bus/vmbus: fix race in sub channel creation |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/checkpatch warning coding style issues

Commit Message

Stephen Hemminger Nov. 30, 2018, 8:24 p.m. UTC
  When using multiple queues, there was a race with the kernel
in setting up the second channel. This is do due to a kernel change
whiche does not allow accessing sysfs files for Hyper-V
channels that are not opened.

The fix is simple, just move the logic to detect not ready
sub channels earlier in the existing loop.

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Reported-by:Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/bus/vmbus/linux/vmbus_uio.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)
  

Comments

Mohammed Gamal Dec. 3, 2018, 6:02 a.m. UTC | #1
On Fri, 2018-11-30 at 12:24 -0800, Stephen Hemminger wrote:
> When using multiple queues, there was a race with the kernel
> in setting up the second channel. This is do due to a kernel change
> whiche does not allow accessing sysfs files for Hyper-V
> channels that are not opened.
> 
> The fix is simple, just move the logic to detect not ready
> sub channels earlier in the existing loop.
> 
> Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> Reported-by:Mohammed Gamal <mgamal@redhat.com>
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  drivers/bus/vmbus/linux/vmbus_uio.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c
> b/drivers/bus/vmbus/linux/vmbus_uio.c
> index 12e97e3a420a..38df4d724ed5 100644
> --- a/drivers/bus/vmbus/linux/vmbus_uio.c
> +++ b/drivers/bus/vmbus/linux/vmbus_uio.c
> @@ -357,6 +357,12 @@ int vmbus_uio_get_subchan(struct vmbus_channel
> *primary,
>  			continue;
>  		}
>  
> +		if (!vmbus_isnew_subchannel(primary, relid))
> +			continue;	/* Already know about you
> */
> +
> +		if (!vmbus_uio_ring_present(dev, relid))
> +			continue;	/* Ring may not be ready
> yet */
> +
>  		snprintf(subchan_path, sizeof(subchan_path),
> "%s/%lu",
>  			 chan_path, relid);
>  		err = vmbus_uio_sysfs_read(subchan_path,
> "subchannel_id",
> @@ -370,12 +376,6 @@ int vmbus_uio_get_subchan(struct vmbus_channel
> *primary,
>  		if (subid == 0)
>  			continue;	/* skip primary channel */
>  
> -		if (!vmbus_isnew_subchannel(primary, relid))
> -			continue;
> -
> -		if (!vmbus_uio_ring_present(dev, relid))
> -			continue;	/* Ring may not be ready
> yet */
> -
>  		err = vmbus_uio_sysfs_read(subchan_path,
> "monitor_id",
>  					   &monid, UINT8_MAX);
>  		if (err) {

With this patch I am now getting the following error:
[...]
Configuring Port 0 (socket 0)
hn_dev_configure():  >>
hn_rndis_link_status(): link status 0x40020006
hn_subchan_configure(): open 1 subchannels
hn_subchan_configure(): open subchannel failed: -2
hn_dev_configure(): subchannel configuration failed
Port0 dev_configure = -2
hn_dev_rx_queue_release():  >>
hn_dev_rx_queue_release():  >>
hn_dev_tx_queue_release():  >>
hn_dev_tx_queue_release():  >>
Fail to configure port 0
EAL: Error - exiting with code: 1
  Cause: Start ports failed

Apparently, no subchannels were ready. Anything I may have missed or
misconfigured?

Regards,
Mohammed
  
Stephen Hemminger Dec. 3, 2018, 4:48 p.m. UTC | #2
On Mon, 03 Dec 2018 07:02:55 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> On Fri, 2018-11-30 at 12:24 -0800, Stephen Hemminger wrote:
> > When using multiple queues, there was a race with the kernel
> > in setting up the second channel. This is do due to a kernel change
> > whiche does not allow accessing sysfs files for Hyper-V
> > channels that are not opened.
> > 
> > The fix is simple, just move the logic to detect not ready
> > sub channels earlier in the existing loop.
> > 
> > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> > Reported-by:Mohammed Gamal <mgamal@redhat.com>
> > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > ---
> >  drivers/bus/vmbus/linux/vmbus_uio.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c
> > b/drivers/bus/vmbus/linux/vmbus_uio.c
> > index 12e97e3a420a..38df4d724ed5 100644
> > --- a/drivers/bus/vmbus/linux/vmbus_uio.c
> > +++ b/drivers/bus/vmbus/linux/vmbus_uio.c
> > @@ -357,6 +357,12 @@ int vmbus_uio_get_subchan(struct vmbus_channel
> > *primary,
> >  			continue;
> >  		}
> >  
> > +		if (!vmbus_isnew_subchannel(primary, relid))
> > +			continue;	/* Already know about you
> > */
> > +
> > +		if (!vmbus_uio_ring_present(dev, relid))
> > +			continue;	/* Ring may not be ready
> > yet */
> > +
> >  		snprintf(subchan_path, sizeof(subchan_path),
> > "%s/%lu",
> >  			 chan_path, relid);
> >  		err = vmbus_uio_sysfs_read(subchan_path,
> > "subchannel_id",
> > @@ -370,12 +376,6 @@ int vmbus_uio_get_subchan(struct vmbus_channel
> > *primary,
> >  		if (subid == 0)
> >  			continue;	/* skip primary channel */
> >  
> > -		if (!vmbus_isnew_subchannel(primary, relid))
> > -			continue;
> > -
> > -		if (!vmbus_uio_ring_present(dev, relid))
> > -			continue;	/* Ring may not be ready
> > yet */
> > -
> >  		err = vmbus_uio_sysfs_read(subchan_path,
> > "monitor_id",
> >  					   &monid, UINT8_MAX);
> >  		if (err) {  
> 
> With this patch I am now getting the following error:
> [...]
> Configuring Port 0 (socket 0)
> hn_dev_configure():  >>  
> hn_rndis_link_status(): link status 0x40020006
> hn_subchan_configure(): open 1 subchannels
> hn_subchan_configure(): open subchannel failed: -2
> hn_dev_configure(): subchannel configuration failed
> Port0 dev_configure = -2
> hn_dev_rx_queue_release():  >>
> hn_dev_rx_queue_release():  >>
> hn_dev_tx_queue_release():  >>
> hn_dev_tx_queue_release():  >>  
> Fail to configure port 0
> EAL: Error - exiting with code: 1
>   Cause: Start ports failed
> 
> Apparently, no subchannels were ready. Anything I may have missed or
> misconfigured?
> 
> Regards,
> Mohammed

Could you check the kernel log?

The way sub channel configuration works is that the userspace code in DPDK
sends a message to the hypervisor that it would like N subchannels, then
the response from the hypervisor is processed by the kernel causing sysfs
files to be created. Meanwhile the userspace is polling waiting for the
sysfs files to show up (for 10 seconds). You could increas the timeout or
go looking in the sysfs directory  to see what is present.

There is no good way to handle errors here, the hypervisor doesn't really
give much feedback.
  
Mohammed Gamal Dec. 4, 2018, 11:59 a.m. UTC | #3
On Mon, 2018-12-03 at 08:48 -0800, Stephen Hemminger wrote:
> On Mon, 03 Dec 2018 07:02:55 +0100
> Mohammed Gamal <mgamal@redhat.com> wrote:
> 
> > On Fri, 2018-11-30 at 12:24 -0800, Stephen Hemminger wrote:
> > > When using multiple queues, there was a race with the kernel
> > > in setting up the second channel. This is do due to a kernel
> > > change
> > > whiche does not allow accessing sysfs files for Hyper-V
> > > channels that are not opened.
> > > 
> > > The fix is simple, just move the logic to detect not ready
> > > sub channels earlier in the existing loop.
> > > 
> > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus
> > > support")
> > > Reported-by:Mohammed Gamal <mgamal@redhat.com>
> > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > > ---
> > >  drivers/bus/vmbus/linux/vmbus_uio.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c
> > > b/drivers/bus/vmbus/linux/vmbus_uio.c
> > > index 12e97e3a420a..38df4d724ed5 100644
> > > --- a/drivers/bus/vmbus/linux/vmbus_uio.c
> > > +++ b/drivers/bus/vmbus/linux/vmbus_uio.c
> > > @@ -357,6 +357,12 @@ int vmbus_uio_get_subchan(struct
> > > vmbus_channel
> > > *primary,
> > >  			continue;
> > >  		}
> > >  
> > > +		if (!vmbus_isnew_subchannel(primary, relid))
> > > +			continue;	/* Already know about
> > > you
> > > */
> > > +
> > > +		if (!vmbus_uio_ring_present(dev, relid))
> > > +			continue;	/* Ring may not be
> > > ready
> > > yet */
> > > +
> > >  		snprintf(subchan_path, sizeof(subchan_path),
> > > "%s/%lu",
> > >  			 chan_path, relid);
> > >  		err = vmbus_uio_sysfs_read(subchan_path,
> > > "subchannel_id",
> > > @@ -370,12 +376,6 @@ int vmbus_uio_get_subchan(struct
> > > vmbus_channel
> > > *primary,
> > >  		if (subid == 0)
> > >  			continue;	/* skip primary channel
> > > */
> > >  
> > > -		if (!vmbus_isnew_subchannel(primary, relid))
> > > -			continue;
> > > -
> > > -		if (!vmbus_uio_ring_present(dev, relid))
> > > -			continue;	/* Ring may not be
> > > ready
> > > yet */
> > > -
> > >  		err = vmbus_uio_sysfs_read(subchan_path,
> > > "monitor_id",
> > >  					   &monid, UINT8_MAX);
> > >  		if (err) {  
> > 
> > With this patch I am now getting the following error:
> > [...]
> > Configuring Port 0 (socket 0)
> > hn_dev_configure():  >>  
> > hn_rndis_link_status(): link status 0x40020006
> > hn_subchan_configure(): open 1 subchannels
> > hn_subchan_configure(): open subchannel failed: -2
> > hn_dev_configure(): subchannel configuration failed
> > Port0 dev_configure = -2
> > hn_dev_rx_queue_release():  >>
> > hn_dev_rx_queue_release():  >>
> > hn_dev_tx_queue_release():  >>
> > hn_dev_tx_queue_release():  >>  
> > Fail to configure port 0
> > EAL: Error - exiting with code: 1
> >   Cause: Start ports failed
> > 
> > Apparently, no subchannels were ready. Anything I may have missed
> > or
> > misconfigured?
> > 
> > Regards,
> > Mohammed
> 
> Could you check the kernel log?
> 
I did. No relevant messages seem to be there.

> The way sub channel configuration works is that the userspace code in
> DPDK
> sends a message to the hypervisor that it would like N subchannels,
> then
> the response from the hypervisor is processed by the kernel causing
> sysfs
> files to be created. Meanwhile the userspace is polling waiting for
> the
> sysfs files to show up (for 10 seconds). You could increas the
> timeout or
> go looking in the sysfs directory  to see what is present.

Tried increasing that up to 100 seconds, still nothing. Could it be a
problem on my host? The VM I am using is on a local hyper-v instance.
 
> 
> There is no good way to handle errors here, the hypervisor doesn't
> really
> give much feedback.
  

Patch

diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c b/drivers/bus/vmbus/linux/vmbus_uio.c
index 12e97e3a420a..38df4d724ed5 100644
--- a/drivers/bus/vmbus/linux/vmbus_uio.c
+++ b/drivers/bus/vmbus/linux/vmbus_uio.c
@@ -357,6 +357,12 @@  int vmbus_uio_get_subchan(struct vmbus_channel *primary,
 			continue;
 		}
 
+		if (!vmbus_isnew_subchannel(primary, relid))
+			continue;	/* Already know about you */
+
+		if (!vmbus_uio_ring_present(dev, relid))
+			continue;	/* Ring may not be ready yet */
+
 		snprintf(subchan_path, sizeof(subchan_path), "%s/%lu",
 			 chan_path, relid);
 		err = vmbus_uio_sysfs_read(subchan_path, "subchannel_id",
@@ -370,12 +376,6 @@  int vmbus_uio_get_subchan(struct vmbus_channel *primary,
 		if (subid == 0)
 			continue;	/* skip primary channel */
 
-		if (!vmbus_isnew_subchannel(primary, relid))
-			continue;
-
-		if (!vmbus_uio_ring_present(dev, relid))
-			continue;	/* Ring may not be ready yet */
-
 		err = vmbus_uio_sysfs_read(subchan_path, "monitor_id",
 					   &monid, UINT8_MAX);
 		if (err) {