[2/2] virtio: fix PCI config err handling

Message ID 20180814143035.19640-2-bluca@debian.org (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [1/2] bus/pci: harmonize and document rte_pci_read_config return value |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Luca Boccassi Aug. 14, 2018, 2:30 p.m. UTC
  From: Brian Russell <brussell@brocade.com>

In virtio_read_caps, rte_pci_read_config returns the number of bytes
read from PCI config or < 0 on error.
If less than the expected number of bytes are read then log the
failure and return rather than carrying on with garbage.

Signed-off-by: Brian Russell <brussell@brocade.com>
---

Follow-up from:
http://mails.dpdk.org/archives/dev/2017-June/067278.html
https://patches.dpdk.org/patch/25056/

 drivers/net/virtio/virtio_pci.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
  

Comments

Tiwei Bie Aug. 15, 2018, 3:11 a.m. UTC | #1
On Tue, Aug 14, 2018 at 03:30:35PM +0100, Luca Boccassi wrote:
> From: Brian Russell <brussell@brocade.com>
> 
> In virtio_read_caps, rte_pci_read_config returns the number of bytes
> read from PCI config or < 0 on error.
> If less than the expected number of bytes are read then log the
> failure and return rather than carrying on with garbage.

Is this a fix or an improvement?
Or did you see anything broken without this patch?
If so, we may need a fixes line and Cc stable.

> 
> Signed-off-by: Brian Russell <brussell@brocade.com>
> ---
> 
> Follow-up from:
> http://mails.dpdk.org/archives/dev/2017-June/067278.html
> https://patches.dpdk.org/patch/25056/
> 
>  drivers/net/virtio/virtio_pci.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
> index 6bd22e54a6..a10698aed8 100644
> --- a/drivers/net/virtio/virtio_pci.c
> +++ b/drivers/net/virtio/virtio_pci.c
> @@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
>  	}
>  
>  	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
> -	if (ret < 0) {
> -		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
> +	if (ret != 1) {
> +		PMD_INIT_LOG(DEBUG,
> +			     "failed to read pci capability list, ret %d", ret);
>  		return -1;
>  	}
>  
>  	while (pos) {
>  		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
> -		if (ret < 0) {
> -			PMD_INIT_LOG(ERR,
> -				"failed to read pci cap at pos: %x", pos);
> +		if (ret != sizeof(cap)) {
> +			PMD_INIT_LOG(DEBUG,

Why change the log level to DEBUG?

Thanks

> +				     "failed to read pci cap at pos: %x ret %d",
> +				     pos, ret);
>  			break;
>  		}
>  
> -- 
> 2.18.0
>
  
Luca Boccassi Aug. 15, 2018, 9:50 a.m. UTC | #2
On Wed, 2018-08-15 at 11:11 +0800, Tiwei Bie wrote:
> On Tue, Aug 14, 2018 at 03:30:35PM +0100, Luca Boccassi wrote:
> > From: Brian Russell <brussell@brocade.com>
> > 
> > In virtio_read_caps, rte_pci_read_config returns the number of
> > bytes
> > read from PCI config or < 0 on error.
> > If less than the expected number of bytes are read then log the
> > failure and return rather than carrying on with garbage.
> 
> Is this a fix or an improvement?
> Or did you see anything broken without this patch?
> If so, we may need a fixes line and Cc stable.

It is a fix, as it was creating problems in production due to the
constant flux of errors in the logs.
But given patch 1/2 is effectively doing a small change in the BSD bus
API, and it's a requirement for 2/2, I don't think we can include it in
the stable releases unfortunately.

> > 
> > Signed-off-by: Brian Russell <brussell@brocade.com>
> > ---
> > 
> > Follow-up from:
> > http://mails.dpdk.org/archives/dev/2017-June/067278.html
> > https://patches.dpdk.org/patch/25056/
> > 
> >  drivers/net/virtio/virtio_pci.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtio_pci.c
> > b/drivers/net/virtio/virtio_pci.c
> > index 6bd22e54a6..a10698aed8 100644
> > --- a/drivers/net/virtio/virtio_pci.c
> > +++ b/drivers/net/virtio/virtio_pci.c
> > @@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device *dev,
> > struct virtio_hw *hw)
> >  	}
> >  
> >  	ret = rte_pci_read_config(dev, &pos, 1,
> > PCI_CAPABILITY_LIST);
> > -	if (ret < 0) {
> > -		PMD_INIT_LOG(DEBUG, "failed to read pci capability
> > list");
> > +	if (ret != 1) {
> > +		PMD_INIT_LOG(DEBUG,
> > +			     "failed to read pci capability list,
> > ret %d", ret);
> >  		return -1;
> >  	}
> >  
> >  	while (pos) {
> >  		ret = rte_pci_read_config(dev, &cap, sizeof(cap),
> > pos);
> > -		if (ret < 0) {
> > -			PMD_INIT_LOG(ERR,
> > -				"failed to read pci cap at pos:
> > %x", pos);
> > +		if (ret != sizeof(cap)) {
> > +			PMD_INIT_LOG(DEBUG,
> 
> Why change the log level to DEBUG?
> 
> Thanks

Beforehand reading less than the required amount of bytes caused
problems in the following code, so it warranted printing errors - but
now it will not go ahead without the right amount of data, so it's not
critical anymore to inform the user.
Main issue is, log will get very spammy with errors, and paying
customers don't like that :-)
  
Tiwei Bie Aug. 16, 2018, 6:46 a.m. UTC | #3
On Wed, Aug 15, 2018 at 10:50:57AM +0100, Luca Boccassi wrote:
> On Wed, 2018-08-15 at 11:11 +0800, Tiwei Bie wrote:
> > On Tue, Aug 14, 2018 at 03:30:35PM +0100, Luca Boccassi wrote:
> > > From: Brian Russell <brussell@brocade.com>
> > > 
> > > In virtio_read_caps, rte_pci_read_config returns the number of
> > > bytes
> > > read from PCI config or < 0 on error.
> > > If less than the expected number of bytes are read then log the
> > > failure and return rather than carrying on with garbage.
> > 
> > Is this a fix or an improvement?
> > Or did you see anything broken without this patch?
> > If so, we may need a fixes line and Cc stable.
> 
> It is a fix, as it was creating problems in production due to the
> constant flux of errors in the logs.

Could you be a bit more specific about which errors
were logged if possible?

If my understanding is correct, you mean the errors
were logged because less than the required amount of
bytes were read?

> But given patch 1/2 is effectively doing a small change in the BSD bus
> API, and it's a requirement for 2/2, I don't think we can include it in
> the stable releases unfortunately.

If it's a fix, we need a fixes line.

> 
> > > 
[...]
> > > @@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device *dev,
> > > struct virtio_hw *hw)
> > >  	}
> > >  
> > >  	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
> > > -	if (ret < 0) {
> > > -		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
> > > +	if (ret != 1) {
> > > +		PMD_INIT_LOG(DEBUG,
> > > +			     "failed to read pci capability list, ret %d", ret);
> > >  		return -1;
> > >  	}
> > >  
> > >  	while (pos) {
> > >  		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
> > > -		if (ret < 0) {
> > > -			PMD_INIT_LOG(ERR,
> > > -				"failed to read pci cap at pos: %x", pos);
> > > +		if (ret != sizeof(cap)) {

Above code has to successfully read a full virtio
PCI capability during each read, otherwise it will
give up reading other capabilities and may fallback
to the legacy mode. In which case it will fail to
read the requested amount of bytes? Should we try
to read the generic PCI fields first?

Besides, you also need to update other calls to
rte_pci_read_config(), e.g.:

https://github.com/DPDK/dpdk/blob/76b9d9de5c7d/drivers/net/virtio/virtio_pci.c#L696

Thanks

> > > +			PMD_INIT_LOG(DEBUG,
> > 
> > Why change the log level to DEBUG?
> > 
> > Thanks
> 
> Beforehand reading less than the required amount of bytes caused
> problems in the following code, so it warranted printing errors - but
> now it will not go ahead without the right amount of data, so it's not
> critical anymore to inform the user.
> Main issue is, log will get very spammy with errors, and paying
> customers don't like that :-)
> 
> -- 
> Kind regards,
> Luca Boccassi
  
Luca Boccassi Aug. 16, 2018, 10:27 a.m. UTC | #4
On Thu, 2018-08-16 at 14:46 +0800, Tiwei Bie wrote:
> On Wed, Aug 15, 2018 at 10:50:57AM +0100, Luca Boccassi wrote:
> > On Wed, 2018-08-15 at 11:11 +0800, Tiwei Bie wrote:
> > > On Tue, Aug 14, 2018 at 03:30:35PM +0100, Luca Boccassi wrote:
> > > > From: Brian Russell <brussell@brocade.com>
> > > > 
> > > > In virtio_read_caps, rte_pci_read_config returns the number of
> > > > bytes
> > > > read from PCI config or < 0 on error.
> > > > If less than the expected number of bytes are read then log the
> > > > failure and return rather than carrying on with garbage.
> > > 
> > > Is this a fix or an improvement?
> > > Or did you see anything broken without this patch?
> > > If so, we may need a fixes line and Cc stable.
> > 
> > It is a fix, as it was creating problems in production due to the
> > constant flux of errors in the logs.
> 
> Could you be a bit more specific about which errors
> were logged if possible?
> 
> If my understanding is correct, you mean the errors
> were logged because less than the required amount of
> bytes were read?

Yes - rte_pci_read_config on Linux will return not just 0/-1, but the
actual number of bytes read. If it's less than the required amount, the
code then goes on and reads garbage, which causes errors later in the
execution. Checking that we actually got the amount of data we need
fixes this issue.

> > But given patch 1/2 is effectively doing a small change in the BSD
> > bus
> > API, and it's a requirement for 2/2, I don't think we can include
> > it in
> > the stable releases unfortunately.
> 
> If it's a fix, we need a fixes line.

Sure, will send a v2.

> > 
> > > > 
> 
> [...]
> > > > @@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device
> > > > *dev,
> > > > struct virtio_hw *hw)
> > > >  	}
> > > >  
> > > >  	ret = rte_pci_read_config(dev, &pos, 1,
> > > > PCI_CAPABILITY_LIST);
> > > > -	if (ret < 0) {
> > > > -		PMD_INIT_LOG(DEBUG, "failed to read pci
> > > > capability list");
> > > > +	if (ret != 1) {
> > > > +		PMD_INIT_LOG(DEBUG,
> > > > +			     "failed to read pci capability
> > > > list, ret %d", ret);
> > > >  		return -1;
> > > >  	}
> > > >  
> > > >  	while (pos) {
> > > >  		ret = rte_pci_read_config(dev, &cap,
> > > > sizeof(cap), pos);
> > > > -		if (ret < 0) {
> > > > -			PMD_INIT_LOG(ERR,
> > > > -				"failed to read pci cap at
> > > > pos: %x", pos);
> > > > +		if (ret != sizeof(cap)) {
> 
> Above code has to successfully read a full virtio
> PCI capability during each read, otherwise it will
> give up reading other capabilities and may fallback
> to the legacy mode. In which case it will fail to
> read the requested amount of bytes? Should we try
> to read the generic PCI fields first?

I do not know what exactly causes less than required bytes to be read,
but we have seen it happen in production (not 100% of the times though
- so I think it's worth keeping the structure as-is). As you said in
that case it falls back to legacy mode which, in our experience in
production deployments, then succeeds. That's why the error level print
is undesired - because the code will actually work via the fallback,
but the customers will see scary errors in the logs and open
escalations :-)

> Besides, you also need to update other calls to
> rte_pci_read_config(), e.g.:
> 
> https://github.com/DPDK/dpdk/blob/76b9d9de5c7d/drivers/net/virtio/vir
> tio_pci.c#L696
> 
> Thanks

Sure I will apply the same changes in v2.
  

Patch

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 6bd22e54a6..a10698aed8 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -567,16 +567,18 @@  virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
 	}
 
 	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
-	if (ret < 0) {
-		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+	if (ret != 1) {
+		PMD_INIT_LOG(DEBUG,
+			     "failed to read pci capability list, ret %d", ret);
 		return -1;
 	}
 
 	while (pos) {
 		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
-		if (ret < 0) {
-			PMD_INIT_LOG(ERR,
-				"failed to read pci cap at pos: %x", pos);
+		if (ret != sizeof(cap)) {
+			PMD_INIT_LOG(DEBUG,
+				     "failed to read pci cap at pos: %x ret %d",
+				     pos, ret);
 			break;
 		}