bus/pci: fix mmap PCI resource

Message ID 20200708092435.9776-1-alvinx.zhang@intel.com (mailing list archive)
State Rejected, archived
Delegated to: David Marchand
Headers
Series bus/pci: fix mmap PCI resource |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot success Travis build: passed
ci/iol-broadcom-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Alvin Zhang July 8, 2020, 9:24 a.m. UTC
  From: Alvin Zhang <alvinx.zhang@intel.com>

When mapping a PCI BAR containing an MSI-X table, some devices do not
need to actually map this BAR or only need to map part of them, which
may cause the mapping to fail. Now some checks are added and a non-NULL
initial value is set to the variable to avoid this situation.

Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
Cc: talshn@mellanox.com

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
  

Comments

David Marchand July 8, 2020, 9:38 a.m. UTC | #1
On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
>
> From: Alvin Zhang <alvinx.zhang@intel.com>
>
> When mapping a PCI BAR containing an MSI-X table, some devices do not
> need to actually map this BAR or only need to map part of them, which
> may cause the mapping to fail. Now some checks are added and a non-NULL
> initial value is set to the variable to avoid this situation.
>
> Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> Cc: talshn@mellanox.com
>
> Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>


I guess this had to do with:
https://bugs.dpdk.org/show_bug.cgi?id=503

Review please.
  
Chenbo Xia July 8, 2020, 1:58 p.m. UTC | #2
Hi Alvin,

CC the maintainers. Comments below.

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of alvinx.zhang@intel.com
> Sent: Wednesday, July 8, 2020 5:25 PM
> To: dev@dpdk.org
> Cc: Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>
> Subject: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> 
> From: Alvin Zhang <alvinx.zhang@intel.com>
> 
> When mapping a PCI BAR containing an MSI-X table, some devices do not need
> to actually map this BAR or only need to map part of them, which may cause the
> mapping to fail. Now some checks are added and a non-NULL initial value is set
> to the variable to avoid this situation.
> 
> Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> Cc: talshn@mellanox.com
> 
> Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index fdeb9a8..9143bfc 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -547,6 +547,14 @@
>  			bar_index,
>  			memreg[0].offset, memreg[0].size,
>  			memreg[1].offset, memreg[1].size);
> +
> +		if (memreg[0].size == 0 && memreg[1].size == 0) {
> +			/* No need to map this BAR */
> +			RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> +			bar->size = 0;
> +			bar->addr = 0;
> +			return 0;
> +		}

I'm not sure if this corner case will happen. If you confirmed it,
Just ignore this.

>  	} else {
>  		memreg[0].offset = bar->offset;
>  		memreg[0].size = bar->size;
> @@ -556,7 +564,9 @@
>  	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
>  			MAP_ANONYMOUS | additional_flags, -1, 0);
>  	if (bar_addr != MAP_FAILED) {
> -		void *map_addr = NULL;
> +		/* Set non NULL initial value for in case of no PCI mapping */
> +		void *map_addr = bar_addr;
> +

I see the issue that this patch wants to fix is based on an old kernel.
In older vfio-pci kernel module, MSI related reg cannot be mmaped
in userspace while in newer kernel it can be. That's why sometimes
it cannot be reproduced (https://bugs.dpdk.org/show_bug.cgi?id=503)

So under the condition of old kernel, there could be an example that
Memreg 0 has size 0 but Memreg 1 has non-zero size, which leads to
Memreg 1 cannot be mmaped.

So I'm fine with this part of code change. As this issue is blocking test,
we should do fast confirm and review.

Thanks,
Chenbo

>  		if (memreg[0].size) {
>  			/* actual map of first part */
>  			map_addr = pci_map_resource(bar_addr, vfio_dev_fd,
> --
> 1.8.3.1
  
Xiao, QimaiX July 9, 2020, 2:25 a.m. UTC | #3
Tested-by: Xiao Qimai <qimaix.xiao@intel.com>

Regards,
Xiao Qimai

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of alvinx.zhang@intel.com
> Sent: Wednesday, July 8, 2020 5:25 PM
> To: dev@dpdk.org
> Cc: Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>
> Subject: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> 
> From: Alvin Zhang <alvinx.zhang@intel.com>
> 
> When mapping a PCI BAR containing an MSI-X table, some devices do not
> need to actually map this BAR or only need to map part of them, which may
> cause the mapping to fail. Now some checks are added and a non-NULL initial
> value is set to the variable to avoid this situation.
> 
> Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> Cc: talshn@mellanox.com
> 
> Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index fdeb9a8..9143bfc 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -547,6 +547,14 @@
>  			bar_index,
>  			memreg[0].offset, memreg[0].size,
>  			memreg[1].offset, memreg[1].size);
> +
> +		if (memreg[0].size == 0 && memreg[1].size == 0) {
> +			/* No need to map this BAR */
> +			RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n",
> bar_index);
> +			bar->size = 0;
> +			bar->addr = 0;
> +			return 0;
> +		}
>  	} else {
>  		memreg[0].offset = bar->offset;
>  		memreg[0].size = bar->size;
> @@ -556,7 +564,9 @@
>  	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
>  			MAP_ANONYMOUS | additional_flags, -1, 0);
>  	if (bar_addr != MAP_FAILED) {
> -		void *map_addr = NULL;
> +		/* Set non NULL initial value for in case of no PCI mapping */
> +		void *map_addr = bar_addr;
> +
>  		if (memreg[0].size) {
>  			/* actual map of first part */
>  			map_addr = pci_map_resource(bar_addr,
> vfio_dev_fd,
> --
> 1.8.3.1
  
Alvin Zhang July 9, 2020, 5:13 a.m. UTC | #4
Hi Chenbo,

Thanks your comments.

> -----Original Message-----
> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Wednesday, July 8, 2020 9:58 PM
> To: Zhang, AlvinX <alvinx.zhang@intel.com>; dev@dpdk.org
> Cc: Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>; David
> Marchand <david.marchand@redhat.com>; Tal Shnaiderman
> <talshn@mellanox.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> thomas@monjalon.net
> Subject: RE: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> 
> Hi Alvin,
> 
> CC the maintainers. Comments below.
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of alvinx.zhang@intel.com
> > Sent: Wednesday, July 8, 2020 5:25 PM
> > To: dev@dpdk.org
> > Cc: Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>
> > Subject: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> >
> > From: Alvin Zhang <alvinx.zhang@intel.com>
> >
> > When mapping a PCI BAR containing an MSI-X table, some devices do not
> > need to actually map this BAR or only need to map part of them, which
> > may cause the mapping to fail. Now some checks are added and a
> > non-NULL initial value is set to the variable to avoid this situation.
> >
> > Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> > Cc: talshn@mellanox.com
> >
> > Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> > ---
> >  drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/bus/pci/linux/pci_vfio.c
> > b/drivers/bus/pci/linux/pci_vfio.c
> > index fdeb9a8..9143bfc 100644
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -547,6 +547,14 @@
> >  			bar_index,
> >  			memreg[0].offset, memreg[0].size,
> >  			memreg[1].offset, memreg[1].size);
> > +
> > +		if (memreg[0].size == 0 && memreg[1].size == 0) {
> > +			/* No need to map this BAR */
> > +			RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> > +			bar->size = 0;
> > +			bar->addr = 0;
> > +			return 0;
> > +		}
> 
> I'm not sure if this corner case will happen. If you confirmed it, Just ignore this.

In theory, it is entirely possible if the misx-table size is equal to the bar size.

> 
> >  	} else {
> >  		memreg[0].offset = bar->offset;
> >  		memreg[0].size = bar->size;
> > @@ -556,7 +564,9 @@
> >  	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> >  			MAP_ANONYMOUS | additional_flags, -1, 0);
> >  	if (bar_addr != MAP_FAILED) {
> > -		void *map_addr = NULL;
> > +		/* Set non NULL initial value for in case of no PCI mapping */
> > +		void *map_addr = bar_addr;
> > +
> 
> I see the issue that this patch wants to fix is based on an old kernel.
> In older vfio-pci kernel module, MSI related reg cannot be mmaped in userspace
> while in newer kernel it can be. That's why sometimes it cannot be reproduced
> (https://bugs.dpdk.org/show_bug.cgi?id=503)
> 
> So under the condition of old kernel, there could be an example that Memreg 0
> has size 0 but Memreg 1 has non-zero size, which leads to Memreg 1 cannot be
> mmaped.

Yes, it is.

> 
> So I'm fine with this part of code change. As this issue is blocking test, we should
> do fast confirm and review.
> 
> Thanks,
> Chenbo
> 
> >  		if (memreg[0].size) {
> >  			/* actual map of first part */
> >  			map_addr = pci_map_resource(bar_addr, vfio_dev_fd,
> > --
> > 1.8.3.1
  
Chenbo Xia July 9, 2020, 5:21 a.m. UTC | #5
> -----Original Message-----
> From: Zhang, AlvinX <alvinx.zhang@intel.com>
> Sent: Thursday, July 9, 2020 1:14 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org
> Cc: David Marchand <david.marchand@redhat.com>; Tal Shnaiderman
> <talshn@mellanox.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> thomas@monjalon.net
> Subject: RE: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> 
> Hi Chenbo,
> 
> Thanks your comments.
> 
> > -----Original Message-----
> > From: Xia, Chenbo <chenbo.xia@intel.com>
> > Sent: Wednesday, July 8, 2020 9:58 PM
> > To: Zhang, AlvinX <alvinx.zhang@intel.com>; dev@dpdk.org
> > Cc: Xing, Beilei <beilei.xing@intel.com>; Guo, Jia
> > <jia.guo@intel.com>; David Marchand <david.marchand@redhat.com>; Tal
> > Shnaiderman <talshn@mellanox.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; thomas@monjalon.net
> > Subject: RE: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> >
> > Hi Alvin,
> >
> > CC the maintainers. Comments below.
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of alvinx.zhang@intel.com
> > > Sent: Wednesday, July 8, 2020 5:25 PM
> > > To: dev@dpdk.org
> > > Cc: Xing, Beilei <beilei.xing@intel.com>; Guo, Jia
> > > <jia.guo@intel.com>
> > > Subject: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> > >
> > > From: Alvin Zhang <alvinx.zhang@intel.com>
> > >
> > > When mapping a PCI BAR containing an MSI-X table, some devices do
> > > not need to actually map this BAR or only need to map part of them,
> > > which may cause the mapping to fail. Now some checks are added and a
> > > non-NULL initial value is set to the variable to avoid this situation.
> > >
> > > Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> > > Cc: talshn@mellanox.com
> > >
> > > Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> > > ---
> > >  drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/bus/pci/linux/pci_vfio.c
> > > b/drivers/bus/pci/linux/pci_vfio.c
> > > index fdeb9a8..9143bfc 100644
> > > --- a/drivers/bus/pci/linux/pci_vfio.c
> > > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > > @@ -547,6 +547,14 @@
> > >  			bar_index,
> > >  			memreg[0].offset, memreg[0].size,
> > >  			memreg[1].offset, memreg[1].size);
> > > +
> > > +		if (memreg[0].size == 0 && memreg[1].size == 0) {
> > > +			/* No need to map this BAR */
> > > +			RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> > > +			bar->size = 0;
> > > +			bar->addr = 0;
> > > +			return 0;
> > > +		}
> >
> > I'm not sure if this corner case will happen. If you confirmed it, Just ignore this.
> 
> In theory, it is entirely possible if the misx-table size is equal to the bar size.
> 
> >
> > >  	} else {
> > >  		memreg[0].offset = bar->offset;
> > >  		memreg[0].size = bar->size;
> > > @@ -556,7 +564,9 @@
> > >  	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> > >  			MAP_ANONYMOUS | additional_flags, -1, 0);
> > >  	if (bar_addr != MAP_FAILED) {
> > > -		void *map_addr = NULL;
> > > +		/* Set non NULL initial value for in case of no PCI mapping */
> > > +		void *map_addr = bar_addr;
> > > +
> >
> > I see the issue that this patch wants to fix is based on an old kernel.
> > In older vfio-pci kernel module, MSI related reg cannot be mmaped in
> > userspace while in newer kernel it can be. That's why sometimes it
> > cannot be reproduced
> > (https://bugs.dpdk.org/show_bug.cgi?id=503)
> >
> > So under the condition of old kernel, there could be an example that
> > Memreg 0 has size 0 but Memreg 1 has non-zero size, which leads to
> > Memreg 1 cannot be mmaped.
> 
> Yes, it is.
> 
> >
> > So I'm fine with this part of code change. As this issue is blocking
> > test, we should do fast confirm and review.
> >
> > Thanks,
> > Chenbo
> >
> > >  		if (memreg[0].size) {
> > >  			/* actual map of first part */
> > >  			map_addr = pci_map_resource(bar_addr, vfio_dev_fd,
> > > --
> > > 1.8.3.1

Acked-by: Chenbo Xia <chenbo.xia@intel.com>
  
David Marchand July 10, 2020, 9:54 a.m. UTC | #6
On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
>
> From: Alvin Zhang <alvinx.zhang@intel.com>
>
> When mapping a PCI BAR containing an MSI-X table, some devices do not
> need to actually map this BAR or only need to map part of them, which
> may cause the mapping to fail. Now some checks are added and a non-NULL
> initial value is set to the variable to avoid this situation.
>
> Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> Cc: talshn@mellanox.com
>
> Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index fdeb9a8..9143bfc 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -547,6 +547,14 @@
>                         bar_index,
>                         memreg[0].offset, memreg[0].size,
>                         memreg[1].offset, memreg[1].size);
> +
> +               if (memreg[0].size == 0 && memreg[1].size == 0) {
> +                       /* No need to map this BAR */
> +                       RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> +                       bar->size = 0;
> +                       bar->addr = 0;
> +                       return 0;
> +               }

We already have a check on bar size == 0.
Why would we have this condition?
Broken hw?


>         } else {
>                 memreg[0].offset = bar->offset;
>                 memreg[0].size = bar->size;
> @@ -556,7 +564,9 @@
>         bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
>                         MAP_ANONYMOUS | additional_flags, -1, 0);
>         if (bar_addr != MAP_FAILED) {
> -               void *map_addr = NULL;
> +               /* Set non NULL initial value for in case of no PCI mapping */
> +               void *map_addr = bar_addr;
> +

It took me some time to understand this code...
Anyway, we have a regression in the librte_pci.
This is where the fix should be.

We can cleanup this code later.

>                 if (memreg[0].size) {
>                         /* actual map of first part */
>                         map_addr = pci_map_resource(bar_addr, vfio_dev_fd,
> --
> 1.8.3.1
>


Thanks.
  
Thomas Monjalon July 10, 2020, 10:07 a.m. UTC | #7
10/07/2020 11:54, David Marchand:
> On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
> > From: Alvin Zhang <alvinx.zhang@intel.com>
> >
> > When mapping a PCI BAR containing an MSI-X table, some devices do not
> > need to actually map this BAR or only need to map part of them, which
> > may cause the mapping to fail. Now some checks are added and a non-NULL
> > initial value is set to the variable to avoid this situation.

Note: this regression would not have happened if we had some CI tests
for simple device probing.
Please let's invest more in CI.


> > Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> > Cc: talshn@mellanox.com

No he was not Cc in the thread. Same for Anatoly.
Adding more people in Cc...

> > Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> > ---
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -547,6 +547,14 @@
> >                         bar_index,
> >                         memreg[0].offset, memreg[0].size,
> >                         memreg[1].offset, memreg[1].size);
> > +
> > +               if (memreg[0].size == 0 && memreg[1].size == 0) {
> > +                       /* No need to map this BAR */
> > +                       RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> > +                       bar->size = 0;
> > +                       bar->addr = 0;
> > +                       return 0;
> > +               }
> 
> We already have a check on bar size == 0.
> Why would we have this condition?
> Broken hw?
> 
> 
> >         } else {
> >                 memreg[0].offset = bar->offset;
> >                 memreg[0].size = bar->size;
> > @@ -556,7 +564,9 @@
> >         bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> >                         MAP_ANONYMOUS | additional_flags, -1, 0);
> >         if (bar_addr != MAP_FAILED) {
> > -               void *map_addr = NULL;
> > +               /* Set non NULL initial value for in case of no PCI mapping */
> > +               void *map_addr = bar_addr;
> > +
> 
> It took me some time to understand this code...
> Anyway, we have a regression in the librte_pci.
> This is where the fix should be.

Yes, I am going to send a fix.

> We can cleanup this code later.

Yes please, this function isn't understandable and lack of comments.
Anatoly please?
  
Thomas Monjalon July 10, 2020, 12:31 p.m. UTC | #8
10/07/2020 12:07, Thomas Monjalon:
> 10/07/2020 11:54, David Marchand:
> > On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
> > > From: Alvin Zhang <alvinx.zhang@intel.com>
> > >
> > > When mapping a PCI BAR containing an MSI-X table, some devices do not
> > > need to actually map this BAR or only need to map part of them, which
> > > may cause the mapping to fail. Now some checks are added and a non-NULL
> > > initial value is set to the variable to avoid this situation.
[...]
> > > --- a/drivers/bus/pci/linux/pci_vfio.c
> > > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > > @@ -547,6 +547,14 @@
> > >                         bar_index,
> > >                         memreg[0].offset, memreg[0].size,
> > >                         memreg[1].offset, memreg[1].size);
> > > +
> > > +               if (memreg[0].size == 0 && memreg[1].size == 0) {
> > > +                       /* No need to map this BAR */
> > > +                       RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> > > +                       bar->size = 0;
> > > +                       bar->addr = 0;
> > > +                       return 0;
> > > +               }
> > 
> > We already have a check on bar size == 0.
> > Why would we have this condition?
> > Broken hw?
> > 
> > 
> > >         } else {
> > >                 memreg[0].offset = bar->offset;
> > >                 memreg[0].size = bar->size;
> > > @@ -556,7 +564,9 @@
> > >         bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> > >                         MAP_ANONYMOUS | additional_flags, -1, 0);
> > >         if (bar_addr != MAP_FAILED) {
> > > -               void *map_addr = NULL;
> > > +               /* Set non NULL initial value for in case of no PCI mapping */
> > > +               void *map_addr = bar_addr;
> > > +
> > 
> > It took me some time to understand this code...
> > Anyway, we have a regression in the librte_pci.
> > This is where the fix should be.
> 
> Yes, I am going to send a fix.

Patch sent: https://patches.dpdk.org/patch/73741/

This patch is marked as rejected, but please follow-up on cleanup.

> > We can cleanup this code later.
> 
> Yes please, this function isn't understandable and lack of comments.
> Anatoly please?
  
Lincoln Lavoie July 10, 2020, 12:43 p.m. UTC | #9
On Fri, Jul 10, 2020 at 6:08 AM Thomas Monjalon <thomas@monjalon.net> wrote:

> 10/07/2020 11:54, David Marchand:
> > On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
> > > From: Alvin Zhang <alvinx.zhang@intel.com>
> > >
> > > When mapping a PCI BAR containing an MSI-X table, some devices do not
> > > need to actually map this BAR or only need to map part of them, which
> > > may cause the mapping to fail. Now some checks are added and a non-NULL
> > > initial value is set to the variable to avoid this situation.
>
> Note: this regression would not have happened if we had some CI tests
> for simple device probing.
> Please let's invest more in CI.
>
> Are you referring to adding tests to specifically check these conditions,
or would this have been caught just from the continued expansion of testing
on real hardware / NICs, or both? It seems like the issue is caused by a
combination of hardware behaviors and "broken code".  My point is, without
having some of those behaviors in the CI, we might still not have caught
this issue, even with probing checks.  Of course, more checks are always a
good thing.

>
> > > Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> > > Cc: talshn@mellanox.com
>
> No he was not Cc in the thread. Same for Anatoly.
> Adding more people in Cc...
>
> > > Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> > > ---
> > > --- a/drivers/bus/pci/linux/pci_vfio.c
> > > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > > @@ -547,6 +547,14 @@
> > >                         bar_index,
> > >                         memreg[0].offset, memreg[0].size,
> > >                         memreg[1].offset, memreg[1].size);
> > > +
> > > +               if (memreg[0].size == 0 && memreg[1].size == 0) {
> > > +                       /* No need to map this BAR */
> > > +                       RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n",
> bar_index);
> > > +                       bar->size = 0;
> > > +                       bar->addr = 0;
> > > +                       return 0;
> > > +               }
> >
> > We already have a check on bar size == 0.
> > Why would we have this condition?
> > Broken hw?
> >
> >
> > >         } else {
> > >                 memreg[0].offset = bar->offset;
> > >                 memreg[0].size = bar->size;
> > > @@ -556,7 +564,9 @@
> > >         bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> > >                         MAP_ANONYMOUS | additional_flags, -1, 0);
> > >         if (bar_addr != MAP_FAILED) {
> > > -               void *map_addr = NULL;
> > > +               /* Set non NULL initial value for in case of no PCI
> mapping */
> > > +               void *map_addr = bar_addr;
> > > +
> >
> > It took me some time to understand this code...
> > Anyway, we have a regression in the librte_pci.
> > This is where the fix should be.
>
> Yes, I am going to send a fix.
>
> > We can cleanup this code later.
>
> Yes please, this function isn't understandable and lack of comments.
> Anatoly please?
>
>
>
  
Alvin Zhang July 11, 2020, 6:57 a.m. UTC | #10
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, July 10, 2020 5:54 PM
> To: Zhang, AlvinX <alvinx.zhang@intel.com>
> Cc: dev <dev@dpdk.org>; Xing, Beilei <beilei.xing@intel.com>; Guo, Jia
> <jia.guo@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] bus/pci: fix mmap PCI resource
> 
> On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
> >
> > From: Alvin Zhang <alvinx.zhang@intel.com>
> >
> > When mapping a PCI BAR containing an MSI-X table, some devices do not
> > need to actually map this BAR or only need to map part of them, which
> > may cause the mapping to fail. Now some checks are added and a
> > non-NULL initial value is set to the variable to avoid this situation.
> >
> > Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> > Cc: talshn@mellanox.com
> >
> > Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> > ---
> >  drivers/bus/pci/linux/pci_vfio.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/bus/pci/linux/pci_vfio.c
> > b/drivers/bus/pci/linux/pci_vfio.c
> > index fdeb9a8..9143bfc 100644
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -547,6 +547,14 @@
> >                         bar_index,
> >                         memreg[0].offset, memreg[0].size,
> >                         memreg[1].offset, memreg[1].size);
> > +
> > +               if (memreg[0].size == 0 && memreg[1].size == 0) {
> > +                       /* No need to map this BAR */
> > +                       RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n",
> bar_index);
> > +                       bar->size = 0;
> > +                       bar->addr = 0;
> > +                       return 0;
> > +               }
> 
> We already have a check on bar size == 0.
> Why would we have this condition?
> Broken hw?
> 

If the misx-table size is equal to the bar size, the memreg[0].size and memreg[1].size will be zero at same time although the bar size is not zero

> 
> >         } else {
> >                 memreg[0].offset = bar->offset;
> >                 memreg[0].size = bar->size; @@ -556,7 +564,9 @@
> >         bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> >                         MAP_ANONYMOUS | additional_flags, -1, 0);
> >         if (bar_addr != MAP_FAILED) {
> > -               void *map_addr = NULL;
> > +               /* Set non NULL initial value for in case of no PCI mapping
> */
> > +               void *map_addr = bar_addr;
> > +
> 
> It took me some time to understand this code...
> Anyway, we have a regression in the librte_pci.
> This is where the fix should be.
> 
> We can cleanup this code later.

The key cause is the value of memreg[0].size is 0


> 
> >                 if (memreg[0].size) {
> >                         /* actual map of first part */
> >                         map_addr = pci_map_resource(bar_addr,
> > vfio_dev_fd,
> > --
> > 1.8.3.1
> >
> 
> 
> Thanks.
> 
> --
> David Marchand
  
David Marchand Sept. 23, 2020, 7:34 a.m. UTC | #11
On Fri, Jul 10, 2020 at 12:08 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > It took me some time to understand this code...
> > Anyway, we have a regression in the librte_pci.
> > This is where the fix should be.
>
> Yes, I am going to send a fix.
>
> > We can cleanup this code later.
>
> Yes please, this function isn't understandable and lack of comments.
> Anatoly please?

Now that the workaround on the pci mapping API has been reverted, we
are back with this issue.
https://bugs.dpdk.org/show_bug.cgi?id=539

Volunteer(s) to fix and clean this code?

Thanks.
  

Patch

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fdeb9a8..9143bfc 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -547,6 +547,14 @@ 
 			bar_index,
 			memreg[0].offset, memreg[0].size,
 			memreg[1].offset, memreg[1].size);
+
+		if (memreg[0].size == 0 && memreg[1].size == 0) {
+			/* No need to map this BAR */
+			RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
+			bar->size = 0;
+			bar->addr = 0;
+			return 0;
+		}
 	} else {
 		memreg[0].offset = bar->offset;
 		memreg[0].size = bar->size;
@@ -556,7 +564,9 @@ 
 	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
 			MAP_ANONYMOUS | additional_flags, -1, 0);
 	if (bar_addr != MAP_FAILED) {
-		void *map_addr = NULL;
+		/* Set non NULL initial value for in case of no PCI mapping */
+		void *map_addr = bar_addr;
+
 		if (memreg[0].size) {
 			/* actual map of first part */
 			map_addr = pci_map_resource(bar_addr, vfio_dev_fd,