[v4] bus/pci: fix legacy device IO port map in secondary process

Message ID 20230830050713.58247-1-wenwux.ma@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v4] bus/pci: fix legacy device IO port map in secondary process |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/loongarch-compilation success Compilation OK
ci/github-robot: build success github build: passed
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

Ma, WenwuX Aug. 30, 2023, 5:07 a.m. UTC
  When doing IO port mapping for legacy device in secondary process, the
region information is missing, so, we need to refill it.

Fixes: 4b741542ecde ("bus/pci: avoid depending on private kernel value")
Cc: stable@dpdk.org

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
v4:
 - adjusting commit log
v3:
 - adjusting variable settings
v2:
 - add release of device in pci_vfio_ioport_unmap

---
 drivers/bus/pci/linux/pci_vfio.c | 43 ++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)
  

Comments

Gupta, Nipun Sept. 4, 2023, 3:06 p.m. UTC | #1
Addressed
On 8/30/2023 10:37 AM, Wenwu Ma wrote:
> When doing IO port mapping for legacy device in secondary process, the
> region information is missing, so, we need to refill it.
> 
> Fixes: 4b741542ecde ("bus/pci: avoid depending on private kernel value")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>

Acked-by: Nipun Gupta <nipun.gupta@amd.com>
  
Ling, WeiX Sept. 5, 2023, 3:10 a.m. UTC | #2
Addressed
> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Wednesday, August 30, 2023 1:07 PM
> To: nipun.gupta@amd.com; dev@dpdk.org
> Cc: david.marchand@redhat.com; maxime.coquelin@redhat.com; Xia,
> Chenbo <chenbo.xia@intel.com>; Li, Miao <miao.li@intel.com>; Ling, WeiX
> <weix.ling@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>;
> stable@dpdk.org
> Subject: [PATCH v4] bus/pci: fix legacy device IO port map in secondary
> process
> 
> When doing IO port mapping for legacy device in secondary process, the
> region information is missing, so, we need to refill it.
> 
> Fixes: 4b741542ecde ("bus/pci: avoid depending on private kernel value")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>

Tested-by: Wei Ling <weix.ling@intel.com>
  
David Marchand Oct. 3, 2023, 10:21 a.m. UTC | #3
On Wed, Aug 30, 2023 at 7:07 AM Wenwu Ma <wenwux.ma@intel.com> wrote:
>
> When doing IO port mapping for legacy device in secondary process, the
> region information is missing, so, we need to refill it.
>
> Fixes: 4b741542ecde ("bus/pci: avoid depending on private kernel value")
> Cc: stable@dpdk.org
>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 43 ++++++++++++++++++++++++++++++--
>  1 file changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index e634de8322..5ef26c98d1 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -1314,6 +1314,27 @@ pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
>                 return -1;
>         }
>
> +       if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +               struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
> +               char pci_addr[PATH_MAX];
> +               int vfio_dev_fd;
> +               struct rte_pci_addr *loc = &dev->addr;
> +               int ret;
> +               /* store PCI address string */
> +               snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
> +                               loc->domain, loc->bus, loc->devid, loc->function);
> +
> +               ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
> +                                               &vfio_dev_fd, &device_info);

From a pci bus API pov, nothing prevents a driver from mixing memory
mapped with vfio and ioport resources (iow, calls to
rte_pci_map_resource() and rte_pci_ioport_map()).
IOW, it may not be the case with the net/virtio driver but, in theory,
rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
rte_pci_map_resource() call.

In a similar manner, from the API pov,
rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple
bars.

In summary, nothing in this patch checks that vfio has been configured
already and I think we need a refcount to handle those situations.

Nipun, Chenbo, WDYT?


> +               if (ret)
> +                       return -1;

ret value is not used, so there is no need for this variable.

if (rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
        &vfio_dev_fd, &device_info) != 0)
    return -1;

> +
> +               ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
> +               if (ret)
> +                       return -1;

Same here, ret is not needed.


> +
> +       }
> +
>         if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
>                 RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
>                 return -1;
> @@ -1361,8 +1382,26 @@ pci_vfio_ioport_write(struct rte_pci_ioport *p,
>  int
>  pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
>  {
> -       RTE_SET_USED(p);
> -       return -1;
> +       char pci_addr[PATH_MAX] = {0};
> +       struct rte_pci_addr *loc = &p->dev->addr;
> +       int ret, vfio_dev_fd;
> +
> +       /* store PCI address string */
> +       snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
> +                       loc->domain, loc->bus, loc->devid, loc->function);
> +
> +       vfio_dev_fd = rte_intr_dev_fd_get(p->dev->intr_handle);
> +       if (vfio_dev_fd < 0)
> +               return -1;

This check is odd and does not seem related.


> +
> +       ret = rte_vfio_release_device(rte_pci_get_sysfs_path(), pci_addr,
> +                                     vfio_dev_fd);
> +       if (ret < 0) {
> +               RTE_LOG(ERR, EAL, "Cannot release VFIO device\n");
> +               return ret;
> +       }
  
Ma, WenwuX Oct. 9, 2023, 3:06 a.m. UTC | #4
Hi,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 2023年10月3日 18:21
> To: Ma, WenwuX <wenwux.ma@intel.com>; nipun.gupta@amd.com;
> chenbo.xia@outlook.com
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com; Li, Miao
> <miao.li@intel.com>; Ling, WeiX <weix.ling@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v4] bus/pci: fix legacy device IO port map in secondary
> process
> 
> On Wed, Aug 30, 2023 at 7:07 AM Wenwu Ma <wenwux.ma@intel.com>
> wrote:
> >
> > When doing IO port mapping for legacy device in secondary process, the
> > region information is missing, so, we need to refill it.
> >
> > Fixes: 4b741542ecde ("bus/pci: avoid depending on private kernel
> > value")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >
> > ---
> >  drivers/bus/pci/linux/pci_vfio.c | 43
> > ++++++++++++++++++++++++++++++--
> >  1 file changed, 41 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/bus/pci/linux/pci_vfio.c
> > b/drivers/bus/pci/linux/pci_vfio.c
> > index e634de8322..5ef26c98d1 100644
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -1314,6 +1314,27 @@ pci_vfio_ioport_map(struct rte_pci_device
> *dev, int bar,
> >                 return -1;
> >         }
> >
> > +       if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> > +               struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
> > +               char pci_addr[PATH_MAX];
> > +               int vfio_dev_fd;
> > +               struct rte_pci_addr *loc = &dev->addr;
> > +               int ret;
> > +               /* store PCI address string */
> > +               snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
> > +                               loc->domain, loc->bus, loc->devid,
> > + loc->function);
> > +
> > +               ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
> > +                                               &vfio_dev_fd,
> > + &device_info);
> 
> From a pci bus API pov, nothing prevents a driver from mixing memory
> mapped with vfio and ioport resources (iow, calls to
> rte_pci_map_resource() and rte_pci_ioport_map()).
> IOW, it may not be the case with the net/virtio driver but, in theory,
> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
> rte_pci_map_resource() call.
> 
> In a similar manner, from the API pov,
> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple bars.
> 
> In summary, nothing in this patch checks that vfio has been configured already
> and I think we need a refcount to handle those situations.
> 
We call rte_vfio_setup_device just to get device info, we can call rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
This avoids reference counting operations, do you think it works?

> Nipun, Chenbo, WDYT?
> 
> 
> > +               if (ret)
> > +                       return -1;
> 
> ret value is not used, so there is no need for this variable.
> 
> if (rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
>         &vfio_dev_fd, &device_info) != 0)
>     return -1;
> 
> > +
> > +               ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
> > +               if (ret)
> > +                       return -1;
> 
> Same here, ret is not needed.
> 
> 
> > +
> > +       }
> > +
> >         if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
> >                 RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
> >                 return -1;
> > @@ -1361,8 +1382,26 @@ pci_vfio_ioport_write(struct rte_pci_ioport *p,
> > int  pci_vfio_ioport_unmap(struct rte_pci_ioport *p)  {
> > -       RTE_SET_USED(p);
> > -       return -1;
> > +       char pci_addr[PATH_MAX] = {0};
> > +       struct rte_pci_addr *loc = &p->dev->addr;
> > +       int ret, vfio_dev_fd;
> > +
> > +       /* store PCI address string */
> > +       snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
> > +                       loc->domain, loc->bus, loc->devid,
> > + loc->function);
> > +
> > +       vfio_dev_fd = rte_intr_dev_fd_get(p->dev->intr_handle);
> > +       if (vfio_dev_fd < 0)
> > +               return -1;
> 
> This check is odd and does not seem related.
> 
> 
> > +
> > +       ret = rte_vfio_release_device(rte_pci_get_sysfs_path(), pci_addr,
> > +                                     vfio_dev_fd);
> > +       if (ret < 0) {
> > +               RTE_LOG(ERR, EAL, "Cannot release VFIO device\n");
> > +               return ret;
> > +       }
> 
> 
> --
> David Marchand
  
David Marchand Oct. 18, 2023, 10:05 a.m. UTC | #5
On Mon, Oct 9, 2023 at 5:06 AM Ma, WenwuX <wenwux.ma@intel.com> wrote:
> > From a pci bus API pov, nothing prevents a driver from mixing memory
> > mapped with vfio and ioport resources (iow, calls to
> > rte_pci_map_resource() and rte_pci_ioport_map()).
> > IOW, it may not be the case with the net/virtio driver but, in theory,
> > rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
> > rte_pci_map_resource() call.
> >
> > In a similar manner, from the API pov,
> > rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple bars.
> >
> > In summary, nothing in this patch checks that vfio has been configured already
> > and I think we need a refcount to handle those situations.
> >
> We call rte_vfio_setup_device just to get device info, we can call rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
> This avoids reference counting operations, do you think it works?

Afaics, rte_vfio_setup_device should not be called if a call to
rte_pci_map_device for this device was successful (rte_pci_map_device
itself calls rte_vfio_setup_device).
And as a consequence, calling rte_vfio_release_device cannot be done
unconditionnally neither.
  
Gupta, Nipun Oct. 18, 2023, 12:38 p.m. UTC | #6
On 10/18/2023 3:35 PM, David Marchand wrote:
> On Mon, Oct 9, 2023 at 5:06 AM Ma, WenwuX <wenwux.ma@intel.com> wrote:
>>>  From a pci bus API pov, nothing prevents a driver from mixing memory
>>> mapped with vfio and ioport resources (iow, calls to
>>> rte_pci_map_resource() and rte_pci_ioport_map()).
>>> IOW, it may not be the case with the net/virtio driver but, in theory,
>>> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
>>> rte_pci_map_resource() call.
>>>
>>> In a similar manner, from the API pov,
>>> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple bars.
>>>
>>> In summary, nothing in this patch checks that vfio has been configured already
>>> and I think we need a refcount to handle those situations.
>>>
>> We call rte_vfio_setup_device just to get device info, we can call rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
>> This avoids reference counting operations, do you think it works?
> 
> Afaics, rte_vfio_setup_device should not be called if a call to
> rte_pci_map_device for this device was successful (rte_pci_map_device
> itself calls rte_vfio_setup_device).
> And as a consequence, calling rte_vfio_release_device cannot be done
> unconditionnally neither.

Hi David,

AFAIU, rte_vfio_setup_device() is written as re-entrant and does not 
create the DMA mapping again if it is already done for the iommu group.

When this API is called again either for a device within the same group 
or from the device for which it is already called, it mainly only does 
the work for device info get. Though not the best thing to use like 
this, but if this is called multiple times it should not have any 
negative impact.

As Wenmu mention that they need only device info from VFIO, a separate 
API to get device info can be added in eal_vfio.c/h. The device info 
portion of rte_vfio_setup_device() can be moved out to a new API, and 
rte_vfio_setup_device() can call this new API too?

Thanks,
Nipun
  
David Marchand Oct. 18, 2023, 1:44 p.m. UTC | #7
On Wed, Oct 18, 2023 at 2:39 PM Gupta, Nipun <nipun.gupta@amd.com> wrote:
> On 10/18/2023 3:35 PM, David Marchand wrote:
> > On Mon, Oct 9, 2023 at 5:06 AM Ma, WenwuX <wenwux.ma@intel.com> wrote:
> >>>  From a pci bus API pov, nothing prevents a driver from mixing memory
> >>> mapped with vfio and ioport resources (iow, calls to
> >>> rte_pci_map_resource() and rte_pci_ioport_map()).
> >>> IOW, it may not be the case with the net/virtio driver but, in theory,
> >>> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
> >>> rte_pci_map_resource() call.
> >>>
> >>> In a similar manner, from the API pov,
> >>> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple bars.
> >>>
> >>> In summary, nothing in this patch checks that vfio has been configured already
> >>> and I think we need a refcount to handle those situations.
> >>>
> >> We call rte_vfio_setup_device just to get device info, we can call rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
> >> This avoids reference counting operations, do you think it works?
> >
> > Afaics, rte_vfio_setup_device should not be called if a call to
> > rte_pci_map_device for this device was successful (rte_pci_map_device
> > itself calls rte_vfio_setup_device).
> > And as a consequence, calling rte_vfio_release_device cannot be done
> > unconditionnally neither.
>
> Hi David,
>
> AFAIU, c() is written as re-entrant and does not
> create the DMA mapping again if it is already done for the iommu group.
>
> When this API is called again either for a device within the same group
> or from the device for which it is already called, it mainly only does
> the work for device info get. Though not the best thing to use like
> this, but if this is called multiple times it should not have any
> negative impact.

Even if rte_vfio_setup_device() is reentrant, there is still the
question when to call rte_vfio_release_device().


>
> As Wenmu mention that they need only device info from VFIO, a separate
> API to get device info can be added in eal_vfio.c/h. The device info
> portion of rte_vfio_setup_device() can be moved out to a new API, and
> rte_vfio_setup_device() can call this new API too?

Ok, I think I understand your suggestion.

Do we have a reference to the vfio device fd stored somewhere in the
pci device object?
I don't think it is the case, but if the pci layer keeps a reference
to it (it would be populated/reset during
rte_pci_map_device/rte_pci_unmap_device), then the ioport code can
call the VFIO_DEVICE_GET_INFO ioctl() similarly to what is done for
irq msi info, and  there is no need for a new EAL api.

For the case when this device fd is not available (no previous call to
rte_pci_map_device()), then the ioport code can call
rte_vfio_setup_device() / rte_vfio_release_device().

Is this what you have in mind?
  

Patch

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index e634de8322..5ef26c98d1 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1314,6 +1314,27 @@  pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		return -1;
 	}
 
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+		char pci_addr[PATH_MAX];
+		int vfio_dev_fd;
+		struct rte_pci_addr *loc = &dev->addr;
+		int ret;
+		/* store PCI address string */
+		snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+				loc->domain, loc->bus, loc->devid, loc->function);
+
+		ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
+						&vfio_dev_fd, &device_info);
+		if (ret)
+			return -1;
+
+		ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
+		if (ret)
+			return -1;
+
+	}
+
 	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
 		RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
 		return -1;
@@ -1361,8 +1382,26 @@  pci_vfio_ioport_write(struct rte_pci_ioport *p,
 int
 pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
 {
-	RTE_SET_USED(p);
-	return -1;
+	char pci_addr[PATH_MAX] = {0};
+	struct rte_pci_addr *loc = &p->dev->addr;
+	int ret, vfio_dev_fd;
+
+	/* store PCI address string */
+	snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+			loc->domain, loc->bus, loc->devid, loc->function);
+
+	vfio_dev_fd = rte_intr_dev_fd_get(p->dev->intr_handle);
+	if (vfio_dev_fd < 0)
+		return -1;
+
+	ret = rte_vfio_release_device(rte_pci_get_sysfs_path(), pci_addr,
+				      vfio_dev_fd);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Cannot release VFIO device\n");
+		return ret;
+	}
+
+	return 0;
 }
 
 int