[0/7] vfio/pci: SR-IOV support

Message ID	158145472604.16827.15751375540102298130.stgit@gimli.home (mailing list archive)
Headers	From: Alex Williamson <alex.williamson@redhat.com> To: kvm@vger.kernel.org Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, dev@dpdk.org, mtosatti@redhat.com, thomas@monjalon.net, bluca@debian.org, jerinjacobk@gmail.com, bruce.richardson@intel.com, cohuck@redhat.com Date: Tue, 11 Feb 2020 16:05:16 -0700 Message-ID: <158145472604.16827.15751375540102298130.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Subject: [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	vfio/pci: SR-IOV support \| [0/7] vfio/pci: SR-IOV support [1/7] vfio: Include optional device match in vfio_device_ops callbacks [2/7] vfio/pci: Implement match ops [3/7] vfio/pci: Introduce VF token [4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user [5/7] vfio/pci: Add sriov_configure support [6/7] vfio/pci: Remove dev_fmt definition [7/7] vfio/pci: Cleanup .probe() exit paths

Message ID

158145472604.16827.15751375540102298130.stgit@gimli.home (mailing list archive)

Headers

From: Alex Williamson <alex.williamson@redhat.com>
To: kvm@vger.kernel.org
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, dev@dpdk.org,
 mtosatti@redhat.com, thomas@monjalon.net, bluca@debian.org,
 jerinjacobk@gmail.com, bruce.richardson@intel.com, cohuck@redhat.com
Date: Tue, 11 Feb 2020 16:05:16 -0700
Message-ID: <158145472604.16827.15751375540102298130.stgit@gimli.home>
User-Agent: StGit/0.19-dirty
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Subject: [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support
Precedence: list
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Series

vfio/pci: SR-IOV support |

Message

Alex Williamson Feb. 11, 2020, 11:05 p.m. UTC

  Given the mostly positive feedback from the RFC[1], here's a new
non-RFC revision.  Changes since RFC:

 - vfio_device_ops.match semantics refined
 - Use helpers for struct pci_dev.physfn to avoid breakage without
   CONFIG_PCI_IOV
 - Relax to allow SR-IOV configuration changes while PF is opened.
   There are potentially interesting use cases here, including
   perhaps QEMU emulating an SR-IOV capability and calling out
   to a privileged entity to manipulate sriov_numvfs and corral
   the resulting devices.
 - Retest vfio_device_feature.argsz to include uuid length.
 - Add Connie's R-b on 6/7

I still wish we had a solution to make it less opaque to the user
why a VFIO_GROUP_GET_DEVICE_FD() has failed if a VF token is
required, but this is still the best I've been able to come up with.
If there are objections or better ideas, please raise them now.

The synopsis of this series is that we have an ongoing desire to drive
PCIe SR-IOV PFs from userspace with VFIO.  There's an immediate need
for this with DPDK drivers and potentially interesting future use
cases in virtualization.  We've been reluctant to add this support
previously due to the dependency and trust relationship between the
VF device and PF driver.  Minimally the PF driver can induce a denial
of service to the VF, but depending on the specific implementation,
the PF driver might also be responsible for moving data between VFs
or have direct access to the state of the VF, including data or state
otherwise private to the VF or VF driver.

To help resolve these concerns, we introduce a VF token into the VFIO
PCI ABI, which acts as a shared secret key between drivers.  The
userspace PF driver is required to set the VF token to a known value
and userspace VF drivers are required to provide the token to access
the VF device.  If a PF driver is restarted with VF drivers in use, it
must also provide the current token in order to prevent a rogue
untrusted PF driver from replacing a known driver.  The degree to
which this new token is considered secret is left to the userspace
drivers, the kernel intentionally provides no means to retrieve the
current token.

Note that the above token is only required for this new model where
both the PF and VF devices are usable through vfio-pci.  Existing
models of VFIO drivers where the PF is used without SR-IOV enabled
or the VF is bound to a userspace driver with an in-kernel, host PF
driver are unaffected.

The latter configuration above also highlights a new inverted scenario
that is now possible, a userspace PF driver with in-kernel VF drivers.
I believe this is a scenario that should be allowed, but should not be
enabled by default.  This series includes code to set a default
driver_override for VFs sourced from a vfio-pci user owned PF, such
that the VFs are also bound to vfio-pci.  This model is compatible
with tools like driverctl and allows the system administrator to
decide if other bindings should be enabled.  The VF token interface
above exists only between vfio-pci PF and VF drivers, once a VF is
bound to another driver, the administrator has effectively pronounced
the device as trusted.  The vfio-pci driver will note alternate
binding in dmesg for logging and debugging purposes.

Please review, comment, and test.  The example QEMU implementation
provided with the RFC[2] is still current for this version.  Thanks,

Alex

[1] https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stgit@gimli.home/
[2] https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/
---

Alex Williamson (7):
      vfio: Include optional device match in vfio_device_ops callbacks
      vfio/pci: Implement match ops
      vfio/pci: Introduce VF token
      vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
      vfio/pci: Add sriov_configure support
      vfio/pci: Remove dev_fmt definition
      vfio/pci: Cleanup .probe() exit paths


 drivers/vfio/pci/vfio_pci.c         |  312 ++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   10 +
 drivers/vfio/vfio.c                 |   20 ++
 include/linux/vfio.h                |    4 
 include/uapi/linux/vfio.h           |   37 ++++
 5 files changed, 355 insertions(+), 28 deletions(-)

Comments

Alexey Kardashevskiy Feb. 14, 2020, 4:57 a.m. UTC | #1

On 12/02/2020 10:05, Alex Williamson wrote:
> Given the mostly positive feedback from the RFC[1], here's a new
> non-RFC revision.  Changes since RFC:
> 
>  - vfio_device_ops.match semantics refined
>  - Use helpers for struct pci_dev.physfn to avoid breakage without
>    CONFIG_PCI_IOV
>  - Relax to allow SR-IOV configuration changes while PF is opened.
>    There are potentially interesting use cases here, including
>    perhaps QEMU emulating an SR-IOV capability and calling out
>    to a privileged entity to manipulate sriov_numvfs and corral
>    the resulting devices.
>  - Retest vfio_device_feature.argsz to include uuid length.
>  - Add Connie's R-b on 6/7
> 
> I still wish we had a solution to make it less opaque to the user
> why a VFIO_GROUP_GET_DEVICE_FD() has failed if a VF token is
> required, but this is still the best I've been able to come up with.
> If there are objections or better ideas, please raise them now.
> 
> The synopsis of this series is that we have an ongoing desire to drive
> PCIe SR-IOV PFs from userspace with VFIO.  There's an immediate need
> for this with DPDK drivers and potentially interesting future use
> cases in virtualization.  We've been reluctant to add this support
> previously due to the dependency and trust relationship between the
> VF device and PF driver.  Minimally the PF driver can induce a denial
> of service to the VF, but depending on the specific implementation,
> the PF driver might also be responsible for moving data between VFs
> or have direct access to the state of the VF, including data or state
> otherwise private to the VF or VF driver.
> 
> To help resolve these concerns, we introduce a VF token into the VFIO
> PCI ABI, which acts as a shared secret key between drivers.  The
> userspace PF driver is required to set the VF token to a known value
> and userspace VF drivers are required to provide the token to access
> the VF device.  If a PF driver is restarted with VF drivers in use, it
> must also provide the current token in order to prevent a rogue
> untrusted PF driver from replacing a known driver.  The degree to
> which this new token is considered secret is left to the userspace
> drivers, the kernel intentionally provides no means to retrieve the
> current token.
> 
> Note that the above token is only required for this new model where
> both the PF and VF devices are usable through vfio-pci.  Existing
> models of VFIO drivers where the PF is used without SR-IOV enabled
> or the VF is bound to a userspace driver with an in-kernel, host PF
> driver are unaffected.
> 
> The latter configuration above also highlights a new inverted scenario
> that is now possible, a userspace PF driver with in-kernel VF drivers.
> I believe this is a scenario that should be allowed, but should not be
> enabled by default.  This series includes code to set a default
> driver_override for VFs sourced from a vfio-pci user owned PF, such
> that the VFs are also bound to vfio-pci.  This model is compatible
> with tools like driverctl and allows the system administrator to
> decide if other bindings should be enabled.  The VF token interface
> above exists only between vfio-pci PF and VF drivers, once a VF is
> bound to another driver, the administrator has effectively pronounced
> the device as trusted.  The vfio-pci driver will note alternate
> binding in dmesg for logging and debugging purposes.
> 
> Please review, comment, and test.  The example QEMU implementation
> provided with the RFC[2] is still current for this version.  Thanks,


It is a cool feature. One question - what device have you tested it with?

Does not a PF want to control/manage VFs on a PF driver side? I am
thinking of Mellanox CX5 or similar NIC and it acts as an managed
ethernet switch which might want to do something to VFs and VFs may not
work as expected without PF's native driver doing things to it, or this
is not a concern, is it? Thanks,


> 
> Alex
> 
> [1] https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stgit@gimli.home/
> [2] https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/
> ---
> 
> Alex Williamson (7):
>       vfio: Include optional device match in vfio_device_ops callbacks
>       vfio/pci: Implement match ops
>       vfio/pci: Introduce VF token
>       vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
>       vfio/pci: Add sriov_configure support
>       vfio/pci: Remove dev_fmt definition
>       vfio/pci: Cleanup .probe() exit paths
> 
> 
>  drivers/vfio/pci/vfio_pci.c         |  312 ++++++++++++++++++++++++++++++++---
>  drivers/vfio/pci/vfio_pci_private.h |   10 +
>  drivers/vfio/vfio.c                 |   20 ++
>  include/linux/vfio.h                |    4 
>  include/uapi/linux/vfio.h           |   37 ++++
>  5 files changed, 355 insertions(+), 28 deletions(-)
>

Alex Williamson Feb. 14, 2020, 3:27 p.m. UTC | #2

On Fri, 14 Feb 2020 15:57:04 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 12/02/2020 10:05, Alex Williamson wrote:
> > Given the mostly positive feedback from the RFC[1], here's a new
> > non-RFC revision.  Changes since RFC:
> > 
> >  - vfio_device_ops.match semantics refined
> >  - Use helpers for struct pci_dev.physfn to avoid breakage without
> >    CONFIG_PCI_IOV
> >  - Relax to allow SR-IOV configuration changes while PF is opened.
> >    There are potentially interesting use cases here, including
> >    perhaps QEMU emulating an SR-IOV capability and calling out
> >    to a privileged entity to manipulate sriov_numvfs and corral
> >    the resulting devices.
> >  - Retest vfio_device_feature.argsz to include uuid length.
> >  - Add Connie's R-b on 6/7
> > 
> > I still wish we had a solution to make it less opaque to the user
> > why a VFIO_GROUP_GET_DEVICE_FD() has failed if a VF token is
> > required, but this is still the best I've been able to come up with.
> > If there are objections or better ideas, please raise them now.
> > 
> > The synopsis of this series is that we have an ongoing desire to drive
> > PCIe SR-IOV PFs from userspace with VFIO.  There's an immediate need
> > for this with DPDK drivers and potentially interesting future use
> > cases in virtualization.  We've been reluctant to add this support
> > previously due to the dependency and trust relationship between the
> > VF device and PF driver.  Minimally the PF driver can induce a denial
> > of service to the VF, but depending on the specific implementation,
> > the PF driver might also be responsible for moving data between VFs
> > or have direct access to the state of the VF, including data or state
> > otherwise private to the VF or VF driver.
> > 
> > To help resolve these concerns, we introduce a VF token into the VFIO
> > PCI ABI, which acts as a shared secret key between drivers.  The
> > userspace PF driver is required to set the VF token to a known value
> > and userspace VF drivers are required to provide the token to access
> > the VF device.  If a PF driver is restarted with VF drivers in use, it
> > must also provide the current token in order to prevent a rogue
> > untrusted PF driver from replacing a known driver.  The degree to
> > which this new token is considered secret is left to the userspace
> > drivers, the kernel intentionally provides no means to retrieve the
> > current token.
> > 
> > Note that the above token is only required for this new model where
> > both the PF and VF devices are usable through vfio-pci.  Existing
> > models of VFIO drivers where the PF is used without SR-IOV enabled
> > or the VF is bound to a userspace driver with an in-kernel, host PF
> > driver are unaffected.
> > 
> > The latter configuration above also highlights a new inverted scenario
> > that is now possible, a userspace PF driver with in-kernel VF drivers.
> > I believe this is a scenario that should be allowed, but should not be
> > enabled by default.  This series includes code to set a default
> > driver_override for VFs sourced from a vfio-pci user owned PF, such
> > that the VFs are also bound to vfio-pci.  This model is compatible
> > with tools like driverctl and allows the system administrator to
> > decide if other bindings should be enabled.  The VF token interface
> > above exists only between vfio-pci PF and VF drivers, once a VF is
> > bound to another driver, the administrator has effectively pronounced
> > the device as trusted.  The vfio-pci driver will note alternate
> > binding in dmesg for logging and debugging purposes.
> > 
> > Please review, comment, and test.  The example QEMU implementation
> > provided with the RFC[2] is still current for this version.  Thanks,  
> 
> 
> It is a cool feature. One question - what device have you tested it with?
> 
> Does not a PF want to control/manage VFs on a PF driver side? I am
> thinking of Mellanox CX5 or similar NIC and it acts as an managed
> ethernet switch which might want to do something to VFs and VFs may not
> work as expected without PF's native driver doing things to it, or this
> is not a concern, is it? Thanks,

TBH, I'm starting with the premise that a userspace PF driver already
works.  The DPDK folks have produced some "interesting" code that
allows SR-IOV to be enabled on a PF underneath vfio-pci.  There's also
a non-upstream igb-uio driver associated with DPDK that seems to be
recommended for SR-IOV PF driver use cases, particularly for an FPGA
device.  The testing I've done, and what's provided by the QEMU patch I
reference, is really only unit testing the vf_token support and
DEVICE_FEATURE ioctl provided here.  I used this with an Intel 82576
(igb) where the PF driver doesn't particularly like being assigned to a
VM with SR-IOV enabled.  Likewise, I can prove that the interfaces here
provide the correct restrictions for the VF, but the VF doesn't work in
a VM due to the state of the PF.  I'm hoping we'll have some
confirmation from the DPDK folks that this provides what they need to
abandon the non-upstream drivers and more nefarious hacks.  There's a
lot more virtualization work to be done in QEMU before I'd propose
patch I reference above upstream.

To your specific question regarding CX5, I think there are very few
SR-IOV devices where the PF doesn't act as some kind of packet router
or ring management engine.  The Amazon device listed in the pci-pf-stub
driver seems to be one of the few SR-IOV devices which claim the PF has
no special interfaces other than exposing the SR-IOV capability itself.
So I think we generally expect a device specific SR-IOV aware driver
running on the PF via this interface.  That's certainly the case for
the DPDK code for the FPGA device above.  Thanks,

Alex