mbox series

[RFC,0/4] Support VFIO sparse mmap in PCI bus

Message ID 20230418053012.10667-1-chenbo.xia@intel.com (mailing list archive)
Headers
Series Support VFIO sparse mmap in PCI bus |

Message

Chenbo Xia April 18, 2023, 5:30 a.m. UTC
  This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules. 

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able.

Patch 4 adds the VFIO sparse mmap support finally. A question
is for all sparse mmap regions, should they be mapped to a
continuous virtual address region that follows device-specific
BAR layout or not. In theory, there could be three options to
support this feature.

Option 1: Map sparse mmap regions independently
======================================================
In this approach, we mmap each sparse mmap region one by one
and each region could be located anywhere in process address
space. But accessing the mmaped BAR will not be as easy as
'bar_base_address + bar_offset', driver needs to check the
sparse mmap information to access specific BAR register. 

Patch 4 in this patchset adopts this option. Driver API change
is introduced in bus_pci_driver.h. Corresponding changes in
all drivers are also done and currently I am assuming drivers
do not support this feature so they will not check the
'is_sparse' flag but assumes it to be false. Note that it will
not break any driver and each vendor can add related logic when
they start to support this feature. This is only because I don't
want to introduce complexity to drivers that do not want to
support this feature.

Option 2: Map sparse mmap regions based on device-specific BAR layout 
======================================================================
In this approach, the sparse mmap regions are mapped to continuous
virtual address region that follows device-specific BAR layout.
For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
mmaped. Region #1 will be mapped at 'base_addr' and region #2
will be mapped at 'base_addr + 0x3000'. The good thing is if
we implement like this, driver can still access all BAR registers
using 'bar_base_address + bar_offset' way and we don't need
to introduce any driver API change. But the address space
range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
be reserved so it could result in waste of address space or memory
(when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
range). Meanwhile, driver needs to know which part of BAR is
mmaped (this is possible since the range is defined by vendor's
specific kernel module).

Option 3: Support both option 1 & 2 
===================================
We could define a driver flag to let driver choose which way it
perfers since either option has its own Pros & Cons.

Please share your comments, Thanks!


Chenbo Xia (4):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write
  bus/pci: add VFIO sparse mmap support

 drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
 drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
 .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
 drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
 drivers/bus/pci/bsd/pci.c                     |  43 +-
 drivers/bus/pci/bus_pci_driver.h              |  24 +-
 drivers/bus/pci/linux/pci.c                   |  91 +++-
 drivers/bus/pci/linux/pci_init.h              |  14 +-
 drivers/bus/pci/linux/pci_uio.c               |  34 +-
 drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
 drivers/bus/pci/pci_common.c                  |  57 ++-
 drivers/bus/pci/pci_common_uio.c              |  12 +-
 drivers/bus/pci/private.h                     |  25 +-
 drivers/bus/pci/rte_bus_pci.h                 |  48 ++
 drivers/bus/pci/version.map                   |   3 +
 drivers/common/cnxk/roc_dev.c                 |   4 +-
 drivers/common/cnxk/roc_dpi.c                 |   2 +-
 drivers/common/cnxk/roc_ml.c                  |  22 +-
 drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
 drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
 drivers/common/sfc_efx/sfc_efx.c              |   2 +-
 drivers/compress/octeontx/otx_zip.c           |   4 +-
 drivers/crypto/ccp/ccp_dev.c                  |   4 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
 drivers/crypto/nitrox/nitrox_device.c         |   4 +-
 drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
 drivers/crypto/virtio/virtio_pci.c            |   6 +-
 drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
 drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
 drivers/dma/idxd/idxd_pci.c                   |   4 +-
 drivers/dma/ioat/ioat_dmadev.c                |   2 +-
 drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
 drivers/event/octeontx/ssovf_probe.c          |  38 +-
 drivers/event/octeontx/timvf_probe.c          |  18 +-
 drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
 drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
 drivers/net/ark/ark_ethdev.c                  |   4 +-
 drivers/net/atlantic/atl_ethdev.c             |   2 +-
 drivers/net/avp/avp_ethdev.c                  |  20 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
 drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
 drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
 drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
 drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
 drivers/net/cxgbe/cxgbe_main.c                |   2 +-
 drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
 drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
 drivers/net/e1000/em_ethdev.c                 |   4 +-
 drivers/net/e1000/igb_ethdev.c                |   4 +-
 drivers/net/ena/ena_ethdev.c                  |   4 +-
 drivers/net/enetc/enetc_ethdev.c              |   2 +-
 drivers/net/enic/enic_main.c                  |   4 +-
 drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
 drivers/net/gve/gve_ethdev.c                  |   4 +-
 drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
 drivers/net/hns3/hns3_ethdev.c                |   2 +-
 drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
 drivers/net/hns3/hns3_rxtx.c                  |   4 +-
 drivers/net/i40e/i40e_ethdev.c                |   2 +-
 drivers/net/iavf/iavf_ethdev.c                |   2 +-
 drivers/net/ice/ice_dcf.c                     |   2 +-
 drivers/net/ice/ice_ethdev.c                  |   2 +-
 drivers/net/idpf/idpf_ethdev.c                |   4 +-
 drivers/net/igc/igc_ethdev.c                  |   2 +-
 drivers/net/ionic/ionic_dev_pci.c             |   2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
 drivers/net/liquidio/lio_ethdev.c             |   4 +-
 drivers/net/nfp/nfp_ethdev.c                  |   2 +-
 drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
 drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
 drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
 drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
 drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
 drivers/net/qede/qede_main.c                  |   6 +-
 drivers/net/sfc/sfc.c                         |   2 +-
 drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
 drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
 drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
 drivers/net/virtio/virtio_pci.c               |   6 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
 drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
 drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
 drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
 drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
 drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
 drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
 lib/eal/include/rte_vfio.h                    |   1 -
 90 files changed, 853 insertions(+), 352 deletions(-)
  

Comments

David Marchand April 18, 2023, 7:46 a.m. UTC | #1
Hello Chenbo,

On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
>
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.

Sorry, I did not take the time to look into the details.
Could you summarize what would be the benefit of this series?


>
> In this patchset:
>
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able.
>
> Patch 4 adds the VFIO sparse mmap support finally. A question
> is for all sparse mmap regions, should they be mapped to a
> continuous virtual address region that follows device-specific
> BAR layout or not. In theory, there could be three options to
> support this feature.
>
> Option 1: Map sparse mmap regions independently
> ======================================================
> In this approach, we mmap each sparse mmap region one by one
> and each region could be located anywhere in process address
> space. But accessing the mmaped BAR will not be as easy as
> 'bar_base_address + bar_offset', driver needs to check the
> sparse mmap information to access specific BAR register.
>
> Patch 4 in this patchset adopts this option. Driver API change
> is introduced in bus_pci_driver.h. Corresponding changes in
> all drivers are also done and currently I am assuming drivers
> do not support this feature so they will not check the
> 'is_sparse' flag but assumes it to be false. Note that it will
> not break any driver and each vendor can add related logic when
> they start to support this feature. This is only because I don't
> want to introduce complexity to drivers that do not want to
> support this feature.
>
> Option 2: Map sparse mmap regions based on device-specific BAR layout
> ======================================================================
> In this approach, the sparse mmap regions are mapped to continuous
> virtual address region that follows device-specific BAR layout.
> For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> mmaped. Region #1 will be mapped at 'base_addr' and region #2
> will be mapped at 'base_addr + 0x3000'. The good thing is if
> we implement like this, driver can still access all BAR registers
> using 'bar_base_address + bar_offset' way and we don't need
> to introduce any driver API change. But the address space
> range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> be reserved so it could result in waste of address space or memory
> (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> range). Meanwhile, driver needs to know which part of BAR is
> mmaped (this is possible since the range is defined by vendor's
> specific kernel module).
>
> Option 3: Support both option 1 & 2
> ===================================
> We could define a driver flag to let driver choose which way it
> perfers since either option has its own Pros & Cons.
>
> Please share your comments, Thanks!
>
>
> Chenbo Xia (4):
>   bus/pci: introduce an internal representation of PCI device

I think this first patch main motivation was to avoid ABI issues.
Since v22.11, the rte_pci_device object is opaque to applications.

So, do we still need this patch?


>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
>   bus/pci: add VFIO sparse mmap support
>
>  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
>  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
>  drivers/bus/pci/bsd/pci.c                     |  43 +-
>  drivers/bus/pci/bus_pci_driver.h              |  24 +-
>  drivers/bus/pci/linux/pci.c                   |  91 +++-
>  drivers/bus/pci/linux/pci_init.h              |  14 +-
>  drivers/bus/pci/linux/pci_uio.c               |  34 +-
>  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
>  drivers/bus/pci/pci_common.c                  |  57 ++-
>  drivers/bus/pci/pci_common_uio.c              |  12 +-
>  drivers/bus/pci/private.h                     |  25 +-
>  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
>  drivers/bus/pci/version.map                   |   3 +
>  drivers/common/cnxk/roc_dev.c                 |   4 +-
>  drivers/common/cnxk/roc_dpi.c                 |   2 +-
>  drivers/common/cnxk/roc_ml.c                  |  22 +-
>  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
>  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
>  drivers/compress/octeontx/otx_zip.c           |   4 +-
>  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
>  drivers/crypto/virtio/virtio_pci.c            |   6 +-
>  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
>  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
>  drivers/dma/idxd/idxd_pci.c                   |   4 +-
>  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
>  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
>  drivers/event/octeontx/ssovf_probe.c          |  38 +-
>  drivers/event/octeontx/timvf_probe.c          |  18 +-
>  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
>  drivers/net/ark/ark_ethdev.c                  |   4 +-
>  drivers/net/atlantic/atl_ethdev.c             |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |  20 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
>  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
>  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
>  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
>  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
>  drivers/net/e1000/em_ethdev.c                 |   4 +-
>  drivers/net/e1000/igb_ethdev.c                |   4 +-
>  drivers/net/ena/ena_ethdev.c                  |   4 +-
>  drivers/net/enetc/enetc_ethdev.c              |   2 +-
>  drivers/net/enic/enic_main.c                  |   4 +-
>  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
>  drivers/net/gve/gve_ethdev.c                  |   4 +-
>  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
>  drivers/net/hns3/hns3_ethdev.c                |   2 +-
>  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
>  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                |   2 +-
>  drivers/net/iavf/iavf_ethdev.c                |   2 +-
>  drivers/net/ice/ice_dcf.c                     |   2 +-
>  drivers/net/ice/ice_ethdev.c                  |   2 +-
>  drivers/net/idpf/idpf_ethdev.c                |   4 +-
>  drivers/net/igc/igc_ethdev.c                  |   2 +-
>  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
>  drivers/net/liquidio/lio_ethdev.c             |   4 +-
>  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
>  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
>  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
>  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
>  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
>  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
>  drivers/net/qede/qede_main.c                  |   6 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
>  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
>  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
>  drivers/net/virtio/virtio_pci.c               |   6 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
>  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
>  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
>  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
>  lib/eal/include/rte_vfio.h                    |   1 -
>  90 files changed, 853 insertions(+), 352 deletions(-)
  
Chenbo Xia April 18, 2023, 9:27 a.m. UTC | #2
Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> Hello Chenbo,
> 
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> 
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?

It could be different benefit for different vendor. There was one discussion:
http://inbox.dpdk.org/dev/CO6PR18MB386016A2634AF375F5B4BA8CB4899@CO6PR18MB3860.namprd18.prod.outlook.com/

Above problem is some device has very large BAR, and we don't want DPDK to map
the whole BAR.

For Intel devices, one benefit is that we want our kernel module to control over
access to specific BAR region so we will let DPDK process unable to mmap that region.
(Because after mmap, kernel will not know if userspace is accessing device BAR).

So that's why I summarize as 'Sparse mmap is useful when kernel does not want
userspace to mmap whole BAR region, or kernel wants to control over access to
specific BAR region'. It could be more usage for other vendors that I have not realized

Thanks,
Chenbo

> 
> 
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> 
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
> 
> So, do we still need this patch?
> 
> 
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> 
> --
> David Marchand
  
Chenbo Xia April 18, 2023, 9:33 a.m. UTC | #3
David,

Sorry that I missed one comment...

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> Hello Chenbo,
> 
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> 
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?
> 
> 
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> 
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
> 
> So, do we still need this patch?

I think it could be good to reduce unnecessary driver APIs..
Hiding these region information could be friendly to driver developer?

Thanks,
Chenbo

> 
> 
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> 
> --
> David Marchand
  
Chenbo Xia May 8, 2023, 2:13 a.m. UTC | #4
Gentle ping for some comments..

After rethink, personally I may choose option 2 as it will have no driver API change for
PMDs and as I look at current code, that's how we do when MSI-X table can't be mmap-ed.

Thanks,
Chenbo

> -----Original Message-----
> From: Chenbo Xia <chenbo.xia@intel.com>
> Sent: Tuesday, April 18, 2023 1:30 PM
> To: dev@dpdk.org
> Cc: skori@marvell.com
> Subject: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.
> 
> In this patchset:
> 
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able.
> 
> Patch 4 adds the VFIO sparse mmap support finally. A question
> is for all sparse mmap regions, should they be mapped to a
> continuous virtual address region that follows device-specific
> BAR layout or not. In theory, there could be three options to
> support this feature.
> 
> Option 1: Map sparse mmap regions independently
> ======================================================
> In this approach, we mmap each sparse mmap region one by one
> and each region could be located anywhere in process address
> space. But accessing the mmaped BAR will not be as easy as
> 'bar_base_address + bar_offset', driver needs to check the
> sparse mmap information to access specific BAR register.
> 
> Patch 4 in this patchset adopts this option. Driver API change
> is introduced in bus_pci_driver.h. Corresponding changes in
> all drivers are also done and currently I am assuming drivers
> do not support this feature so they will not check the
> 'is_sparse' flag but assumes it to be false. Note that it will
> not break any driver and each vendor can add related logic when
> they start to support this feature. This is only because I don't
> want to introduce complexity to drivers that do not want to
> support this feature.
> 
> Option 2: Map sparse mmap regions based on device-specific BAR layout
> ======================================================================
> In this approach, the sparse mmap regions are mapped to continuous
> virtual address region that follows device-specific BAR layout.
> For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> mmaped. Region #1 will be mapped at 'base_addr' and region #2
> will be mapped at 'base_addr + 0x3000'. The good thing is if
> we implement like this, driver can still access all BAR registers
> using 'bar_base_address + bar_offset' way and we don't need
> to introduce any driver API change. But the address space
> range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> be reserved so it could result in waste of address space or memory
> (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> range). Meanwhile, driver needs to know which part of BAR is
> mmaped (this is possible since the range is defined by vendor's
> specific kernel module).
> 
> Option 3: Support both option 1 & 2
> ===================================
> We could define a driver flag to let driver choose which way it
> perfers since either option has its own Pros & Cons.
> 
> Please share your comments, Thanks!
> 
> 
> Chenbo Xia (4):
>   bus/pci: introduce an internal representation of PCI device
>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
>   bus/pci: add VFIO sparse mmap support
> 
>  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
>  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
>  drivers/bus/pci/bsd/pci.c                     |  43 +-
>  drivers/bus/pci/bus_pci_driver.h              |  24 +-
>  drivers/bus/pci/linux/pci.c                   |  91 +++-
>  drivers/bus/pci/linux/pci_init.h              |  14 +-
>  drivers/bus/pci/linux/pci_uio.c               |  34 +-
>  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
>  drivers/bus/pci/pci_common.c                  |  57 ++-
>  drivers/bus/pci/pci_common_uio.c              |  12 +-
>  drivers/bus/pci/private.h                     |  25 +-
>  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
>  drivers/bus/pci/version.map                   |   3 +
>  drivers/common/cnxk/roc_dev.c                 |   4 +-
>  drivers/common/cnxk/roc_dpi.c                 |   2 +-
>  drivers/common/cnxk/roc_ml.c                  |  22 +-
>  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
>  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
>  drivers/compress/octeontx/otx_zip.c           |   4 +-
>  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
>  drivers/crypto/virtio/virtio_pci.c            |   6 +-
>  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
>  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
>  drivers/dma/idxd/idxd_pci.c                   |   4 +-
>  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
>  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
>  drivers/event/octeontx/ssovf_probe.c          |  38 +-
>  drivers/event/octeontx/timvf_probe.c          |  18 +-
>  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
>  drivers/net/ark/ark_ethdev.c                  |   4 +-
>  drivers/net/atlantic/atl_ethdev.c             |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |  20 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
>  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
>  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
>  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
>  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
>  drivers/net/e1000/em_ethdev.c                 |   4 +-
>  drivers/net/e1000/igb_ethdev.c                |   4 +-
>  drivers/net/ena/ena_ethdev.c                  |   4 +-
>  drivers/net/enetc/enetc_ethdev.c              |   2 +-
>  drivers/net/enic/enic_main.c                  |   4 +-
>  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
>  drivers/net/gve/gve_ethdev.c                  |   4 +-
>  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
>  drivers/net/hns3/hns3_ethdev.c                |   2 +-
>  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
>  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                |   2 +-
>  drivers/net/iavf/iavf_ethdev.c                |   2 +-
>  drivers/net/ice/ice_dcf.c                     |   2 +-
>  drivers/net/ice/ice_ethdev.c                  |   2 +-
>  drivers/net/idpf/idpf_ethdev.c                |   4 +-
>  drivers/net/igc/igc_ethdev.c                  |   2 +-
>  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
>  drivers/net/liquidio/lio_ethdev.c             |   4 +-
>  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
>  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
>  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
>  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
>  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
>  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
>  drivers/net/qede/qede_main.c                  |   6 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
>  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
>  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
>  drivers/net/virtio/virtio_pci.c               |   6 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
>  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
>  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
>  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
>  lib/eal/include/rte_vfio.h                    |   1 -
>  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> --
> 2.17.1
  
Sunil Kumar Kori May 8, 2023, 3:04 a.m. UTC | #5
+1 for option 2.

Thanks & Regards
Sunil Kumar Kori

> -----Original Message-----
> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Monday, May 8, 2023 7:43 AM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori <skori@marvell.com>; techboard@dpdk.org;
> thomas@monjalon.net; Richardson, Bruce <bruce.richardson@intel.com>;
> ferruh.yigit@amd.com; david.marchand@redhat.com; Cao, Yahui
> <yahui.cao@intel.com>; Li, Miao <miao.li@intel.com>
> Subject: [EXT] RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> External Email
> 
> ----------------------------------------------------------------------
> Gentle ping for some comments..
> 
> After rethink, personally I may choose option 2 as it will have no driver API
> change for PMDs and as I look at current code, that's how we do when MSI-X
> table can't be mmap-ed.
> 
> Thanks,
> Chenbo
> 
> > -----Original Message-----
> > From: Chenbo Xia <chenbo.xia@intel.com>
> > Sent: Tuesday, April 18, 2023 1:30 PM
> > To: dev@dpdk.org
> > Cc: skori@marvell.com
> > Subject: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> >
> > This series introduces a VFIO standard capability, called sparse mmap
> > to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap
> > whole BAR region into DPDK process, only mmap part of the BAR region
> > after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process can use
> > pread/pwrite system calls to access. Sparse mmap is useful when kernel
> > does not want userspace to mmap whole BAR region, or kernel wants to
> > control over access to specific BAR region. Vendors can choose to
> > enable this feature or not for their devices in their specific kernel
> > modules.
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that driver
> > could use them to access specific BAR using pread/pwrite system calls
> > when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question is for
> > all sparse mmap regions, should they be mapped to a continuous virtual
> > address region that follows device-specific BAR layout or not. In
> > theory, there could be three options to support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one and each
> > region could be located anywhere in process address space. But
> > accessing the mmaped BAR will not be as easy as 'bar_base_address +
> > bar_offset', driver needs to check the sparse mmap information to
> > access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change is
> > introduced in bus_pci_driver.h. Corresponding changes in all drivers
> > are also done and currently I am assuming drivers do not support this
> > feature so they will not check the 'is_sparse' flag but assumes it to
> > be false. Note that it will not break any driver and each vendor can
> > add related logic when they start to support this feature. This is
> > only because I don't want to introduce complexity to drivers that do
> > not want to support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> >
> ================================================================
> ======
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be mmaped.
> > Region #1 will be mapped at 'base_addr' and region #2 will be mapped
> > at 'base_addr + 0x3000'. The good thing is if we implement like this,
> > driver can still access all BAR registers using 'bar_base_address +
> > bar_offset' way and we don't need to introduce any driver API change.
> > But the address space range 'base_addr + 0x1000' to 'base_addr +
> > 0x3000' may need to be reserved so it could result in waste of address
> > space or memory (when we use MAP_ANONYMOUS and MAP_PRIVATE flag
> to
> > reserve this range). Meanwhile, driver needs to know which part of BAR
> > is mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> >
> > --
> > 2.17.1