[v2,3/5] bus/pci: use IOVAs check when setting IOVA mode
Checks
Commit Message
Although VT-d emulation currently only supports 39 bits, it could
be iovas being within that supported range. This patch allows
IOVA mode in such a case.
Indeed, memory initialization code can be modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most system. And this is likely 100% true for VMs.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
Comments
On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> Although VT-d emulation currently only supports 39 bits, it could
> be iovas being within that supported range. This patch allows
> IOVA mode in such a case.
>
> Indeed, memory initialization code can be modified for using lower
> virtual addresses than those used by the kernel for 64 bits processes
> by default, and therefore memsegs iovas can use 39 bits or less for
> most system. And this is likely 100% true for VMs.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> ---
> drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index 04648ac..215dc10 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -588,10 +588,11 @@
> fclose(fp);
>
> mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
> - if (mgaw < X86_VA_WIDTH)
> - return false;
>
> - return true;
> + if (!rte_eal_check_dma_mask(mgaw))
> + return true;
> + else
> + return false;
return rte_eal_check_dma_mask(mgaw) == 0; ?
> }
> #elif defined(RTE_ARCH_PPC_64)
> static bool
> @@ -615,13 +616,17 @@
> {
> struct rte_pci_device *dev = NULL;
> struct rte_pci_driver *drv = NULL;
> + int iommu_dma_mask_check_done = 0;
>
> FOREACH_DRIVER_ON_PCIBUS(drv) {
> FOREACH_DEVICE_ON_PCIBUS(dev) {
> if (!rte_pci_match(drv, dev))
> continue;
> - if (!pci_one_device_iommu_support_va(dev))
> - return false;
> + if (!iommu_dma_mask_check_done) {
> + if (!pci_one_device_iommu_support_va(dev))
> + return false;
> + iommu_dma_mask_check_done = 1;
> + }
> }
The commit message doesn't explain why are we only checking a single
device. Indeed, i am not 100% clear as to why, so some explanation in
the commit message and preferably a comment in code would be more than
welcome :)
> }
> return true;
>
On Wed, Oct 3, 2018 at 1:56 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> > Although VT-d emulation currently only supports 39 bits, it could
> > be iovas being within that supported range. This patch allows
> > IOVA mode in such a case.
> >
> > Indeed, memory initialization code can be modified for using lower
> > virtual addresses than those used by the kernel for 64 bits processes
> > by default, and therefore memsegs iovas can use 39 bits or less for
> > most system. And this is likely 100% true for VMs.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > ---
> > drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> > index 04648ac..215dc10 100644
> > --- a/drivers/bus/pci/linux/pci.c
> > +++ b/drivers/bus/pci/linux/pci.c
> > @@ -588,10 +588,11 @@
> > fclose(fp);
> >
> > mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) +
> 1;
> > - if (mgaw < X86_VA_WIDTH)
> > - return false;
> >
> > - return true;
> > + if (!rte_eal_check_dma_mask(mgaw))
> > + return true;
> > + else
> > + return false;
>
> return rte_eal_check_dma_mask(mgaw) == 0; ?
>
I guess that works and is more elegant.
Thanks.
>
> > }
> > #elif defined(RTE_ARCH_PPC_64)
> > static bool
> > @@ -615,13 +616,17 @@
> > {
> > struct rte_pci_device *dev = NULL;
> > struct rte_pci_driver *drv = NULL;
> > + int iommu_dma_mask_check_done = 0;
> >
> > FOREACH_DRIVER_ON_PCIBUS(drv) {
> > FOREACH_DEVICE_ON_PCIBUS(dev) {
> > if (!rte_pci_match(drv, dev))
> > continue;
> > - if (!pci_one_device_iommu_support_va(dev))
> > - return false;
> > + if (!iommu_dma_mask_check_done) {
> > + if (!pci_one_device_iommu_support_va(dev))
> > + return false;
> > + iommu_dma_mask_check_done = 1;
> > + }
> > }
>
> The commit message doesn't explain why are we only checking a single
> device. Indeed, i am not 100% clear as to why, so some explanation in
> the commit message and preferably a comment in code would be more than
> welcome :)
>
>
Because the pci_one_device_iommu_support_va function does always the same
whatever the device is used in the call.
The code uses the device for looking at /sys/bus/pci/devices/ but then it
uses a link to iommu which will be the same for all
the devices. Note that some can refer to dmar0 and others to dmar1, but the
IOMMU capabilities are the same.
The limitation here is not a PCI device but the IOMMU hardware itself. The
first call to pci_one_device_iommu_support_va will check
if all the hugepages addresses are within the supported DMA range by the
IOMMU hw. If it fails, that is.
Now that I'm explaining this, I notice it is the same for any case. If the
check is good, no more checks are needed. This assumes there is just one
IOMMU hardware or if more than one (I have NUMA systems with one IOMMU unit
per socket) they are all the same hardware version. Adding Maxime in the
thread for confirming this and asking him about my previous statement.
> > }
> > return true;
> >
>
>
> --
> Thanks,
> Anatoly
>
On 04-Oct-18 2:35 PM, Alejandro Lucero wrote:
>
>
> On Wed, Oct 3, 2018 at 1:56 PM Burakov, Anatoly
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>
> On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> > Although VT-d emulation currently only supports 39 bits, it could
> > be iovas being within that supported range. This patch allows
> > IOVA mode in such a case.
> >
> > Indeed, memory initialization code can be modified for using lower
> > virtual addresses than those used by the kernel for 64 bits processes
> > by default, and therefore memsegs iovas can use 39 bits or less for
> > most system. And this is likely 100% true for VMs.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com
> <mailto:alejandro.lucero@netronome.com>>
> > ---
> > drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/bus/pci/linux/pci.c
> b/drivers/bus/pci/linux/pci.c
> > index 04648ac..215dc10 100644
> > --- a/drivers/bus/pci/linux/pci.c
> > +++ b/drivers/bus/pci/linux/pci.c
> > @@ -588,10 +588,11 @@
> > fclose(fp);
> >
> > mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >>
> VTD_CAP_MGAW_SHIFT) + 1;
> > - if (mgaw < X86_VA_WIDTH)
> > - return false;
> >
> > - return true;
> > + if (!rte_eal_check_dma_mask(mgaw))
> > + return true;
> > + else
> > + return false;
>
> return rte_eal_check_dma_mask(mgaw) == 0; ?
>
>
> I guess that works and is more elegant.
> Thanks.
>
>
> > }
> > #elif defined(RTE_ARCH_PPC_64)
> > static bool
> > @@ -615,13 +616,17 @@
> > {
> > struct rte_pci_device *dev = NULL;
> > struct rte_pci_driver *drv = NULL;
> > + int iommu_dma_mask_check_done = 0;
> >
> > FOREACH_DRIVER_ON_PCIBUS(drv) {
> > FOREACH_DEVICE_ON_PCIBUS(dev) {
> > if (!rte_pci_match(drv, dev))
> > continue;
> > - if (!pci_one_device_iommu_support_va(dev))
> > - return false;
> > + if (!iommu_dma_mask_check_done) {
> > + if
> (!pci_one_device_iommu_support_va(dev))
> > + return false;
> > + iommu_dma_mask_check_done = 1;
> > + }
> > }
>
> The commit message doesn't explain why are we only checking a single
> device. Indeed, i am not 100% clear as to why, so some explanation in
> the commit message and preferably a comment in code would be more than
> welcome :)
>
>
> Because the pci_one_device_iommu_support_va function does always the
> same whatever the device is used in the call.
So, this code was always wrong and needlessly checked each device when
it could've checked it a single time? OK, that makes it a bit clearer.
Still, needs to be documented in comments/commit message :) The commit
message IMO looks quite irrelevant to what happens in the commit. It
almost feels like this commit should be split in two - first change the
mgaw check, and then fix the PCI bus code to not check needlessly.
On Thu, Oct 4, 2018 at 4:49 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 04-Oct-18 2:35 PM, Alejandro Lucero wrote:
> >
> >
> > On Wed, Oct 3, 2018 at 1:56 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> > On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> > > Although VT-d emulation currently only supports 39 bits, it could
> > > be iovas being within that supported range. This patch allows
> > > IOVA mode in such a case.
> > >
> > > Indeed, memory initialization code can be modified for using lower
> > > virtual addresses than those used by the kernel for 64 bits
> processes
> > > by default, and therefore memsegs iovas can use 39 bits or less
> for
> > > most system. And this is likely 100% true for VMs.
> > >
> > > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com
> > <mailto:alejandro.lucero@netronome.com>>
> > > ---
> > > drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
> > > 1 file changed, 10 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/bus/pci/linux/pci.c
> > b/drivers/bus/pci/linux/pci.c
> > > index 04648ac..215dc10 100644
> > > --- a/drivers/bus/pci/linux/pci.c
> > > +++ b/drivers/bus/pci/linux/pci.c
> > > @@ -588,10 +588,11 @@
> > > fclose(fp);
> > >
> > > mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >>
> > VTD_CAP_MGAW_SHIFT) + 1;
> > > - if (mgaw < X86_VA_WIDTH)
> > > - return false;
> > >
> > > - return true;
> > > + if (!rte_eal_check_dma_mask(mgaw))
> > > + return true;
> > > + else
> > > + return false;
> >
> > return rte_eal_check_dma_mask(mgaw) == 0; ?
> >
> >
> > I guess that works and is more elegant.
> > Thanks.
> >
> >
> > > }
> > > #elif defined(RTE_ARCH_PPC_64)
> > > static bool
> > > @@ -615,13 +616,17 @@
> > > {
> > > struct rte_pci_device *dev = NULL;
> > > struct rte_pci_driver *drv = NULL;
> > > + int iommu_dma_mask_check_done = 0;
> > >
> > > FOREACH_DRIVER_ON_PCIBUS(drv) {
> > > FOREACH_DEVICE_ON_PCIBUS(dev) {
> > > if (!rte_pci_match(drv, dev))
> > > continue;
> > > - if (!pci_one_device_iommu_support_va(dev))
> > > - return false;
> > > + if (!iommu_dma_mask_check_done) {
> > > + if
> > (!pci_one_device_iommu_support_va(dev))
> > > + return false;
> > > + iommu_dma_mask_check_done = 1;
> > > + }
> > > }
> >
> > The commit message doesn't explain why are we only checking a single
> > device. Indeed, i am not 100% clear as to why, so some explanation in
> > the commit message and preferably a comment in code would be more
> than
> > welcome :)
> >
> >
> > Because the pci_one_device_iommu_support_va function does always the
> > same whatever the device is used in the call.
>
> So, this code was always wrong and needlessly checked each device when
> it could've checked it a single time? OK, that makes it a bit clearer.
> Still, needs to be documented in comments/commit message :) The commit
> message IMO looks quite irrelevant to what happens in the commit. It
> almost feels like this commit should be split in two - first change the
> mgaw check, and then fix the PCI bus code to not check needlessly.
>
>
Ok. Maybe that's better. I will do that in next version.
Thanks
> --
> Thanks,
> Anatoly
>
@@ -588,10 +588,11 @@
fclose(fp);
mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
- if (mgaw < X86_VA_WIDTH)
- return false;
- return true;
+ if (!rte_eal_check_dma_mask(mgaw))
+ return true;
+ else
+ return false;
}
#elif defined(RTE_ARCH_PPC_64)
static bool
@@ -615,13 +616,17 @@
{
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
+ int iommu_dma_mask_check_done = 0;
FOREACH_DRIVER_ON_PCIBUS(drv) {
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (!rte_pci_match(drv, dev))
continue;
- if (!pci_one_device_iommu_support_va(dev))
- return false;
+ if (!iommu_dma_mask_check_done) {
+ if (!pci_one_device_iommu_support_va(dev))
+ return false;
+ iommu_dma_mask_check_done = 1;
+ }
}
}
return true;