[v2,3/5] bus/pci: use IOVAs check when setting IOVA mode

Message ID 1535719857-19092-4-git-send-email-alejandro.lucero@netronome.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series use IOVAs check based on DMA mask |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Alejandro Lucero Aug. 31, 2018, 12:50 p.m. UTC
  Although VT-d emulation currently only supports 39 bits, it could
be iovas being within that supported range. This patch allows
IOVA mode in such a case.

Indeed, memory initialization code can be modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most system. And this is likely 100% true for VMs.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)
  

Comments

Burakov, Anatoly Oct. 3, 2018, 12:55 p.m. UTC | #1
On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> Although VT-d emulation currently only supports 39 bits, it could
> be iovas being within that supported range. This patch allows
> IOVA mode in such a case.
> 
> Indeed, memory initialization code can be modified for using lower
> virtual addresses than those used by the kernel for 64 bits processes
> by default, and therefore memsegs iovas can use 39 bits or less for
> most system. And this is likely 100% true for VMs.
> 
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> ---
>   drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index 04648ac..215dc10 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -588,10 +588,11 @@
>   	fclose(fp);
>   
>   	mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
> -	if (mgaw < X86_VA_WIDTH)
> -		return false;
>   
> -	return true;
> +	if (!rte_eal_check_dma_mask(mgaw))
> +		return true;
> +	else
> +		return false;

return rte_eal_check_dma_mask(mgaw) == 0; ?

>   }
>   #elif defined(RTE_ARCH_PPC_64)
>   static bool
> @@ -615,13 +616,17 @@
>   {
>   	struct rte_pci_device *dev = NULL;
>   	struct rte_pci_driver *drv = NULL;
> +	int iommu_dma_mask_check_done = 0;
>   
>   	FOREACH_DRIVER_ON_PCIBUS(drv) {
>   		FOREACH_DEVICE_ON_PCIBUS(dev) {
>   			if (!rte_pci_match(drv, dev))
>   				continue;
> -			if (!pci_one_device_iommu_support_va(dev))
> -				return false;
> +			if (!iommu_dma_mask_check_done) {
> +				if (!pci_one_device_iommu_support_va(dev))
> +					return false;
> +				iommu_dma_mask_check_done  = 1;
> +			}
>   		}

The commit message doesn't explain why are we only checking a single 
device. Indeed, i am not 100% clear as to why, so some explanation in 
the commit message and preferably a comment in code would be more than 
welcome :)

>   	}
>   	return true;
>
  
Alejandro Lucero Oct. 4, 2018, 1:35 p.m. UTC | #2
On Wed, Oct 3, 2018 at 1:56 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> > Although VT-d emulation currently only supports 39 bits, it could
> > be iovas being within that supported range. This patch allows
> > IOVA mode in such a case.
> >
> > Indeed, memory initialization code can be modified for using lower
> > virtual addresses than those used by the kernel for 64 bits processes
> > by default, and therefore memsegs iovas can use 39 bits or less for
> > most system. And this is likely 100% true for VMs.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > ---
> >   drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
> >   1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> > index 04648ac..215dc10 100644
> > --- a/drivers/bus/pci/linux/pci.c
> > +++ b/drivers/bus/pci/linux/pci.c
> > @@ -588,10 +588,11 @@
> >       fclose(fp);
> >
> >       mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) +
> 1;
> > -     if (mgaw < X86_VA_WIDTH)
> > -             return false;
> >
> > -     return true;
> > +     if (!rte_eal_check_dma_mask(mgaw))
> > +             return true;
> > +     else
> > +             return false;
>
> return rte_eal_check_dma_mask(mgaw) == 0; ?
>

I guess that works and is more elegant.
Thanks.


>
> >   }
> >   #elif defined(RTE_ARCH_PPC_64)
> >   static bool
> > @@ -615,13 +616,17 @@
> >   {
> >       struct rte_pci_device *dev = NULL;
> >       struct rte_pci_driver *drv = NULL;
> > +     int iommu_dma_mask_check_done = 0;
> >
> >       FOREACH_DRIVER_ON_PCIBUS(drv) {
> >               FOREACH_DEVICE_ON_PCIBUS(dev) {
> >                       if (!rte_pci_match(drv, dev))
> >                               continue;
> > -                     if (!pci_one_device_iommu_support_va(dev))
> > -                             return false;
> > +                     if (!iommu_dma_mask_check_done) {
> > +                             if (!pci_one_device_iommu_support_va(dev))
> > +                                     return false;
> > +                             iommu_dma_mask_check_done  = 1;
> > +                     }
> >               }
>
> The commit message doesn't explain why are we only checking a single
> device. Indeed, i am not 100% clear as to why, so some explanation in
> the commit message and preferably a comment in code would be more than
> welcome :)
>
>
Because the pci_one_device_iommu_support_va function does always the same
whatever the device is used in the call.
The code uses the device for looking at /sys/bus/pci/devices/   but then it
uses a link to iommu which will be the same for all
the devices. Note that some can refer to dmar0 and others to dmar1, but the
IOMMU capabilities are the same.

The limitation here is not a PCI device but the IOMMU hardware itself. The
first call to pci_one_device_iommu_support_va will check
if all the hugepages addresses are within the supported DMA range by the
IOMMU hw. If it fails, that is.

Now that I'm explaining this, I notice it is the same for any case. If the
check is good, no more checks are needed. This assumes there is just one
IOMMU hardware or if more than one (I have NUMA systems with one IOMMU unit
per socket) they are all the same hardware version. Adding Maxime in the
thread for confirming this and asking him about my previous statement.


> >       }
> >       return true;
> >
>
>
> --
> Thanks,
> Anatoly
>
  
Burakov, Anatoly Oct. 4, 2018, 3:49 p.m. UTC | #3
On 04-Oct-18 2:35 PM, Alejandro Lucero wrote:
> 
> 
> On Wed, Oct 3, 2018 at 1:56 PM Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
>      > Although VT-d emulation currently only supports 39 bits, it could
>      > be iovas being within that supported range. This patch allows
>      > IOVA mode in such a case.
>      >
>      > Indeed, memory initialization code can be modified for using lower
>      > virtual addresses than those used by the kernel for 64 bits processes
>      > by default, and therefore memsegs iovas can use 39 bits or less for
>      > most system. And this is likely 100% true for VMs.
>      >
>      > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com
>     <mailto:alejandro.lucero@netronome.com>>
>      > ---
>      >   drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
>      >   1 file changed, 10 insertions(+), 5 deletions(-)
>      >
>      > diff --git a/drivers/bus/pci/linux/pci.c
>     b/drivers/bus/pci/linux/pci.c
>      > index 04648ac..215dc10 100644
>      > --- a/drivers/bus/pci/linux/pci.c
>      > +++ b/drivers/bus/pci/linux/pci.c
>      > @@ -588,10 +588,11 @@
>      >       fclose(fp);
>      >
>      >       mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >>
>     VTD_CAP_MGAW_SHIFT) + 1;
>      > -     if (mgaw < X86_VA_WIDTH)
>      > -             return false;
>      >
>      > -     return true;
>      > +     if (!rte_eal_check_dma_mask(mgaw))
>      > +             return true;
>      > +     else
>      > +             return false;
> 
>     return rte_eal_check_dma_mask(mgaw) == 0; ?
> 
> 
> I guess that works and is more elegant.
> Thanks.
> 
> 
>      >   }
>      >   #elif defined(RTE_ARCH_PPC_64)
>      >   static bool
>      > @@ -615,13 +616,17 @@
>      >   {
>      >       struct rte_pci_device *dev = NULL;
>      >       struct rte_pci_driver *drv = NULL;
>      > +     int iommu_dma_mask_check_done = 0;
>      >
>      >       FOREACH_DRIVER_ON_PCIBUS(drv) {
>      >               FOREACH_DEVICE_ON_PCIBUS(dev) {
>      >                       if (!rte_pci_match(drv, dev))
>      >                               continue;
>      > -                     if (!pci_one_device_iommu_support_va(dev))
>      > -                             return false;
>      > +                     if (!iommu_dma_mask_check_done) {
>      > +                             if
>     (!pci_one_device_iommu_support_va(dev))
>      > +                                     return false;
>      > +                             iommu_dma_mask_check_done  = 1;
>      > +                     }
>      >               }
> 
>     The commit message doesn't explain why are we only checking a single
>     device. Indeed, i am not 100% clear as to why, so some explanation in
>     the commit message and preferably a comment in code would be more than
>     welcome :)
> 
> 
> Because the pci_one_device_iommu_support_va function does always the 
> same whatever the device is used in the call.

So, this code was always wrong and needlessly checked each device when 
it could've checked it a single time? OK, that makes it a bit clearer. 
Still, needs to be documented in comments/commit message :) The commit 
message IMO looks quite irrelevant to what happens in the commit. It 
almost feels like this commit should be split in two - first change the 
mgaw check, and then fix the PCI bus code to not check needlessly.
  
Alejandro Lucero Oct. 4, 2018, 5:59 p.m. UTC | #4
On Thu, Oct 4, 2018 at 4:49 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 04-Oct-18 2:35 PM, Alejandro Lucero wrote:
> >
> >
> > On Wed, Oct 3, 2018 at 1:56 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> >     On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> >      > Although VT-d emulation currently only supports 39 bits, it could
> >      > be iovas being within that supported range. This patch allows
> >      > IOVA mode in such a case.
> >      >
> >      > Indeed, memory initialization code can be modified for using lower
> >      > virtual addresses than those used by the kernel for 64 bits
> processes
> >      > by default, and therefore memsegs iovas can use 39 bits or less
> for
> >      > most system. And this is likely 100% true for VMs.
> >      >
> >      > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com
> >     <mailto:alejandro.lucero@netronome.com>>
> >      > ---
> >      >   drivers/bus/pci/linux/pci.c | 15 ++++++++++-----
> >      >   1 file changed, 10 insertions(+), 5 deletions(-)
> >      >
> >      > diff --git a/drivers/bus/pci/linux/pci.c
> >     b/drivers/bus/pci/linux/pci.c
> >      > index 04648ac..215dc10 100644
> >      > --- a/drivers/bus/pci/linux/pci.c
> >      > +++ b/drivers/bus/pci/linux/pci.c
> >      > @@ -588,10 +588,11 @@
> >      >       fclose(fp);
> >      >
> >      >       mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >>
> >     VTD_CAP_MGAW_SHIFT) + 1;
> >      > -     if (mgaw < X86_VA_WIDTH)
> >      > -             return false;
> >      >
> >      > -     return true;
> >      > +     if (!rte_eal_check_dma_mask(mgaw))
> >      > +             return true;
> >      > +     else
> >      > +             return false;
> >
> >     return rte_eal_check_dma_mask(mgaw) == 0; ?
> >
> >
> > I guess that works and is more elegant.
> > Thanks.
> >
> >
> >      >   }
> >      >   #elif defined(RTE_ARCH_PPC_64)
> >      >   static bool
> >      > @@ -615,13 +616,17 @@
> >      >   {
> >      >       struct rte_pci_device *dev = NULL;
> >      >       struct rte_pci_driver *drv = NULL;
> >      > +     int iommu_dma_mask_check_done = 0;
> >      >
> >      >       FOREACH_DRIVER_ON_PCIBUS(drv) {
> >      >               FOREACH_DEVICE_ON_PCIBUS(dev) {
> >      >                       if (!rte_pci_match(drv, dev))
> >      >                               continue;
> >      > -                     if (!pci_one_device_iommu_support_va(dev))
> >      > -                             return false;
> >      > +                     if (!iommu_dma_mask_check_done) {
> >      > +                             if
> >     (!pci_one_device_iommu_support_va(dev))
> >      > +                                     return false;
> >      > +                             iommu_dma_mask_check_done  = 1;
> >      > +                     }
> >      >               }
> >
> >     The commit message doesn't explain why are we only checking a single
> >     device. Indeed, i am not 100% clear as to why, so some explanation in
> >     the commit message and preferably a comment in code would be more
> than
> >     welcome :)
> >
> >
> > Because the pci_one_device_iommu_support_va function does always the
> > same whatever the device is used in the call.
>
> So, this code was always wrong and needlessly checked each device when
> it could've checked it a single time? OK, that makes it a bit clearer.
> Still, needs to be documented in comments/commit message :) The commit
> message IMO looks quite irrelevant to what happens in the commit. It
> almost feels like this commit should be split in two - first change the
> mgaw check, and then fix the PCI bus code to not check needlessly.
>
>
Ok. Maybe that's better. I will do that in next version.
Thanks


> --
> Thanks,
> Anatoly
>
  

Patch

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac..215dc10 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -588,10 +588,11 @@ 
 	fclose(fp);
 
 	mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
-	if (mgaw < X86_VA_WIDTH)
-		return false;
 
-	return true;
+	if (!rte_eal_check_dma_mask(mgaw))
+		return true;
+	else
+		return false;
 }
 #elif defined(RTE_ARCH_PPC_64)
 static bool
@@ -615,13 +616,17 @@ 
 {
 	struct rte_pci_device *dev = NULL;
 	struct rte_pci_driver *drv = NULL;
+	int iommu_dma_mask_check_done = 0;
 
 	FOREACH_DRIVER_ON_PCIBUS(drv) {
 		FOREACH_DEVICE_ON_PCIBUS(dev) {
 			if (!rte_pci_match(drv, dev))
 				continue;
-			if (!pci_one_device_iommu_support_va(dev))
-				return false;
+			if (!iommu_dma_mask_check_done) {
+				if (!pci_one_device_iommu_support_va(dev))
+					return false;
+				iommu_dma_mask_check_done  = 1;
+			}
 		}
 	}
 	return true;