mbox series

[v4,0/4] Fixes on IOVA mode selection

Message ID 1563800213-29839-1-git-send-email-david.marchand@redhat.com (mailing list archive)
Headers
Series Fixes on IOVA mode selection |

Message

David Marchand July 22, 2019, 12:56 p.m. UTC
  Following the issues reported by Jerin and the discussion that emerged
from it, here are fixes to restore and document the behavior of the EAL
and the pci bus driver.

I pondered all the arguments and tried to have the less changes
possible.
I can't find a need for a flag to just announce support of physical
addresses from the pmd point of view.
So it ended up with something really close to what Jerin had suggested.

But the problem is that this is still unfinished wrt the documentation.
I will be offline for 10 days and we need this to move forward, so
sending
anyway.

Changelog since v3:
- fixed typos in patch 2,
- updated patch 3 title,
- moved and reworded comments in the note section in patch 4,

Changelog since v2 (Jerin):
- Patch 2/4 - Remove personal appeals in log messages(Anatoly)
- Patch 4/4 - Added documentation (Anatoly)

Changelog since v1 (Jerin):
- Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA
  (patch 3/4)
- Changed IOVA mode as VA for default case(patch 4/4) with documentation
- Tested the patch series on octeontx2 platform
  

Comments

Thomas Monjalon July 22, 2019, 3:53 p.m. UTC | #1
22/07/2019 14:56, David Marchand:
> Following the issues reported by Jerin and the discussion that emerged
> from it, here are fixes to restore and document the behavior of the EAL
> and the pci bus driver.
> 
> I pondered all the arguments and tried to have the less changes
> possible.
> I can't find a need for a flag to just announce support of physical
> addresses from the pmd point of view.
> So it ended up with something really close to what Jerin had suggested.
> 
> But the problem is that this is still unfinished wrt the documentation.
> I will be offline for 10 days and we need this to move forward, so
> sending
> anyway.
> 
> Changelog since v3:
> - fixed typos in patch 2,
> - updated patch 3 title,
> - moved and reworded comments in the note section in patch 4,
> 
> Changelog since v2 (Jerin):
> - Patch 2/4 - Remove personal appeals in log messages(Anatoly)
> - Patch 4/4 - Added documentation (Anatoly)
> 
> Changelog since v1 (Jerin):
> - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA
>   (patch 3/4)
> - Changed IOVA mode as VA for default case(patch 4/4) with documentation
> - Tested the patch series on octeontx2 platform

Applied, thanks Jerin, Anatoly and David for converging
on a documented solution together.
  
Stojaczyk, Dariusz July 23, 2019, 3:35 a.m. UTC | #2
This introduces a regression where uio-bound devies are attached
to a DPDK app at runtime.

When there are no devices attached at initialization, the only safe
default should be RTE_IOVA_PA. With RTE_IOVA_VA we just
won't be able to do any DMA to uio-bound PCI devices. 

Can we revert this patch?

D.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Monday, July 22, 2019 5:53 PM
> To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; jerinj@marvell.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> 
> 22/07/2019 14:56, David Marchand:
> > Following the issues reported by Jerin and the discussion that emerged
> > from it, here are fixes to restore and document the behavior of the EAL
> > and the pci bus driver.
> >
> > I pondered all the arguments and tried to have the less changes
> > possible.
> > I can't find a need for a flag to just announce support of physical
> > addresses from the pmd point of view.
> > So it ended up with something really close to what Jerin had suggested.
> >
> > But the problem is that this is still unfinished wrt the documentation.
> > I will be offline for 10 days and we need this to move forward, so
> > sending
> > anyway.
> >
> > Changelog since v3:
> > - fixed typos in patch 2,
> > - updated patch 3 title,
> > - moved and reworded comments in the note section in patch 4,
> >
> > Changelog since v2 (Jerin):
> > - Patch 2/4 - Remove personal appeals in log messages(Anatoly)
> > - Patch 4/4 - Added documentation (Anatoly)
> >
> > Changelog since v1 (Jerin):
> > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as
> RTE_PCI_DRV_NEED_IOVA_AS_VA
> >   (patch 3/4)
> > - Changed IOVA mode as VA for default case(patch 4/4) with
> documentation
> > - Tested the patch series on octeontx2 platform
> 
> Applied, thanks Jerin, Anatoly and David for converging
> on a documented solution together.
> 
>
  
Jerin Jacob Kollanukkaran July 23, 2019, 4:18 a.m. UTC | #3
> -----Original Message-----
> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> Sent: Tuesday, July 23, 2019 9:06 AM
> To: Thomas Monjalon <thomas@monjalon.net>; David Marchand
> <david.marchand@redhat.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>
> Cc: dev@dpdk.org
> Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> 
> This introduces a regression where uio-bound devies are attached to a DPDK
> app at runtime.

Just to understand the requirements;
# Is this requirement for SPDK?
# Is brand new PCI device scanned and attached to DPDK at runtime?
# Any specific reason for using uio vs vfio?

If it is for SPDK, 
# How about introducing rte_eal_init_with_mode(enum rte_iova_mode)?
# How about adding dummy bus which returns RTE_IOVA_PA in the bus_get_iommus_class() in SPDK code base?

> 
> When there are no devices attached at initialization, the only safe default
> should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be able to do
> any DMA to uio-bound PCI devices.
> 
> Can we revert this patch?
> 
> D.
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Monday, July 22, 2019 5:53 PM
> > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; jerinj@marvell.com
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> >
> > 22/07/2019 14:56, David Marchand:
> > > Following the issues reported by Jerin and the discussion that
> > > emerged from it, here are fixes to restore and document the behavior
> > > of the EAL and the pci bus driver.
> > >
> > > I pondered all the arguments and tried to have the less changes
> > > possible.
> > > I can't find a need for a flag to just announce support of physical
> > > addresses from the pmd point of view.
> > > So it ended up with something really close to what Jerin had suggested.
> > >
> > > But the problem is that this is still unfinished wrt the documentation.
> > > I will be offline for 10 days and we need this to move forward, so
> > > sending anyway.
> > >
> > > Changelog since v3:
> > > - fixed typos in patch 2,
> > > - updated patch 3 title,
> > > - moved and reworded comments in the note section in patch 4,
> > >
> > > Changelog since v2 (Jerin):
> > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly)
> > > - Patch 4/4 - Added documentation (Anatoly)
> > >
> > > Changelog since v1 (Jerin):
> > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as
> > RTE_PCI_DRV_NEED_IOVA_AS_VA
> > >   (patch 3/4)
> > > - Changed IOVA mode as VA for default case(patch 4/4) with
> > documentation
> > > - Tested the patch series on octeontx2 platform
> >
> > Applied, thanks Jerin, Anatoly and David for converging on a
> > documented solution together.
> >
> >
  
Stojaczyk, Dariusz July 23, 2019, 4:54 a.m. UTC | #4
> -----Original Message-----
> From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com]
> Sent: Tuesday, July 23, 2019 6:19 AM
> 
> > -----Original Message-----
> > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> > Sent: Tuesday, July 23, 2019 9:06 AM
> > To: Thomas Monjalon <thomas@monjalon.net>; David Marchand
> > <david.marchand@redhat.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran
> > <jerinj@marvell.com>
> > Cc: dev@dpdk.org
> > Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> >
> > This introduces a regression where uio-bound devies are attached to a
> DPDK
> > app at runtime.
> 
> Just to understand the requirements;
> # Is this requirement for SPDK?
> # Is brand new PCI device scanned and attached to DPDK at runtime?
> # Any specific reason for using uio vs vfio?

Jerin,

It came up in SPDK tests, but it's certainly nothing SPDK-specific, I can't
give you the steps but it should be reproducible even with testpmd.

The PCI device could have been simply hotplugged to the system after
DPDK app start. DPDK didn't know about it at initialization, so it picked
RTE_IOVA_VA and then would fail to attach any UIO-bound device
ever after:

EAL:   Expecting 'PA' IOVA mode but current mode is 'VA', not initializing
EAL: Driver cannot attach the device (0000:00:09.0)
EAL: Failed to attach device on primary process

UIO is commonly used on systems without IOMMU- including VMs.

> 
> If it is for SPDK,
> # How about introducing rte_eal_init_with_mode(enum rte_iova_mode)?
> # How about adding dummy bus which returns RTE_IOVA_PA in the
> bus_get_iommus_class() in SPDK code base?

There's already an --iova=mode option in DPDK that forces the iova mode.
I'm not concerned about configurability, but the regression in the
default behavior.

I can add workarounds to SPDK, sure, but that wouldn't be a very healthy
approach.

D.

> 
> >
> > When there are no devices attached at initialization, the only safe default
> > should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be able to do
> > any DMA to uio-bound PCI devices.
> >
> > Can we revert this patch?
> >
> > D.
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> Monjalon
> > > Sent: Monday, July 22, 2019 5:53 PM
> > > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly
> > > <anatoly.burakov@intel.com>; jerinj@marvell.com
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> > >
> > > 22/07/2019 14:56, David Marchand:
> > > > Following the issues reported by Jerin and the discussion that
> > > > emerged from it, here are fixes to restore and document the behavior
> > > > of the EAL and the pci bus driver.
> > > >
> > > > I pondered all the arguments and tried to have the less changes
> > > > possible.
> > > > I can't find a need for a flag to just announce support of physical
> > > > addresses from the pmd point of view.
> > > > So it ended up with something really close to what Jerin had suggested.
> > > >
> > > > But the problem is that this is still unfinished wrt the documentation.
> > > > I will be offline for 10 days and we need this to move forward, so
> > > > sending anyway.
> > > >
> > > > Changelog since v3:
> > > > - fixed typos in patch 2,
> > > > - updated patch 3 title,
> > > > - moved and reworded comments in the note section in patch 4,
> > > >
> > > > Changelog since v2 (Jerin):
> > > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly)
> > > > - Patch 4/4 - Added documentation (Anatoly)
> > > >
> > > > Changelog since v1 (Jerin):
> > > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as
> > > RTE_PCI_DRV_NEED_IOVA_AS_VA
> > > >   (patch 3/4)
> > > > - Changed IOVA mode as VA for default case(patch 4/4) with
> > > documentation
> > > > - Tested the patch series on octeontx2 platform
> > >
> > > Applied, thanks Jerin, Anatoly and David for converging on a
> > > documented solution together.
> > >
> > >
  
Jerin Jacob Kollanukkaran July 23, 2019, 5:27 a.m. UTC | #5
> -----Original Message-----
> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> Sent: Tuesday, July 23, 2019 10:24 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Thomas Monjalon
> <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>
> Cc: dev@dpdk.org
> Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> 
> > -----Original Message-----
> > From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com]
> > Sent: Tuesday, July 23, 2019 6:19 AM
> >
> > > -----Original Message-----
> > > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> > > Sent: Tuesday, July 23, 2019 9:06 AM
> > > To: Thomas Monjalon <thomas@monjalon.net>; David Marchand
> > > <david.marchand@redhat.com>; Burakov, Anatoly
> > > <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran
> > > <jerinj@marvell.com>
> > > Cc: dev@dpdk.org
> > > Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode
> > > selection
> > >
> > > This introduces a regression where uio-bound devies are attached to
> > > a
> > DPDK
> > > app at runtime.
> >
> > Just to understand the requirements;
> > # Is this requirement for SPDK?
> > # Is brand new PCI device scanned and attached to DPDK at runtime?
> > # Any specific reason for using uio vs vfio?
> 
> Jerin,

Stojaczyk,

There reason to choose VA incase if bus detects DC is following:

- All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
      physical address availability.
- By default, the mempool, first asks for IOVA-contiguous memory using
      ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may
      affect the application boot time.
- It is easy to enable large amount of IOVA-contiguous memory use-cases
      with IOVA in VA mode.

> 
> It came up in SPDK tests, but it's certainly nothing SPDK-specific, I can't give
> you the steps but it should be reproducible even with testpmd.
> 
> The PCI device could have been simply hotplugged to the system after DPDK
> app start. DPDK didn't know about it at initialization, so it picked
> RTE_IOVA_VA and then would fail to attach any UIO-bound device ever
> after:
> 
> EAL:   Expecting 'PA' IOVA mode but current mode is 'VA', not initializing

We have RTE_PCI_DRV_NEED_IOVA_AS_VA devices in DPDK, Which can work
Only on VA. If we default 'PA' incase of DC, then what do with hotplugging on those devices?


> EAL: Driver cannot attach the device (0000:00:09.0)
> EAL: Failed to attach device on primary process
> 
> UIO is commonly used on systems without IOMMU- including VMs.

The latest machines has IOMMU. Which machines you are testing against,
Can we detect the machines without IOMMU and switch to PA?

> 
> >
> > If it is for SPDK,
> > # How about introducing rte_eal_init_with_mode(enum rte_iova_mode)?
> > # How about adding dummy bus which returns RTE_IOVA_PA in the
> > bus_get_iommus_class() in SPDK code base?
> 
> There's already an --iova=mode option in DPDK that forces the iova mode.
> I'm not concerned about configurability, but the regression in the default
> behavior.
> 
> I can add workarounds to SPDK, sure, but that wouldn't be a very healthy
> approach.

Nothing like workaround, I am looking for the options for expressing
The requirements for PA?


> 
> D.
> 
> >
> > >
> > > When there are no devices attached at initialization, the only safe
> > > default should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be
> > > able to do any DMA to uio-bound PCI devices.
> > >
> > > Can we revert this patch?
> > >
> > > D.
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > > > Sent: Monday, July 22, 2019 5:53 PM
> > > > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly
> > > > <anatoly.burakov@intel.com>; jerinj@marvell.com
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode
> > > > selection
> > > >
> > > > 22/07/2019 14:56, David Marchand:
> > > > > Following the issues reported by Jerin and the discussion that
> > > > > emerged from it, here are fixes to restore and document the
> > > > > behavior of the EAL and the pci bus driver.
> > > > >
> > > > > I pondered all the arguments and tried to have the less changes
> > > > > possible.
> > > > > I can't find a need for a flag to just announce support of
> > > > > physical addresses from the pmd point of view.
> > > > > So it ended up with something really close to what Jerin had
> suggested.
> > > > >
> > > > > But the problem is that this is still unfinished wrt the documentation.
> > > > > I will be offline for 10 days and we need this to move forward,
> > > > > so sending anyway.
> > > > >
> > > > > Changelog since v3:
> > > > > - fixed typos in patch 2,
> > > > > - updated patch 3 title,
> > > > > - moved and reworded comments in the note section in patch 4,
> > > > >
> > > > > Changelog since v2 (Jerin):
> > > > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly)
> > > > > - Patch 4/4 - Added documentation (Anatoly)
> > > > >
> > > > > Changelog since v1 (Jerin):
> > > > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as
> > > > RTE_PCI_DRV_NEED_IOVA_AS_VA
> > > > >   (patch 3/4)
> > > > > - Changed IOVA mode as VA for default case(patch 4/4) with
> > > > documentation
> > > > > - Tested the patch series on octeontx2 platform
> > > >
> > > > Applied, thanks Jerin, Anatoly and David for converging on a
> > > > documented solution together.
> > > >
> > > >
  
Thomas Monjalon July 23, 2019, 7:21 a.m. UTC | #6
23/07/2019 07:27, Jerin Jacob Kollanukkaran:
> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> > From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com]
> > > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> > > >
> > > > This introduces a regression where uio-bound devies are attached to
> > > > a DPDK app at runtime.

Yes it is a regression on purpose.
We can also name it a behaviour change (more below).

> > >
[...]
> There reason to choose VA incase if bus detects DC is following:
> 
> - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
>       physical address availability.
> - By default, the mempool, first asks for IOVA-contiguous memory using
>       ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may
>       affect the application boot time.
> - It is easy to enable large amount of IOVA-contiguous memory use-cases
>       with IOVA in VA mode.
> 
[...]
> > The PCI device could have been simply hotplugged to the system after DPDK
> > app start. DPDK didn't know about it at initialization, so it picked
> > RTE_IOVA_VA and then would fail to attach any UIO-bound device ever
> > after:
> > 
> > EAL:   Expecting 'PA' IOVA mode but current mode is 'VA', not initializing
> 
> We have RTE_PCI_DRV_NEED_IOVA_AS_VA devices in DPDK, Which can work
> Only on VA. If we default 'PA' incase of DC, then what do with hotplugging on those devices?
[...]
> > > > When there are no devices attached at initialization, the only safe
> > > > default should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be
> > > > able to do any DMA to uio-bound PCI devices.

As Jerin explained, there is no safe default.
There are two cases which cannot work together:
	1/ no IOMMU
	2/ driver supporting only IOMMU address (named IOVA_AS_VA)

In the past we were defaulting to physical addressing,
it was in favor of case 1.
Now we decided to switch to IOMMU address by default,
which is in favor of case 2.
As explained above by Jerin, this is considered as an improvement.
We should explain this change in the known issues of the release notes.

The only real fix would be to allow both addresses at the same time,
with separate memory allocators.
  
Burakov, Anatoly July 23, 2019, 9:57 a.m. UTC | #7
On 23-Jul-19 6:27 AM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
>> Sent: Tuesday, July 23, 2019 10:24 AM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Thomas Monjalon
>> <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>;
>> Burakov, Anatoly <anatoly.burakov@intel.com>
>> Cc: dev@dpdk.org
>> Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
>>
>>> -----Original Message-----
>>> From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com]
>>> Sent: Tuesday, July 23, 2019 6:19 AM
>>>
>>>> -----Original Message-----
>>>> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
>>>> Sent: Tuesday, July 23, 2019 9:06 AM
>>>> To: Thomas Monjalon <thomas@monjalon.net>; David Marchand
>>>> <david.marchand@redhat.com>; Burakov, Anatoly
>>>> <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran
>>>> <jerinj@marvell.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode
>>>> selection
>>>>
>>>> This introduces a regression where uio-bound devies are attached to
>>>> a
>>> DPDK
>>>> app at runtime.
>>>
>>> Just to understand the requirements;
>>> # Is this requirement for SPDK?
>>> # Is brand new PCI device scanned and attached to DPDK at runtime?
>>> # Any specific reason for using uio vs vfio?
>>
>> Jerin,
> 
> Stojaczyk,
> 
> There reason to choose VA incase if bus detects DC is following:
> 
> - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
>        physical address availability.
> - By default, the mempool, first asks for IOVA-contiguous memory using
>        ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may
>        affect the application boot time.
> - It is easy to enable large amount of IOVA-contiguous memory use-cases
>        with IOVA in VA mode.
> 
>>
>> It came up in SPDK tests, but it's certainly nothing SPDK-specific, I can't give
>> you the steps but it should be reproducible even with testpmd.
>>
>> The PCI device could have been simply hotplugged to the system after DPDK
>> app start. DPDK didn't know about it at initialization, so it picked
>> RTE_IOVA_VA and then would fail to attach any UIO-bound device ever
>> after:
>>
>> EAL:   Expecting 'PA' IOVA mode but current mode is 'VA', not initializing
> 
> We have RTE_PCI_DRV_NEED_IOVA_AS_VA devices in DPDK, Which can work
> Only on VA. If we default 'PA' incase of DC, then what do with hotplugging on those devices?
> 
> 
>> EAL: Driver cannot attach the device (0000:00:09.0)
>> EAL: Failed to attach device on primary process
>>
>> UIO is commonly used on systems without IOMMU- including VMs.
> 
> The latest machines has IOMMU. Which machines you are testing against,
> Can we detect the machines without IOMMU and switch to PA?

A machine without an IOMMU shouldn't have picked IOVA as VA in the first 
place. Perhaps this is something we could fix? I'm not sure how to 
detected that condition though, i don't think there's a mechanism to 
know that for sure. Some kernels create a "iommu" sysfs directories, but 
i'm not too sure if they're 1) there for older kernels we support, and 
2) always there.

On machines with IOMMU, VFIO should be the default, and we should 
discourage people from using igb_uio. Is there any reason why SPDK is 
not using VFIO by default?

On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is 
enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in 
both cases, but is empty when IOMMU is disabled). Perhaps we could go 
off that?
  
Thomas Monjalon July 23, 2019, 10:25 a.m. UTC | #8
23/07/2019 11:57, Burakov, Anatoly:
> A machine without an IOMMU shouldn't have picked IOVA as VA in the first 
> place. Perhaps this is something we could fix? I'm not sure how to 
> detected that condition though, i don't think there's a mechanism to 
> know that for sure. Some kernels create a "iommu" sysfs directories, but 
> i'm not too sure if they're 1) there for older kernels we support, and 
> 2) always there.
[..]
> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is 
> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in 
> both cases, but is empty when IOMMU is disabled). Perhaps we could go 
> off that?

Yes, good idea.
We need to check how these sysfs entries are managed,
and how old they are by looking at Linux code history.
  
Burakov, Anatoly July 23, 2019, 1:56 p.m. UTC | #9
On 23-Jul-19 11:25 AM, Thomas Monjalon wrote:
> 23/07/2019 11:57, Burakov, Anatoly:
>> A machine without an IOMMU shouldn't have picked IOVA as VA in the first
>> place. Perhaps this is something we could fix? I'm not sure how to
>> detected that condition though, i don't think there's a mechanism to
>> know that for sure. Some kernels create a "iommu" sysfs directories, but
>> i'm not too sure if they're 1) there for older kernels we support, and
>> 2) always there.
> [..]
>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is
>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in
>> both cases, but is empty when IOMMU is disabled). Perhaps we could go
>> off that?
> 
> Yes, good idea.
> We need to check how these sysfs entries are managed,
> and how old they are by looking at Linux code history.
> 

Quick (and by no means thorough) Google reveals that IOMMU driver's 
sysfs-related code dates back as far as kernel version 3.17:

https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sysfs.c

I'm not a kernel code expert, but the code *looks* like it's creating an 
IOMMU-related entry in sysfs. So, i take it we can be reasonably sure of 
these entries' presence at least since v3.17 onwards? Do we support 
kernels which don't have this code?
  
Jerin Jacob Kollanukkaran July 23, 2019, 2:24 p.m. UTC | #10
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 23, 2019 7:27 PM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Stojaczyk, Dariusz
> <dariusz.stojaczyk@intel.com>; David Marchand
> <david.marchand@redhat.com>; dev@dpdk.org
> Subject: [EXT] Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> 
> ----------------------------------------------------------------------
> On 23-Jul-19 11:25 AM, Thomas Monjalon wrote:
> > 23/07/2019 11:57, Burakov, Anatoly:
> >> A machine without an IOMMU shouldn't have picked IOVA as VA in the
> >> first place. Perhaps this is something we could fix? I'm not sure how
> >> to detected that condition though, i don't think there's a mechanism
> >> to know that for sure. Some kernels create a "iommu" sysfs
> >> directories, but i'm not too sure if they're 1) there for older
> >> kernels we support, and
> >> 2) always there.
> > [..]
> >> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is
> >> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in
> >> both cases, but is empty when IOMMU is disabled). Perhaps we could go
> >> off that?
> >
> > Yes, good idea.
> > We need to check how these sysfs entries are managed, and how old they
> > are by looking at Linux code history.
> >
> 
> Quick (and by no means thorough) Google reveals that IOMMU driver's
> sysfs-related code dates back as far as kernel version 3.17:
> 
> https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-
> sysfs.c
> 
> I'm not a kernel code expert, but the code *looks* like it's creating an
> IOMMU-related entry in sysfs. So, i take it we can be reasonably sure of
> these entries' presence at least since v3.17 onwards? Do we support kernels
> which don't have this code?

I checked with a x86 and arm64 machine. I could not see "/sys/devices/virtual/iommu"
But looks like "/sys/class/iommu/" present when iommu present.

$ uname -a
Linux jerin-lab 5.1.15-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 25 04:49:39 UTC 2019 x86_64 GNU/Linux

$ ls /sys/devices/virtual/
bdi  dmi  drm  graphics  mem  misc  msr  net  powercap  thermal  tty  vc  vtconsole  workqueue

# ls /sys/class/iommu/

# uname -a                                                      
Linux alarm 4.14.76-5.0.0-g12f0519 #63 SMP PREEMPT Thu Jul 11 17:43:54 IST 2019 aarch64 GNU/Linux 

# ls /sys/devices/virtual/
bdi  block  graphics  input  mem  misc  net  otx-bphy-ctr  otx-gpio-ctr  ppp  tty  vc  vfio  vtconsole  workqueue

# ls /sys/class/iommu/
smmu3.0x0000830000000000

> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly July 23, 2019, 2:29 p.m. UTC | #11
On 23-Jul-19 2:56 PM, Burakov, Anatoly wrote:
> On 23-Jul-19 11:25 AM, Thomas Monjalon wrote:
>> 23/07/2019 11:57, Burakov, Anatoly:
>>> A machine without an IOMMU shouldn't have picked IOVA as VA in the first
>>> place. Perhaps this is something we could fix? I'm not sure how to
>>> detected that condition though, i don't think there's a mechanism to
>>> know that for sure. Some kernels create a "iommu" sysfs directories, but
>>> i'm not too sure if they're 1) there for older kernels we support, and
>>> 2) always there.
>> [..]
>>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is
>>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in
>>> both cases, but is empty when IOMMU is disabled). Perhaps we could go
>>> off that?
>>
>> Yes, good idea.
>> We need to check how these sysfs entries are managed,
>> and how old they are by looking at Linux code history.
>>
> 
> Quick (and by no means thorough) Google reveals that IOMMU driver's 
> sysfs-related code dates back as far as kernel version 3.17:
> 
> https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sysfs.c
> 
> I'm not a kernel code expert, but the code *looks* like it's creating an 
> IOMMU-related entry in sysfs. So, i take it we can be reasonably sure of 
> these entries' presence at least since v3.17 onwards? Do we support 
> kernels which don't have this code?
> 

After a short chat with Ferruh, i think we have even better way to 
determine whether IOMMU is enabled - the /sys/kernel/iommu filesystem. 
Those are created whenever it is possible for VFIO to run, even if VFIO 
driver itself is not loaded. These have been there since kernel 3.6, so 
our minimum requirements are met with this approach, i believe.
  
Jerin Jacob Kollanukkaran July 23, 2019, 2:36 p.m. UTC | #12
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 23, 2019 8:00 PM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Stojaczyk, Dariusz
> <dariusz.stojaczyk@intel.com>; David Marchand
> <david.marchand@redhat.com>; dev@dpdk.org
> Subject: [EXT] Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
> On 23-Jul-19 2:56 PM, Burakov, Anatoly wrote:
> > On 23-Jul-19 11:25 AM, Thomas Monjalon wrote:
> >> 23/07/2019 11:57, Burakov, Anatoly:
> >>> A machine without an IOMMU shouldn't have picked IOVA as VA in the
> >>> first place. Perhaps this is something we could fix? I'm not sure
> >>> how to detected that condition though, i don't think there's a
> >>> mechanism to know that for sure. Some kernels create a "iommu" sysfs
> >>> directories, but i'm not too sure if they're 1) there for older
> >>> kernels we support, and
> >>> 2) always there.
> >> [..]
> >>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is
> >>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in
> >>> both cases, but is empty when IOMMU is disabled). Perhaps we could
> >>> go off that?
> >>
> >> Yes, good idea.
> >> We need to check how these sysfs entries are managed, and how old
> >> they are by looking at Linux code history.
> >>
> >
> > Quick (and by no means thorough) Google reveals that IOMMU driver's
> > sysfs-related code dates back as far as kernel version 3.17:
> >
> > https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sy
> > sfs.c
> >
> > I'm not a kernel code expert, but the code *looks* like it's creating
> > an IOMMU-related entry in sysfs. So, i take it we can be reasonably
> > sure of these entries' presence at least since v3.17 onwards? Do we
> > support kernels which don't have this code?
> >
> 
> After a short chat with Ferruh, i think we have even better way to determine
> whether IOMMU is enabled - the /sys/kernel/iommu filesystem.
> Those are created whenever it is possible for VFIO to run, even if VFIO driver
> itself is not loaded. These have been there since kernel 3.6, so our minimum
> requirements are met with this approach, i believe.

I can see /sys/kernel/iommu_groups/ on IOMMU systems not  /sys/kernel/iommu

> --
> Thanks,
> Anatoly
  
Burakov, Anatoly July 23, 2019, 3:47 p.m. UTC | #13
On 23-Jul-19 3:36 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 23, 2019 8:00 PM
>> To: Thomas Monjalon <thomas@monjalon.net>
>> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Stojaczyk, Dariusz
>> <dariusz.stojaczyk@intel.com>; David Marchand
>> <david.marchand@redhat.com>; dev@dpdk.org
>> Subject: [EXT] Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection
>> On 23-Jul-19 2:56 PM, Burakov, Anatoly wrote:
>>> On 23-Jul-19 11:25 AM, Thomas Monjalon wrote:
>>>> 23/07/2019 11:57, Burakov, Anatoly:
>>>>> A machine without an IOMMU shouldn't have picked IOVA as VA in the
>>>>> first place. Perhaps this is something we could fix? I'm not sure
>>>>> how to detected that condition though, i don't think there's a
>>>>> mechanism to know that for sure. Some kernels create a "iommu" sysfs
>>>>> directories, but i'm not too sure if they're 1) there for older
>>>>> kernels we support, and
>>>>> 2) always there.
>>>> [..]
>>>>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is
>>>>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in
>>>>> both cases, but is empty when IOMMU is disabled). Perhaps we could
>>>>> go off that?
>>>>
>>>> Yes, good idea.
>>>> We need to check how these sysfs entries are managed, and how old
>>>> they are by looking at Linux code history.
>>>>
>>>
>>> Quick (and by no means thorough) Google reveals that IOMMU driver's
>>> sysfs-related code dates back as far as kernel version 3.17:
>>>
>>> https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sy
>>> sfs.c
>>>
>>> I'm not a kernel code expert, but the code *looks* like it's creating
>>> an IOMMU-related entry in sysfs. So, i take it we can be reasonably
>>> sure of these entries' presence at least since v3.17 onwards? Do we
>>> support kernels which don't have this code?
>>>
>>
>> After a short chat with Ferruh, i think we have even better way to determine
>> whether IOMMU is enabled - the /sys/kernel/iommu filesystem.
>> Those are created whenever it is possible for VFIO to run, even if VFIO driver
>> itself is not loaded. These have been there since kernel 3.6, so our minimum
>> requirements are met with this approach, i believe.
> 
> I can see /sys/kernel/iommu_groups/ on IOMMU systems not  /sys/kernel/iommu

Sorry, yes, a typo. It's /sys/kernel/iommu_groups/.

> 
>> --
>> Thanks,
>> Anatoly