[dpdk-dev] eal: map io resources for non x86 architectures

Message ID 1450269064-23608-1-git-send-email-david.marchand@6wind.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

David Marchand Dec. 16, 2015, 12:31 p.m. UTC
  x86 requires a special set of instructions to access ioports, but other
architectures let you remap io resources.
So let eal remap io resources by accepting IORESOURCE_IO flag for
architectures other than x86.

Signed-off-by: David Marchand <david.marchand@6wind.com>
---
 lib/librte_eal/common/include/rte_pci.h |    3 ++-
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   21 +++++++++++++++------
 2 files changed, 17 insertions(+), 7 deletions(-)
  

Comments

Yuanhan Liu Dec. 16, 2015, 12:48 p.m. UTC | #1
On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> x86 requires a special set of instructions to access ioports, but other
> architectures let you remap io resources.
> So let eal remap io resources by accepting IORESOURCE_IO flag for
> architectures other than x86.

One question: this patch could be a replacement of the igbuio_iomap patch
from Santosh? If so, I like it: It's more elegant.

	--yliu

> 
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> ---
>  lib/librte_eal/common/include/rte_pci.h |    3 ++-
>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   21 +++++++++++++++------
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
> index 334c12e..8aaab4a 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -105,7 +105,8 @@ extern struct pci_device_list pci_device_list; /**< Global list of PCI devices.
>  /** Nb. of values in PCI resource format. */
>  #define PCI_RESOURCE_FMT_NVAL 3
>  
> -/** IO resource type: memory address space */
> +/** IO resource type: */
> +#define IORESOURCE_IO         0x00000100
>  #define IORESOURCE_MEM        0x00000200
>  
>  /**
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index bc5b5be..9c4651d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -236,12 +236,21 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
>  			goto error;
>  		}
>  
> -		if (flags & IORESOURCE_MEM) {
> -			dev->mem_resource[i].phys_addr = phys_addr;
> -			dev->mem_resource[i].len = end_addr - phys_addr + 1;
> -			/* not mapped for now */
> -			dev->mem_resource[i].addr = NULL;
> -		}
> +		/* we only care about IORESOURCE_IO or IORESOURCE_MEM */
> +		if (!(flags & IORESOURCE_IO) &&
> +		    !(flags & IORESOURCE_MEM))
> +			continue;
> +
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
> +		/* x86 can not remap ioports, so skip it, remapping code will
> +		 * look at dev->mem_resource[i].phys_addr == 0 and skip it */
> +		if (flags & IORESOURCE_IO)
> +			continue;
> +#endif
> +		dev->mem_resource[i].phys_addr = phys_addr;
> +		dev->mem_resource[i].len = end_addr - phys_addr + 1;
> +		/* not mapped for now */
> +		dev->mem_resource[i].addr = NULL;
>  	}
>  	fclose(f);
>  	return 0;
> -- 
> 1.7.10.4
  
Bruce Richardson Dec. 16, 2015, 1:15 p.m. UTC | #2
On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> x86 requires a special set of instructions to access ioports, but other
> architectures let you remap io resources.
> So let eal remap io resources by accepting IORESOURCE_IO flag for
> architectures other than x86.
> 
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> ---
>  lib/librte_eal/common/include/rte_pci.h |    3 ++-
>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   21 +++++++++++++++------
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
> index 334c12e..8aaab4a 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -105,7 +105,8 @@ extern struct pci_device_list pci_device_list; /**< Global list of PCI devices.
>  /** Nb. of values in PCI resource format. */
>  #define PCI_RESOURCE_FMT_NVAL 3
>  
> -/** IO resource type: memory address space */
> +/** IO resource type: */
> +#define IORESOURCE_IO         0x00000100
>  #define IORESOURCE_MEM        0x00000200
>  
>  /**
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index bc5b5be..9c4651d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -236,12 +236,21 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
>  			goto error;
>  		}
>  
> -		if (flags & IORESOURCE_MEM) {
> -			dev->mem_resource[i].phys_addr = phys_addr;
> -			dev->mem_resource[i].len = end_addr - phys_addr + 1;
> -			/* not mapped for now */
> -			dev->mem_resource[i].addr = NULL;
> -		}
> +		/* we only care about IORESOURCE_IO or IORESOURCE_MEM */
> +		if (!(flags & IORESOURCE_IO) &&
> +		    !(flags & IORESOURCE_MEM))
> +			continue;
> +
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
> +		/* x86 can not remap ioports, so skip it, remapping code will
> +		 * look at dev->mem_resource[i].phys_addr == 0 and skip it */
> +		if (flags & IORESOURCE_IO)
> +			continue;
> +#endif

As a tangential comment: We maybe could look to make certain preprocessor
defines available as C globals as well. There is no reason that the ifdef here
could not be implemented as a runtime check in C code.

/Bruce
  
David Marchand Dec. 16, 2015, 1:29 p.m. UTC | #3
Bruce,

On Wed, Dec 16, 2015 at 2:15 PM, Bruce Richardson <
bruce.richardson@intel.com> wrote:

>
> > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
> > +             /* x86 can not remap ioports, so skip it, remapping code
> will
> > +              * look at dev->mem_resource[i].phys_addr == 0 and skip it
> */
> > +             if (flags & IORESOURCE_IO)
> > +                     continue;
> > +#endif
>
> As a tangential comment: We maybe could look to make certain preprocessor
> defines available as C globals as well. There is no reason that the ifdef
> here
> could not be implemented as a runtime check in C code.
>
>
Well, instead of having the same information as the preprocessor define,
maybe some capability per arch/cpu would be better "arch supports io remap".
Maybe we can extend the cpuflags ?
  
David Marchand Dec. 16, 2015, 1:34 p.m. UTC | #4
Yuanhan,

On Wed, Dec 16, 2015 at 1:48 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> > x86 requires a special set of instructions to access ioports, but other
> > architectures let you remap io resources.
> > So let eal remap io resources by accepting IORESOURCE_IO flag for
> > architectures other than x86.
>
> One question: this patch could be a replacement of the igbuio_iomap patch
> from Santosh? If so, I like it: It's more elegant.
>

Well, yes, unless I missed something since I am no guru :-).
  
Yuanhan Liu Dec. 16, 2015, 1:42 p.m. UTC | #5
On Wed, Dec 16, 2015 at 02:34:35PM +0100, David Marchand wrote:
> Yuanhan,
> 
> On Wed, Dec 16, 2015 at 1:48 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>     > x86 requires a special set of instructions to access ioports, but other
>     > architectures let you remap io resources.
>     > So let eal remap io resources by accepting IORESOURCE_IO flag for
>     > architectures other than x86.
> 
>     One question: this patch could be a replacement of the igbuio_iomap patch
>     from Santosh? If so, I like it: It's more elegant.
> 
> 
> Well, yes, unless I missed something since I am no guru :-).

Great then. If there is a test-by, I could give my Ack :)
(I have no arm or other platform for testing).

	--yliu
  
Santosh Shukla Dec. 16, 2015, 1:51 p.m. UTC | #6
On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>> x86 requires a special set of instructions to access ioports, but other
>> architectures let you remap io resources.
>> So let eal remap io resources by accepting IORESOURCE_IO flag for
>> architectures other than x86.
>
> One question: this patch could be a replacement of the igbuio_iomap patch
> from Santosh? If so, I like it: It's more elegant.
>
>         --yliu
>

I did tried similar in past but not in parse_sysfs (such that
mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
pci_map_resource not able to map address hence segfault at tespmd
initialization.

i was getting these:
EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
Invalid argument (0xffffffffffffffff)

after enabling RTE_PCI_NEED_DRV_MAPPING flags in virtio_ethdev. I
guess patch assume that flag enabled for driver right?



>>
>> Signed-off-by: David Marchand <david.marchand@6wind.com>
>> ---
>>  lib/librte_eal/common/include/rte_pci.h |    3 ++-
>>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   21 +++++++++++++++------
>>  2 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
>> index 334c12e..8aaab4a 100644
>> --- a/lib/librte_eal/common/include/rte_pci.h
>> +++ b/lib/librte_eal/common/include/rte_pci.h
>> @@ -105,7 +105,8 @@ extern struct pci_device_list pci_device_list; /**< Global list of PCI devices.
>>  /** Nb. of values in PCI resource format. */
>>  #define PCI_RESOURCE_FMT_NVAL 3
>>
>> -/** IO resource type: memory address space */
>> +/** IO resource type: */
>> +#define IORESOURCE_IO         0x00000100
>>  #define IORESOURCE_MEM        0x00000200
>>
>>  /**
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index bc5b5be..9c4651d 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -236,12 +236,21 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
>>                       goto error;
>>               }
>>
>> -             if (flags & IORESOURCE_MEM) {
>> -                     dev->mem_resource[i].phys_addr = phys_addr;
>> -                     dev->mem_resource[i].len = end_addr - phys_addr + 1;
>> -                     /* not mapped for now */
>> -                     dev->mem_resource[i].addr = NULL;
>> -             }
>> +             /* we only care about IORESOURCE_IO or IORESOURCE_MEM */
>> +             if (!(flags & IORESOURCE_IO) &&
>> +                 !(flags & IORESOURCE_MEM))
>> +                     continue;
>> +
>> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
>> +             /* x86 can not remap ioports, so skip it, remapping code will
>> +              * look at dev->mem_resource[i].phys_addr == 0 and skip it */
>> +             if (flags & IORESOURCE_IO)
>> +                     continue;
>> +#endif
>> +             dev->mem_resource[i].phys_addr = phys_addr;
>> +             dev->mem_resource[i].len = end_addr - phys_addr + 1;
>> +             /* not mapped for now */
>> +             dev->mem_resource[i].addr = NULL;
>>       }
>>       fclose(f);
>>       return 0;
>> --
>> 1.7.10.4
  
Yuanhan Liu Dec. 17, 2015, 9:38 a.m. UTC | #7
On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> >> x86 requires a special set of instructions to access ioports, but other
> >> architectures let you remap io resources.
> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
> >> architectures other than x86.
> >
> > One question: this patch could be a replacement of the igbuio_iomap patch
> > from Santosh? If so, I like it: It's more elegant.
> >
> >         --yliu
> >
> 
> I did tried similar in past but not in parse_sysfs (such that
> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
> pci_map_resource not able to map address hence segfault at tespmd
> initialization.
> 
> i was getting these:
> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
> Invalid argument (0xffffffffffffffff)

That's because ARM (at least the kernel) doesn't allow an IO map:

arch/arm/kernel/bios32.c
------------------------
618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
619                         enum pci_mmap_state mmap_state, int write_combine)
620 {
621         if (mmap_state == pci_mmap_io)
622                 return -EINVAL;

And with a quick glimpse of powerpc, I see no such limitation. Hence,
this peice of code may work only on Powerpc platform (and maybe a few
others we don't care).

So, apparently, this will not work for ARM.

	--yliu
  
Santosh Shukla Dec. 17, 2015, 10:01 a.m. UTC | #8
On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>> <yuanhan.liu@linux.intel.com> wrote:
>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>> >> x86 requires a special set of instructions to access ioports, but other
>> >> architectures let you remap io resources.
>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>> >> architectures other than x86.
>> >
>> > One question: this patch could be a replacement of the igbuio_iomap patch
>> > from Santosh? If so, I like it: It's more elegant.
>> >
>> >         --yliu
>> >
>>
>> I did tried similar in past but not in parse_sysfs (such that
>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>> pci_map_resource not able to map address hence segfault at tespmd
>> initialization.
>>
>> i was getting these:
>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
>> Invalid argument (0xffffffffffffffff)
>
> That's because ARM (at least the kernel) doesn't allow an IO map:
>
> arch/arm/kernel/bios32.c
> ------------------------
> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
> 619                         enum pci_mmap_state mmap_state, int write_combine)
> 620 {
> 621         if (mmap_state == pci_mmap_io)
> 622                 return -EINVAL;
>
> And with a quick glimpse of powerpc, I see no such limitation. Hence,
> this peice of code may work only on Powerpc platform (and maybe a few
> others we don't care).
>
> So, apparently, this will not work for ARM.
>

Right and I did shared detailed explanation on why it wont work on
this link [1], infact this patch shouldn;t work for mips too.

As I mentioned earlier I did tried similar approach and so to get
everything working like iomem is currently in dpdk; we need to add
something like pci_remap_iospace --> ioremap_page_range() but this api
not really pci_mmap_page_range types. user need to write more code on
top so to use this api efficiently, also this api looks like meant to
use by arch file only in kernel space.


>         --yliu
  
Santosh Shukla Dec. 17, 2015, 10:02 a.m. UTC | #9
On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>>> <yuanhan.liu@linux.intel.com> wrote:
>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>>> >> x86 requires a special set of instructions to access ioports, but other
>>> >> architectures let you remap io resources.
>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>>> >> architectures other than x86.
>>> >
>>> > One question: this patch could be a replacement of the igbuio_iomap patch
>>> > from Santosh? If so, I like it: It's more elegant.
>>> >
>>> >         --yliu
>>> >
>>>
>>> I did tried similar in past but not in parse_sysfs (such that
>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>>> pci_map_resource not able to map address hence segfault at tespmd
>>> initialization.
>>>
>>> i was getting these:
>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
>>> Invalid argument (0xffffffffffffffff)
>>
>> That's because ARM (at least the kernel) doesn't allow an IO map:
>>
>> arch/arm/kernel/bios32.c
>> ------------------------
>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
>> 619                         enum pci_mmap_state mmap_state, int write_combine)
>> 620 {
>> 621         if (mmap_state == pci_mmap_io)
>> 622                 return -EINVAL;
>>
>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>> this peice of code may work only on Powerpc platform (and maybe a few
>> others we don't care).
>>
>> So, apparently, this will not work for ARM.
>>
>
> Right and I did shared detailed explanation on why it wont work on
> this link [1], infact this patch shouldn;t work for mips too.
>
> As I mentioned earlier I did tried similar approach and so to get
> everything working like iomem is currently in dpdk; we need to add
> something like pci_remap_iospace --> ioremap_page_range() but this api
> not really pci_mmap_page_range types. user need to write more code on
> top so to use this api efficiently, also this api looks like meant to
> use by arch file only in kernel space.
>
>
missed link;

[1] http://dpdk.org/dev/patchwork/patch/9365/

>>         --yliu
  
Santosh Shukla Dec. 17, 2015, 10:07 a.m. UTC | #10
On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla <sshukla@mvista.com> wrote:
>> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>> <yuanhan.liu@linux.intel.com> wrote:
>>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>>>> <yuanhan.liu@linux.intel.com> wrote:
>>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>>>> >> x86 requires a special set of instructions to access ioports, but other
>>>> >> architectures let you remap io resources.
>>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>>>> >> architectures other than x86.
>>>> >
>>>> > One question: this patch could be a replacement of the igbuio_iomap patch
>>>> > from Santosh? If so, I like it: It's more elegant.
>>>> >
>>>> >         --yliu
>>>> >
>>>>
>>>> I did tried similar in past but not in parse_sysfs (such that
>>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>>>> pci_map_resource not able to map address hence segfault at tespmd
>>>> initialization.
>>>>
>>>> i was getting these:
>>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
>>>> Invalid argument (0xffffffffffffffff)
>>>
>>> That's because ARM (at least the kernel) doesn't allow an IO map:
>>>
>>> arch/arm/kernel/bios32.c
>>> ------------------------
>>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
>>> 619                         enum pci_mmap_state mmap_state, int write_combine)
>>> 620 {
>>> 621         if (mmap_state == pci_mmap_io)
>>> 622                 return -EINVAL;
>>>
>>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>>> this peice of code may work only on Powerpc platform (and maybe a few
>>> others we don't care).
>>>
>>> So, apparently, this will not work for ARM.
>>>
>>
>> Right and I did shared detailed explanation on why it wont work on
>> this link [1], infact this patch shouldn;t work for mips too.
>>
>> As I mentioned earlier I did tried similar approach and so to get
>> everything working like iomem is currently in dpdk; we need to add
>> something like pci_remap_iospace --> ioremap_page_range() but this api
>> not really pci_mmap_page_range types. user need to write more code on
>> top so to use this api efficiently, also this api looks like meant to
>> use by arch file only in kernel space.
>>
>>
> missed link;
>
> [1] http://dpdk.org/dev/patchwork/patch/9365/
>

IMO, it is worth keeping one special device file who could work across
archs like arm/arm64/powerpc and others, who could map iopci bar to
dpdk user-space. also this approach has no kernel version dependency
too. BTW; I did mentioned in second approach in to add /dev/ioport
interface in drivers/char/mem.c which could read more than byte in one
single operation, but that has kernel dependency. However that
approach too is arch agnostic.

Let me know!

>>>         --yliu
  
Thomas Monjalon Dec. 17, 2015, 10:14 a.m. UTC | #11
Hi,

2015-12-17 15:37, Santosh Shukla:
> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
> >> <yuanhan.liu@linux.intel.com> wrote:
> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
> >>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
> >>>> <yuanhan.liu@linux.intel.com> wrote:
> >>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> >>>> >> x86 requires a special set of instructions to access ioports, but other
> >>>> >> architectures let you remap io resources.
> >>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
> >>>> >> architectures other than x86.
> >>>> >
> >>>> > One question: this patch could be a replacement of the igbuio_iomap patch
> >>>> > from Santosh? If so, I like it: It's more elegant.
> >>>> >
> >>>> >         --yliu
> >>>> >
> >>>>
> >>>> I did tried similar in past but not in parse_sysfs (such that
> >>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
> >>>> pci_map_resource not able to map address hence segfault at tespmd
> >>>> initialization.
> >>>>
> >>>> i was getting these:
> >>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
> >>>> Invalid argument (0xffffffffffffffff)
> >>>
> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
> >>>
> >>> arch/arm/kernel/bios32.c
> >>> ------------------------
> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
> >>> 619                         enum pci_mmap_state mmap_state, int write_combine)
> >>> 620 {
> >>> 621         if (mmap_state == pci_mmap_io)
> >>> 622                 return -EINVAL;
> >>>
> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
> >>> this peice of code may work only on Powerpc platform (and maybe a few
> >>> others we don't care).
> >>>
> >>> So, apparently, this will not work for ARM.
> >>>
> >>
> >> Right and I did shared detailed explanation on why it wont work on
> >> this link [1], infact this patch shouldn;t work for mips too.
> >>
> >> As I mentioned earlier I did tried similar approach and so to get
> >> everything working like iomem is currently in dpdk; we need to add
> >> something like pci_remap_iospace --> ioremap_page_range() but this api
> >> not really pci_mmap_page_range types. user need to write more code on
> >> top so to use this api efficiently, also this api looks like meant to
> >> use by arch file only in kernel space.
> >>
> >>
> > missed link;
> >
> > [1] http://dpdk.org/dev/patchwork/patch/9365/
> >
> 
> IMO, it is worth keeping one special device file who could work across
> archs like arm/arm64/powerpc and others, who could map iopci bar to
> dpdk user-space. also this approach has no kernel version dependency
> too. BTW; I did mentioned in second approach in to add /dev/ioport
> interface in drivers/char/mem.c which could read more than byte in one
> single operation, but that has kernel dependency. However that
> approach too is arch agnostic.

Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
really say there is no kernel dependency.
We should try to remove the need for any out-of-tree kernel module.
That's why the Linux upstream approach is a better solution.
  
Santosh Shukla Dec. 17, 2015, 10:21 a.m. UTC | #12
On Thu, Dec 17, 2015 at 3:44 PM, Thomas Monjalon
<thomas.monjalon@6wind.com> wrote:
> Hi,
>
> 2015-12-17 15:37, Santosh Shukla:
>> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla <sshukla@mvista.com> wrote:
>> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla <sshukla@mvista.com> wrote:
>> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>> >> <yuanhan.liu@linux.intel.com> wrote:
>> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>> >>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>> >>>> <yuanhan.liu@linux.intel.com> wrote:
>> >>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>> >>>> >> x86 requires a special set of instructions to access ioports, but other
>> >>>> >> architectures let you remap io resources.
>> >>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>> >>>> >> architectures other than x86.
>> >>>> >
>> >>>> > One question: this patch could be a replacement of the igbuio_iomap patch
>> >>>> > from Santosh? If so, I like it: It's more elegant.
>> >>>> >
>> >>>> >         --yliu
>> >>>> >
>> >>>>
>> >>>> I did tried similar in past but not in parse_sysfs (such that
>> >>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>> >>>> pci_map_resource not able to map address hence segfault at tespmd
>> >>>> initialization.
>> >>>>
>> >>>> i was getting these:
>> >>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
>> >>>> Invalid argument (0xffffffffffffffff)
>> >>>
>> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
>> >>>
>> >>> arch/arm/kernel/bios32.c
>> >>> ------------------------
>> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
>> >>> 619                         enum pci_mmap_state mmap_state, int write_combine)
>> >>> 620 {
>> >>> 621         if (mmap_state == pci_mmap_io)
>> >>> 622                 return -EINVAL;
>> >>>
>> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>> >>> this peice of code may work only on Powerpc platform (and maybe a few
>> >>> others we don't care).
>> >>>
>> >>> So, apparently, this will not work for ARM.
>> >>>
>> >>
>> >> Right and I did shared detailed explanation on why it wont work on
>> >> this link [1], infact this patch shouldn;t work for mips too.
>> >>
>> >> As I mentioned earlier I did tried similar approach and so to get
>> >> everything working like iomem is currently in dpdk; we need to add
>> >> something like pci_remap_iospace --> ioremap_page_range() but this api
>> >> not really pci_mmap_page_range types. user need to write more code on
>> >> top so to use this api efficiently, also this api looks like meant to
>> >> use by arch file only in kernel space.
>> >>
>> >>
>> > missed link;
>> >
>> > [1] http://dpdk.org/dev/patchwork/patch/9365/
>> >
>>
>> IMO, it is worth keeping one special device file who could work across
>> archs like arm/arm64/powerpc and others, who could map iopci bar to
>> dpdk user-space. also this approach has no kernel version dependency
>> too. BTW; I did mentioned in second approach in to add /dev/ioport
>> interface in drivers/char/mem.c which could read more than byte in one
>> single operation, but that has kernel dependency. However that
>> approach too is arch agnostic.
>
> Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
> really say there is no kernel dependency.

Agree but I mentioned kernel __version__ dependency.

> We should try to remove the need for any out-of-tree kernel module.
> That's why the Linux upstream approach is a better solution.

IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
pci_mmap_page_range()?
  
Thomas Monjalon Dec. 17, 2015, 10:33 a.m. UTC | #13
2015-12-17 15:51, Santosh Shukla:
> On Thu, Dec 17, 2015 at 3:44 PM, Thomas Monjalon
> <thomas.monjalon@6wind.com> wrote:
> > Hi,
> >
> > 2015-12-17 15:37, Santosh Shukla:
> >> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> >> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> >> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
> >> >> <yuanhan.liu@linux.intel.com> wrote:
> >> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
> >> >>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
> >> >>>> <yuanhan.liu@linux.intel.com> wrote:
> >> >>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> >> >>>> >> x86 requires a special set of instructions to access ioports, but other
> >> >>>> >> architectures let you remap io resources.
> >> >>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
> >> >>>> >> architectures other than x86.
> >> >>>> >
> >> >>>> > One question: this patch could be a replacement of the igbuio_iomap patch
> >> >>>> > from Santosh? If so, I like it: It's more elegant.
> >> >>>> >
> >> >>>> >         --yliu
> >> >>>> >
> >> >>>>
> >> >>>> I did tried similar in past but not in parse_sysfs (such that
> >> >>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
> >> >>>> pci_map_resource not able to map address hence segfault at tespmd
> >> >>>> initialization.
> >> >>>>
> >> >>>> i was getting these:
> >> >>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
> >> >>>> Invalid argument (0xffffffffffffffff)
> >> >>>
> >> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
> >> >>>
> >> >>> arch/arm/kernel/bios32.c
> >> >>> ------------------------
> >> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
> >> >>> 619                         enum pci_mmap_state mmap_state, int write_combine)
> >> >>> 620 {
> >> >>> 621         if (mmap_state == pci_mmap_io)
> >> >>> 622                 return -EINVAL;
> >> >>>
> >> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
> >> >>> this peice of code may work only on Powerpc platform (and maybe a few
> >> >>> others we don't care).
> >> >>>
> >> >>> So, apparently, this will not work for ARM.
> >> >>>
> >> >>
> >> >> Right and I did shared detailed explanation on why it wont work on
> >> >> this link [1], infact this patch shouldn;t work for mips too.
> >> >>
> >> >> As I mentioned earlier I did tried similar approach and so to get
> >> >> everything working like iomem is currently in dpdk; we need to add
> >> >> something like pci_remap_iospace --> ioremap_page_range() but this api
> >> >> not really pci_mmap_page_range types. user need to write more code on
> >> >> top so to use this api efficiently, also this api looks like meant to
> >> >> use by arch file only in kernel space.
> >> >>
> >> >>
> >> > missed link;
> >> >
> >> > [1] http://dpdk.org/dev/patchwork/patch/9365/
> >> >
> >>
> >> IMO, it is worth keeping one special device file who could work across
> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
> >> dpdk user-space. also this approach has no kernel version dependency
> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
> >> interface in drivers/char/mem.c which could read more than byte in one
> >> single operation, but that has kernel dependency. However that
> >> approach too is arch agnostic.
> >
> > Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
> > really say there is no kernel dependency.
> 
> Agree but I mentioned kernel __version__ dependency.

Yes you did.
One of the main issue with out-of-tree kernel modules is the version
dependency. Probably that igb_uio from DPDK 2.3 will not compile with
the kernel 5.0.

> > We should try to remove the need for any out-of-tree kernel module.
> > That's why the Linux upstream approach is a better solution.
> 
> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
> pci_mmap_page_range()?

I don't know what is the best solution in the kernel.
First we need to be sure that there is absolutely no solution without
kernel changes.
Then we can try a pci_mmap solution or, as you suggest, an interface in
drivers/char/mem.c
  
Santosh Shukla Dec. 17, 2015, 11:22 a.m. UTC | #14
On Thu, Dec 17, 2015 at 4:03 PM, Thomas Monjalon
<thomas.monjalon@6wind.com> wrote:
> 2015-12-17 15:51, Santosh Shukla:
>> On Thu, Dec 17, 2015 at 3:44 PM, Thomas Monjalon
>> <thomas.monjalon@6wind.com> wrote:
>> > Hi,
>> >
>> > 2015-12-17 15:37, Santosh Shukla:
>> >> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla <sshukla@mvista.com> wrote:
>> >> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla <sshukla@mvista.com> wrote:
>> >> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>> >> >> <yuanhan.liu@linux.intel.com> wrote:
>> >> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>> >> >>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>> >> >>>> <yuanhan.liu@linux.intel.com> wrote:
>> >> >>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>> >> >>>> >> x86 requires a special set of instructions to access ioports, but other
>> >> >>>> >> architectures let you remap io resources.
>> >> >>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>> >> >>>> >> architectures other than x86.
>> >> >>>> >
>> >> >>>> > One question: this patch could be a replacement of the igbuio_iomap patch
>> >> >>>> > from Santosh? If so, I like it: It's more elegant.
>> >> >>>> >
>> >> >>>> >         --yliu
>> >> >>>> >
>> >> >>>>
>> >> >>>> I did tried similar in past but not in parse_sysfs (such that
>> >> >>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>> >> >>>> pci_map_resource not able to map address hence segfault at tespmd
>> >> >>>> initialization.
>> >> >>>>
>> >> >>>> i was getting these:
>> >> >>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c00000, 0x20, 0x0):
>> >> >>>> Invalid argument (0xffffffffffffffff)
>> >> >>>
>> >> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
>> >> >>>
>> >> >>> arch/arm/kernel/bios32.c
>> >> >>> ------------------------
>> >> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
>> >> >>> 619                         enum pci_mmap_state mmap_state, int write_combine)
>> >> >>> 620 {
>> >> >>> 621         if (mmap_state == pci_mmap_io)
>> >> >>> 622                 return -EINVAL;
>> >> >>>
>> >> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>> >> >>> this peice of code may work only on Powerpc platform (and maybe a few
>> >> >>> others we don't care).
>> >> >>>
>> >> >>> So, apparently, this will not work for ARM.
>> >> >>>
>> >> >>
>> >> >> Right and I did shared detailed explanation on why it wont work on
>> >> >> this link [1], infact this patch shouldn;t work for mips too.
>> >> >>
>> >> >> As I mentioned earlier I did tried similar approach and so to get
>> >> >> everything working like iomem is currently in dpdk; we need to add
>> >> >> something like pci_remap_iospace --> ioremap_page_range() but this api
>> >> >> not really pci_mmap_page_range types. user need to write more code on
>> >> >> top so to use this api efficiently, also this api looks like meant to
>> >> >> use by arch file only in kernel space.
>> >> >>
>> >> >>
>> >> > missed link;
>> >> >
>> >> > [1] http://dpdk.org/dev/patchwork/patch/9365/
>> >> >
>> >>
>> >> IMO, it is worth keeping one special device file who could work across
>> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
>> >> dpdk user-space. also this approach has no kernel version dependency
>> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
>> >> interface in drivers/char/mem.c which could read more than byte in one
>> >> single operation, but that has kernel dependency. However that
>> >> approach too is arch agnostic.
>> >
>> > Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
>> > really say there is no kernel dependency.
>>
>> Agree but I mentioned kernel __version__ dependency.
>
> Yes you did.
> One of the main issue with out-of-tree kernel modules is the version
> dependency. Probably that igb_uio from DPDK 2.3 will not compile with
> the kernel 5.0.
>

don't know kernel 5.0 feature list so I guess your may be right. is
uio obsoleted for 5.0 kernel?

>> > We should try to remove the need for any out-of-tree kernel module.
>> > That's why the Linux upstream approach is a better solution.
>>
>> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
>> pci_mmap_page_range()?
>
> I don't know what is the best solution in the kernel.
> First we need to be sure that there is absolutely no solution without
> kernel changes.

I guess we have done enough evaluation / investigation that suggest -
so to map iopci region to userspace in arch agnostic-way -

# either we need to modify kernel
               - Make sure all the non-x86 arch to support mapping for
iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
approach though.
            or
               - include /dev/ioport char-mem device file who could do
more than byte operation, Note that this implementation does not exist
in kernel.  I could send an RFC to lkml.

OR keep device file in user space (current approach)
Right now Virtio-for-arm patches are blocked on this, let me know if
someone has better approach/thought in mind.

Thanks.

> Then we can try a pci_mmap solution or, as you suggest, an interface in
> drivers/char/mem.c
>
  
Yuanhan Liu Dec. 18, 2015, 5:30 a.m. UTC | #15
On Thu, Dec 17, 2015 at 04:52:00PM +0530, Santosh Shukla wrote:
> >> >> IMO, it is worth keeping one special device file who could work across
> >> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
> >> >> dpdk user-space. also this approach has no kernel version dependency
> >> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
> >> >> interface in drivers/char/mem.c which could read more than byte in one
> >> >> single operation, but that has kernel dependency. However that
> >> >> approach too is arch agnostic.
> >> >
> >> > Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
> >> > really say there is no kernel dependency.
> >>
> >> Agree but I mentioned kernel __version__ dependency.
> >
> > Yes you did.
> > One of the main issue with out-of-tree kernel modules is the version
> > dependency. Probably that igb_uio from DPDK 2.3 will not compile with
> > the kernel 5.0.
> >
> 
> don't know kernel 5.0 feature list so I guess your may be right. is
> uio obsoleted for 5.0 kernel?

Nope, Thomas meant to say that we should keep the old DPDK will work
with newer kernel, not just the newest DPDK work on newest linux kernel
only. The out-of-tree kernel makes no garantee on that.

> >> > We should try to remove the need for any out-of-tree kernel module.
> >> > That's why the Linux upstream approach is a better solution.
> >>
> >> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
> >> pci_mmap_page_range()?
> >
> > I don't know what is the best solution in the kernel.
> > First we need to be sure that there is absolutely no solution without
> > kernel changes.
> 
> I guess we have done enough evaluation / investigation that suggest -
> so to map iopci region to userspace in arch agnostic-way -
> 
> # either we need to modify kernel
>                - Make sure all the non-x86 arch to support mapping for
> iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
> approach though.
>             or
>                - include /dev/ioport char-mem device file who could do
> more than byte operation, Note that this implementation does not exist
> in kernel.  I could send an RFC to lkml.

Maybe you could propose the two to lkml, to get some feedbacks from
those kernel/ARM gurus? Please cc me if you do so.

	--yliu
> 
> OR keep device file in user space (current approach)
> Right now Virtio-for-arm patches are blocked on this, let me know if
> someone has better approach/thought in mind.
> 
> Thanks.
> 
> > Then we can try a pci_mmap solution or, as you suggest, an interface in
> > drivers/char/mem.c
> >
  
Jerin Jacob Dec. 18, 2015, 6:34 a.m. UTC | #16
On Fri, Dec 18, 2015 at 01:30:53PM +0800, Yuanhan Liu wrote:
> On Thu, Dec 17, 2015 at 04:52:00PM +0530, Santosh Shukla wrote:
> > >> >> IMO, it is worth keeping one special device file who could work across
> > >> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
> > >> >> dpdk user-space. also this approach has no kernel version dependency
> > >> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
> > >> >> interface in drivers/char/mem.c which could read more than byte in one
> > >> >> single operation, but that has kernel dependency. However that
> > >> >> approach too is arch agnostic.
> > >> >
> > >> > Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
> > >> > really say there is no kernel dependency.
> > >>
> > >> Agree but I mentioned kernel __version__ dependency.
> > >
> > > Yes you did.
> > > One of the main issue with out-of-tree kernel modules is the version
> > > dependency. Probably that igb_uio from DPDK 2.3 will not compile with
> > > the kernel 5.0.
> > >
> >
> > don't know kernel 5.0 feature list so I guess your may be right. is
> > uio obsoleted for 5.0 kernel?
>
> Nope, Thomas meant to say that we should keep the old DPDK will work
> with newer kernel, not just the newest DPDK work on newest linux kernel
> only. The out-of-tree kernel makes no garantee on that.
>
> > >> > We should try to remove the need for any out-of-tree kernel module.
> > >> > That's why the Linux upstream approach is a better solution.
> > >>
> > >> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
> > >> pci_mmap_page_range()?
> > >
> > > I don't know what is the best solution in the kernel.
> > > First we need to be sure that there is absolutely no solution without
> > > kernel changes.
> >
> > I guess we have done enough evaluation / investigation that suggest -
> > so to map iopci region to userspace in arch agnostic-way -
> >
> > # either we need to modify kernel
> >                - Make sure all the non-x86 arch to support mapping for
> > iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
> > approach though.
> >             or
> >                - include /dev/ioport char-mem device file who could do
> > more than byte operation, Note that this implementation does not exist
> > in kernel.  I could send an RFC to lkml.
>
> Maybe you could propose the two to lkml, to get some feedbacks from
> those kernel/ARM gurus? Please cc me if you do so.

IMO, We need to support DPDK-ARM for the old kernels also. If there is no means
to expose iomap region to userspace in the existing kernel then we add that
support through out-of-tree driver like igb_uio and when make become part of
the kernel then we can turn off of the out-of-tree driver based
on the kernel major and minor version numbers.

Jerin


>
> 	--yliu
> >
> > OR keep device file in user space (current approach)
> > Right now Virtio-for-arm patches are blocked on this, let me know if
> > someone has better approach/thought in mind.
> >
> > Thanks.
> >
> > > Then we can try a pci_mmap solution or, as you suggest, an interface in
> > > drivers/char/mem.c
> > >
  
Santosh Shukla Dec. 18, 2015, 7:54 a.m. UTC | #17
On Fri, Dec 18, 2015 at 11:00 AM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> On Thu, Dec 17, 2015 at 04:52:00PM +0530, Santosh Shukla wrote:
>> >> >> IMO, it is worth keeping one special device file who could work across
>> >> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
>> >> >> dpdk user-space. also this approach has no kernel version dependency
>> >> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
>> >> >> interface in drivers/char/mem.c which could read more than byte in one
>> >> >> single operation, but that has kernel dependency. However that
>> >> >> approach too is arch agnostic.
>> >> >
>> >> > Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
>> >> > really say there is no kernel dependency.
>> >>
>> >> Agree but I mentioned kernel __version__ dependency.
>> >
>> > Yes you did.
>> > One of the main issue with out-of-tree kernel modules is the version
>> > dependency. Probably that igb_uio from DPDK 2.3 will not compile with
>> > the kernel 5.0.
>> >
>>
>> don't know kernel 5.0 feature list so I guess your may be right. is
>> uio obsoleted for 5.0 kernel?
>
> Nope, Thomas meant to say that we should keep the old DPDK will work
> with newer kernel, not just the newest DPDK work on newest linux kernel
> only. The out-of-tree kernel makes no garantee on that.
>
>> >> > We should try to remove the need for any out-of-tree kernel module.
>> >> > That's why the Linux upstream approach is a better solution.
>> >>
>> >> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
>> >> pci_mmap_page_range()?
>> >
>> > I don't know what is the best solution in the kernel.
>> > First we need to be sure that there is absolutely no solution without
>> > kernel changes.
>>
>> I guess we have done enough evaluation / investigation that suggest -
>> so to map iopci region to userspace in arch agnostic-way -
>>
>> # either we need to modify kernel
>>                - Make sure all the non-x86 arch to support mapping for
>> iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
>> approach though.
>>             or
>>                - include /dev/ioport char-mem device file who could do
>> more than byte operation, Note that this implementation does not exist
>> in kernel.  I could send an RFC to lkml.
>
> Maybe you could propose the two to lkml, to get some feedbacks from
> those kernel/ARM gurus? Please cc me if you do so.
>

The latter one I already shared old lkml thread, Pl. revisit my v1 0/6
patch [1] and in that refer [2].
Josh has already proposed to lkml but for some reason thread didn't
went far. I can restart that discussion giving dpdk use-case as an
example/ requirement.

And for the former one, I'll have to check with linux-arm why iopci
region not mappable. By grepping kernel source all i could see two
commit - latest one is 415ae101 and older one 1da177e. Both has
nothing to explain why
if (mmap_state == pci_mmap_io)
      return -EINVAL;

Setting pci io region to -EINVAL, should have fundamental reason for
sure. But we'll have to check, for that I could post as a query rather
a patch to lkml.

Note that dpdk already has out-of-tree implementation for
dom0/xen-case too, it creates special device file which maps pci
resources. so keep one more igb_uio char device so to map iopci region
wont hurt much though!

[1] http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/29531

>         --yliu
>>
>> OR keep device file in user space (current approach)
>> Right now Virtio-for-arm patches are blocked on this, let me know if
>> someone has better approach/thought in mind.
>>
>> Thanks.
>>
>> > Then we can try a pci_mmap solution or, as you suggest, an interface in
>> > drivers/char/mem.c
>> >
  
Yuanhan Liu Dec. 18, 2015, 7:55 a.m. UTC | #18
On Fri, Dec 18, 2015 at 12:04:48PM +0530, Jerin Jacob wrote:
> > > # either we need to modify kernel
> > >                - Make sure all the non-x86 arch to support mapping for
> > > iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
> > > approach though.
> > >             or
> > >                - include /dev/ioport char-mem device file who could do
> > > more than byte operation, Note that this implementation does not exist
> > > in kernel.  I could send an RFC to lkml.
> >
> > Maybe you could propose the two to lkml, to get some feedbacks from
> > those kernel/ARM gurus? Please cc me if you do so.
>
> IMO, We need to support DPDK-ARM for the old kernels also. If there is no means
> to expose iomap region to userspace in the existing kernel then we add that
> support through out-of-tree driver like igb_uio and when make become part of
> the kernel then we can turn off of the out-of-tree driver based
> on the kernel major and minor version numbers.

Agreed.


(BTW, I have no idea why the CC list of my last reply was shortened
to Santosh only. I double checked with my sent box, the mail header
sent out looks normal:

    Date: Fri, 18 Dec 2015 13:30:53 +0800                                                                                                                                                                              From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
    To: Santosh Shukla <sshukla@mvista.com>
    Cc: Thomas Monjalon <thomas.monjalon@6wind.com>, dev@dpdk.org, David Marchand <david.marchand@6wind.com>

Not sure it's a bug from my mutt email client, or the mailing list)

	--yliu
  
Yuanhan Liu Dec. 18, 2015, 8:21 a.m. UTC | #19
On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla wrote:
> >> I guess we have done enough evaluation / investigation that suggest -
> >> so to map iopci region to userspace in arch agnostic-way -
> >>
> >> # either we need to modify kernel
> >>                - Make sure all the non-x86 arch to support mapping for
> >> iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
> >> approach though.
> >>             or
> >>                - include /dev/ioport char-mem device file who could do
> >> more than byte operation, Note that this implementation does not exist
> >> in kernel.  I could send an RFC to lkml.
> >
> > Maybe you could propose the two to lkml, to get some feedbacks from
> > those kernel/ARM gurus? Please cc me if you do so.
> >
> 
> The latter one I already shared old lkml thread, Pl. revisit my v1 0/6
> patch [1] and in that refer [2].

Oops, sorry, a bit busy, that I didn't look at it carefully. My bad,
anyway.

> Josh has already proposed to lkml but for some reason thread didn't
> went far. I can restart that discussion giving dpdk use-case as an
> example/ requirement.

I had a quick go through of the discussion. Both hpa and Arnd seem
to be fine with the ioctl interface on /dev/port. Have you tried
that?  And if you want to restart it, ioctl might be a better option
than /dev/ioport, judging from the discussion.

> 
> And for the former one, I'll have to check with linux-arm why iopci
> region not mappable. By grepping kernel source all i could see two
> commit - latest one is 415ae101 and older one 1da177e. Both has
> nothing to explain why
> if (mmap_state == pci_mmap_io)
>       return -EINVAL;
> 
> Setting pci io region to -EINVAL, should have fundamental reason for
> sure. But we'll have to check, for that I could post as a query rather
> a patch to lkml.

No idea, such check should have been there in the very beginning
for some reason. And here is a quote from Arnd:

    Only powerpc, microblaze, alpha, sparc and xtensa allow
    users to mmap I/O space, even though a lot of others could. 

> 
> Note that dpdk already has out-of-tree implementation for
> dom0/xen-case too, it creates special device file which maps pci
> resources. so keep one more igb_uio char device so to map iopci region
> wont hurt much though!

Xen is not a good example here; I'm even not sure it still works
or not :)

	--yliu
> [1] http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/29531
> 
> >         --yliu
> >>
> >> OR keep device file in user space (current approach)
> >> Right now Virtio-for-arm patches are blocked on this, let me know if
> >> someone has better approach/thought in mind.
> >>
> >> Thanks.
> >>
> >> > Then we can try a pci_mmap solution or, as you suggest, an interface in
> >> > drivers/char/mem.c
> >> >
  
Thomas Monjalon Dec. 18, 2015, 9:37 a.m. UTC | #20
2015-12-18 12:04, Jerin Jacob:
> On Fri, Dec 18, 2015 at 01:30:53PM +0800, Yuanhan Liu wrote:
> > On Thu, Dec 17, 2015 at 04:52:00PM +0530, Santosh Shukla wrote:
> > > > One of the main issue with out-of-tree kernel modules is the version
> > > > dependency. Probably that igb_uio from DPDK 2.3 will not compile with
> > > > the kernel 5.0.
> > >
> > > don't know kernel 5.0 feature list so I guess your may be right. is
> > > uio obsoleted for 5.0 kernel?
> >
> > Nope, Thomas meant to say that we should keep the old DPDK will work
> > with newer kernel, not just the newest DPDK work on newest linux kernel
> > only. The out-of-tree kernel makes no garantee on that.
> >
> > > >> > We should try to remove the need for any out-of-tree kernel module.
> > > >> > That's why the Linux upstream approach is a better solution.
> > > >>
> > > >> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
> > > >> pci_mmap_page_range()?
> > > >
> > > > I don't know what is the best solution in the kernel.
> > > > First we need to be sure that there is absolutely no solution without
> > > > kernel changes.
> > >
> > > I guess we have done enough evaluation / investigation that suggest -
> > > so to map iopci region to userspace in arch agnostic-way -
> > >
> > > # either we need to modify kernel
> > >                - Make sure all the non-x86 arch to support mapping for
> > > iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
> > > approach though.
> > >             or
> > >                - include /dev/ioport char-mem device file who could do
> > > more than byte operation, Note that this implementation does not exist
> > > in kernel.  I could send an RFC to lkml.
> >
> > Maybe you could propose the two to lkml, to get some feedbacks from
> > those kernel/ARM gurus? Please cc me if you do so.
> 
> IMO, We need to support DPDK-ARM for the old kernels also. If there is no means
> to expose iomap region to userspace in the existing kernel then we add that
> support through out-of-tree driver like igb_uio and when make become part of
> the kernel then we can turn off of the out-of-tree driver based
> on the kernel major and minor version numbers.

Of course, we have to support DPDK for old/current kernels.
But if we continue to implement features in out-of-tree modules without trying
to have a solution with upstream kernel, we are stuck with our legacy forever.
So, as sane policy, we should not accept new changes in our kernel modules
if there is no alternative in Linus' repo.
  
Santosh Shukla Dec. 18, 2015, 12:55 p.m. UTC | #21
On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla wrote:
>> >> I guess we have done enough evaluation / investigation that suggest -
>> >> so to map iopci region to userspace in arch agnostic-way -
>> >>
>> >> # either we need to modify kernel
>> >>                - Make sure all the non-x86 arch to support mapping for
>> >> iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
>> >> approach though.
>> >>             or
>> >>                - include /dev/ioport char-mem device file who could do
>> >> more than byte operation, Note that this implementation does not exist
>> >> in kernel.  I could send an RFC to lkml.
>> >
>> > Maybe you could propose the two to lkml, to get some feedbacks from
>> > those kernel/ARM gurus? Please cc me if you do so.
>> >
>>
>> The latter one I already shared old lkml thread, Pl. revisit my v1 0/6
>> patch [1] and in that refer [2].
>
> Oops, sorry, a bit busy, that I didn't look at it carefully. My bad,
> anyway.
>
>> Josh has already proposed to lkml but for some reason thread didn't
>> went far. I can restart that discussion giving dpdk use-case as an
>> example/ requirement.
>
> I had a quick go through of the discussion. Both hpa and Arnd seem
> to be fine with the ioctl interface on /dev/port. Have you tried
> that?  And if you want to restart it, ioctl might be a better option
> than /dev/ioport, judging from the discussion.
>

I tried legacy patch and re-writing with ioctl-way; doing changes in
dpdk port as well in kernel, had to test on quite other hw not only
arm64 though! so it will take time for me, I am travelling tomorrow so
bit delayed, We'll post patch to lkml and share dpdk-virtio feedback
perhaps by Monday.

>>
>> And for the former one, I'll have to check with linux-arm why iopci
>> region not mappable. By grepping kernel source all i could see two
>> commit - latest one is 415ae101 and older one 1da177e. Both has
>> nothing to explain why
>> if (mmap_state == pci_mmap_io)
>>       return -EINVAL;
>>
>> Setting pci io region to -EINVAL, should have fundamental reason for
>> sure. But we'll have to check, for that I could post as a query rather
>> a patch to lkml.
>
> No idea, such check should have been there in the very beginning
> for some reason. And here is a quote from Arnd:
>
>     Only powerpc, microblaze, alpha, sparc and xtensa allow
>     users to mmap I/O space, even though a lot of others could.
>
>>
>> Note that dpdk already has out-of-tree implementation for
>> dom0/xen-case too, it creates special device file which maps pci
>> resources. so keep one more igb_uio char device so to map iopci region
>> wont hurt much though!
>
> Xen is not a good example here; I'm even not sure it still works
> or not :)
>
>         --yliu
>> [1] http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/29531
>>
>> >         --yliu
>> >>
>> >> OR keep device file in user space (current approach)
>> >> Right now Virtio-for-arm patches are blocked on this, let me know if
>> >> someone has better approach/thought in mind.
>> >>
>> >> Thanks.
>> >>
>> >> > Then we can try a pci_mmap solution or, as you suggest, an interface in
>> >> > drivers/char/mem.c
>> >> >
  
Santosh Shukla Dec. 29, 2015, 5:56 a.m. UTC | #22
On Fri, Dec 18, 2015 at 6:25 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
>> On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla wrote:
>>> >> I guess we have done enough evaluation / investigation that suggest -
>>> >> so to map iopci region to userspace in arch agnostic-way -
>>> >>
>>> >> # either we need to modify kernel
>>> >>                - Make sure all the non-x86 arch to support mapping for
>>> >> iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
>>> >> approach though.
>>> >>             or
>>> >>                - include /dev/ioport char-mem device file who could do
>>> >> more than byte operation, Note that this implementation does not exist
>>> >> in kernel.  I could send an RFC to lkml.
>>> >
>>> > Maybe you could propose the two to lkml, to get some feedbacks from
>>> > those kernel/ARM gurus? Please cc me if you do so.
>>> >
>>>
>>> The latter one I already shared old lkml thread, Pl. revisit my v1 0/6
>>> patch [1] and in that refer [2].
>>
>> Oops, sorry, a bit busy, that I didn't look at it carefully. My bad,
>> anyway.
>>
>>> Josh has already proposed to lkml but for some reason thread didn't
>>> went far. I can restart that discussion giving dpdk use-case as an
>>> example/ requirement.
>>
>> I had a quick go through of the discussion. Both hpa and Arnd seem
>> to be fine with the ioctl interface on /dev/port. Have you tried
>> that?  And if you want to restart it, ioctl might be a better option
>> than /dev/ioport, judging from the discussion.
>>
>
> I tried legacy patch and re-writing with ioctl-way; doing changes in
> dpdk port as well in kernel, had to test on quite other hw not only
> arm64 though! so it will take time for me, I am travelling tomorrow so
> bit delayed, We'll post patch to lkml and share dpdk-virtio feedback
> perhaps by Monday.
>

I posted a query about /dev/ioports approach in lkml thread [1], and
Arnd suggested to use vfio framework but it looks like vfio too does
not map ioresource_io region. Same communicated to Arnd and waiting
for his reply.

In mean time I like to ask general question;
- Has anyone tried vfio/non-iommu approach for virtio pmd driver? If
not then Is there any plan? Someone planning to take up.
[1] https://lkml.org/lkml/2015/12/23/145
  
Anatoly Burakov Dec. 29, 2015, 9:56 a.m. UTC | #23
Hi Santosh,

> On Fri, Dec 18, 2015 at 6:25 PM, Santosh Shukla <sshukla@mvista.com>

> wrote:

> > On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu

> > <yuanhan.liu@linux.intel.com> wrote:

> >> On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla wrote:

> >>> >> I guess we have done enough evaluation / investigation that

> >>> >> suggest - so to map iopci region to userspace in arch

> >>> >> agnostic-way -

> >>> >>

> >>> >> # either we need to modify kernel

> >>> >>                - Make sure all the non-x86 arch to support

> >>> >> mapping for iopci region (i.e. pci_mmap_page_range). I don;t

> >>> >> think its a correct approach though.

> >>> >>             or

> >>> >>                - include /dev/ioport char-mem device file who

> >>> >> could do more than byte operation, Note that this implementation

> >>> >> does not exist in kernel.  I could send an RFC to lkml.

> >>> >

> >>> > Maybe you could propose the two to lkml, to get some feedbacks

> >>> > from those kernel/ARM gurus? Please cc me if you do so.

> >>> >

> >>>

> >>> The latter one I already shared old lkml thread, Pl. revisit my v1

> >>> 0/6 patch [1] and in that refer [2].

> >>

> >> Oops, sorry, a bit busy, that I didn't look at it carefully. My bad,

> >> anyway.

> >>

> >>> Josh has already proposed to lkml but for some reason thread didn't

> >>> went far. I can restart that discussion giving dpdk use-case as an

> >>> example/ requirement.

> >>

> >> I had a quick go through of the discussion. Both hpa and Arnd seem to

> >> be fine with the ioctl interface on /dev/port. Have you tried that?

> >> And if you want to restart it, ioctl might be a better option than

> >> /dev/ioport, judging from the discussion.

> >>

> >

> > I tried legacy patch and re-writing with ioctl-way; doing changes in

> > dpdk port as well in kernel, had to test on quite other hw not only

> > arm64 though! so it will take time for me, I am travelling tomorrow so

> > bit delayed, We'll post patch to lkml and share dpdk-virtio feedback

> > perhaps by Monday.

> >

> 

> I posted a query about /dev/ioports approach in lkml thread [1], and Arnd

> suggested to use vfio framework but it looks like vfio too does not map

> ioresource_io region. Same communicated to Arnd and waiting for his reply.

> 

> In mean time I like to ask general question;

> - Has anyone tried vfio/non-iommu approach for virtio pmd driver? If not

> then Is there any plan? Someone planning to take up.

> [1] https://lkml.org/lkml/2015/12/23/145


I have submitted a patch to support no-iommu mode, but I'm not aware of anyone trying VFIO-noiommu at all. That's probably expected since it's Christmas/New Year in a lot of places, and everyone is on a break.

That said, I'm not sure I completely understand what is it that you're asking about. The code you're referring to (in vfio_pci.c, line 854 as of kernel 4.3) is checking if a PCI BAR is available for IO (hence checking if the IORESOURCE_MEM bit is set). There isn't any "ioresource_mem" region as far as VFIO is concerned, there are only BARs, ROM, VGA and PCI config regions (linux/vfio.h, line 314 as of kernel 4.3). So if you're missing some PCI regions for VFIO to map, they would first need to be added to VFIO kernel implementation before they can be used by DPDK. That is, unless I'm misunderstanding something :)

Thanks,
Anatoly
  
Santosh Shukla Dec. 29, 2015, 10:47 a.m. UTC | #24
On Tue, Dec 29, 2015 at 3:26 PM, Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
> Hi Santosh,
>
>> On Fri, Dec 18, 2015 at 6:25 PM, Santosh Shukla <sshukla@mvista.com>
>> wrote:
>> > On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu
>> > <yuanhan.liu@linux.intel.com> wrote:
>> >> On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla wrote:
>> >>> >> I guess we have done enough evaluation / investigation that
>> >>> >> suggest - so to map iopci region to userspace in arch
>> >>> >> agnostic-way -
>> >>> >>
>> >>> >> # either we need to modify kernel
>> >>> >>                - Make sure all the non-x86 arch to support
>> >>> >> mapping for iopci region (i.e. pci_mmap_page_range). I don;t
>> >>> >> think its a correct approach though.
>> >>> >>             or
>> >>> >>                - include /dev/ioport char-mem device file who
>> >>> >> could do more than byte operation, Note that this implementation
>> >>> >> does not exist in kernel.  I could send an RFC to lkml.
>> >>> >
>> >>> > Maybe you could propose the two to lkml, to get some feedbacks
>> >>> > from those kernel/ARM gurus? Please cc me if you do so.
>> >>> >
>> >>>
>> >>> The latter one I already shared old lkml thread, Pl. revisit my v1
>> >>> 0/6 patch [1] and in that refer [2].
>> >>
>> >> Oops, sorry, a bit busy, that I didn't look at it carefully. My bad,
>> >> anyway.
>> >>
>> >>> Josh has already proposed to lkml but for some reason thread didn't
>> >>> went far. I can restart that discussion giving dpdk use-case as an
>> >>> example/ requirement.
>> >>
>> >> I had a quick go through of the discussion. Both hpa and Arnd seem to
>> >> be fine with the ioctl interface on /dev/port. Have you tried that?
>> >> And if you want to restart it, ioctl might be a better option than
>> >> /dev/ioport, judging from the discussion.
>> >>
>> >
>> > I tried legacy patch and re-writing with ioctl-way; doing changes in
>> > dpdk port as well in kernel, had to test on quite other hw not only
>> > arm64 though! so it will take time for me, I am travelling tomorrow so
>> > bit delayed, We'll post patch to lkml and share dpdk-virtio feedback
>> > perhaps by Monday.
>> >
>>
>> I posted a query about /dev/ioports approach in lkml thread [1], and Arnd
>> suggested to use vfio framework but it looks like vfio too does not map
>> ioresource_io region. Same communicated to Arnd and waiting for his reply.
>>
>> In mean time I like to ask general question;
>> - Has anyone tried vfio/non-iommu approach for virtio pmd driver? If not
>> then Is there any plan? Someone planning to take up.
>> [1] https://lkml.org/lkml/2015/12/23/145
>
> I have submitted a patch to support no-iommu mode, but I'm not aware of anyone trying VFIO-noiommu at all. That's probably expected since it's Christmas/New Year in a lot of places, and everyone is on a break.
>
> That said, I'm not sure I completely understand what is it that you're asking about. The code you're referring to (in vfio_pci.c, line 854 as of kernel 4.3) is checking if a PCI BAR is available for IO (hence checking if the IORESOURCE_MEM


Thanks for reply! You comment might help to move this discuss to next level.

Look at kernel/resource.c, it exports two symbol ioport_resource and
iomem_resource and sets appropriate flag type i.e.. IORESOURCE_IO and
IORESOURCE_MEM. In virtio-net case; it creates both pci region i.e..
_io bar and _mem bar. dpdk virtio pmd driver (<= 0.95 virtio spec)
uses pci _io bar region for device initialization as virtio headers
are locate at pci _io bar region. Since it uses pci _iobar region so
likely it update pci_resource.[index].flag = IORESOURCE_IO.  and vfio
mmap function wont handle ioresource_io (i guess). And that is why I
asked same to lkml thread.


bit is set). There isn't any "ioresource_mem" region as far as VFIO is
concerned, there are only BARs, ROM, VGA and PCI
config regions (linux/vfio.h, line 314 as of kernel 4.3). So if you're
missing some PCI regions for VFIO to map, they would first need to be
added to VFIO kernel implementation before they can be used by DPDK.
That is, unless I'm misunderstanding something :)
>

Agree. As mentioned above, I guess ioresource_io pci bar
implementation missing in vfio kernel? but before adding code support
in kernel I would like to hear from experts example, You, Alex!
(looping alex)

> Thanks,
> Anatoly
  
Anatoly Burakov Dec. 29, 2015, 11:06 a.m. UTC | #25
Hi Santosh,

> Look at kernel/resource.c, it exports two symbol ioport_resource and

> iomem_resource and sets appropriate flag type i.e.. IORESOURCE_IO and

> IORESOURCE_MEM. In virtio-net case; it creates both pci region i.e..

> _io bar and _mem bar. dpdk virtio pmd driver (<= 0.95 virtio spec) uses pci

> _io bar region for device initialization as virtio headers are locate at pci _io bar

> region. Since it uses pci _iobar region so likely it update

> pci_resource.[index].flag = IORESOURCE_IO.  and vfio mmap function wont

> handle ioresource_io (i guess). And that is why I asked same to lkml thread.


Yes, I can see that this is what I was misunderstanding (that IORESOURCE_IO isn't a BAR but rather a flag). Technically, if you also set IORESOURCE_MEM along with IORESOURCE_IO, then the BAR would be mapped by VFIO, but this sounds like a dangerous hack :) other than that, yes, I'm afraid it's up to kernel guys to add support for it. Once done, provided IO BARs are meant to be worked with the same way as MEM BARs, there's a good chance it would work out of the box with DPDK.

Thanks,
Anatoly
  
Santosh Shukla Dec. 29, 2015, 12:23 p.m. UTC | #26
On Tue, Dec 29, 2015 at 4:36 PM, Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
> Hi Santosh,
>
>> Look at kernel/resource.c, it exports two symbol ioport_resource and
>> iomem_resource and sets appropriate flag type i.e.. IORESOURCE_IO and
>> IORESOURCE_MEM. In virtio-net case; it creates both pci region i.e..
>> _io bar and _mem bar. dpdk virtio pmd driver (<= 0.95 virtio spec) uses pci
>> _io bar region for device initialization as virtio headers are locate at pci _io bar
>> region. Since it uses pci _iobar region so likely it update
>> pci_resource.[index].flag = IORESOURCE_IO.  and vfio mmap function wont
>> handle ioresource_io (i guess). And that is why I asked same to lkml thread.
>
> Yes, I can see that this is what I was misunderstanding (that IORESOURCE_IO isn't a BAR but rather a flag). Technically, if you also set IORESOURCE_MEM along with IORESOURCE_IO, then the BAR would be mapped by VFIO, but this sounds like a dangerous hack :) other than that, yes, I'm afraid it's up to kernel guys to add support for it. Once done, provided IO BARs are meant to be worked with the same way as MEM BARs, there's a good chance it would work out of the box with DPDK.
>

I guess so, I'll give it a try; although before that I need to port /
use your vfio/noiommu patch for virtio pmd driver and perhaps do
dependant changes, I will post my test feedback.

Thanks
> Thanks,
> Anatoly
  
Alex Williamson Dec. 29, 2015, 2:04 p.m. UTC | #27
On Tue, 2015-12-29 at 16:17 +0530, Santosh Shukla wrote:
> On Tue, Dec 29, 2015 at 3:26 PM, Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
> > Hi Santosh,
> > 
> > > On Fri, Dec 18, 2015 at 6:25 PM, Santosh Shukla <sshukla@mvista.c
> > > om>
> > > wrote:
> > > > On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu
> > > > <yuanhan.liu@linux.intel.com> wrote:
> > > > > On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla
> > > > > wrote:
> > > > > > > > I guess we have done enough evaluation / investigation
> > > > > > > > that
> > > > > > > > suggest - so to map iopci region to userspace in arch
> > > > > > > > agnostic-way -
> > > > > > > > 
> > > > > > > > # either we need to modify kernel
> > > > > > > >                - Make sure all the non-x86 arch to
> > > > > > > > support
> > > > > > > > mapping for iopci region (i.e. pci_mmap_page_range). I
> > > > > > > > don;t
> > > > > > > > think its a correct approach though.
> > > > > > > >             or
> > > > > > > >                - include /dev/ioport char-mem device
> > > > > > > > file who
> > > > > > > > could do more than byte operation, Note that this
> > > > > > > > implementation
> > > > > > > > does not exist in kernel.  I could send an RFC to lkml.
> > > > > > > 
> > > > > > > Maybe you could propose the two to lkml, to get some
> > > > > > > feedbacks
> > > > > > > from those kernel/ARM gurus? Please cc me if you do so.
> > > > > > > 
> > > > > > 
> > > > > > The latter one I already shared old lkml thread, Pl.
> > > > > > revisit my v1
> > > > > > 0/6 patch [1] and in that refer [2].
> > > > > 
> > > > > Oops, sorry, a bit busy, that I didn't look at it carefully.
> > > > > My bad,
> > > > > anyway.
> > > > > 
> > > > > > Josh has already proposed to lkml but for some reason
> > > > > > thread didn't
> > > > > > went far. I can restart that discussion giving dpdk use-
> > > > > > case as an
> > > > > > example/ requirement.
> > > > > 
> > > > > I had a quick go through of the discussion. Both hpa and Arnd
> > > > > seem to
> > > > > be fine with the ioctl interface on /dev/port. Have you tried
> > > > > that?
> > > > > And if you want to restart it, ioctl might be a better option
> > > > > than
> > > > > /dev/ioport, judging from the discussion.
> > > > > 
> > > > 
> > > > I tried legacy patch and re-writing with ioctl-way; doing
> > > > changes in
> > > > dpdk port as well in kernel, had to test on quite other hw not
> > > > only
> > > > arm64 though! so it will take time for me, I am travelling
> > > > tomorrow so
> > > > bit delayed, We'll post patch to lkml and share dpdk-virtio
> > > > feedback
> > > > perhaps by Monday.
> > > > 
> > > 
> > > I posted a query about /dev/ioports approach in lkml thread [1],
> > > and Arnd
> > > suggested to use vfio framework but it looks like vfio too does
> > > not map
> > > ioresource_io region. Same communicated to Arnd and waiting for
> > > his reply.
> > > 
> > > In mean time I like to ask general question;
> > > - Has anyone tried vfio/non-iommu approach for virtio pmd driver?
> > > If not
> > > then Is there any plan? Someone planning to take up.
> > > [1] https://lkml.org/lkml/2015/12/23/145
> > 
> > I have submitted a patch to support no-iommu mode, but I'm not
> > aware of anyone trying VFIO-noiommu at all. That's probably
> > expected since it's Christmas/New Year in a lot of places, and
> > everyone is on a break.
> > 
> > That said, I'm not sure I completely understand what is it that
> > you're asking about. The code you're referring to (in vfio_pci.c,
> > line 854 as of kernel 4.3) is checking if a PCI BAR is available
> > for IO (hence checking if the IORESOURCE_MEM
> 
> 
> Thanks for reply! You comment might help to move this discuss to next
> level.
> 
> Look at kernel/resource.c, it exports two symbol ioport_resource and
> iomem_resource and sets appropriate flag type i.e.. IORESOURCE_IO and
> IORESOURCE_MEM. In virtio-net case; it creates both pci region i.e..
> _io bar and _mem bar. dpdk virtio pmd driver (<= 0.95 virtio spec)
> uses pci _io bar region for device initialization as virtio headers
> are locate at pci _io bar region. Since it uses pci _iobar region so
> likely it update pci_resource.[index].flag = IORESOURCE_IO.  and vfio
> mmap function wont handle ioresource_io (i guess). And that is why I
> asked same to lkml thread.
> 
> 
> bit is set). There isn't any "ioresource_mem" region as far as VFIO
> is
> concerned, there are only BARs, ROM, VGA and PCI
> config regions (linux/vfio.h, line 314 as of kernel 4.3). So if
> you're
> missing some PCI regions for VFIO to map, they would first need to be
> added to VFIO kernel implementation before they can be used by DPDK.
> That is, unless I'm misunderstanding something :)
> > 
> 
> Agree. As mentioned above, I guess ioresource_io pci bar
> implementation missing in vfio kernel? but before adding code support
> in kernel I would like to hear from experts example, You, Alex!
> (looping alex)

MMIO and I/O port space are simply regions as far as VFIO is concerned.
 The userspace API supports the concept of I/O port regions that can be
mmap'd by userspace, but the implementation does not since I/O port
regions cannot be mmap'd on x86.  Someone needs to propose some code
for vfio-pci to support it.  Thanks,

Alex
  
Santosh Shukla Dec. 29, 2015, 2:51 p.m. UTC | #28
On Tue, Dec 29, 2015 at 7:34 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2015-12-29 at 16:17 +0530, Santosh Shukla wrote:
>> On Tue, Dec 29, 2015 at 3:26 PM, Burakov, Anatoly
>> <anatoly.burakov@intel.com> wrote:
>> > Hi Santosh,
>> >
>> > > On Fri, Dec 18, 2015 at 6:25 PM, Santosh Shukla <sshukla@mvista.c
>> > > om>
>> > > wrote:
>> > > > On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu
>> > > > <yuanhan.liu@linux.intel.com> wrote:
>> > > > > On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla
>> > > > > wrote:
>> > > > > > > > I guess we have done enough evaluation / investigation
>> > > > > > > > that
>> > > > > > > > suggest - so to map iopci region to userspace in arch
>> > > > > > > > agnostic-way -
>> > > > > > > >
>> > > > > > > > # either we need to modify kernel
>> > > > > > > >                - Make sure all the non-x86 arch to
>> > > > > > > > support
>> > > > > > > > mapping for iopci region (i.e. pci_mmap_page_range). I
>> > > > > > > > don;t
>> > > > > > > > think its a correct approach though.
>> > > > > > > >             or
>> > > > > > > >                - include /dev/ioport char-mem device
>> > > > > > > > file who
>> > > > > > > > could do more than byte operation, Note that this
>> > > > > > > > implementation
>> > > > > > > > does not exist in kernel.  I could send an RFC to lkml.
>> > > > > > >
>> > > > > > > Maybe you could propose the two to lkml, to get some
>> > > > > > > feedbacks
>> > > > > > > from those kernel/ARM gurus? Please cc me if you do so.
>> > > > > > >
>> > > > > >
>> > > > > > The latter one I already shared old lkml thread, Pl.
>> > > > > > revisit my v1
>> > > > > > 0/6 patch [1] and in that refer [2].
>> > > > >
>> > > > > Oops, sorry, a bit busy, that I didn't look at it carefully.
>> > > > > My bad,
>> > > > > anyway.
>> > > > >
>> > > > > > Josh has already proposed to lkml but for some reason
>> > > > > > thread didn't
>> > > > > > went far. I can restart that discussion giving dpdk use-
>> > > > > > case as an
>> > > > > > example/ requirement.
>> > > > >
>> > > > > I had a quick go through of the discussion. Both hpa and Arnd
>> > > > > seem to
>> > > > > be fine with the ioctl interface on /dev/port. Have you tried
>> > > > > that?
>> > > > > And if you want to restart it, ioctl might be a better option
>> > > > > than
>> > > > > /dev/ioport, judging from the discussion.
>> > > > >
>> > > >
>> > > > I tried legacy patch and re-writing with ioctl-way; doing
>> > > > changes in
>> > > > dpdk port as well in kernel, had to test on quite other hw not
>> > > > only
>> > > > arm64 though! so it will take time for me, I am travelling
>> > > > tomorrow so
>> > > > bit delayed, We'll post patch to lkml and share dpdk-virtio
>> > > > feedback
>> > > > perhaps by Monday.
>> > > >
>> > >
>> > > I posted a query about /dev/ioports approach in lkml thread [1],
>> > > and Arnd
>> > > suggested to use vfio framework but it looks like vfio too does
>> > > not map
>> > > ioresource_io region. Same communicated to Arnd and waiting for
>> > > his reply.
>> > >
>> > > In mean time I like to ask general question;
>> > > - Has anyone tried vfio/non-iommu approach for virtio pmd driver?
>> > > If not
>> > > then Is there any plan? Someone planning to take up.
>> > > [1] https://lkml.org/lkml/2015/12/23/145
>> >
>> > I have submitted a patch to support no-iommu mode, but I'm not
>> > aware of anyone trying VFIO-noiommu at all. That's probably
>> > expected since it's Christmas/New Year in a lot of places, and
>> > everyone is on a break.
>> >
>> > That said, I'm not sure I completely understand what is it that
>> > you're asking about. The code you're referring to (in vfio_pci.c,
>> > line 854 as of kernel 4.3) is checking if a PCI BAR is available
>> > for IO (hence checking if the IORESOURCE_MEM
>>
>>
>> Thanks for reply! You comment might help to move this discuss to next
>> level.
>>
>> Look at kernel/resource.c, it exports two symbol ioport_resource and
>> iomem_resource and sets appropriate flag type i.e.. IORESOURCE_IO and
>> IORESOURCE_MEM. In virtio-net case; it creates both pci region i.e..
>> _io bar and _mem bar. dpdk virtio pmd driver (<= 0.95 virtio spec)
>> uses pci _io bar region for device initialization as virtio headers
>> are locate at pci _io bar region. Since it uses pci _iobar region so
>> likely it update pci_resource.[index].flag = IORESOURCE_IO.  and vfio
>> mmap function wont handle ioresource_io (i guess). And that is why I
>> asked same to lkml thread.
>>
>>
>> bit is set). There isn't any "ioresource_mem" region as far as VFIO
>> is
>> concerned, there are only BARs, ROM, VGA and PCI
>> config regions (linux/vfio.h, line 314 as of kernel 4.3). So if
>> you're
>> missing some PCI regions for VFIO to map, they would first need to be
>> added to VFIO kernel implementation before they can be used by DPDK.
>> That is, unless I'm misunderstanding something :)
>> >
>>
>> Agree. As mentioned above, I guess ioresource_io pci bar
>> implementation missing in vfio kernel? but before adding code support
>> in kernel I would like to hear from experts example, You, Alex!
>> (looping alex)
>
> MMIO and I/O port space are simply regions as far as VFIO is concerned.
>  The userspace API supports the concept of I/O port regions that can be
> mmap'd by userspace, but the implementation does not since I/O port
> regions cannot be mmap'd on x86.  Someone needs to propose some code
> for vfio-pci to support it.  Thanks,
>

I will work on this and post an RFC soon to LKML, That RFC targetted
to map IOport region for non-x86 platform in particular. Thanks for
confirming the vfio behaviour.

> Alex
  
Santosh Shukla Dec. 31, 2015, 2:27 p.m. UTC | #29
On Tue, Dec 29, 2015 at 8:21 PM, Santosh Shukla <sshukla@mvista.com> wrote:
> On Tue, Dec 29, 2015 at 7:34 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
>> On Tue, 2015-12-29 at 16:17 +0530, Santosh Shukla wrote:
>>> On Tue, Dec 29, 2015 at 3:26 PM, Burakov, Anatoly
>>> <anatoly.burakov@intel.com> wrote:
>>> > Hi Santosh,
>>> >
>>> > > On Fri, Dec 18, 2015 at 6:25 PM, Santosh Shukla <sshukla@mvista.c
>>> > > om>
>>> > > wrote:
>>> > > > On Fri, Dec 18, 2015 at 1:51 PM, Yuanhan Liu
>>> > > > <yuanhan.liu@linux.intel.com> wrote:
>>> > > > > On Fri, Dec 18, 2015 at 01:24:41PM +0530, Santosh Shukla
>>> > > > > wrote:
>>> > > > > > > > I guess we have done enough evaluation / investigation
>>> > > > > > > > that
>>> > > > > > > > suggest - so to map iopci region to userspace in arch
>>> > > > > > > > agnostic-way -
>>> > > > > > > >
>>> > > > > > > > # either we need to modify kernel
>>> > > > > > > >                - Make sure all the non-x86 arch to
>>> > > > > > > > support
>>> > > > > > > > mapping for iopci region (i.e. pci_mmap_page_range). I
>>> > > > > > > > don;t
>>> > > > > > > > think its a correct approach though.
>>> > > > > > > >             or
>>> > > > > > > >                - include /dev/ioport char-mem device
>>> > > > > > > > file who
>>> > > > > > > > could do more than byte operation, Note that this
>>> > > > > > > > implementation
>>> > > > > > > > does not exist in kernel.  I could send an RFC to lkml.
>>> > > > > > >
>>> > > > > > > Maybe you could propose the two to lkml, to get some
>>> > > > > > > feedbacks
>>> > > > > > > from those kernel/ARM gurus? Please cc me if you do so.
>>> > > > > > >
>>> > > > > >
>>> > > > > > The latter one I already shared old lkml thread, Pl.
>>> > > > > > revisit my v1
>>> > > > > > 0/6 patch [1] and in that refer [2].
>>> > > > >
>>> > > > > Oops, sorry, a bit busy, that I didn't look at it carefully.
>>> > > > > My bad,
>>> > > > > anyway.
>>> > > > >
>>> > > > > > Josh has already proposed to lkml but for some reason
>>> > > > > > thread didn't
>>> > > > > > went far. I can restart that discussion giving dpdk use-
>>> > > > > > case as an
>>> > > > > > example/ requirement.
>>> > > > >
>>> > > > > I had a quick go through of the discussion. Both hpa and Arnd
>>> > > > > seem to
>>> > > > > be fine with the ioctl interface on /dev/port. Have you tried
>>> > > > > that?
>>> > > > > And if you want to restart it, ioctl might be a better option
>>> > > > > than
>>> > > > > /dev/ioport, judging from the discussion.
>>> > > > >
>>> > > >
>>> > > > I tried legacy patch and re-writing with ioctl-way; doing
>>> > > > changes in
>>> > > > dpdk port as well in kernel, had to test on quite other hw not
>>> > > > only
>>> > > > arm64 though! so it will take time for me, I am travelling
>>> > > > tomorrow so
>>> > > > bit delayed, We'll post patch to lkml and share dpdk-virtio
>>> > > > feedback
>>> > > > perhaps by Monday.
>>> > > >
>>> > >
>>> > > I posted a query about /dev/ioports approach in lkml thread [1],
>>> > > and Arnd
>>> > > suggested to use vfio framework but it looks like vfio too does
>>> > > not map
>>> > > ioresource_io region. Same communicated to Arnd and waiting for
>>> > > his reply.
>>> > >
>>> > > In mean time I like to ask general question;
>>> > > - Has anyone tried vfio/non-iommu approach for virtio pmd driver?
>>> > > If not
>>> > > then Is there any plan? Someone planning to take up.
>>> > > [1] https://lkml.org/lkml/2015/12/23/145
>>> >
>>> > I have submitted a patch to support no-iommu mode, but I'm not
>>> > aware of anyone trying VFIO-noiommu at all. That's probably
>>> > expected since it's Christmas/New Year in a lot of places, and
>>> > everyone is on a break.
>>> >
>>> > That said, I'm not sure I completely understand what is it that
>>> > you're asking about. The code you're referring to (in vfio_pci.c,
>>> > line 854 as of kernel 4.3) is checking if a PCI BAR is available
>>> > for IO (hence checking if the IORESOURCE_MEM
>>>
>>>
>>> Thanks for reply! You comment might help to move this discuss to next
>>> level.
>>>
>>> Look at kernel/resource.c, it exports two symbol ioport_resource and
>>> iomem_resource and sets appropriate flag type i.e.. IORESOURCE_IO and
>>> IORESOURCE_MEM. In virtio-net case; it creates both pci region i.e..
>>> _io bar and _mem bar. dpdk virtio pmd driver (<= 0.95 virtio spec)
>>> uses pci _io bar region for device initialization as virtio headers
>>> are locate at pci _io bar region. Since it uses pci _iobar region so
>>> likely it update pci_resource.[index].flag = IORESOURCE_IO.  and vfio
>>> mmap function wont handle ioresource_io (i guess). And that is why I
>>> asked same to lkml thread.
>>>
>>>
>>> bit is set). There isn't any "ioresource_mem" region as far as VFIO
>>> is
>>> concerned, there are only BARs, ROM, VGA and PCI
>>> config regions (linux/vfio.h, line 314 as of kernel 4.3). So if
>>> you're
>>> missing some PCI regions for VFIO to map, they would first need to be
>>> added to VFIO kernel implementation before they can be used by DPDK.
>>> That is, unless I'm misunderstanding something :)
>>> >
>>>
>>> Agree. As mentioned above, I guess ioresource_io pci bar
>>> implementation missing in vfio kernel? but before adding code support
>>> in kernel I would like to hear from experts example, You, Alex!
>>> (looping alex)
>>
>> MMIO and I/O port space are simply regions as far as VFIO is concerned.
>>  The userspace API supports the concept of I/O port regions that can be
>> mmap'd by userspace, but the implementation does not since I/O port
>> regions cannot be mmap'd on x86.  Someone needs to propose some code
>> for vfio-pci to support it.  Thanks,
>>
>
> I will work on this and post an RFC soon to LKML, That RFC targetted
> to map IOport region for non-x86 platform in particular. Thanks for
> confirming the vfio behaviour.
>

Some update:

Patches used for setup -
- Used Alex(s) vfio-noiommu linux-next patch
- then used Anatoly(s) vfio-noiommu dpdk patch [1]
- Added ioport pci bar map code support in kernel
- Did similar change at dpdk(s) eal_pci_vfio.c file

I could able to run testpmd on virtio-net dpdk's pmd driver..
I am cleaning the patches / changes in kernel and dpdk, soon posting
the v3 patchset..

Thanks for the input!

[1] http://dpdk.org/dev/patchwork/patch/9619/

>> Alex
  

Patch

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 334c12e..8aaab4a 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -105,7 +105,8 @@  extern struct pci_device_list pci_device_list; /**< Global list of PCI devices.
 /** Nb. of values in PCI resource format. */
 #define PCI_RESOURCE_FMT_NVAL 3
 
-/** IO resource type: memory address space */
+/** IO resource type: */
+#define IORESOURCE_IO         0x00000100
 #define IORESOURCE_MEM        0x00000200
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bc5b5be..9c4651d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -236,12 +236,21 @@  pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
 			goto error;
 		}
 
-		if (flags & IORESOURCE_MEM) {
-			dev->mem_resource[i].phys_addr = phys_addr;
-			dev->mem_resource[i].len = end_addr - phys_addr + 1;
-			/* not mapped for now */
-			dev->mem_resource[i].addr = NULL;
-		}
+		/* we only care about IORESOURCE_IO or IORESOURCE_MEM */
+		if (!(flags & IORESOURCE_IO) &&
+		    !(flags & IORESOURCE_MEM))
+			continue;
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
+		/* x86 can not remap ioports, so skip it, remapping code will
+		 * look at dev->mem_resource[i].phys_addr == 0 and skip it */
+		if (flags & IORESOURCE_IO)
+			continue;
+#endif
+		dev->mem_resource[i].phys_addr = phys_addr;
+		dev->mem_resource[i].len = end_addr - phys_addr + 1;
+		/* not mapped for now */
+		dev->mem_resource[i].addr = NULL;
 	}
 	fclose(f);
 	return 0;