[v2] bus/pci: resolve multiple NICs address conflicts

Message ID 78A93308629D474AA53B84C5879E84D24B102602@DGGEMM533-MBX.china.huawei.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v2] bus/pci: resolve multiple NICs address conflicts |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-compilation success Compile Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Wangyu (Eric) Nov. 5, 2019, 7:26 a.m. UTC
  NIC address conflicts on 64K pagesize when using multiple NICs,
as system will mmap 64K pagesize for NIC,
but dev->mem_resource[i].len is 16K.

Signed-off-by: Beard-627 <dengxiaofeng@huawei.com>
Signed-off-by: Eric wang <seven.wangyu@huawei.com>
Acked-by: Wei Hu <xavier.huwei@huawei.com>
Acked-by: Min Hu <humin29@huawei.com>
---
 drivers/bus/pci/linux/pci.c | 5 +++++
 1 file changed, 5 insertions(+)

--
1.8.3.1
  

Comments

David Marchand Nov. 5, 2019, 2:33 p.m. UTC | #1
On Tue, Nov 5, 2019 at 8:27 AM Wangyu (Turing Solution Development
Dep) <seven.wangyu@huawei.com> wrote:
>
>
> NIC address conflicts on 64K pagesize when using multiple NICs,
> as system will mmap 64K pagesize for NIC,
> but dev->mem_resource[i].len is 16K.

Please, can you describe the problem you want to fix?
Is this a problem specific to a pci device you are using?

Thanks.

>
> Signed-off-by: Beard-627 <dengxiaofeng@huawei.com>
> Signed-off-by: Eric wang <seven.wangyu@huawei.com>
> Acked-by: Wei Hu <xavier.huwei@huawei.com>
> Acked-by: Min Hu <humin29@huawei.com>
> ---
>  drivers/bus/pci/linux/pci.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 43debaa..afaa68d 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -201,6 +201,11 @@
>                 if (flags & IORESOURCE_MEM) {
>                         dev->mem_resource[i].phys_addr = phys_addr;
>                         dev->mem_resource[i].len = end_addr - phys_addr + 1;
> +                       if (dev->mem_resource[i].len <
> +                               (unsigned int)getpagesize())
> +
> +                               dev->mem_resource[i].len =
> +                                       (unsigned int)getpagesize();
>                         /* not mapped for now */
>                         dev->mem_resource[i].addr = NULL;
>                 }
> --
> 1.8.3.1
  
Wangyu (Eric) Nov. 6, 2019, 6:15 a.m. UTC | #2
In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K. 
Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.
So if change the size to first NIC address + 64K, problem solved.

-----邮件原件-----
发件人: David Marchand [mailto:david.marchand@redhat.com]
发送时间: 2019年11月5日 22:33
收件人: Wangyu (Turing Solution Development Dep) <seven.wangyu@huawei.com>
抄送: dev@dpdk.org; ferruh.yigit@intel.com; Linuxarm <linuxarm@huawei.com>; humin (Q) <humin29@huawei.com>; Liyuan (Larry) <Larry.T@huawei.com>; dengxiaofeng <dengxiaofeng@huawei.com>
主题: Re: [dpdk-dev] [PATCH v2] bus/pci: resolve multiple NICs address conflicts

On Tue, Nov 5, 2019 at 8:27 AM Wangyu (Turing Solution Development
Dep) <seven.wangyu@huawei.com> wrote:
>
>
> NIC address conflicts on 64K pagesize when using multiple NICs, as 
> system will mmap 64K pagesize for NIC, but dev->mem_resource[i].len is 
> 16K.

Please, can you describe the problem you want to fix?
Is this a problem specific to a pci device you are using?

Thanks.

>
> Signed-off-by: Beard-627 <dengxiaofeng@huawei.com>
> Signed-off-by: Eric wang <seven.wangyu@huawei.com>
> Acked-by: Wei Hu <xavier.huwei@huawei.com>
> Acked-by: Min Hu <humin29@huawei.com>
> ---
>  drivers/bus/pci/linux/pci.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c 
> index 43debaa..afaa68d 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -201,6 +201,11 @@
>                 if (flags & IORESOURCE_MEM) {
>                         dev->mem_resource[i].phys_addr = phys_addr;
>                         dev->mem_resource[i].len = end_addr - 
> phys_addr + 1;
> +                       if (dev->mem_resource[i].len <
> +                               (unsigned int)getpagesize())
> +
> +                               dev->mem_resource[i].len =
> +                                       (unsigned int)getpagesize();
>                         /* not mapped for now */
>                         dev->mem_resource[i].addr = NULL;
>                 }
> --
> 1.8.3.1


--
David Marchand
  
David Marchand Nov. 6, 2019, 7:37 a.m. UTC | #3
On Wed, Nov 6, 2019 at 7:16 AM Wangyu (Eric) <seven.wangyu@huawei.com> wrote:
>
>
> In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K.
> Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.

Do you see this issue with vfio?


> So if change the size to first NIC address + 64K, problem solved.

You are hacking a description of the device resources to workaround a problem.
This patch is a no go for me.

Maybe there is something to do with the hint passed to mmap in uio case.
Adding Anatoly to the loop.
  
Burakov, Anatoly Nov. 6, 2019, 10:29 a.m. UTC | #4
On 06-Nov-19 6:15 AM, Wangyu (Eric) wrote:
> 
> In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K.
> Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.
> So if change the size to first NIC address + 64K, problem solved.
> 

It seems like something that should be fixed at the mapping stage, 
rather than modifying length of the resources.
  
Burakov, Anatoly Nov. 6, 2019, 10:35 a.m. UTC | #5
On 06-Nov-19 7:37 AM, David Marchand wrote:
> On Wed, Nov 6, 2019 at 7:16 AM Wangyu (Eric) <seven.wangyu@huawei.com> wrote:
>>
>>
>> In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K.
>> Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.
> 
> Do you see this issue with vfio?
> 
> 
>> So if change the size to first NIC address + 64K, problem solved.
> 
> You are hacking a description of the device resources to workaround a problem.
> This patch is a no go for me.
> 
> Maybe there is something to do with the hint passed to mmap in uio case.
> Adding Anatoly to the loop.
> 

We map BAR's with arbitrary addresses, so i don't think we can do much 
here - if mmap() returns page-unaligned addresses on platforms with 64K 
sizes, it's not our fault.

That said, we could simply page-align mapping requests.
  
Burakov, Anatoly Nov. 6, 2019, 11:14 a.m. UTC | #6
On 06-Nov-19 7:37 AM, David Marchand wrote:
> On Wed, Nov 6, 2019 at 7:16 AM Wangyu (Eric) <seven.wangyu@huawei.com> wrote:
>>
>>
>> In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K.
>> Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.
> 
> Do you see this issue with vfio?
> 
> 
>> So if change the size to first NIC address + 64K, problem solved.
> 
> You are hacking a description of the device resources to workaround a problem.
> This patch is a no go for me.
> 
> Maybe there is something to do with the hint passed to mmap in uio case.
> Adding Anatoly to the loop.
> 
> 

I did a quick code inspection for VFIO and UIO. We do the same thing in 
both, so both code paths can be for all intents and purposes considered 
equivalent.

To reserve mappings for addresses, we start at some arbitrary address 
(find_max_va_end()), and start mapping from there. Then, we do an mmap() 
*and overwrite* whatever address we expected to get, and then the next 
address is (current.addr + current.len).

The mmap() is called without MAP_FIXED, so we get an address the kernel 
feels comfortable for us to get. Meaning, even if the initial address 
hint was not page-aligned, the return value from mmap() will be 
page-aligned. It seems to me that your platform/kernel does not do that, 
and allows mmap() to return page-unaligned addresses. I would strongly 
suggest checking the mmap() return address on your platform (in either 
UIO or VFIO - they both do it about the same way).

We could work around that by doing (next_addr = 
RTE_PTR_ALIGN(current.addr + current.len, pagesize)), but to me it seems 
like a bug in your kernel/mmap() implementation. This is an easy fix 
though, and i'm sure we can put in a workaround like i described.
  
Wangyu (Eric) Nov. 7, 2019, 3:17 a.m. UTC | #7
Hi, Anatoly

We map BAR's with arbitrary addresses, so i don't think we can do much here - if mmap() returns page-unaligned addresses on platforms with 64K sizes, it's not our fault.

    	Maybe we have some misunderstanding here. mmap() returns page-aligned on my system, but the man page of mmap() ,we found this " offset must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE) ".
	So the mmap() must apply of one or more page size, in 64K system, it's bigger than 16K. When calculate second NIC address(first NIC address + 16K), the problem arise. I prefer to say that it is a matching problem between software and system.

		We are using Centos 7.6.1810 (AltArch) on arm server, the default page size is 64K, so we can't do much here.


Best Regards,
Xiaofeng Deng

Turing Business Unit, HiSilicon, HUAWEI
Mobile: (+86)18219206880
Email: dengxiaofeng@huawei.com

-----邮件原件-----
发件人: Burakov, Anatoly [mailto:anatoly.burakov@intel.com] 
发送时间: 2019年11月6日 18:35
收件人: David Marchand <david.marchand@redhat.com>; Wangyu (Eric) <seven.wangyu@huawei.com>
抄送: dengxiaofeng <dengxiaofeng@huawei.com>; dev@dpdk.org; ferruh.yigit@intel.com; Linuxarm <linuxarm@huawei.com>; humin (Q) <humin29@huawei.com>; Liyuan (Larry) <Larry.T@huawei.com>
主题: Re: [dpdk-dev] [PATCH v2] bus/pci: resolve multiple NICs address conflicts

On 06-Nov-19 7:37 AM, David Marchand wrote:
> On Wed, Nov 6, 2019 at 7:16 AM Wangyu (Eric) <seven.wangyu@huawei.com> wrote:
>>
>>
>> In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K.
>> Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.
> 
> Do you see this issue with vfio?
> 
> 
>> So if change the size to first NIC address + 64K, problem solved.
> 
> You are hacking a description of the device resources to workaround a problem.
> This patch is a no go for me.
> 
> Maybe there is something to do with the hint passed to mmap in uio case.
> Adding Anatoly to the loop.
>
  
Wangyu (Eric) Nov. 7, 2019, 5:44 a.m. UTC | #8
Hi, Anatoly

Thank you for advices. This problem will happen in both VFIO and UIO, I will modify both according to your advices and test them.

I did some tests with mmap() on my system, when I provided address not page-aligned, mmap() could return page-aligned address too, but the code will return fault because mmap() return address was not equal with address I provided(problem occurs in pci_uio_map_secondary()).


-----邮件原件-----
发件人: Burakov, Anatoly [mailto:anatoly.burakov@intel.com] 
发送时间: 2019年11月6日 19:15
收件人: David Marchand <david.marchand@redhat.com>; Wangyu (Eric) <seven.wangyu@huawei.com>
抄送: dengxiaofeng <dengxiaofeng@huawei.com>; dev@dpdk.org; ferruh.yigit@intel.com; Linuxarm <linuxarm@huawei.com>; humin (Q) <humin29@huawei.com>; Liyuan (Larry) <Larry.T@huawei.com>
主题: Re: [dpdk-dev] [PATCH v2] bus/pci: resolve multiple NICs address conflicts

On 06-Nov-19 7:37 AM, David Marchand wrote:
> On Wed, Nov 6, 2019 at 7:16 AM Wangyu (Eric) <seven.wangyu@huawei.com> wrote:
>>
>>
>> In 64K pagesize system, DPDK will read the size NIC need in uio/uio1/maps/map1/size,  when the size small than pagesize(e.g.,82599 is 16K), dev->mem_resource[i].len will be 16K, but the mmap function applies for at least 1 page size, which is 64K.
>> Then second NIC mmap, start address is first NIC address + 16K, which already used by first NIC.
> 
> Do you see this issue with vfio?
> 
> 
>> So if change the size to first NIC address + 64K, problem solved.
> 
> You are hacking a description of the device resources to workaround a problem.
> This patch is a no go for me.
> 
> Maybe there is something to do with the hint passed to mmap in uio case.
> Adding Anatoly to the loop.
> 
> 

I did a quick code inspection for VFIO and UIO. We do the same thing in both, so both code paths can be for all intents and purposes considered equivalent.

To reserve mappings for addresses, we start at some arbitrary address (find_max_va_end()), and start mapping from there. Then, we do an mmap() *and overwrite* whatever address we expected to get, and then the next address is (current.addr + current.len).

The mmap() is called without MAP_FIXED, so we get an address the kernel feels comfortable for us to get. Meaning, even if the initial address hint was not page-aligned, the return value from mmap() will be page-aligned. It seems to me that your platform/kernel does not do that, and allows mmap() to return page-unaligned addresses. I would strongly suggest checking the mmap() return address on your platform (in either UIO or VFIO - they both do it about the same way).

We could work around that by doing (next_addr = RTE_PTR_ALIGN(current.addr + current.len, pagesize)), but to me it seems like a bug in your kernel/mmap() implementation. This is an easy fix though, and i'm sure we can put in a workaround like i described.

--
Thanks,
Anatoly
  
Burakov, Anatoly Nov. 7, 2019, 12:24 p.m. UTC | #9
On 07-Nov-19 5:44 AM, Wangyu (Eric) wrote:
> Hi, Anatoly
> 
> Thank you for advices. This problem will happen in both VFIO and UIO, I will modify both according to your advices and test them.
> 
> I did some tests with mmap() on my system, when I provided address not page-aligned, mmap() could return page-aligned address too, but the code will return fault because mmap() return address was not equal with address I provided(problem occurs in pci_uio_map_secondary()).
> 

I still don't understand how do you get addresses aligned on a 16K 
boundary with 64K page size.

The mapping process is as follows:

0) start with max_va_end, or with previous addres + previous len
1) reserve virtual area with mmap() (accepts any return address)
2) map the BAR with MAP_FIXED (checks return address, but should work 
because we already have that area reserved)

The error you're referring to would've happened at step 2 (MAP_FIXED 
with unaligned addresses will cause the mmap() to fail), but at that 
point we already have a valid virtual area for the bar. If you get a 
16K-aligned page address for the BAR, you get it on step 1, not step 2.

So, if, by your own admission, your mmap() implementation does return a 
64K-aligned address... What exactly is the issue then? How does your BAR 
end up with an invalid address?
  
Wangyu (Eric) Nov. 11, 2019, 9:37 a.m. UTC | #10
Sorry, I didn't explain it clearly, and I will explain this problem step by step.

Precondition, we have a 64K page size system and two 82599 NICs. The memory required for each NIC is as follows:
Map0 : size = 0x0000000000400000
Map1 : size = 0x0000000000004000

1. Primary process start, process mmap() first NIC map0, mmap()'s input address is 0x8202000000, and output address is 0x8202000000, size is 0x0000000000400000, next_addr is 0x8202400000, mmap() executed correctly.

2. Primary mmap() first NIC map1, mmap()'s input address is 0x8202400000, and output address is 0x8202400000, size is 0x0000000000004000, next_addr is 0x8202404000, now mmap() applied from 0x8202400000 to 0x8202410000 actually(because page size is 64K), and next_addr is 0x8202404000, but mmap() executed correctly.

3. Primary mmap() second NIC map0, mmap()'s input address is 0x8202404000, but it's conflict, so output address is 0xffffbcdc0000(system assigned), size is 0x0000000000400000, next_addr is 0xffffbd1c0000, now the address is abnormal, and mmap() executed correctly.

4. Primary mmap() second NIC map1, mmap()'s input address is 0xffffbd1c0000, and output address is 0xffffbcdb0000 (system assigned), size is 0x0000000000004000, now the address is abnormal, and mmap() executed correctly.

5. Secondary process start, process mmap() first NIC map0, it's normal.

6. Secondary process mmap() first NIC map1, it's normal.

7. Secondary process mmap() second NIC map0, mmap()'s input address is 0xffffbcdc0000, but it's conflict on secondary process, so we get another address, but secondary will check if the input address is equal with output address, it's not equal, so secondary will exit with " Cannot mmap device resource file %s to address: %p ".


Now I use (next_addr = RTE_PTR_ALIGN(current.addr + current.len, pagesize)) to solve the problem, and it worked. If it is right, I will submit a patch later.

By the way, I made a mistake, the problem won't happen on VFIO, because VFIO don't apply for 16K memory, only apply for 4M size(map0).But I think VFIO also needs to be modified.


-----邮件原件-----
发件人: Burakov, Anatoly [mailto:anatoly.burakov@intel.com] 
发送时间: 2019年11月7日 20:25
收件人: Wangyu (Eric) <seven.wangyu@huawei.com>; David Marchand <david.marchand@redhat.com>
抄送: dev@dpdk.org; ferruh.yigit@intel.com; Linuxarm <linuxarm@huawei.com>; humin (Q) <humin29@huawei.com>; Liyuan (Larry) <Larry.T@huawei.com>; dengxiaofeng <dengxiaofeng@huawei.com>
主题: Re: 答复: [dpdk-dev] [PATCH v2] bus/pci: resolve multiple NICs address conflicts

On 07-Nov-19 5:44 AM, Wangyu (Eric) wrote:
> Hi, Anatoly
> 
> Thank you for advices. This problem will happen in both VFIO and UIO, I will modify both according to your advices and test them.
> 
> I did some tests with mmap() on my system, when I provided address not page-aligned, mmap() could return page-aligned address too, but the code will return fault because mmap() return address was not equal with address I provided(problem occurs in pci_uio_map_secondary()).
> 

I still don't understand how do you get addresses aligned on a 16K boundary with 64K page size.

The mapping process is as follows:

0) start with max_va_end, or with previous addres + previous len
1) reserve virtual area with mmap() (accepts any return address)
2) map the BAR with MAP_FIXED (checks return address, but should work because we already have that area reserved)

The error you're referring to would've happened at step 2 (MAP_FIXED with unaligned addresses will cause the mmap() to fail), but at that point we already have a valid virtual area for the bar. If you get a 16K-aligned page address for the BAR, you get it on step 1, not step 2.

So, if, by your own admission, your mmap() implementation does return a 64K-aligned address... What exactly is the issue then? How does your BAR end up with an invalid address?

--
Thanks,
Anatoly
  
Burakov, Anatoly Nov. 11, 2019, 1:07 p.m. UTC | #11
On 11-Nov-19 9:37 AM, Wangyu (Eric) wrote:
> 
> Sorry, I didn't explain it clearly, and I will explain this problem step by step.
> 
> Precondition, we have a 64K page size system and two 82599 NICs. The memory required for each NIC is as follows:
> Map0 : size = 0x0000000000400000
> Map1 : size = 0x0000000000004000
> 
> 1. Primary process start, process mmap() first NIC map0, mmap()'s input address is 0x8202000000, and output address is 0x8202000000, size is 0x0000000000400000, next_addr is 0x8202400000, mmap() executed correctly.
> 
> 2. Primary mmap() first NIC map1, mmap()'s input address is 0x8202400000, and output address is 0x8202400000, size is 0x0000000000004000, next_addr is 0x8202404000, now mmap() applied from 0x8202400000 to 0x8202410000 actually(because page size is 64K), and next_addr is 0x8202404000, but mmap() executed correctly.
> 
> 3. Primary mmap() second NIC map0, mmap()'s input address is 0x8202404000, but it's conflict, so output address is 0xffffbcdc0000(system assigned), size is 0x0000000000400000, next_addr is 0xffffbd1c0000, now the address is abnormal, and mmap() executed correctly.
> 
> 4. Primary mmap() second NIC map1, mmap()'s input address is 0xffffbd1c0000, and output address is 0xffffbcdb0000 (system assigned), size is 0x0000000000004000, now the address is abnormal, and mmap() executed correctly.
> 
> 5. Secondary process start, process mmap() first NIC map0, it's normal.
> 
> 6. Secondary process mmap() first NIC map1, it's normal.
> 
> 7. Secondary process mmap() second NIC map0, mmap()'s input address is 0xffffbcdc0000, but it's conflict on secondary process, so we get another address, but secondary will check if the input address is equal with output address, it's not equal, so secondary will exit with " Cannot mmap device resource file %s to address: %p ".
> 
> 
> Now I use (next_addr = RTE_PTR_ALIGN(current.addr + current.len, pagesize)) to solve the problem, and it worked. If it is right, I will submit a patch later.
> 
> By the way, I made a mistake, the problem won't happen on VFIO, because VFIO don't apply for 16K memory, only apply for 4M size(map0).But I think VFIO also needs to be modified.
> 

OK, that makes more sense. In that case, aligning next address on page 
boundary is the right approach.
  
Gavin Hu Nov. 12, 2019, 6:37 a.m. UTC | #12
Hi Eric,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Wangyu (Eric)
> Sent: Monday, November 11, 2019 5:38 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev@dpdk.org; ferruh.yigit@intel.com; Linuxarm
> <linuxarm@huawei.com>; humin (Q) <humin29@huawei.com>; Liyuan
> (Larry) <Larry.T@huawei.com>; dengxiaofeng <dengxiaofeng@huawei.com>
> Subject: [dpdk-dev] 答复: 答复: [PATCH v2] bus/pci: resolve multiple NICs
> address conflicts
> 
> 
> Sorry, I didn't explain it clearly, and I will explain this problem step by step.
> 
> Precondition, we have a 64K page size system and two 82599 NICs. The
> memory required for each NIC is as follows:
> Map0 : size = 0x0000000000400000
> Map1 : size = 0x0000000000004000
> 
> 1. Primary process start, process mmap() first NIC map0, mmap()'s input
> address is 0x8202000000, and output address is 0x8202000000, size is
> 0x0000000000400000, next_addr is 0x8202400000, mmap() executed
> correctly.
> 
> 2. Primary mmap() first NIC map1, mmap()'s input address is 0x8202400000,
> and output address is 0x8202400000, size is 0x0000000000004000,
> next_addr is 0x8202404000, now mmap() applied from 0x8202400000 to
> 0x8202410000 actually(because page size is 64K), and next_addr is
> 0x8202404000, but mmap() executed correctly.
So the problem begins to happen here, next_addr should be equal to: 0x8202400000 + 64K(other than 16K) = 0x8202410000, taking account of the real mapping size(page size).  
> 
> 3. Primary mmap() second NIC map0, mmap()'s input address is
> 0x8202404000, but it's conflict, so output address is 0xffffbcdc0000(system
> assigned), size is 0x0000000000400000, next_addr is 0xffffbd1c0000, now
> the address is abnormal, and mmap() executed correctly.
If the step 2) is correct with the next_addr, this step should go on correctly with addresses.
> 
> 4. Primary mmap() second NIC map1, mmap()'s input address is
> 0xffffbd1c0000, and output address is 0xffffbcdb0000 (system assigned),
> size is 0x0000000000004000, now the address is abnormal, and mmap()
> executed correctly.
> 
> 5. Secondary process start, process mmap() first NIC map0, it's normal.
> 
> 6. Secondary process mmap() first NIC map1, it's normal.
> 
> 7. Secondary process mmap() second NIC map0, mmap()'s input address is
> 0xffffbcdc0000, but it's conflict on secondary process, so we get another
> address, but secondary will check if the input address is equal with output
> address, it's not equal, so secondary will exit with " Cannot mmap device
> resource file %s to address: %p ".
> 
> 
> Now I use (next_addr = RTE_PTR_ALIGN(current.addr + current.len,
> pagesize)) to solve the problem, and it worked. If it is right, I will submit a
> patch later.
> 
> By the way, I made a mistake, the problem won't happen on VFIO, because
> VFIO don't apply for 16K memory, only apply for 4M size(map0).But I think
> VFIO also needs to be modified.
> 
> 
> -----邮件原件-----
> 发件人: Burakov, Anatoly [mailto:anatoly.burakov@intel.com]
> 发送时间: 2019年11月7日 20:25
> 收件人: Wangyu (Eric) <seven.wangyu@huawei.com>; David Marchand
> <david.marchand@redhat.com>
> 抄送: dev@dpdk.org; ferruh.yigit@intel.com; Linuxarm
> <linuxarm@huawei.com>; humin (Q) <humin29@huawei.com>; Liyuan
> (Larry) <Larry.T@huawei.com>; dengxiaofeng <dengxiaofeng@huawei.com>
> 主题: Re: 答复: [dpdk-dev] [PATCH v2] bus/pci: resolve multiple NICs
> address conflicts
> 
> On 07-Nov-19 5:44 AM, Wangyu (Eric) wrote:
> > Hi, Anatoly
> >
> > Thank you for advices. This problem will happen in both VFIO and UIO, I
> will modify both according to your advices and test them.
> >
> > I did some tests with mmap() on my system, when I provided address not
> page-aligned, mmap() could return page-aligned address too, but the code
> will return fault because mmap() return address was not equal with address
> I provided(problem occurs in pci_uio_map_secondary()).
> >
> 
> I still don't understand how do you get addresses aligned on a 16K boundary
> with 64K page size.
> 
> The mapping process is as follows:
> 
> 0) start with max_va_end, or with previous addres + previous len
> 1) reserve virtual area with mmap() (accepts any return address)
> 2) map the BAR with MAP_FIXED (checks return address, but should work
> because we already have that area reserved)
> 
> The error you're referring to would've happened at step 2 (MAP_FIXED with
> unaligned addresses will cause the mmap() to fail), but at that point we
> already have a valid virtual area for the bar. If you get a 16K-aligned page
> address for the BAR, you get it on step 1, not step 2.
> 
> So, if, by your own admission, your mmap() implementation does return a
> 64K-aligned address... What exactly is the issue then? How does your BAR
> end up with an invalid address?
> 
> --
> Thanks,
> Anatoly
  

Patch

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 43debaa..afaa68d 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -201,6 +201,11 @@ 
 		if (flags & IORESOURCE_MEM) {
 			dev->mem_resource[i].phys_addr = phys_addr;
 			dev->mem_resource[i].len = end_addr - phys_addr + 1;
+			if (dev->mem_resource[i].len <
+				(unsigned int)getpagesize())
+
+				dev->mem_resource[i].len =
+					(unsigned int)getpagesize();
 			/* not mapped for now */
 			dev->mem_resource[i].addr = NULL;
 		}