[dpdk-dev] vhost-user technical isssues

Message ID 546579A3.3010804@igel.co.jp (mailing list archive)
State Rejected, archived
Headers

Commit Message

Tetsuya Mukawa Nov. 14, 2014, 3:40 a.m. UTC
  Hi Lin,

(2014/11/14 12:13), Linhaifeng wrote:
>
> size should be same as mmap and
> guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
>

Thanks. It should be.
How about following patch?

-------------------------------------------------------
#define QEMU_CMD_CHR " -chardev socket,id=chr0,path=%s"
#define QEMU_CMD_NETDEV " -netdev
vhost-user,id=net0,chardev=chr0,vhostforce"
@@ -221,13 +221,16 @@ static void read_guest_mem(void)

/* check for sanity */
g_assert_cmpint(fds_num, >, 0);
- g_assert_cmpint(fds_num, ==, memory.nregions);
+ //g_assert_cmpint(fds_num, ==, memory.nregions);

+ fprintf(stderr, "%s(%d)\n", __func__, __LINE__);
/* iterate all regions */
for (i = 0; i < fds_num; i++) {
+ int ret = 0;

/* We'll check only the region statring at 0x0*/
- if (memory.regions[i].guest_phys_addr != 0x0) {
+ if (memory.regions[i].guest_phys_addr == 0x0) {
+ close(fds[i]);
continue;
}

@@ -237,6 +240,7 @@ static void read_guest_mem(void)

guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fds[i], 0);
+ fprintf(stderr, "region=%d, mmap=%p, size=%lu\n", i, guest_mem, size);

g_assert(guest_mem != MAP_FAILED);
guest_mem += (memory.regions[i].mmap_offset / sizeof(*guest_mem));
@@ -247,8 +251,10 @@ static void read_guest_mem(void)

g_assert_cmpint(a, ==, b);
}
-
- munmap(guest_mem, memory.regions[i].memory_size);
+ guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
+ ret = munmap(guest_mem, memory.regions[i].memory_size);
+ fprintf(stderr, "region=%d, munmap=%p, size=%lu, ret=%d\n",
+ i, guest_mem, size, ret);
}

g_assert_cmpint(1, ==, 1);
-------------------------------------------------------
I am using 1GB hugepage size.

$ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
region=0, mmap=0x2aaac0000000, size=6291456000
region=0, munmap=0x2aaac0000000, size=6291456000, ret=-1 << failed

6291456000 is not aligned by 1GB.
When I specify 4096MB as guest memory size, munmap() doesn't return
error like following.

$ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
region=0, mmap=0x2aaac0000000, size=4294967296
region=0, munmap=0x2aaac0000000, size=4294967296, ret=0

Thanks,
Tetsuya
  

Comments

Tetsuya Mukawa Nov. 14, 2014, 4:05 a.m. UTC | #1
(2014/11/14 12:40), Tetsuya Mukawa wrote:
> I am using 1GB hugepage size.
>
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac0000000, size=6291456000
> region=0, munmap=0x2aaac0000000, size=6291456000, ret=-1 << failed
>
> 6291456000 is not aligned by 1GB.
> When I specify 4096MB as guest memory size, munmap() doesn't return
> error like following.
>
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac0000000, size=4294967296
> region=0, munmap=0x2aaac0000000, size=4294967296, ret=0
>
>
Also I've checked mmap2 and munmap implementation of current linux kernel.
When a file on hugetlbfs is mapped, 'size' will be aligned by hugepages
size in some case.
But when munmap is called, 'size' will be aligned by PAGE_SIZE.
It mean we cannot use same 'size' value for mmap and munmap in some case.
I guess this implementation or specification cases the munmap issue.

Thanks,
Tetsuya
  
Linhaifeng Nov. 14, 2014, 4:42 a.m. UTC | #2
On 2014/11/14 11:40, Tetsuya Mukawa wrote:
> Hi Lin,
> 
> (2014/11/14 12:13), Linhaifeng wrote:
>>
>> size should be same as mmap and
>> guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
>>
> 
> Thanks. It should be.
> How about following patch?
> 
> -------------------------------------------------------
> diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
> index 75fedf0..be4b171 100644
> --- a/tests/vhost-user-test.c
> +++ b/tests/vhost-user-test.c
> @@ -37,7 +37,7 @@
> #endif
> 
> #define QEMU_CMD_ACCEL " -machine accel=tcg"
> -#define QEMU_CMD_MEM " -m 512 -object
> memory-backend-file,id=mem,size=512M,"\
> +#define QEMU_CMD_MEM " -m 6000 -object
> memory-backend-file,id=mem,size=6000M,"\
> "mem-path=%s,share=on -numa node,memdev=mem"
> #define QEMU_CMD_CHR " -chardev socket,id=chr0,path=%s"
> #define QEMU_CMD_NETDEV " -netdev
> vhost-user,id=net0,chardev=chr0,vhostforce"
> @@ -221,13 +221,16 @@ static void read_guest_mem(void)
> 
> /* check for sanity */
> g_assert_cmpint(fds_num, >, 0);
> - g_assert_cmpint(fds_num, ==, memory.nregions);
> + //g_assert_cmpint(fds_num, ==, memory.nregions);
> 
> + fprintf(stderr, "%s(%d)\n", __func__, __LINE__);
> /* iterate all regions */
> for (i = 0; i < fds_num; i++) {
> + int ret = 0;
> 
> /* We'll check only the region statring at 0x0*/
> - if (memory.regions[i].guest_phys_addr != 0x0) {
> + if (memory.regions[i].guest_phys_addr == 0x0) {
> + close(fds[i]);
> continue;
> }
> 
> @@ -237,6 +240,7 @@ static void read_guest_mem(void)
> 
> guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,


How many is size? mmap_size + mmap_offset ?


> MAP_SHARED, fds[i], 0);
> + fprintf(stderr, "region=%d, mmap=%p, size=%lu\n", i, guest_mem, size);
> 
> g_assert(guest_mem != MAP_FAILED);
> guest_mem += (memory.regions[i].mmap_offset / sizeof(*guest_mem));
> @@ -247,8 +251,10 @@ static void read_guest_mem(void)
> 
> g_assert_cmpint(a, ==, b);
> }
> -
> - munmap(guest_mem, memory.regions[i].memory_size);
> + guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
> + ret = munmap(guest_mem, memory.regions[i].memory_size);

memory.regions[i].memory_size --> memory.regions[i].memory_size + memory.regions[i].memory_offset

check you have apply qemu's patch: [PATCH] vhost-user: fix mmap offset calculation

> + fprintf(stderr, "region=%d, munmap=%p, size=%lu, ret=%d\n",
> + i, guest_mem, size, ret);
> }
> 
> g_assert_cmpint(1, ==, 1);
> -------------------------------------------------------
> I am using 1GB hugepage size.
> 
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac0000000, size=6291456000
> region=0, munmap=0x2aaac0000000, size=6291456000, ret=-1 << failed
> 
> 6291456000 is not aligned by 1GB.
> When I specify 4096MB as guest memory size, munmap() doesn't return
> error like following.
> 
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac0000000, size=4294967296
> region=0, munmap=0x2aaac0000000, size=4294967296, ret=0
> 
> Thanks,
> Tetsuya
> 
> .
>
  
Tetsuya Mukawa Nov. 14, 2014, 5:12 a.m. UTC | #3
Hi Lin,

(2014/11/14 13:42), Linhaifeng wrote:
>
> On 2014/11/14 11:40, Tetsuya Mukawa wrote:
>> Hi Lin,
>>
>> (2014/11/14 12:13), Linhaifeng wrote:
>>> size should be same as mmap and
>>> guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
>>>
>> Thanks. It should be.
>> How about following patch?
>>
>> -------------------------------------------------------
>> diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
>> index 75fedf0..be4b171 100644
>> --- a/tests/vhost-user-test.c
>> +++ b/tests/vhost-user-test.c
>> @@ -37,7 +37,7 @@
>> #endif
>>
>> #define QEMU_CMD_ACCEL " -machine accel=tcg"
>> -#define QEMU_CMD_MEM " -m 512 -object
>> memory-backend-file,id=mem,size=512M,"\
>> +#define QEMU_CMD_MEM " -m 6000 -object
>> memory-backend-file,id=mem,size=6000M,"\
>> "mem-path=%s,share=on -numa node,memdev=mem"
>> #define QEMU_CMD_CHR " -chardev socket,id=chr0,path=%s"
>> #define QEMU_CMD_NETDEV " -netdev
>> vhost-user,id=net0,chardev=chr0,vhostforce"
>> @@ -221,13 +221,16 @@ static void read_guest_mem(void)
>>
>> /* check for sanity */
>> g_assert_cmpint(fds_num, >, 0);
>> - g_assert_cmpint(fds_num, ==, memory.nregions);
>> + //g_assert_cmpint(fds_num, ==, memory.nregions);
>>
>> + fprintf(stderr, "%s(%d)\n", __func__, __LINE__);
>> /* iterate all regions */
>> for (i = 0; i < fds_num; i++) {
>> + int ret = 0;
>>
>> /* We'll check only the region statring at 0x0*/
>> - if (memory.regions[i].guest_phys_addr != 0x0) {
>> + if (memory.regions[i].guest_phys_addr == 0x0) {
>> + close(fds[i]);
>> continue;
>> }
>>
>> @@ -237,6 +240,7 @@ static void read_guest_mem(void)
>>
>> guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,
>
> How many is size? mmap_size + mmap_offset ?
In this case, guest memory length is the size.
I added messages from this program within last email.
Could you please also check it?

>
>
>> MAP_SHARED, fds[i], 0);
>> + fprintf(stderr, "region=%d, mmap=%p, size=%lu\n", i, guest_mem, size);
>>
>> g_assert(guest_mem != MAP_FAILED);
>> guest_mem += (memory.regions[i].mmap_offset / sizeof(*guest_mem));
>> @@ -247,8 +251,10 @@ static void read_guest_mem(void)
>>
>> g_assert_cmpint(a, ==, b);
>> }
>> -
>> - munmap(guest_mem, memory.regions[i].memory_size);
>> + guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
>> + ret = munmap(guest_mem, memory.regions[i].memory_size);
> memory.regions[i].memory_size --> memory.regions[i].memory_size + memory.regions[i].memory_offset
>
> check you have apply qemu's patch: [PATCH] vhost-user: fix mmap offset calculation
I checked it using latest QEMU code.
So the patch you mentioned is included.

I guess you can munmap a file, because 'size' is aligned by hugepage
size like 2GB.
Could you please try another value like 6000MB?

Thanks,
Tetsuya
  
Linhaifeng Nov. 14, 2014, 5:30 a.m. UTC | #4
On 2014/11/14 13:12, Tetsuya Mukawa wrote:
> ease try another value like 6000MB

i have try this value 6000MB.I can munmap success.

you mmap with size "memory_size + memory_offset" should also munmap with this size.
  
Tetsuya Mukawa Nov. 14, 2014, 6:57 a.m. UTC | #5
Hi Lin,
(2014/11/14 14:30), Linhaifeng wrote:
>
> On 2014/11/14 13:12, Tetsuya Mukawa wrote:
>> ease try another value like 6000MB
> i have try this value 6000MB.I can munmap success.
>
> you mmap with size "memory_size + memory_offset" should also munmap with this size.
>
I appreciate for your testing and sugesstions. :)
I am not sure what is difference between your environment and my
environment.

Here is my code and message from the code.
---------------------------------------------
[code]
---------------------------------------------
size = memory.regions[i].memory_size + memory.regions[i].mmap_offset;

guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fds[i], 0);

fprintf(stderr, "region=%d, mmap=%p, size=%lu\n", i, guest_mem, size);

g_assert(guest_mem != MAP_FAILED);

ret = munmap(guest_mem, size);

fprintf(stderr, "region=%d, munmap=%p, size=%lu, ret=%d\n",
i, guest_mem, size, ret);

---------------------------------------------
[messages]
---------------------------------------------
region=0, mmap=0x2aaac0000000, size=6291456000
region=0, munmap=0x2aaac0000000, size=6291456000, ret=-1

With your environment, 'ret' will be 0.
In my environment, 'size' should be aligned not to get error.
Anyway, it's nice to implement more simple.
When munmap failure occurs, let's think it again.

Thanks,
Tetsuya
  
Huawei Xie Nov. 14, 2014, 10:59 a.m. UTC | #6
I tested with latest qemu(with offset fix) in vhost app(not with test case), unmap succeeds only when the size is aligned to 1GB(hugepage size).

Another important thing is  could we do mmap(0, region[i].memory_size, PROT_XX, mmap_offset) rather than with offset 0? With the region above 4GB, we will waste 4GB address space. Or we at least need to round down offset to nearest 1GB, and round up memory size to upper 1GB, to save some address space waste.

Anyway, this is ugly. Kernel doesn't take care of us, do those alignment for us automatically.

> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Thursday, November 13, 2014 11:57 PM
> To: Linhaifeng; Xie, Huawei
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] vhost-user technical isssues
> 
> Hi Lin,
> (2014/11/14 14:30), Linhaifeng wrote:
> >
> > On 2014/11/14 13:12, Tetsuya Mukawa wrote:
> >> ease try another value like 6000MB
> > i have try this value 6000MB.I can munmap success.
> >
> > you mmap with size "memory_size + memory_offset" should also munmap
> with this size.
> >
> I appreciate for your testing and sugesstions. :)
> I am not sure what is difference between your environment and my
> environment.
> 
> Here is my code and message from the code.
> ---------------------------------------------
> [code]
> ---------------------------------------------
> size = memory.regions[i].memory_size + memory.regions[i].mmap_offset;
> 
> guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,
> MAP_SHARED, fds[i], 0);
> 
> fprintf(stderr, "region=%d, mmap=%p, size=%lu\n", i, guest_mem, size);
> 
> g_assert(guest_mem != MAP_FAILED);
> 
> ret = munmap(guest_mem, size);
> 
> fprintf(stderr, "region=%d, munmap=%p, size=%lu, ret=%d\n",
> i, guest_mem, size, ret);
> 
> ---------------------------------------------
> [messages]
> ---------------------------------------------
> region=0, mmap=0x2aaac0000000, size=6291456000
> region=0, munmap=0x2aaac0000000, size=6291456000, ret=-1
> 
> With your environment, 'ret' will be 0.
> In my environment, 'size' should be aligned not to get error.
> Anyway, it's nice to implement more simple.
> When munmap failure occurs, let's think it again.
> 
> Thanks,
> Tetsuya
  
Tetsuya Mukawa Nov. 17, 2014, 6:14 a.m. UTC | #7
Hi Xie,

(2014/11/14 19:59), Xie, Huawei wrote:
> I tested with latest qemu(with offset fix) in vhost app(not with test case), unmap succeeds only when the size is aligned to 1GB(hugepage size).
I appreciate for your testing.

> Another important thing is  could we do mmap(0, region[i].memory_size, PROT_XX, mmap_offset) rather than with offset 0? With the region above 4GB, we will waste 4GB address space. Or we at least need to round down offset to nearest 1GB, and round up memory size to upper 1GB, to save some address space waste.
>
> Anyway, this is ugly. Kernel doesn't take care of us, do those alignment for us automatically.
>

It seems 'offset' also should be aligned by hugepage size also.
But it might be a specification of mmap. Manpage of mmap says 'offset'
should be aligned by sysconf(_SC_PAGE_SIZE).
If the target file is on hugetlbfs, I guess hugepage size is used as
alignment size.

Thanks,
Tetsuya
  

Patch

diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index 75fedf0..be4b171 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -37,7 +37,7 @@ 
#endif

#define QEMU_CMD_ACCEL " -machine accel=tcg"
-#define QEMU_CMD_MEM " -m 512 -object
memory-backend-file,id=mem,size=512M,"\
+#define QEMU_CMD_MEM " -m 6000 -object
memory-backend-file,id=mem,size=6000M,"\
"mem-path=%s,share=on -numa node,memdev=mem"