[dpdk-dev,v2,2/5] EAL: Add new EAL "--qtest-virtio" option

Message ID 1455075613-3605-3-git-send-email-mukawa@igel.co.jp (mailing list archive)
State Superseded, archived
Headers

Commit Message

Tetsuya Mukawa Feb. 10, 2016, 3:40 a.m. UTC
  To work with qtest virtio-net PMD, virtual address that maps hugepages
should be between (1 << 31) to (1 << 44). This patch adds one more option
to map like this. Also all hugepages should consists of one file.
Because of this, the option will work only when '--single-file' option is
specified.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_eal/common/eal_common_options.c | 10 ++++
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h        |  2 +
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 81 +++++++++++++++++++++++++++++-
 4 files changed, 93 insertions(+), 1 deletion(-)
  

Comments

Jianfeng Tan Feb. 15, 2016, 7:52 a.m. UTC | #1
Hi Tetsuya,

On 2/10/2016 11:40 AM, Tetsuya Mukawa wrote:
> To work with qtest virtio-net PMD, virtual address that maps hugepages
> should be between (1 << 31) to (1 << 44). This patch adds one more option

Is there any reference about this limitation? And is it also true for 32 
bit machine?

Thanks,
Jianfeng
  
Tetsuya Mukawa Feb. 16, 2016, 1:32 a.m. UTC | #2
On 2016/02/15 16:52, Tan, Jianfeng wrote:
> Hi Tetsuya,
>
> On 2/10/2016 11:40 AM, Tetsuya Mukawa wrote:
>> To work with qtest virtio-net PMD, virtual address that maps hugepages
>> should be between (1 << 31) to (1 << 44). This patch adds one more
>> option
>
> Is there any reference about this limitation? And is it also true for
> 32 bit machine?
>

Hi Jianfeng,

44bit limitation is come from virtio legacy device spec.
The queue address register of virtio device is 32bit width.
And we should set page number to this register.
As a result, EAL memory should be under 44 bits.

I only support virtio modern device with this patch series.
So we can relax this limitation a bit.
(Next limitation may be 47 bits. It seems it is come from QEMU
implementation.)
But I guess 44bit limitation is still not so hard, also we can leave a
possibility to support legacy device.

31bits limitation is come from current memory mapping of QTest QEMU guest.
Here is.

 * ------------------------------------------------------------
 * Memory mapping of qtest quest
 * ------------------------------------------------------------
 * 0x00000000_00000000 - 0x00000000_3fffffff : not used
 * 0x00000000_40000000 - 0x00000000_40000fff : virtio-net(BAR1)
 * 0x00000000_40001000 - 0x00000000_40ffffff : not used
 * 0x00000000_41000000 - 0x00000000_417fffff : virtio-net(BAR4)
 * 0x00000000_41800000 - 0x00000000_41ffffff : not used
 * 0x00000000_42000000 - 0x00000000_420000ff : ivshmem(BAR0)
 * 0x00000000_42000100 - 0x00000000_42ffffff : not used
 * 0x00000000_80000000 - 0xffffffff_ffffffff : ivshmem(BAR2)

Thanks,
Tetsuya


> Thanks,
> Jianfeng
  
David Marchand Feb. 16, 2016, 5:53 a.m. UTC | #3
On Wed, Feb 10, 2016 at 4:40 AM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
> To work with qtest virtio-net PMD, virtual address that maps hugepages
> should be between (1 << 31) to (1 << 44). This patch adds one more option
> to map like this. Also all hugepages should consists of one file.
> Because of this, the option will work only when '--single-file' option is
> specified.

This patch is pure virtio stuff.
Please, rework this so that we have a generic api in eal (asking for a
free region could be of use for something else).
Then you can call this api from virtio pmd.

If you need to pass options to virtio pmd, add some devargs for it.


Thanks.
  
Jianfeng Tan Feb. 16, 2016, 11:36 a.m. UTC | #4
Hi David,

On 2/16/2016 1:53 PM, David Marchand wrote:
> On Wed, Feb 10, 2016 at 4:40 AM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>> To work with qtest virtio-net PMD, virtual address that maps hugepages
>> should be between (1 << 31) to (1 << 44). This patch adds one more option
>> to map like this. Also all hugepages should consists of one file.
>> Because of this, the option will work only when '--single-file' option is
>> specified.
> This patch is pure virtio stuff.
> Please, rework this so that we have a generic api in eal (asking for a
> free region could be of use for something else).
> Then you can call this api from virtio pmd.
>
> If you need to pass options to virtio pmd, add some devargs for it.
>

Seems it's hard to slip this option into --vdev="eth_qtest_virtio0..." 
from my side because memory initialization happens before vdev option is 
parsed.

Can we make use of "--base-virtaddr" achieve the function of this option?

Thanks,
Jianfeng
  
Tetsuya Mukawa Feb. 22, 2016, 8:17 a.m. UTC | #5
The patches will work on below patch series.
 - [PATCH v2 0/5] virtio support for container

[Changes]
v3 changes:
 - Rebase on latest master.
 - remove "-qtest-virtio" option, then add "--range-virtaddr" and
   "--align-memsize" options.
 - Fix typos in qtest.c

v2 changes:
 - Rebase on above patch seiries.
 - Rebase on master
 - Add "--qtest-virtio" EAL option.
 - Fixes in qtest.c
  - Fix error handling for the case qtest connection is closed.
  - Use eventfd for interrupt messaging.
  - Use linux header for PCI register definitions.
  - Fix qtest_raw_send/recv to handle error correctly.
  - Fix bit mask of PCI_CONFIG_ADDR.
  - Describe memory and ioport usage of qtest guest in qtest.c
  - Remove loop that is for finding PCI devices.


[Abstraction]

Normally, virtio-net PMD only works on VM, because there is no virtio-net device on host.
This patches extend  virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest mode, then connect it from virtio-net PMD through unix domain socket.

The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and vhost-user backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD will be shared between vhost backend application.
But vhost backend application memory will not be shared.

Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.


[How to use]

 Please use QEMU-2.5.1, or above.
 (So far, QEMU-2.5.1 hasn't been released yet, so please checkout master from QEMU repository)

 - Compile
 Set "CONFIG_RTE_VIRTIO_VDEV_QTEST=y" in config/common_linux.
 Then compile it.

 - Start QEMU like below.
 $ qemu-system-x86_64 \
              -machine pc-i440fx-1.4,accel=qtest \
              -display none -qtest-log /dev/null \
              -qtest unix:/tmp/socket,server \
              -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
              -device virtio-net-pci,netdev=net0,mq=on,disable-modern=false,addr=3 \
              -chardev socket,id=chr1,path=/tmp/ivshmem,server \
              -device ivshmem,size=1G,chardev=chr1,vectors=1,addr=4

 - Start DPDK application like below
 $ testpmd -c f -n 1 -m 1024 --no-pci --single-file --qtest-virtio \
             --vdev="eth_qtest_virtio0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem"\
             -- --disable-hw-vlan --txqflags=0xf00 -i

(*1) Please Specify same memory size in QEMU and DPDK command line.
(*2) Should use qemu-2.5.1, or above.
(*3) QEMU process is needed per port.
(*4) virtio-1.0 device are only supported.
(*5) The vhost backends like vhost-net and vhost-user can be specified.
(*6) In most cases, just using above command is enough, but you can also
     specify other QEMU virtio-net options.
(*7) Only checked "pc-i440fx-1.4" machine, but may work with other
     machines. It depends on a machine has piix3 south bridge.
     If the machine doesn't have, virtio-net PMD cannot receive status
     changed interrupts.
(*8) Should not add "--enable-kvm" to QEMU command line.


[Detailed Description]

 - virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is used.
QTest is a test framework of QEMU devices. It allows us to implement a device driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as standalone process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
http://wiki.qemu.org/Features/QTest

 - probing devices
QTest provides a unix domain socket. Through this socket, driver process can access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and virtio-net PMD can initialize vitio-net device on QEMU correctly.

 - ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory should be consist of one file.
To allocate such a memory, EAL has new option called "--single-file".
Also, the hugepages should be mapped between "1 << 31" to "1 << 44".
To map like above, EAL has one more new option called "-qtest-virtio".
While initializing ivshmem device, we can set BAR(Base Address Register).
It represents which memory QEMU vcpu can access to this shared memory.
We will specify host virtual address of shared memory as this address.
It is very useful because we don't need to apply patch to QEMU to calculate address offset.
(For example, if virtio-net PMD process will allocate memory from shared memory, then specify the virtual address of it to virtio-net register, QEMU virtio-net device can understand it without calculating address offset.)


Tetsuya Mukawa (6):
  virtio: Retrieve driver name from eth_dev
  vhost: Add a function to check virtio device type
  EAL: Add new EAL "--range-virtaddr" option
  EAL: Add a new "--align-memsize" option
  virtio: Add support for qtest virtio-net PMD
  docs: add release note for qtest virtio container support

 config/common_linuxapp                     |    1 +
 doc/guides/rel_notes/release_16_04.rst     |    3 +
 drivers/net/virtio/Makefile                |    4 +
 drivers/net/virtio/qtest.c                 | 1342 ++++++++++++++++++++++++++++
 drivers/net/virtio/qtest.h                 |   65 ++
 drivers/net/virtio/virtio_ethdev.c         |  433 ++++++++-
 drivers/net/virtio/virtio_ethdev.h         |   32 +
 drivers/net/virtio/virtio_pci.c            |  364 +++++++-
 drivers/net/virtio/virtio_pci.h            |    5 +-
 lib/librte_eal/common/eal_common_options.c |   17 +
 lib/librte_eal/common/eal_internal_cfg.h   |    3 +
 lib/librte_eal/common/eal_options.h        |    4 +
 lib/librte_eal/linuxapp/eal/eal.c          |   43 +
 lib/librte_eal/linuxapp/eal/eal_memory.c   |   91 +-
 14 files changed, 2338 insertions(+), 69 deletions(-)
 create mode 100644 drivers/net/virtio/qtest.c
 create mode 100644 drivers/net/virtio/qtest.h
  

Patch

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 65bccbd..34c8bd1 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -96,6 +96,7 @@  eal_long_options[] = {
 	{OPT_VMWARE_TSC_MAP,    0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
 	{OPT_XEN_DOM0,          0, NULL, OPT_XEN_DOM0_NUM         },
 	{OPT_SINGLE_FILE,       0, NULL, OPT_SINGLE_FILE_NUM      },
+	{OPT_QTEST_VIRTIO,      0, NULL, OPT_QTEST_VIRTIO_NUM     },
 	{0,                     0, NULL, 0                        }
 };
 
@@ -902,6 +903,10 @@  eal_parse_common_option(int opt, const char *optarg,
 		conf->single_file = 1;
 		break;
 
+	case OPT_QTEST_VIRTIO_NUM:
+		conf->qtest_virtio = 1;
+		break;
+
 	/* don't know what to do, leave this to caller */
 	default:
 		return 1;
@@ -971,6 +976,11 @@  eal_check_common_options(struct internal_config *internal_cfg)
 			"be specified together with --"OPT_SINGLE_FILE"\n");
 		return -1;
 	}
+	if (internal_cfg->qtest_virtio && !internal_cfg->single_file) {
+		RTE_LOG(ERR, EAL, "Option --"OPT_QTEST_VIRTIO" cannot "
+			"be specified without --"OPT_SINGLE_FILE"\n");
+		return -1;
+	}
 
 	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 9117ed9..7f3df39 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -71,6 +71,7 @@  struct internal_config {
 	volatile unsigned no_hpet;        /**< true to disable HPET */
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
+	volatile unsigned qtest_virtio;    /**< mmap hugepages to fit qtest virtio PMD */
 	volatile unsigned no_shconf;      /**< true if there is no shared config */
 	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
 	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index e5da14a..b33a3c3 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -85,6 +85,8 @@  enum {
 	OPT_XEN_DOM0_NUM,
 #define OPT_SINGLE_FILE       "single-file"
 	OPT_SINGLE_FILE_NUM,
+#define OPT_QTEST_VIRTIO      "qtest-virtio"
+	OPT_QTEST_VIRTIO_NUM,
 	OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index a6b3616..677d6a7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1092,6 +1092,73 @@  calc_num_pages_per_socket(uint64_t * memory,
 }
 
 /*
+ * Find memory space that fits qtest virtio-net PMD.
+ */
+static void *
+rte_eal_get_free_region(uint64_t alloc_size, uint64_t pagesz)
+{
+	uint64_t start, end, next_start;
+	uint64_t high_limit, low_limit;
+	char buf[1024], *p;
+	FILE *fp;
+	void *addr = NULL;
+
+	/* all hugepages should be mapped between below values */
+	low_limit = 1UL << 31;
+	high_limit = 1UL << 44;
+
+	/* allocation size should be aligned by page size */
+	if (alloc_size != RTE_ALIGN_CEIL(alloc_size, pagesz)) {
+		rte_panic("Invalid allocation size 0x%lx\n", alloc_size);
+		return NULL;
+	}
+
+	/*
+	 * address should be aligned by allocation size because
+	 * BAR register requiers such an address
+	 */
+	low_limit = RTE_ALIGN_CEIL(low_limit, alloc_size);
+	high_limit = RTE_ALIGN_FLOOR(high_limit, alloc_size);
+
+	fp = fopen("/proc/self/maps", "r");
+	if (fp == NULL) {
+		rte_panic("Cannot open /proc/self/maps\n");
+		return NULL;
+	}
+
+	next_start = 0;
+	do {
+		start = next_start;
+
+		if ((p = fgets(buf, sizeof(buf), fp)) != NULL) {
+			if (sscanf(p, "%lx-%lx ", &end, &next_start) < 2)
+				break;
+
+			next_start = RTE_ALIGN_CEIL(next_start, alloc_size);
+			end = RTE_ALIGN_CEIL(end, alloc_size) - 1;
+		} else
+			end = UINT64_MAX;
+
+		if (start >= high_limit)
+			break;
+		if (end < low_limit)
+			continue;
+
+		start = RTE_MAX(start, low_limit);
+		end = RTE_MIN(end, high_limit - 1);
+
+		if (end - start >= alloc_size - 1) {
+			addr = (void *)start;
+			break;
+		}
+	} while (end != UINT64_MAX);
+
+	fclose(fp);
+
+	return addr;
+}
+
+/*
  * Prepare physical memory mapping: fill configuration structure with
  * these infos, return 0 on success.
  *  1. map N huge pages in separate files in hugetlbfs
@@ -1132,6 +1199,7 @@  rte_eal_hugepage_init(void)
 		uint64_t pagesize;
 		unsigned socket_id = rte_socket_id();
 		char filepath[MAX_HUGEPAGE_PATH];
+		void *fixed;
 
 		if (internal_config.no_hugetlbfs) {
 			eal_get_hugefile_path(filepath, sizeof(filepath),
@@ -1158,7 +1226,18 @@  rte_eal_hugepage_init(void)
 			return -1;
 		}
 
-		addr = mmap(NULL, internal_config.memory,
+		if (internal_config.qtest_virtio) {
+			fixed = rte_eal_get_free_region(
+					internal_config.memory, pagesize);
+			if (fixed == NULL) {
+				RTE_LOG(ERR, EAL, "no free space to mmap %s\n",
+						filepath);
+				return -1;
+			}
+		} else
+			fixed = NULL;
+
+		addr = mmap(fixed, internal_config.memory,
 			    PROT_READ | PROT_WRITE,
 			    MAP_SHARED | MAP_POPULATE, fd, 0);
 		if (addr == MAP_FAILED) {