From patchwork Thu Aug 23 16:51:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxime Coquelin X-Patchwork-Id: 43822 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CC7952C38; Thu, 23 Aug 2018 18:52:08 +0200 (CEST) Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id D545998 for ; Thu, 23 Aug 2018 18:52:07 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C70DA40241DD; Thu, 23 Aug 2018 16:52:06 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-46.ams2.redhat.com [10.36.112.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id 745812166BA1; Thu, 23 Aug 2018 16:52:01 +0000 (UTC) From: Maxime Coquelin To: dev@dpdk.org, tiwei.bie@intel.com, zhihong.wang@intel.com, jfreimann@redhat.com Cc: dgilbert@redhat.com, Maxime Coquelin Date: Thu, 23 Aug 2018 18:51:47 +0200 Message-Id: <20180823165157.30001-1-maxime.coquelin@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 23 Aug 2018 16:52:06 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 23 Aug 2018 16:52:06 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: [dpdk-dev] [RFC 00/10] vhost: add postcopy live-migration support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This RFC adds support for postcopy live-migration. With classic live-migration, the VM runs on source while its content is being migrated to destination. When pages already migrated to destination are dirtied by the source, they get copied until both source and destination memory converge. At that time, the source is stopped and destination is started. With postcopy live-migration, the VM is started on destination before all the memory has been migrated. When the VM tries to access a page that haven't been migrated yet, a pagefault is triggered, handled by userfaultfd which pauses the thread. A Qemu thread in charge of postcopy request the source for the missing page. Once received and mapped, the paused thread gets resumed. Userfaultfd supports handling faults from a different process, and Qemu supports postcopy with vhost-user backends since v2.12. One problem encountered with classic live-migration for VMs relying on vhost-user backends is that when the traffic is high (e.g. PVP), it happens that it never converges as pages gets dirtied at a faster rate than they are copied to the destination. It is expected this problem sould be solved with using postcopy, as rings memory and buffers will be copied once, when destination will pagefault on them. My current test bench is limited, so I could test above scenario. I just ran with flooding the guest using testpmd's txonly forwarding mode, with either having the virtio-net binded to its kernel or DPDK driver in guest. In my setup, migration is done on the same machine. Results are average of 5 runs. A. Flooding virtio-net kernel driver (~3Mpps): 1. Classic live-migration: - Total time: 12592ms - Downtime: 53ms 2. Postcopy live-migration: - Total time: 2324ms - Downtime: 48ms B. Flooding Virtio PMD (~15Mpps): 1. Classic live-migration: - Total time: 22101ms - Downtime: 35ms 2. Postcopy live-migration: - Total time: 2995ms - Downtime: 47ms Note that the Total time reported by Qemu is really steady accross runs, whereas the Downtime is very unsteady. One problem remaining to be fixed is the memory locking. Indeed, userfaultfd requires that the memory registered is neither mmapped with MAP_POPULATE attribute nor mlocked. For the former, the series address this by not advertizing postcopy feature if dequeue zero-copy, which requires it, is enabled. For the latter, this is really problematic because vhost-user backend is a library so it cannot prevent the application to call mlockall(), as opposed to Qemu. When using testpmd, one has just to append --no-mlockall to the command-line, but missing it results in non-trivial warnings in Qemu logs. Steps to test postcopy: 1. Run DPDK's Testpmd application on source: ./install/bin/testpmd -m 512 --file-prefix=src -l 0,2 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vu-src' -- --portmask=1 -i \ --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \ --no-mlockall 2. Run DPDK's Testpmd application on destination: ./install/bin/testpmd -m 512 --file-prefix=dst -l 0,2 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vu-dst' -- --portmask=1 -i \ --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \ --no-mlockall 3. Launch VM on source: ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \ -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \ -numa node,memdev=mem -mem-prealloc \ -chardev socket,id=char0,path=/tmp/vu-src \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \ -net none -vnc :0 -monitor stdio 4. Launch VM on destination: ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \ -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \ -numa node,memdev=mem -mem-prealloc \ -chardev socket,id=char0,path=/tmp/vu-dst \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \ -net none -vnc :1 -monitor stdio -incoming tcp::8888 5. In both testpmd prompts, start flooding the virtio-net device: testpmd> set fwd txonly testpmd> start 6. In destination's Qemu monitor, enable postcopy: (qemu) migrate_set_capability postcopy-ram on 7. In source's Qemu monitor, enable postcopy and launch migration: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate -d tcp:0:8888 (qemu) migrate_start_postcopy Maxime Coquelin (10): vhost: define postcopy protocol flag vhost: add number of fds to vhost-user messages and use it vhost: enable fds passing when sending vhost-user messages vhost: introduce postcopy's advise message vhost: add support for postcopy's listen message vhost: register new regions with userfaultfd vhost: avoid useless VhostUserMemory copy vhost: send userfault range addresses back to qemu vhost: add support to postcopy's end request vhost: enable postcopy protocol feature lib/librte_vhost/rte_vhost.h | 4 + lib/librte_vhost/socket.c | 21 +++- lib/librte_vhost/vhost.h | 3 + lib/librte_vhost/vhost_user.c | 201 +++++++++++++++++++++++++++++----- lib/librte_vhost/vhost_user.h | 12 +- 5 files changed, 206 insertions(+), 35 deletions(-)