[00/28] vhost: add virtio-vhost-user transport
mbox series

Message ID 1560957293-17294-1-git-send-email-ndragazis@arrikto.com
Headers show
Series
  • vhost: add virtio-vhost-user transport
Related show

Message

Nikos Dragazis June 19, 2019, 3:14 p.m. UTC
Hi everyone,

this patch series introduces the concept of the virtio-vhost-user
transport. This is actually a revised version of an earlier RFC
implementation that has been proposed by Stefan Hajnoczi [1]. Though
this is a great feature, it seems to have been stalled, so I’d like to
restart the conversation on this and hopefully get it merged with your
help. Let me give you an overview.

The virtio-vhost-user transport is a vhost-user transport implementation
that is based on the virtio-vhost-user device. Its key difference with
the existing transport is that it allows deploying vhost-user targets
inside dedicated Storage Appliance VMs instead of host user space. In
other words, it allows having guests that act as vhost-user backends for
other guests.

The virtio-vhost-user device implements the vhost-user control plane
(master-slave communication) as follows:

1. it parses the vhost-user messages from the vhost-user unix domain
   socket and forwards them to the slave guest through virtqueues

2. it maps the vhost memory regions in QEMU’s process address space and
   exposes them to the slave guest as a RAM-backed PCI MMIO region

3. it hooks up doorbells to the callfds. The slave guest can use these
   doorbells to interrupt the master guest driver

The device code has not yet been merged into upstream QEMU, but this is
definitely the end goal. The current state is that we are awaiting for
the approval of the virtio spec.

I have Cced Darek from the SPDK community who has helped me a lot by
reviewing this series. Note that any device type could be implemented
over this new transport. So, adding the virtio-vhost-user transport in
DPDK would allow using it from SPDK as well.

Getting into the code internals, this patch series makes the following
changes:

1. introduce a generic interface for the transport-specific operations.
   Each of the two available transports, the pre-existing AF_UNIX
   transport and the virtio-vhost-user transport, is going to implement
   this interface. The AF_UNIX-specific code has been extracted from the
   core vhost-user code and is now part of the AF_UNIX transport
   implementation in trans_af_unix.c.

2. introduce the virtio-vhost-user transport. The virtio-vhost-user
   transport requires a driver for the virtio-vhost-user devices. The
   driver along with the transport implementation have been packed into
   a separate library in `drivers/virtio_vhost_user/`. The necessary
   virtio-pci code has been copied from `drivers/net/virtio/`. Some
   additional changes have been made so that the driver can utilize the
   additional resources of the virtio-vhost-user device.

3. update librte_vhost public API to enable choosing transport for each
   new vhost device. Extend the vhost net driver and vhost-scsi example
   application to export this new API to the end user.

The primary changes I did to Stefan’s RFC implementation are the
following:

1. moved postcopy live migration code into trans_af_unix.c. Postcopy
   live migration relies on the userfault fd mechanism, which cannot be
   supported by virtio-vhost-user.

2. moved setup of the log memory region into trans_af_unix.c. Setting up
   the log memory region involves mapping/unmapping guest memory. This
   is an AF_UNIX transport-specific operation.

3. introduced a vhost transport operation for
   process_slave_message_reply()

4. moved the virtio-vhost-user transport/driver into a separate library
   in `drivers/virtio_vhost_user/`. This required making vhost.h and
   vhost_user.h part of librte_vhost public API and exporting some
   private symbols via the version script. This looks better to me that
   just moving the entire librte_vhost into `drivers/`. I am not sure if
   this is the most appropriate solution. I am looking forward to your
   suggestions on this.

5. made use of the virtio PCI capabilities for the additional device
   resources (doorbells, shared memory). This required changes in
   virtio_pci.c and trans_virtio_vhost_user.c.

6. [minor] changed some commit headlines to comply with
   check-git-log.sh.

Please, have a look and let me know about your thoughts. Any
reviews/pointers/suggestions are welcome.

Best regards,
Nikos

[1] http://mails.dpdk.org/archives/dev/2018-January/088155.html


Nikos Dragazis (23):
  vhost: introduce vhost transport operations structure
  vhost: move socket management code
  vhost: move socket fd and un sockaddr
  vhost: move vhost-user connection
  vhost: move vhost-user reconnection
  vhost: move vhost-user fdset
  vhost: propagate vhost transport operations
  vhost: use a single structure for the device state
  vhost: extract socket I/O into transport
  vhost: move slave request fd and lock
  vhost: move mmap/munmap
  vhost: move setup of the log memory region
  vhost: remove main fd parameter from msg handlers
  vhost: move postcopy live migration code
  vhost: support registering additional vhost-user transports
  drivers/virtio_vhost_user: add virtio PCI framework
  drivers: add virtio-vhost-user transport
  drivers/virtio_vhost_user: use additional device resources
  vhost: add flag for choosing vhost-user transport
  net/vhost: add virtio-vhost-user support
  mk: link apps with virtio-vhost-user driver
  config: add option for the virtio-vhost-user transport
  usertools: add virtio-vhost-user devices to dpdk-devbind.py

Stefan Hajnoczi (5):
  vhost: allocate per-socket transport state
  vhost: move start server/client calls
  vhost: add index field in vhost virtqueues
  examples/vhost_scsi: add --socket-file argument
  examples/vhost_scsi: add virtio-vhost-user support

 config/common_base                                 |    6 +
 config/common_linux                                |    1 +
 drivers/Makefile                                   |    5 +
 drivers/net/vhost/rte_eth_vhost.c                  |   13 +
 drivers/virtio_vhost_user/Makefile                 |   27 +
 .../rte_virtio_vhost_user_version.map              |    4 +
 .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1077 +++++++++++++++++++
 drivers/virtio_vhost_user/virtio_pci.c             |  520 ++++++++++
 drivers/virtio_vhost_user/virtio_pci.h             |  289 ++++++
 drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
 drivers/virtio_vhost_user/virtqueue.h              |  181 ++++
 examples/vhost_scsi/vhost_scsi.c                   |  103 +-
 lib/librte_vhost/Makefile                          |    4 +-
 lib/librte_vhost/rte_vhost.h                       |    1 +
 lib/librte_vhost/rte_vhost_version.map             |   11 +
 lib/librte_vhost/socket.c                          |  685 +-----------
 lib/librte_vhost/trans_af_unix.c                   | 1094 ++++++++++++++++++++
 lib/librte_vhost/vhost.c                           |   22 +-
 lib/librte_vhost/vhost.h                           |  298 +++++-
 lib/librte_vhost/vhost_user.c                      |  474 ++-------
 lib/librte_vhost/vhost_user.h                      |   10 +-
 mk/rte.app.mk                                      |    6 +
 usertools/dpdk-devbind.py                          |    7 +
 23 files changed, 3764 insertions(+), 1092 deletions(-)
 create mode 100644 drivers/virtio_vhost_user/Makefile
 create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
 create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
 create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
 create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
 create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h
 create mode 100644 drivers/virtio_vhost_user/virtqueue.h
 create mode 100644 lib/librte_vhost/trans_af_unix.c

Comments

Ilya Maximets June 20, 2019, 11:32 a.m. UTC | #1
On 19.06.2019 18:14, Nikos Dragazis wrote:
> Hi everyone,

Hi. I didn't look at the code, just a few comments inline.

> 
> this patch series introduces the concept of the virtio-vhost-user
> transport. This is actually a revised version of an earlier RFC
> implementation that has been proposed by Stefan Hajnoczi [1]. Though
> this is a great feature, it seems to have been stalled, so I’d like to
> restart the conversation on this and hopefully get it merged with your
> help. Let me give you an overview.
> 
> The virtio-vhost-user transport is a vhost-user transport implementation
> that is based on the virtio-vhost-user device. Its key difference with
> the existing transport is that it allows deploying vhost-user targets
> inside dedicated Storage Appliance VMs instead of host user space. In
> other words, it allows having guests that act as vhost-user backends for
> other guests.
> 
> The virtio-vhost-user device implements the vhost-user control plane
> (master-slave communication) as follows:
> 
> 1. it parses the vhost-user messages from the vhost-user unix domain
>    socket and forwards them to the slave guest through virtqueues
> 
> 2. it maps the vhost memory regions in QEMU’s process address space and
>    exposes them to the slave guest as a RAM-backed PCI MMIO region
> 
> 3. it hooks up doorbells to the callfds. The slave guest can use these
>    doorbells to interrupt the master guest driver
> 
> The device code has not yet been merged into upstream QEMU, but this is
> definitely the end goal. The current state is that we are awaiting for
> the approval of the virtio spec.
> 
> I have Cced Darek from the SPDK community who has helped me a lot by
> reviewing this series. Note that any device type could be implemented
> over this new transport. So, adding the virtio-vhost-user transport in
> DPDK would allow using it from SPDK as well.
> 
> Getting into the code internals, this patch series makes the following
> changes:
> 
> 1. introduce a generic interface for the transport-specific operations.
>    Each of the two available transports, the pre-existing AF_UNIX
>    transport and the virtio-vhost-user transport, is going to implement
>    this interface. The AF_UNIX-specific code has been extracted from the
>    core vhost-user code and is now part of the AF_UNIX transport
>    implementation in trans_af_unix.c.
> 
> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>    transport requires a driver for the virtio-vhost-user devices. The
>    driver along with the transport implementation have been packed into
>    a separate library in `drivers/virtio_vhost_user/`. The necessary
>    virtio-pci code has been copied from `drivers/net/virtio/`. Some
>    additional changes have been made so that the driver can utilize the
>    additional resources of the virtio-vhost-user device.
> 
> 3. update librte_vhost public API to enable choosing transport for each
>    new vhost device. Extend the vhost net driver and vhost-scsi example
>    application to export this new API to the end user.
> 
> The primary changes I did to Stefan’s RFC implementation are the
> following:
> 
> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>    live migration relies on the userfault fd mechanism, which cannot be
>    supported by virtio-vhost-user.
> 
> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>    the log memory region involves mapping/unmapping guest memory. This
>    is an AF_UNIX transport-specific operation.

Logging dirty pages is the main concept of live migration support. Does it
mean that the live migration is not supported for virtio-vhost-user at all?

> 
> 3. introduced a vhost transport operation for
>    process_slave_message_reply()
> 
> 4. moved the virtio-vhost-user transport/driver into a separate library
>    in `drivers/virtio_vhost_user/`. This required making vhost.h and
>    vhost_user.h part of librte_vhost public API and exporting some
>    private symbols via the version script. This looks better to me that
>    just moving the entire librte_vhost into `drivers/`. I am not sure if
>    this is the most appropriate solution. I am looking forward to your
>    suggestions on this.

Moving the virtio-vhost-user code to a separate driver looks strange for me.
What is the purpose?

Exporting a lot of vhost internal structures will lead to a frequent API/ABI
breakages and will slow down accepting changes to releases in general.

It looks inconsistent to have 'trans_af_unix.c' in 'lib/librte_vhost/' and
'trans_virtio_vhost_user.c' in 'drivers/virtio_vhost_user/' because these
files should be similar in provided functionality, hence, should be located
in similar places.

> 
> 5. made use of the virtio PCI capabilities for the additional device
>    resources (doorbells, shared memory). This required changes in
>    virtio_pci.c and trans_virtio_vhost_user.c.
> 
> 6. [minor] changed some commit headlines to comply with
>    check-git-log.sh.
> 
> Please, have a look and let me know about your thoughts. Any
> reviews/pointers/suggestions are welcome.
> 
> Best regards,
> Nikos
> 
> [1] http://mails.dpdk.org/archives/dev/2018-January/088155.html
> 
> 
> Nikos Dragazis (23):
>   vhost: introduce vhost transport operations structure
>   vhost: move socket management code
>   vhost: move socket fd and un sockaddr
>   vhost: move vhost-user connection
>   vhost: move vhost-user reconnection
>   vhost: move vhost-user fdset
>   vhost: propagate vhost transport operations
>   vhost: use a single structure for the device state
>   vhost: extract socket I/O into transport
>   vhost: move slave request fd and lock
>   vhost: move mmap/munmap
>   vhost: move setup of the log memory region
>   vhost: remove main fd parameter from msg handlers
>   vhost: move postcopy live migration code
>   vhost: support registering additional vhost-user transports
>   drivers/virtio_vhost_user: add virtio PCI framework
>   drivers: add virtio-vhost-user transport
>   drivers/virtio_vhost_user: use additional device resources
>   vhost: add flag for choosing vhost-user transport
>   net/vhost: add virtio-vhost-user support
>   mk: link apps with virtio-vhost-user driver
>   config: add option for the virtio-vhost-user transport
>   usertools: add virtio-vhost-user devices to dpdk-devbind.py
> 
> Stefan Hajnoczi (5):
>   vhost: allocate per-socket transport state
>   vhost: move start server/client calls
>   vhost: add index field in vhost virtqueues
>   examples/vhost_scsi: add --socket-file argument
>   examples/vhost_scsi: add virtio-vhost-user support
> 
>  config/common_base                                 |    6 +
>  config/common_linux                                |    1 +
>  drivers/Makefile                                   |    5 +
>  drivers/net/vhost/rte_eth_vhost.c                  |   13 +
>  drivers/virtio_vhost_user/Makefile                 |   27 +
>  .../rte_virtio_vhost_user_version.map              |    4 +
>  .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1077 +++++++++++++++++++
>  drivers/virtio_vhost_user/virtio_pci.c             |  520 ++++++++++
>  drivers/virtio_vhost_user/virtio_pci.h             |  289 ++++++
>  drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
>  drivers/virtio_vhost_user/virtqueue.h              |  181 ++++
>  examples/vhost_scsi/vhost_scsi.c                   |  103 +-
>  lib/librte_vhost/Makefile                          |    4 +-
>  lib/librte_vhost/rte_vhost.h                       |    1 +
>  lib/librte_vhost/rte_vhost_version.map             |   11 +
>  lib/librte_vhost/socket.c                          |  685 +-----------
>  lib/librte_vhost/trans_af_unix.c                   | 1094 ++++++++++++++++++++
>  lib/librte_vhost/vhost.c                           |   22 +-
>  lib/librte_vhost/vhost.h                           |  298 +++++-
>  lib/librte_vhost/vhost_user.c                      |  474 ++-------
>  lib/librte_vhost/vhost_user.h                      |   10 +-
>  mk/rte.app.mk                                      |    6 +
>  usertools/dpdk-devbind.py                          |    7 +
>  23 files changed, 3764 insertions(+), 1092 deletions(-)
>  create mode 100644 drivers/virtio_vhost_user/Makefile
>  create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
>  create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
>  create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h
>  create mode 100644 drivers/virtio_vhost_user/virtqueue.h
>  create mode 100644 lib/librte_vhost/trans_af_unix.c
>
Maxime Coquelin June 20, 2019, 11:35 a.m. UTC | #2
Hi Nikos,

On 6/19/19 5:14 PM, Nikos Dragazis wrote:
> Hi everyone,
> 
> this patch series introduces the concept of the virtio-vhost-user
> transport. This is actually a revised version of an earlier RFC
> implementation that has been proposed by Stefan Hajnoczi [1]. Though
> this is a great feature, it seems to have been stalled, so I’d like to
> restart the conversation on this and hopefully get it merged with your
> help. Let me give you an overview.

Thanks for taking over the series!

I think you are already aware of that, but it arrives too late to
consider it for v19.08, as the proposal deadline is over by almost 3
weeks.

That said, it is good that you sent it early, so that we can work to
make it in for v19.11.

> The virtio-vhost-user transport is a vhost-user transport implementation
> that is based on the virtio-vhost-user device. Its key difference with
> the existing transport is that it allows deploying vhost-user targets
> inside dedicated Storage Appliance VMs instead of host user space. In
> other words, it allows having guests that act as vhost-user backends for
> other guests.
> 
> The virtio-vhost-user device implements the vhost-user control plane
> (master-slave communication) as follows:
> 
> 1. it parses the vhost-user messages from the vhost-user unix domain
>     socket and forwards them to the slave guest through virtqueues
> 
> 2. it maps the vhost memory regions in QEMU’s process address space and
>     exposes them to the slave guest as a RAM-backed PCI MMIO region
> 
> 3. it hooks up doorbells to the callfds. The slave guest can use these
>     doorbells to interrupt the master guest driver
> 
> The device code has not yet been merged into upstream QEMU, but this is
> definitely the end goal.

Could you provide a pointer to the QEMU series, and instructions to test
this new device?

> The current state is that we are awaiting for
> the approval of the virtio spec.

Ditto, a link to the spec patches would be useful.

> I have Cced Darek from the SPDK community who has helped me a lot by
> reviewing this series. Note that any device type could be implemented
> over this new transport. So, adding the virtio-vhost-user transport in
> DPDK would allow using it from SPDK as well.
> 
> Getting into the code internals, this patch series makes the following
> changes:
> 
> 1. introduce a generic interface for the transport-specific operations.
>     Each of the two available transports, the pre-existing AF_UNIX
>     transport and the virtio-vhost-user transport, is going to implement
>     this interface. The AF_UNIX-specific code has been extracted from the
>     core vhost-user code and is now part of the AF_UNIX transport
>     implementation in trans_af_unix.c.
> 
> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>     transport requires a driver for the virtio-vhost-user devices. The
>     driver along with the transport implementation have been packed into
>     a separate library in `drivers/virtio_vhost_user/`. The necessary
>     virtio-pci code has been copied from `drivers/net/virtio/`. Some
>     additional changes have been made so that the driver can utilize the
>     additional resources of the virtio-vhost-user device.
> 
> 3. update librte_vhost public API to enable choosing transport for each
>     new vhost device. Extend the vhost net driver and vhost-scsi example
>     application to export this new API to the end user.
> 
> The primary changes I did to Stefan’s RFC implementation are the
> following:
> 
> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>     live migration relies on the userfault fd mechanism, which cannot be
>     supported by virtio-vhost-user.
> 
> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>     the log memory region involves mapping/unmapping guest memory. This
>     is an AF_UNIX transport-specific operation.
> 
> 3. introduced a vhost transport operation for
>     process_slave_message_reply()
> 
> 4. moved the virtio-vhost-user transport/driver into a separate library
>     in `drivers/virtio_vhost_user/`. This required making vhost.h and
>     vhost_user.h part of librte_vhost public API and exporting some
>     private symbols via the version script. This looks better to me that
>     just moving the entire librte_vhost into `drivers/`. I am not sure if
>     this is the most appropriate solution. I am looking forward to your
>     suggestions on this.

I'm not sure this is the right place to put it.

> 5. made use of the virtio PCI capabilities for the additional device
>     resources (doorbells, shared memory). This required changes in
>     virtio_pci.c and trans_virtio_vhost_user.c.
> 
> 6. [minor] changed some commit headlines to comply with
>     check-git-log.sh.
> 
> Please, have a look and let me know about your thoughts. Any
> reviews/pointers/suggestions are welcome.

Maxime
Nikos Dragazis June 20, 2019, 11:44 p.m. UTC | #3
On 20/6/19 2:32 μ.μ., Ilya Maximets wrote:
> On 19.06.2019 18:14, Nikos Dragazis wrote:
>> Hi everyone,
> Hi. I didn't look at the code, just a few comments inline.
>
>> this patch series introduces the concept of the virtio-vhost-user
>> transport. This is actually a revised version of an earlier RFC
>> implementation that has been proposed by Stefan Hajnoczi [1]. Though
>> this is a great feature, it seems to have been stalled, so I’d like to
>> restart the conversation on this and hopefully get it merged with your
>> help. Let me give you an overview.
>>
>> The virtio-vhost-user transport is a vhost-user transport implementation
>> that is based on the virtio-vhost-user device. Its key difference with
>> the existing transport is that it allows deploying vhost-user targets
>> inside dedicated Storage Appliance VMs instead of host user space. In
>> other words, it allows having guests that act as vhost-user backends for
>> other guests.
>>
>> The virtio-vhost-user device implements the vhost-user control plane
>> (master-slave communication) as follows:
>>
>> 1. it parses the vhost-user messages from the vhost-user unix domain
>>    socket and forwards them to the slave guest through virtqueues
>>
>> 2. it maps the vhost memory regions in QEMU’s process address space and
>>    exposes them to the slave guest as a RAM-backed PCI MMIO region
>>
>> 3. it hooks up doorbells to the callfds. The slave guest can use these
>>    doorbells to interrupt the master guest driver
>>
>> The device code has not yet been merged into upstream QEMU, but this is
>> definitely the end goal. The current state is that we are awaiting for
>> the approval of the virtio spec.
>>
>> I have Cced Darek from the SPDK community who has helped me a lot by
>> reviewing this series. Note that any device type could be implemented
>> over this new transport. So, adding the virtio-vhost-user transport in
>> DPDK would allow using it from SPDK as well.
>>
>> Getting into the code internals, this patch series makes the following
>> changes:
>>
>> 1. introduce a generic interface for the transport-specific operations.
>>    Each of the two available transports, the pre-existing AF_UNIX
>>    transport and the virtio-vhost-user transport, is going to implement
>>    this interface. The AF_UNIX-specific code has been extracted from the
>>    core vhost-user code and is now part of the AF_UNIX transport
>>    implementation in trans_af_unix.c.
>>
>> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>>    transport requires a driver for the virtio-vhost-user devices. The
>>    driver along with the transport implementation have been packed into
>>    a separate library in `drivers/virtio_vhost_user/`. The necessary
>>    virtio-pci code has been copied from `drivers/net/virtio/`. Some
>>    additional changes have been made so that the driver can utilize the
>>    additional resources of the virtio-vhost-user device.
>>
>> 3. update librte_vhost public API to enable choosing transport for each
>>    new vhost device. Extend the vhost net driver and vhost-scsi example
>>    application to export this new API to the end user.
>>
>> The primary changes I did to Stefan’s RFC implementation are the
>> following:
>>
>> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>>    live migration relies on the userfault fd mechanism, which cannot be
>>    supported by virtio-vhost-user.
>>
>> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>>    the log memory region involves mapping/unmapping guest memory. This
>>    is an AF_UNIX transport-specific operation.
> Logging dirty pages is the main concept of live migration support. Does it
> mean that the live migration is not supported for virtio-vhost-user at all?

No, it is supported. To be more precise, it can be supported, it is part
of the on-going virtio device specification:

https://lists.oasis-open.org/archives/virtio-dev/201905/msg00022.html

and it is in my TODO list for the device code. Here is how it works:

the log memory region, just like the other vhost memory regions, is a
portion of the master guest memory. In case of the AF_UNIX transport,
the master sends a VHOST_USER_SET_LOG_BASE message and the vhost target
mmaps the log memory region into the process' virtual address space. In
case of the virtio-vhost-user transport, the virtio-vhost-user device
parses the VHOST_USER_SET_LOG_BASE message and maps the log memory
region into QEMU's process address space. It then exposes the log memory
region to the slave guest as a RAM-backed PCI MMIO region.

So, from the vhost target's viewpoint, the only difference is the means
of accessing the log memory region. In case of the AF_UNIX transport,
the vhost target receives a file descriptor and mmaps this file. In case
of the virtio-vhost-user transport, the device exports the log memory
region to the vhost target (running in slave guest user space) via an
MMIO BAR.

To recap, just to make sure it is clear, the virtio-vhost-user transport
does support precopy live migration, but it doesn't support postcopy
live migration.

>
>> 3. introduced a vhost transport operation for
>>    process_slave_message_reply()
>>
>> 4. moved the virtio-vhost-user transport/driver into a separate library
>>    in `drivers/virtio_vhost_user/`. This required making vhost.h and
>>    vhost_user.h part of librte_vhost public API and exporting some
>>    private symbols via the version script. This looks better to me that
>>    just moving the entire librte_vhost into `drivers/`. I am not sure if
>>    this is the most appropriate solution. I am looking forward to your
>>    suggestions on this.
> Moving the virtio-vhost-user code to a separate driver looks strange for me.
> What is the purpose?

The virtio-vhost-user transport is based on the virtio-vhost-user
device. This means that we need a user space driver for this device
type.  Currently, the driver and the transport implementation are packed
together into `trans_virtio_vhost_user.c`. There are 2 reasons why I
moved all the virtio-vhost-user code (driver + transport implementation)
into `drivers/`:

1. my understanding is that all drivers should be located in `drivers/`.

2. |the driver depends on `rte_bus_pci.h` which gets exported when
   `drivers/` is built whereas librte_vhost is built before `drivers/`.
   However, I think this could be overcomed by just adding this line:

   CFLAGS += -I$(RTE_SDK)/drivers/bus/pci

   in librte_vhost Makefile.||


|
>
> Exporting a lot of vhost internal structures will lead to a frequent API/ABI
> breakages and will slow down accepting changes to releases in general.

I understand that there are many parameters that we have to consider. My
implementation is just a proposal.  I am looking forward to hearing from
you and the community any other suggestions on this.

>
> It looks inconsistent to have 'trans_af_unix.c' in 'lib/librte_vhost/' and
> 'trans_virtio_vhost_user.c' in 'drivers/virtio_vhost_user/' because these
> files should be similar in provided functionality, hence, should be located
> in similar places.

I agree. Actually, I was thinking about separating the driver from the
transport implementation and moving just the driver into `drivers/`. But
this would require exporting some low-level virtio PCI functions. And
this didn't look appropriate to me.

>
>> 5. made use of the virtio PCI capabilities for the additional device
>>    resources (doorbells, shared memory). This required changes in
>>    virtio_pci.c and trans_virtio_vhost_user.c.
>>
>> 6. [minor] changed some commit headlines to comply with
>>    check-git-log.sh.
>>
>> Please, have a look and let me know about your thoughts. Any
>> reviews/pointers/suggestions are welcome.
>>
>> Best regards,
>> Nikos
>>
>> [1] http://mails.dpdk.org/archives/dev/2018-January/088155.html
>>
>>
>> Nikos Dragazis (23):
>>   vhost: introduce vhost transport operations structure
>>   vhost: move socket management code
>>   vhost: move socket fd and un sockaddr
>>   vhost: move vhost-user connection
>>   vhost: move vhost-user reconnection
>>   vhost: move vhost-user fdset
>>   vhost: propagate vhost transport operations
>>   vhost: use a single structure for the device state
>>   vhost: extract socket I/O into transport
>>   vhost: move slave request fd and lock
>>   vhost: move mmap/munmap
>>   vhost: move setup of the log memory region
>>   vhost: remove main fd parameter from msg handlers
>>   vhost: move postcopy live migration code
>>   vhost: support registering additional vhost-user transports
>>   drivers/virtio_vhost_user: add virtio PCI framework
>>   drivers: add virtio-vhost-user transport
>>   drivers/virtio_vhost_user: use additional device resources
>>   vhost: add flag for choosing vhost-user transport
>>   net/vhost: add virtio-vhost-user support
>>   mk: link apps with virtio-vhost-user driver
>>   config: add option for the virtio-vhost-user transport
>>   usertools: add virtio-vhost-user devices to dpdk-devbind.py
>>
>> Stefan Hajnoczi (5):
>>   vhost: allocate per-socket transport state
>>   vhost: move start server/client calls
>>   vhost: add index field in vhost virtqueues
>>   examples/vhost_scsi: add --socket-file argument
>>   examples/vhost_scsi: add virtio-vhost-user support
>>
>>  config/common_base                                 |    6 +
>>  config/common_linux                                |    1 +
>>  drivers/Makefile                                   |    5 +
>>  drivers/net/vhost/rte_eth_vhost.c                  |   13 +
>>  drivers/virtio_vhost_user/Makefile                 |   27 +
>>  .../rte_virtio_vhost_user_version.map              |    4 +
>>  .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1077 +++++++++++++++++++
>>  drivers/virtio_vhost_user/virtio_pci.c             |  520 ++++++++++
>>  drivers/virtio_vhost_user/virtio_pci.h             |  289 ++++++
>>  drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
>>  drivers/virtio_vhost_user/virtqueue.h              |  181 ++++
>>  examples/vhost_scsi/vhost_scsi.c                   |  103 +-
>>  lib/librte_vhost/Makefile                          |    4 +-
>>  lib/librte_vhost/rte_vhost.h                       |    1 +
>>  lib/librte_vhost/rte_vhost_version.map             |   11 +
>>  lib/librte_vhost/socket.c                          |  685 +-----------
>>  lib/librte_vhost/trans_af_unix.c                   | 1094 ++++++++++++++++++++
>>  lib/librte_vhost/vhost.c                           |   22 +-
>>  lib/librte_vhost/vhost.h                           |  298 +++++-
>>  lib/librte_vhost/vhost_user.c                      |  474 ++-------
>>  lib/librte_vhost/vhost_user.h                      |   10 +-
>>  mk/rte.app.mk                                      |    6 +
>>  usertools/dpdk-devbind.py                          |    7 +
>>  23 files changed, 3764 insertions(+), 1092 deletions(-)
>>  create mode 100644 drivers/virtio_vhost_user/Makefile
>>  create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
>>  create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
>>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
>>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
>>  create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h
>>  create mode 100644 drivers/virtio_vhost_user/virtqueue.h
>>  create mode 100644 lib/librte_vhost/trans_af_unix.c
>>
Nikos Dragazis June 22, 2019, 8:26 p.m. UTC | #4
On 20/6/19 2:35 μ.μ., Maxime Coquelin wrote:
> Hi Nikos,
>
> On 6/19/19 5:14 PM, Nikos Dragazis wrote:
>> Hi everyone,
>>
>> this patch series introduces the concept of the virtio-vhost-user
>> transport. This is actually a revised version of an earlier RFC
>> implementation that has been proposed by Stefan Hajnoczi [1]. Though
>> this is a great feature, it seems to have been stalled, so I’d like to
>> restart the conversation on this and hopefully get it merged with your
>> help. Let me give you an overview.
>
> Thanks for taking over the series!
>
> I think you are already aware of that, but it arrives too late to
> consider it for v19.08, as the proposal deadline is over by almost 3
> weeks.
>
> That said, it is good that you sent it early, so that we can work to
> make it in for v19.11.

That's totally fine.

>
>> The virtio-vhost-user transport is a vhost-user transport implementation
>> that is based on the virtio-vhost-user device. Its key difference with
>> the existing transport is that it allows deploying vhost-user targets
>> inside dedicated Storage Appliance VMs instead of host user space. In
>> other words, it allows having guests that act as vhost-user backends for
>> other guests.
>>
>> The virtio-vhost-user device implements the vhost-user control plane
>> (master-slave communication) as follows:
>>
>> 1. it parses the vhost-user messages from the vhost-user unix domain
>>     socket and forwards them to the slave guest through virtqueues
>>
>> 2. it maps the vhost memory regions in QEMU’s process address space and
>>     exposes them to the slave guest as a RAM-backed PCI MMIO region
>>
>> 3. it hooks up doorbells to the callfds. The slave guest can use these
>>     doorbells to interrupt the master guest driver
>>
>> The device code has not yet been merged into upstream QEMU, but this is
>> definitely the end goal.
>
> Could you provide a pointer to the QEMU series, and instructions to test
> this new device?

Of course. Please have a look at the following step-by-step guide:

https://github.com/ndragazis/ndragazis.github.io/blob/master/dpdk-vhost-vvu-demo.md

If you face any problems or you find that something breaks (I hope not
:) ) please let me know. I haven't done thorough testing so I may have
missed something.  Please, also bear in mind that the device code is not
finished. There are still some things that need to be done.

>
>> The current state is that we are awaiting for
>> the approval of the virtio spec.
>
> Ditto, a link to the spec patches would be useful.

You can find the patches here:

https://lists.oasis-open.org/archives/virtio-dev/201905/msg00022.html

and an HTML version is available here:

https://ndragazis.github.io/virtio-v1.1-wd02.html#x1-41000011

>
>> I have Cced Darek from the SPDK community who has helped me a lot by
>> reviewing this series. Note that any device type could be implemented
>> over this new transport. So, adding the virtio-vhost-user transport in
>> DPDK would allow using it from SPDK as well.
>>
>> Getting into the code internals, this patch series makes the following
>> changes:
>>
>> 1. introduce a generic interface for the transport-specific operations.
>>     Each of the two available transports, the pre-existing AF_UNIX
>>     transport and the virtio-vhost-user transport, is going to implement
>>     this interface. The AF_UNIX-specific code has been extracted from the
>>     core vhost-user code and is now part of the AF_UNIX transport
>>     implementation in trans_af_unix.c.
>>
>> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>>     transport requires a driver for the virtio-vhost-user devices. The
>>     driver along with the transport implementation have been packed into
>>     a separate library in `drivers/virtio_vhost_user/`. The necessary
>>     virtio-pci code has been copied from `drivers/net/virtio/`. Some
>>     additional changes have been made so that the driver can utilize the
>>     additional resources of the virtio-vhost-user device.
>>
>> 3. update librte_vhost public API to enable choosing transport for each
>>     new vhost device. Extend the vhost net driver and vhost-scsi example
>>     application to export this new API to the end user.
>>
>> The primary changes I did to Stefan’s RFC implementation are the
>> following:
>>
>> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>>     live migration relies on the userfault fd mechanism, which cannot be
>>     supported by virtio-vhost-user.
>>
>> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>>     the log memory region involves mapping/unmapping guest memory. This
>>     is an AF_UNIX transport-specific operation.
>>
>> 3. introduced a vhost transport operation for
>>     process_slave_message_reply()
>>
>> 4. moved the virtio-vhost-user transport/driver into a separate library
>>     in `drivers/virtio_vhost_user/`. This required making vhost.h and
>>     vhost_user.h part of librte_vhost public API and exporting some
>>     private symbols via the version script. This looks better to me that
>>     just moving the entire librte_vhost into `drivers/`. I am not sure if
>>     this is the most appropriate solution. I am looking forward to your
>>     suggestions on this.
>
> I'm not sure this is the right place to put it.

Okay. Is there something specific that you think that doesn't fit
nicely? Do you have something else in mind?

>
>> 5. made use of the virtio PCI capabilities for the additional device
>>     resources (doorbells, shared memory). This required changes in
>>     virtio_pci.c and trans_virtio_vhost_user.c.
>>
>> 6. [minor] changed some commit headlines to comply with
>>     check-git-log.sh.
>>
>> Please, have a look and let me know about your thoughts. Any
>> reviews/pointers/suggestions are welcome.
>
> Maxime