mbox series

[v5,00/26] Add VDUSE support to Vhost library

Message ID 20230606081852.71003-1-maxime.coquelin@redhat.com (mailing list archive)
Headers
Series Add VDUSE support to Vhost library |

Message

Maxime Coquelin June 6, 2023, 8:18 a.m. UTC
  This series introduces a new type of backend, VDUSE,
to the Vhost library.

VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.

Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.

It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.

While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:

https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc

In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.

Benchmark results:
==================

On this v2, PVP reference benchmark has been run & compared with
Vhost-user.

When doing macswap forwarding in the worload, no difference is seen.
When doing io forwarding in the workload, we see 4% performance
degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
explained by the use of the IOTLB layer in the Vhost-library when using
VDUSE, whereas Vhost-user/Virtio-user does not make use of it.

Usage:
======

1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa

2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build

3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
 
4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse
=> The virtio-net netdev shows up (eth0 here)
# ip l show eth0
21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff

5. Start/stop traffic in testpmd
testpmd> start
testpmd> show port stats 0
  ######################## NIC statistics for port 0  ########################
  RX-packets: 11         RX-missed: 0          RX-bytes:  1482
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 1          TX-errors: 0          TX-bytes:  62

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################
testpmd> stop

6. Detach the VDUSE device from the vDPA bus
# vdpa dev del vduse0

7. Quit testpmd
testpmd> quit

Known issues & remaining work:
==============================
- Fix issue in FD manager (still polling while FD has been removed)
- Add Netlink support in Vhost library
- Support device reconnection
 -> a temporary patch to support reconnection via a tmpfs file is available,
    upstream solution would be in-kernel and is being developed.
 -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
- Support packed ring
- Provide more performance benchmark results

Changes in v5:
==============
- Delay starting/stopping the device to after having replied to the VDUSE
  event in order to avoid a deadlock encountered when testing with OVS.
- Mention reconnection support lack in the release note.

Changes in v4:
==============
- Applied patch 1 and patch 2 from v3
- Rebased on top of Eelco series
- Fix coredump clear in IOTLB cache removal (David)
- Remove uneeded ret variable in vhost_vring_inject_irq (David)
- Fixed release note (David, Chenbo)

Changes in v2/v3:
=================
- Fixed mem_set_dump() parameter (patch 4)
- Fixed accidental comment change (patch 7, Chenbo)
- Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
- move change from patch 12 to 13 (Chenbo)
- Enable locks annotation for control queue (Patch 17)
- Send control queue notification when used descriptors enqueued (Patch 17)
- Lock control queue IOTLB lock (Patch 17)
- Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
- Set VDUSE dev FD as NONBLOCK (Patch 18)
- Enable more Virtio features (Patch 18)
- Remove calls to pthread_setcancelstate() (Patch 22)
- Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
- Use RTE_DIM() to get requests string array size (Patch 22)
- Set reply result for IOTLB update message (Patch 25, Chenbo)
- Fix queues enablement with multiqueue (Patch 26)
- Move kickfd creation for better logging (Patch 26)
- Improve logging (Patch 26)
- Uninstall cvq kickfd in case of handler installation failure (Patch 27)
- Enable CVQ notifications once handler is installed (Patch 27)
- Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
- Add release notes

Maxime Coquelin (26):
  vhost: fix IOTLB entries overlap check with previous entry
  vhost: add helper of IOTLB entries coredump
  vhost: add helper for IOTLB entries shared page check
  vhost: don't dump unneeded pages with IOTLB
  vhost: change to single IOTLB cache per device
  vhost: add offset field to IOTLB entries
  vhost: add page size info to IOTLB entry
  vhost: retry translating IOVA after IOTLB miss
  vhost: introduce backend ops
  vhost: add IOTLB cache entry removal callback
  vhost: add helper for IOTLB misses
  vhost: add helper for interrupt injection
  vhost: add API to set max queue pairs
  net/vhost: use API to set max queue pairs
  vhost: add control virtqueue support
  vhost: add VDUSE device creation and destruction
  vhost: add VDUSE callback for IOTLB miss
  vhost: add VDUSE callback for IOTLB entry removal
  vhost: add VDUSE callback for IRQ injection
  vhost: add VDUSE events handler
  vhost: add support for virtqueue state get event
  vhost: add support for VDUSE status set event
  vhost: add support for VDUSE IOTLB update event
  vhost: add VDUSE device startup
  vhost: add multiqueue support to VDUSE
  vhost: add VDUSE device stop

 doc/guides/prog_guide/vhost_lib.rst    |   4 +
 doc/guides/rel_notes/release_23_07.rst |  12 +
 drivers/net/vhost/rte_eth_vhost.c      |   3 +
 lib/vhost/iotlb.c                      | 333 +++++++------
 lib/vhost/iotlb.h                      |  45 +-
 lib/vhost/meson.build                  |   5 +
 lib/vhost/rte_vhost.h                  |  17 +
 lib/vhost/socket.c                     |  72 ++-
 lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
 lib/vhost/vduse.h                      |  33 ++
 lib/vhost/version.map                  |   1 +
 lib/vhost/vhost.c                      |  70 ++-
 lib/vhost/vhost.h                      |  57 ++-
 lib/vhost/vhost_user.c                 |  51 +-
 lib/vhost/vhost_user.h                 |   2 +-
 lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
 lib/vhost/virtio_net_ctrl.h            |  10 +
 17 files changed, 1409 insertions(+), 238 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h
  

Comments

Chenbo Xia June 7, 2023, 6:48 a.m. UTC | #1
Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 6, 2023 4:18 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
> 
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
> 
> Benchmark results:
> ==================
> 
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
> 
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> 
> Usage:
> ======
> 
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
> 
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
> 
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --
> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --
> txq=4 --rxq=4
> 
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> mode DEFAULT group default qlen 1000
>     link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> 
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>   ######################## NIC statistics for port 0
> ########################
>   RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 1          TX-errors: 0          TX-bytes:  62
> 
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
> 
> ##########################################################################
> ##
> testpmd> stop
> 
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
> 
> 7. Quit testpmd
> testpmd> quit
> 
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>  -> a temporary patch to support reconnection via a tmpfs file is
> available,
>     upstream solution would be in-kernel and is being developed.
>  -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results
> 
> Changes in v5:
> ==============
> - Delay starting/stopping the device to after having replied to the VDUSE
>   event in order to avoid a deadlock encountered when testing with OVS.

Could you explain more to help me understand the deadlock issue?

Thanks,
Chenbo

> - Mention reconnection support lack in the release note.
> 
> Changes in v4:
> ==============
> - Applied patch 1 and patch 2 from v3
> - Rebased on top of Eelco series
> - Fix coredump clear in IOTLB cache removal (David)
> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
> - Fixed release note (David, Chenbo)
> 
> Changes in v2/v3:
> =================
> - Fixed mem_set_dump() parameter (patch 4)
> - Fixed accidental comment change (patch 7, Chenbo)
> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> - move change from patch 12 to 13 (Chenbo)
> - Enable locks annotation for control queue (Patch 17)
> - Send control queue notification when used descriptors enqueued (Patch 17)
> - Lock control queue IOTLB lock (Patch 17)
> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> - Enable more Virtio features (Patch 18)
> - Remove calls to pthread_setcancelstate() (Patch 22)
> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
> (Patch 22)
> - Use RTE_DIM() to get requests string array size (Patch 22)
> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> - Fix queues enablement with multiqueue (Patch 26)
> - Move kickfd creation for better logging (Patch 26)
> - Improve logging (Patch 26)
> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
> - Enable CVQ notifications once handler is installed (Patch 27)
> - Don't advertise multiqueue and control queue if app only request single
> queue pair (Patch 27)
> - Add release notes
> 
> Maxime Coquelin (26):
>   vhost: fix IOTLB entries overlap check with previous entry
>   vhost: add helper of IOTLB entries coredump
>   vhost: add helper for IOTLB entries shared page check
>   vhost: don't dump unneeded pages with IOTLB
>   vhost: change to single IOTLB cache per device
>   vhost: add offset field to IOTLB entries
>   vhost: add page size info to IOTLB entry
>   vhost: retry translating IOVA after IOTLB miss
>   vhost: introduce backend ops
>   vhost: add IOTLB cache entry removal callback
>   vhost: add helper for IOTLB misses
>   vhost: add helper for interrupt injection
>   vhost: add API to set max queue pairs
>   net/vhost: use API to set max queue pairs
>   vhost: add control virtqueue support
>   vhost: add VDUSE device creation and destruction
>   vhost: add VDUSE callback for IOTLB miss
>   vhost: add VDUSE callback for IOTLB entry removal
>   vhost: add VDUSE callback for IRQ injection
>   vhost: add VDUSE events handler
>   vhost: add support for virtqueue state get event
>   vhost: add support for VDUSE status set event
>   vhost: add support for VDUSE IOTLB update event
>   vhost: add VDUSE device startup
>   vhost: add multiqueue support to VDUSE
>   vhost: add VDUSE device stop
> 
>  doc/guides/prog_guide/vhost_lib.rst    |   4 +
>  doc/guides/rel_notes/release_23_07.rst |  12 +
>  drivers/net/vhost/rte_eth_vhost.c      |   3 +
>  lib/vhost/iotlb.c                      | 333 +++++++------
>  lib/vhost/iotlb.h                      |  45 +-
>  lib/vhost/meson.build                  |   5 +
>  lib/vhost/rte_vhost.h                  |  17 +
>  lib/vhost/socket.c                     |  72 ++-
>  lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>  lib/vhost/vduse.h                      |  33 ++
>  lib/vhost/version.map                  |   1 +
>  lib/vhost/vhost.c                      |  70 ++-
>  lib/vhost/vhost.h                      |  57 ++-
>  lib/vhost/vhost_user.c                 |  51 +-
>  lib/vhost/vhost_user.h                 |   2 +-
>  lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>  lib/vhost/virtio_net_ctrl.h            |  10 +
>  17 files changed, 1409 insertions(+), 238 deletions(-)
>  create mode 100644 lib/vhost/vduse.c
>  create mode 100644 lib/vhost/vduse.h
>  create mode 100644 lib/vhost/virtio_net_ctrl.c
>  create mode 100644 lib/vhost/virtio_net_ctrl.h
> 
> --
> 2.40.1
  
David Marchand June 7, 2023, 8:05 a.m. UTC | #2
On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
>
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
>
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
>
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
>
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
>
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
>
> Benchmark results:
> ==================
>
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
>
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>
> Usage:
> ======
>
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
>
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
>
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4

9 is a nice but undefined value. 8 is enough.
In general, I prefer "human readable" strings, like *:debug ;-).


>
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>     link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>   ######################## NIC statistics for port 0  ########################
>   RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 1          TX-errors: 0          TX-bytes:  62
>
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
>   ############################################################################
> testpmd> stop
>
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
>
> 7. Quit testpmd
> testpmd> quit
>
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>  -> a temporary patch to support reconnection via a tmpfs file is available,
>     upstream solution would be in-kernel and is being developed.
>  -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results

We are missing a reference to the kernel patches required to have
vduse accept net devices.

I had played with the patches at v1 and it was working ok.
I did not review in depth the latest revisions, but I followed your
series from the PoC/start.
Overall, the series lgtm.

For the series,
Acked-by: David Marchand <david.marchand@redhat.com>
  
Maxime Coquelin June 7, 2023, 2:58 p.m. UTC | #3
On 6/7/23 08:48, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, June 6, 2023 4:18 PM
>> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com; lulu@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --
>> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --
>> txq=4 --rxq=4
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>> mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0
>> ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>
>> ##########################################################################
>> ##
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is
>> available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
>> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
>>
>> Changes in v5:
>> ==============
>> - Delay starting/stopping the device to after having replied to the VDUSE
>>    event in order to avoid a deadlock encountered when testing with OVS.
> 
> Could you explain more to help me understand the deadlock issue?

Sure.

The V5 fixes an ABBA deadlock involving OVS mutex and kernel
rtnl_lock(), two OVS threads and the vdpa tool process.

We have an OVS bridge with a mlx5 port already added.
We add the vduse port to the same bridge.
Then we use the iproute2 vdpa tool to attach the vduse device the the
kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
reply.

This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
called, which triggers a bridge reconfiguration.

On bridge reconfiguration, the mlx5 port takes the OVS mutex and
performs an ioctl() which tries to take the rtnl lock, but is is already
owned by the vdpa tool.

The vduse_events thread is stucked waiting for the OVS mutex, so the 
reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
process is stucked for 30 seconds, until a timeout happens.

When the timeourt happen, everything is unblocked, but the VDUSE device
has been marked as broken, and so not usable anymore.

I could reproduce and provide you the backtraces of the different
threads if you wish.

Anyway, I think it makes sense to perform the device startup after
having replied to VDUSE_SET_STATUS request, as it just mean the device
has taken into account the new status of the driver.

Hope it clarifies, let me know if you need more details.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>> - Mention reconnection support lack in the release note.
>>
>> Changes in v4:
>> ==============
>> - Applied patch 1 and patch 2 from v3
>> - Rebased on top of Eelco series
>> - Fix coredump clear in IOTLB cache removal (David)
>> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
>> - Fixed release note (David, Chenbo)
>>
>> Changes in v2/v3:
>> =================
>> - Fixed mem_set_dump() parameter (patch 4)
>> - Fixed accidental comment change (patch 7, Chenbo)
>> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
>> - move change from patch 12 to 13 (Chenbo)
>> - Enable locks annotation for control queue (Patch 17)
>> - Send control queue notification when used descriptors enqueued (Patch 17)
>> - Lock control queue IOTLB lock (Patch 17)
>> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
>> - Set VDUSE dev FD as NONBLOCK (Patch 18)
>> - Enable more Virtio features (Patch 18)
>> - Remove calls to pthread_setcancelstate() (Patch 22)
>> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
>> (Patch 22)
>> - Use RTE_DIM() to get requests string array size (Patch 22)
>> - Set reply result for IOTLB update message (Patch 25, Chenbo)
>> - Fix queues enablement with multiqueue (Patch 26)
>> - Move kickfd creation for better logging (Patch 26)
>> - Improve logging (Patch 26)
>> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
>> - Enable CVQ notifications once handler is installed (Patch 27)
>> - Don't advertise multiqueue and control queue if app only request single
>> queue pair (Patch 27)
>> - Add release notes
>>
>> Maxime Coquelin (26):
>>    vhost: fix IOTLB entries overlap check with previous entry
>>    vhost: add helper of IOTLB entries coredump
>>    vhost: add helper for IOTLB entries shared page check
>>    vhost: don't dump unneeded pages with IOTLB
>>    vhost: change to single IOTLB cache per device
>>    vhost: add offset field to IOTLB entries
>>    vhost: add page size info to IOTLB entry
>>    vhost: retry translating IOVA after IOTLB miss
>>    vhost: introduce backend ops
>>    vhost: add IOTLB cache entry removal callback
>>    vhost: add helper for IOTLB misses
>>    vhost: add helper for interrupt injection
>>    vhost: add API to set max queue pairs
>>    net/vhost: use API to set max queue pairs
>>    vhost: add control virtqueue support
>>    vhost: add VDUSE device creation and destruction
>>    vhost: add VDUSE callback for IOTLB miss
>>    vhost: add VDUSE callback for IOTLB entry removal
>>    vhost: add VDUSE callback for IRQ injection
>>    vhost: add VDUSE events handler
>>    vhost: add support for virtqueue state get event
>>    vhost: add support for VDUSE status set event
>>    vhost: add support for VDUSE IOTLB update event
>>    vhost: add VDUSE device startup
>>    vhost: add multiqueue support to VDUSE
>>    vhost: add VDUSE device stop
>>
>>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>>   doc/guides/rel_notes/release_23_07.rst |  12 +
>>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>>   lib/vhost/iotlb.c                      | 333 +++++++------
>>   lib/vhost/iotlb.h                      |  45 +-
>>   lib/vhost/meson.build                  |   5 +
>>   lib/vhost/rte_vhost.h                  |  17 +
>>   lib/vhost/socket.c                     |  72 ++-
>>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>>   lib/vhost/vduse.h                      |  33 ++
>>   lib/vhost/version.map                  |   1 +
>>   lib/vhost/vhost.c                      |  70 ++-
>>   lib/vhost/vhost.h                      |  57 ++-
>>   lib/vhost/vhost_user.c                 |  51 +-
>>   lib/vhost/vhost_user.h                 |   2 +-
>>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>>   lib/vhost/virtio_net_ctrl.h            |  10 +
>>   17 files changed, 1409 insertions(+), 238 deletions(-)
>>   create mode 100644 lib/vhost/vduse.c
>>   create mode 100644 lib/vhost/vduse.h
>>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>>   create mode 100644 lib/vhost/virtio_net_ctrl.h
>>
>> --
>> 2.40.1
>
  
Chenbo Xia June 8, 2023, 1:53 a.m. UTC | #4
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, June 7, 2023 10:59 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Subject: Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
> 
> 
> 
> On 6/7/23 08:48, Xia, Chenbo wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, June 6, 2023 4:18 PM
> >> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> >> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie,
> Yongji
> >> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> >> amorenoz@redhat.com; lulu@redhat.com
> >> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
> >>
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >>
> >> Benchmark results:
> >> ==================
> >>
> >> On this v2, PVP reference benchmark has been run & compared with
> >> Vhost-user.
> >>
> >> When doing macswap forwarding in the worload, no difference is seen.
> >> When doing io forwarding in the workload, we see 4% performance
> >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> >> explained by the use of the IOTLB layer in the Vhost-library when using
> >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> >>
> >> Usage:
> >> ======
> >>
> >> 1. Probe required Kernel modules
> >> # modprobe vdpa
> >> # modprobe vduse
> >> # modprobe virtio-vdpa
> >>
> >> 2. Build (require vduse kernel headers to be available)
> >> # meson build
> >> # ninja -C build
> >>
> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> >> testpmd (with 4 queue pairs in this example)
> >> # ./build/app/dpdk-testpmd --no-pci --
> >> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i
> --
> >> txq=4 --rxq=4
> >>
> >> 4. Attach the VDUSE device to the vDPA bus
> >> # vdpa dev add name vduse0 mgmtdev vduse
> >> => The virtio-net netdev shows up (eth0 here)
> >> # ip l show eth0
> >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> >> mode DEFAULT group default qlen 1000
> >>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> >>
> >> 5. Start/stop traffic in testpmd
> >> testpmd> start
> >> testpmd> show port stats 0
> >>    ######################## NIC statistics for port 0
> >> ########################
> >>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
> >>    RX-errors: 0
> >>    RX-nombuf:  0
> >>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> >>
> >>    Throughput (since last show)
> >>    Rx-pps:            0          Rx-bps:            0
> >>    Tx-pps:            0          Tx-bps:            0
> >>
> >>
> ##########################################################################
> >> ##
> >> testpmd> stop
> >>
> >> 6. Detach the VDUSE device from the vDPA bus
> >> # vdpa dev del vduse0
> >>
> >> 7. Quit testpmd
> >> testpmd> quit
> >>
> >> Known issues & remaining work:
> >> ==============================
> >> - Fix issue in FD manager (still polling while FD has been removed)
> >> - Add Netlink support in Vhost library
> >> - Support device reconnection
> >>   -> a temporary patch to support reconnection via a tmpfs file is
> >> available,
> >>      upstream solution would be in-kernel and is being developed.
> >>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
> >> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> >> - Support packed ring
> >> - Provide more performance benchmark results
> >>
> >> Changes in v5:
> >> ==============
> >> - Delay starting/stopping the device to after having replied to the
> VDUSE
> >>    event in order to avoid a deadlock encountered when testing with OVS.
> >
> > Could you explain more to help me understand the deadlock issue?
> 
> Sure.
> 
> The V5 fixes an ABBA deadlock involving OVS mutex and kernel
> rtnl_lock(), two OVS threads and the vdpa tool process.
> 
> We have an OVS bridge with a mlx5 port already added.
> We add the vduse port to the same bridge.
> Then we use the iproute2 vdpa tool to attach the vduse device the the
> kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
> net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
> reply.
> 
> This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
> handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
> called, which triggers a bridge reconfiguration.
> 
> On bridge reconfiguration, the mlx5 port takes the OVS mutex and
> performs an ioctl() which tries to take the rtnl lock, but is is already
> owned by the vdpa tool.
> 
> The vduse_events thread is stucked waiting for the OVS mutex, so the
> reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
> process is stucked for 30 seconds, until a timeout happens.
> 
> When the timeourt happen, everything is unblocked, but the VDUSE device
> has been marked as broken, and so not usable anymore.
> 
> I could reproduce and provide you the backtraces of the different
> threads if you wish.
> 
> Anyway, I think it makes sense to perform the device startup after
> having replied to VDUSE_SET_STATUS request, as it just mean the device
> has taken into account the new status of the driver.
> 
> Hope it clarifies, let me know if you need more details.

It's very clear! Thanks Maxime for the explanation!

/Chenbo

> 
> Thanks,
> Maxime
> 
> > Thanks,
> > Chenbo
> >
> >> - Mention reconnection support lack in the release note.
> >>
> >> Changes in v4:
> >> ==============
> >> - Applied patch 1 and patch 2 from v3
> >> - Rebased on top of Eelco series
> >> - Fix coredump clear in IOTLB cache removal (David)
> >> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
> >> - Fixed release note (David, Chenbo)
> >>
> >> Changes in v2/v3:
> >> =================
> >> - Fixed mem_set_dump() parameter (patch 4)
> >> - Fixed accidental comment change (patch 7, Chenbo)
> >> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> >> - move change from patch 12 to 13 (Chenbo)
> >> - Enable locks annotation for control queue (Patch 17)
> >> - Send control queue notification when used descriptors enqueued (Patch
> 17)
> >> - Lock control queue IOTLB lock (Patch 17)
> >> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> >> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> >> - Enable more Virtio features (Patch 18)
> >> - Remove calls to pthread_setcancelstate() (Patch 22)
> >> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a
> set
> >> (Patch 22)
> >> - Use RTE_DIM() to get requests string array size (Patch 22)
> >> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> >> - Fix queues enablement with multiqueue (Patch 26)
> >> - Move kickfd creation for better logging (Patch 26)
> >> - Improve logging (Patch 26)
> >> - Uninstall cvq kickfd in case of handler installation failure (Patch
> 27)
> >> - Enable CVQ notifications once handler is installed (Patch 27)
> >> - Don't advertise multiqueue and control queue if app only request
> single
> >> queue pair (Patch 27)
> >> - Add release notes
> >>
> >> Maxime Coquelin (26):
> >>    vhost: fix IOTLB entries overlap check with previous entry
> >>    vhost: add helper of IOTLB entries coredump
> >>    vhost: add helper for IOTLB entries shared page check
> >>    vhost: don't dump unneeded pages with IOTLB
> >>    vhost: change to single IOTLB cache per device
> >>    vhost: add offset field to IOTLB entries
> >>    vhost: add page size info to IOTLB entry
> >>    vhost: retry translating IOVA after IOTLB miss
> >>    vhost: introduce backend ops
> >>    vhost: add IOTLB cache entry removal callback
> >>    vhost: add helper for IOTLB misses
> >>    vhost: add helper for interrupt injection
> >>    vhost: add API to set max queue pairs
> >>    net/vhost: use API to set max queue pairs
> >>    vhost: add control virtqueue support
> >>    vhost: add VDUSE device creation and destruction
> >>    vhost: add VDUSE callback for IOTLB miss
> >>    vhost: add VDUSE callback for IOTLB entry removal
> >>    vhost: add VDUSE callback for IRQ injection
> >>    vhost: add VDUSE events handler
> >>    vhost: add support for virtqueue state get event
> >>    vhost: add support for VDUSE status set event
> >>    vhost: add support for VDUSE IOTLB update event
> >>    vhost: add VDUSE device startup
> >>    vhost: add multiqueue support to VDUSE
> >>    vhost: add VDUSE device stop
> >>
> >>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
> >>   doc/guides/rel_notes/release_23_07.rst |  12 +
> >>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
> >>   lib/vhost/iotlb.c                      | 333 +++++++------
> >>   lib/vhost/iotlb.h                      |  45 +-
> >>   lib/vhost/meson.build                  |   5 +
> >>   lib/vhost/rte_vhost.h                  |  17 +
> >>   lib/vhost/socket.c                     |  72 ++-
> >>   lib/vhost/vduse.c                      | 646
> +++++++++++++++++++++++++
> >>   lib/vhost/vduse.h                      |  33 ++
> >>   lib/vhost/version.map                  |   1 +
> >>   lib/vhost/vhost.c                      |  70 ++-
> >>   lib/vhost/vhost.h                      |  57 ++-
> >>   lib/vhost/vhost_user.c                 |  51 +-
> >>   lib/vhost/vhost_user.h                 |   2 +-
> >>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
> >>   lib/vhost/virtio_net_ctrl.h            |  10 +
> >>   17 files changed, 1409 insertions(+), 238 deletions(-)
> >>   create mode 100644 lib/vhost/vduse.c
> >>   create mode 100644 lib/vhost/vduse.h
> >>   create mode 100644 lib/vhost/virtio_net_ctrl.c
> >>   create mode 100644 lib/vhost/virtio_net_ctrl.h
> >>
> >> --
> >> 2.40.1
> >
  
Maxime Coquelin June 8, 2023, 9:17 a.m. UTC | #5
On 6/7/23 10:05, David Marchand wrote:
> On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
> 
> 9 is a nice but undefined value. 8 is enough.
> In general, I prefer "human readable" strings, like *:debug ;-).
> 
> 
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0  ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>    ############################################################################
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
> 
> We are missing a reference to the kernel patches required to have
> vduse accept net devices.

Right, I mention it in the cover letter, but it should be in the release
note also. I propose to append this to the release note:
"While VDUSE support is already available in upstream Kernel, a couple
of patches are required to support network device type, which are being
upstreamed: 
https://lore.kernel.org/all/20230419134329.346825-1-maxime.coquelin@redhat.com/"

Does that sound good to you?

Thanks,
Maxime

> 
> I had played with the patches at v1 and it was working ok.
> I did not review in depth the latest revisions, but I followed your
> series from the PoC/start.
> Overall, the series lgtm.
> 
> For the series,
> Acked-by: David Marchand <david.marchand@redhat.com>
> 
>
  
David Marchand June 8, 2023, 12:44 p.m. UTC | #6
On Thu, Jun 8, 2023 at 11:17 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 6/7/23 10:05, David Marchand wrote:
> > On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin
> > <maxime.coquelin@redhat.com> wrote:
> >>
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >>
> >> Benchmark results:
> >> ==================
> >>
> >> On this v2, PVP reference benchmark has been run & compared with
> >> Vhost-user.
> >>
> >> When doing macswap forwarding in the worload, no difference is seen.
> >> When doing io forwarding in the workload, we see 4% performance
> >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> >> explained by the use of the IOTLB layer in the Vhost-library when using
> >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> >>
> >> Usage:
> >> ======
> >>
> >> 1. Probe required Kernel modules
> >> # modprobe vdpa
> >> # modprobe vduse
> >> # modprobe virtio-vdpa
> >>
> >> 2. Build (require vduse kernel headers to be available)
> >> # meson build
> >> # ninja -C build
> >>
> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> >> testpmd (with 4 queue pairs in this example)
> >> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
> >
> > 9 is a nice but undefined value. 8 is enough.
> > In general, I prefer "human readable" strings, like *:debug ;-).
> >
> >
> >>
> >> 4. Attach the VDUSE device to the vDPA bus
> >> # vdpa dev add name vduse0 mgmtdev vduse
> >> => The virtio-net netdev shows up (eth0 here)
> >> # ip l show eth0
> >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
> >>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> >>
> >> 5. Start/stop traffic in testpmd
> >> testpmd> start
> >> testpmd> show port stats 0
> >>    ######################## NIC statistics for port 0  ########################
> >>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
> >>    RX-errors: 0
> >>    RX-nombuf:  0
> >>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> >>
> >>    Throughput (since last show)
> >>    Rx-pps:            0          Rx-bps:            0
> >>    Tx-pps:            0          Tx-bps:            0
> >>    ############################################################################
> >> testpmd> stop
> >>
> >> 6. Detach the VDUSE device from the vDPA bus
> >> # vdpa dev del vduse0
> >>
> >> 7. Quit testpmd
> >> testpmd> quit
> >>
> >> Known issues & remaining work:
> >> ==============================
> >> - Fix issue in FD manager (still polling while FD has been removed)
> >> - Add Netlink support in Vhost library
> >> - Support device reconnection
> >>   -> a temporary patch to support reconnection via a tmpfs file is available,
> >>      upstream solution would be in-kernel and is being developed.
> >>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> >> - Support packed ring
> >> - Provide more performance benchmark results
> >
> > We are missing a reference to the kernel patches required to have
> > vduse accept net devices.
>
> Right, I mention it in the cover letter, but it should be in the release
> note also. I propose to append this to the release note:
> "While VDUSE support is already available in upstream Kernel, a couple
> of patches are required to support network device type, which are being
> upstreamed:
> https://lore.kernel.org/all/20230419134329.346825-1-maxime.coquelin@redhat.com/"
>
> Does that sound good to you?

Ok for me.
Thanks.
  
Maxime Coquelin June 8, 2023, 2:29 p.m. UTC | #7
On 6/6/23 10:18, Maxime Coquelin wrote:
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
> 
> Benchmark results:
> ==================
> 
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
> 
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> 
> Usage:
> ======
> 
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
> 
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
> 
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
>   
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> 
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>    ######################## NIC statistics for port 0  ########################
>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>    RX-errors: 0
>    RX-nombuf:  0
>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> 
>    Throughput (since last show)
>    Rx-pps:            0          Rx-bps:            0
>    Tx-pps:            0          Tx-bps:            0
>    ############################################################################
> testpmd> stop
> 
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
> 
> 7. Quit testpmd
> testpmd> quit
> 
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>   -> a temporary patch to support reconnection via a tmpfs file is available,
>      upstream solution would be in-kernel and is being developed.
>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results
> 
> Changes in v5:
> ==============
> - Delay starting/stopping the device to after having replied to the VDUSE
>    event in order to avoid a deadlock encountered when testing with OVS.
> - Mention reconnection support lack in the release note.
> 
> Changes in v4:
> ==============
> - Applied patch 1 and patch 2 from v3
> - Rebased on top of Eelco series
> - Fix coredump clear in IOTLB cache removal (David)
> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
> - Fixed release note (David, Chenbo)
> 
> Changes in v2/v3:
> =================
> - Fixed mem_set_dump() parameter (patch 4)
> - Fixed accidental comment change (patch 7, Chenbo)
> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> - move change from patch 12 to 13 (Chenbo)
> - Enable locks annotation for control queue (Patch 17)
> - Send control queue notification when used descriptors enqueued (Patch 17)
> - Lock control queue IOTLB lock (Patch 17)
> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> - Enable more Virtio features (Patch 18)
> - Remove calls to pthread_setcancelstate() (Patch 22)
> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
> - Use RTE_DIM() to get requests string array size (Patch 22)
> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> - Fix queues enablement with multiqueue (Patch 26)
> - Move kickfd creation for better logging (Patch 26)
> - Improve logging (Patch 26)
> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
> - Enable CVQ notifications once handler is installed (Patch 27)
> - Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
> - Add release notes
> 
> Maxime Coquelin (26):
>    vhost: fix IOTLB entries overlap check with previous entry
>    vhost: add helper of IOTLB entries coredump
>    vhost: add helper for IOTLB entries shared page check
>    vhost: don't dump unneeded pages with IOTLB
>    vhost: change to single IOTLB cache per device
>    vhost: add offset field to IOTLB entries
>    vhost: add page size info to IOTLB entry
>    vhost: retry translating IOVA after IOTLB miss
>    vhost: introduce backend ops
>    vhost: add IOTLB cache entry removal callback
>    vhost: add helper for IOTLB misses
>    vhost: add helper for interrupt injection
>    vhost: add API to set max queue pairs
>    net/vhost: use API to set max queue pairs
>    vhost: add control virtqueue support
>    vhost: add VDUSE device creation and destruction
>    vhost: add VDUSE callback for IOTLB miss
>    vhost: add VDUSE callback for IOTLB entry removal
>    vhost: add VDUSE callback for IRQ injection
>    vhost: add VDUSE events handler
>    vhost: add support for virtqueue state get event
>    vhost: add support for VDUSE status set event
>    vhost: add support for VDUSE IOTLB update event
>    vhost: add VDUSE device startup
>    vhost: add multiqueue support to VDUSE
>    vhost: add VDUSE device stop
> 
>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>   doc/guides/rel_notes/release_23_07.rst |  12 +
>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>   lib/vhost/iotlb.c                      | 333 +++++++------
>   lib/vhost/iotlb.h                      |  45 +-
>   lib/vhost/meson.build                  |   5 +
>   lib/vhost/rte_vhost.h                  |  17 +
>   lib/vhost/socket.c                     |  72 ++-
>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>   lib/vhost/vduse.h                      |  33 ++
>   lib/vhost/version.map                  |   1 +
>   lib/vhost/vhost.c                      |  70 ++-
>   lib/vhost/vhost.h                      |  57 ++-
>   lib/vhost/vhost_user.c                 |  51 +-
>   lib/vhost/vhost_user.h                 |   2 +-
>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>   lib/vhost/virtio_net_ctrl.h            |  10 +
>   17 files changed, 1409 insertions(+), 238 deletions(-)
>   create mode 100644 lib/vhost/vduse.c
>   create mode 100644 lib/vhost/vduse.h
>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>   create mode 100644 lib/vhost/virtio_net_ctrl.h
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime