[v4,3/3] vhost: fix offload flags in Rx path
Checks
Commit Message
The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.
This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.
On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.
Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
the virtio spec),
Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.
The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.
Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Changes since v3:
- rebased on next-virtio,
Changes since v2:
- introduced a new flag to keep existing behavior as the default,
- packets with unrecognised offload are passed to the application with no
offload metadata rather than dropped,
- ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
the virtio driver is not allowed to use this flag when transmitting
packets,
Changes since v1:
- updated vhost example,
- restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
- restored log on buggy offload request,
---
doc/guides/prog_guide/vhost_lib.rst | 12 ++
doc/guides/rel_notes/release_21_05.rst | 6 +
drivers/net/vhost/rte_eth_vhost.c | 2 +-
examples/vhost/main.c | 44 +++---
lib/vhost/rte_vhost.h | 1 +
lib/vhost/socket.c | 5 +-
lib/vhost/vhost.c | 6 +-
lib/vhost/vhost.h | 14 +-
lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++---
9 files changed, 222 insertions(+), 53 deletions(-)
Comments
On Mon, May 03, 2021 at 06:43:44PM +0200, David Marchand wrote:
> The vhost library currently configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
>
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
>
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
>
> Copy virtio rx processing to fix those offload flags with some
> differences:
> - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
> the virtio spec),
>
> Some applications might rely on the current behavior, so it is left
> untouched by default.
> A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
> new behavior.
>
> The vhost example has been updated for the new behavior: TSO is applied to
> any packet marked LRO.
>
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> Cc: stable@dpdk.org
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Changes since v3:
> - rebased on next-virtio,
>
> Changes since v2:
> - introduced a new flag to keep existing behavior as the default,
> - packets with unrecognised offload are passed to the application with no
> offload metadata rather than dropped,
> - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
> the virtio driver is not allowed to use this flag when transmitting
> packets,
>
> Changes since v1:
> - updated vhost example,
> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
> - restored log on buggy offload request,
>
> ---
> doc/guides/prog_guide/vhost_lib.rst | 12 ++
> doc/guides/rel_notes/release_21_05.rst | 6 +
> drivers/net/vhost/rte_eth_vhost.c | 2 +-
> examples/vhost/main.c | 44 +++---
> lib/vhost/rte_vhost.h | 1 +
> lib/vhost/socket.c | 5 +-
> lib/vhost/vhost.c | 6 +-
> lib/vhost/vhost.h | 14 +-
> lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++---
> 9 files changed, 222 insertions(+), 53 deletions(-)
>
> diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
> index 7afa351675..d18fb98910 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions:
>
> It is disabled by default.
>
> + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
> +
> + Since v16.04, the vhost library forwards checksum and gso requests for
> + packets received from a virtio driver by filling Tx offload metadata in
> + the mbuf. This behavior is inconsistent with other drivers but it is left
> + untouched for existing applications that might rely on it.
> +
> + This flag disables the legacy behavior and instead ask vhost to simply
> + populate Rx offload metadata in the mbuf.
> +
> + It is disabled by default.
> +
> * ``rte_vhost_driver_set_features(path, features)``
>
> This function sets the feature bits the vhost-user driver supports. The
> diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
> index a5f21f8425..6b7b0810a5 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -337,6 +337,12 @@ API Changes
> ``policer_action_recolor_supported`` and ``policer_action_drop_supported``
> have been removed.
>
> +* vhost: The vhost library currently populates received mbufs from a virtio
> + driver with Tx offload flags while not filling Rx offload flags.
> + While this behavior is arguable, it is kept untouched.
> + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask
> + for a behavior compliant with to the mbuf offload API.
> +
>
> ABI Changes
> -----------
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> index d198fc8a8e..281379d6a3 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
> int ret = 0;
> char *iface_name;
> uint16_t queues;
> - uint64_t flags = 0;
> + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> uint64_t disable_flags = 0;
> int client_mode = 0;
> int iommu_support = 0;
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 0bee1f3321..d2179eadb9 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -19,6 +19,7 @@
> #include <rte_log.h>
> #include <rte_string_fns.h>
> #include <rte_malloc.h>
> +#include <rte_net.h>
> #include <rte_vhost.h>
> #include <rte_ip.h>
> #include <rte_tcp.h>
> @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
> return 0;
> }
>
> -static uint16_t
> -get_psd_sum(void *l3_hdr, uint64_t ol_flags)
> -{
> - if (ol_flags & PKT_TX_IPV4)
> - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
> - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
> -}
> -
> static void virtio_tx_offload(struct rte_mbuf *m)
> {
> + struct rte_net_hdr_lens hdr_lens;
> + struct rte_ipv4_hdr *ipv4_hdr;
> + struct rte_tcp_hdr *tcp_hdr;
> + uint32_t ptype;
> void *l3_hdr;
> - struct rte_ipv4_hdr *ipv4_hdr = NULL;
> - struct rte_tcp_hdr *tcp_hdr = NULL;
> - struct rte_ether_hdr *eth_hdr =
> - rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
>
> - l3_hdr = (char *)eth_hdr + m->l2_len;
> + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> + m->l2_len = hdr_lens.l2_len;
> + m->l3_len = hdr_lens.l3_len;
> + m->l4_len = hdr_lens.l4_len;
>
> - if (m->ol_flags & PKT_TX_IPV4) {
> + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
> + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
> + m->l2_len + m->l3_len);
> +
> + m->ol_flags |= PKT_TX_TCP_SEG;
> + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
> + m->ol_flags |= PKT_TX_IPV4;
> + m->ol_flags |= PKT_TX_IP_CKSUM;
> ipv4_hdr = l3_hdr;
> ipv4_hdr->hdr_checksum = 0;
> - m->ol_flags |= PKT_TX_IP_CKSUM;
> + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags);
> + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> + m->ol_flags |= PKT_TX_IPV6;
> + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags);
> }
> -
> - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
> - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
> }
>
> static __rte_always_inline void
> @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
> m->vlan_tci = vlan_tag;
> }
>
> - if (m->ol_flags & PKT_TX_TCP_SEG)
> + if (m->ol_flags & PKT_RX_LRO)
> virtio_tx_offload(m);
>
> tx_q->m_table[tx_q->len++] = m;
> @@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
> int ret, i;
> uint16_t portid;
> static pthread_t tid;
> - uint64_t flags = 0;
> + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>
> signal(SIGINT, sigint_handler);
>
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index d0a8ae31f2..8d875e9322 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -36,6 +36,7 @@ extern "C" {
> /* support only linear buffers (no chained mbufs) */
> #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6)
> #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7)
> +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8)
>
> /* Features. */
> #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 0169d36481..5d0d728d52 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -42,6 +42,7 @@ struct vhost_user_socket {
> bool extbuf;
> bool linearbuf;
> bool async_copy;
> + bool net_compliant_ol_flags;
>
> /*
> * The "supported_features" indicates the feature bits the
> @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
> size = strnlen(vsocket->path, PATH_MAX);
> vhost_set_ifname(vid, vsocket->path, size);
>
> - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
> + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
> + vsocket->net_compliant_ol_flags);
>
> vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
>
> @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
> vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
> vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> + vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>
> if (vsocket->async_copy &&
> (flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index c9b6379f73..9abfc0bfe7 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
> }
>
> void
> -vhost_set_builtin_virtio_net(int vid, bool enable)
> +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
> {
> struct virtio_net *dev = get_device(vid);
>
> @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
> dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> else
> dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> + if (!compliant_ol_flags)
> + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
> + else
> + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
> }
>
> void
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index b303635645..8078ddff79 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -27,15 +27,17 @@
> #include "rte_vhost_async.h"
>
> /* Used to indicate that the device is running on a data core */
> -#define VIRTIO_DEV_RUNNING 1
> +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
> /* Used to indicate that the device is ready to operate */
> -#define VIRTIO_DEV_READY 2
> +#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
> /* Used to indicate that the built-in vhost net device backend is enabled */
> -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
> +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
> /* Used to indicate that the device has its own data path and configured */
> -#define VIRTIO_DEV_VDPA_CONFIGURED 8
> +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
> /* Used to indicate that the feature negotiation failed */
> -#define VIRTIO_DEV_FEATURES_FAILED 16
> +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
> +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
> +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
>
> /* Backend value set by guest. */
> #define VIRTIO_DEV_STOPPED -1
> @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
> void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
>
> void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> -void vhost_set_builtin_virtio_net(int vid, bool enable);
> +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
> void vhost_enable_extbuf(int vid);
> void vhost_enable_linearbuf(int vid);
> int vhost_enable_guest_notification(struct virtio_net *dev,
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 1a34867f3c..8e36f4c340 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -8,6 +8,7 @@
>
> #include <rte_mbuf.h>
> #include <rte_memcpy.h>
> +#include <rte_net.h>
> #include <rte_ether.h>
> #include <rte_ip.h>
> #include <rte_vhost.h>
> @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
> }
>
> static __rte_always_inline void
> -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> {
> uint16_t l4_proto = 0;
> void *l4_hdr = NULL;
> struct rte_tcp_hdr *tcp_hdr = NULL;
>
> - if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> - return;
> -
> parse_ethernet(m, &l4_proto, &l4_hdr);
> if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> }
> }
>
> +static __rte_always_inline void
> +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
> + bool legacy_ol_flags)
> +{
> + struct rte_net_hdr_lens hdr_lens;
> + int l4_supported = 0;
> + uint32_t ptype;
> +
> + if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> + return;
> +
> + if (legacy_ol_flags) {
> + vhost_dequeue_offload_legacy(hdr, m);
> + return;
> + }
> +
> + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> + m->packet_type = ptype;
My _impression_ is that calling rte_net_get_ptype() makes the
receiving process a bit more expensive than without the patch
and it is not optional. However, the original parsing code was
limited and could be considered a bug.
Anyways, calling that has the nice side effect of providing the
packet_type which it didn't provide before the patch.
Acked-by: Flavio Leitner <fbl@sysclose.org>
(though this just got merged)
Thanks David, great work!
fbl
> + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> + l4_supported = 1;
> +
> + /* According to Virtio 1.1 spec, the device only needs to look at
> + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path.
> + * This differs from the processing incoming packets path where the
> + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the
> + * device.
> + *
> + * 5.1.6.2.1 Driver Requirements: Packet Transmission
> + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and
> + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
> + *
> + * 5.1.6.2.2 Device Requirements: Packet Transmission
> + * The device MUST ignore flag bits that it does not recognize.
> + */
> + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> + uint32_t hdrlen;
> +
> + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> + if (hdr->csum_start <= hdrlen && l4_supported != 0) {
> + m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> + } else {
> + /* Unknown proto or tunnel, do sw cksum. We can assume
> + * the cksum field is in the first segment since the
> + * buffers we provided to the host are large enough.
> + * In case of SCTP, this will be wrong since it's a CRC
> + * but there's nothing we can do.
> + */
> + uint16_t csum = 0, off;
> +
> + if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> + rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0)
> + return;
> + if (likely(csum != 0xffff))
> + csum = ~csum;
> + off = hdr->csum_offset + hdr->csum_start;
> + if (rte_pktmbuf_data_len(m) >= off + 1)
> + *rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum;
> + }
> + }
> +
> + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> + if (hdr->gso_size == 0)
> + return;
> +
> + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> + case VIRTIO_NET_HDR_GSO_TCPV4:
> + case VIRTIO_NET_HDR_GSO_TCPV6:
> + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP)
> + break;
> + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
> + m->tso_segsz = hdr->gso_size;
> + break;
> + case VIRTIO_NET_HDR_GSO_UDP:
> + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP)
> + break;
> + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
> + m->tso_segsz = hdr->gso_size;
> + break;
> + default:
> + break;
> + }
> + }
> +}
> +
> static __rte_noinline void
> copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
> struct buf_vector *buf_vec)
> @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
> static __rte_always_inline int
> copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> struct buf_vector *buf_vec, uint16_t nr_vec,
> - struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> + struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
> + bool legacy_ol_flags)
> {
> uint32_t buf_avail, buf_offset;
> uint64_t buf_addr, buf_len;
> @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> m->pkt_len += mbuf_offset;
>
> if (hdr)
> - vhost_dequeue_offload(hdr, m);
> + vhost_dequeue_offload(hdr, m, legacy_ol_flags);
>
> out:
>
> @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
> return pkt;
> }
>
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
> virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> + bool legacy_ol_flags)
> {
> uint16_t i;
> uint16_t free_entries;
> @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> }
>
> err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
> - mbuf_pool);
> + mbuf_pool, legacy_ol_flags);
> if (unlikely(err)) {
> rte_pktmbuf_free(pkts[i]);
> if (!allocerr_warned) {
> @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> return (i - dropped);
> }
>
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_legacy(struct virtio_net *dev,
> + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> + struct rte_mbuf **pkts, uint16_t count)
> +{
> + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_compliant(struct virtio_net *dev,
> + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> + struct rte_mbuf **pkts, uint16_t count)
> +{
> + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
> static __rte_always_inline int
> vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> static __rte_always_inline int
> virtio_dev_tx_batch_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> - struct rte_mbuf **pkts)
> + struct rte_mbuf **pkts,
> + bool legacy_ol_flags)
> {
> uint16_t avail_idx = vq->last_avail_idx;
> uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev,
> if (virtio_net_with_host_offload(dev)) {
> vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> - vhost_dequeue_offload(hdr, pkts[i]);
> + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
> }
> }
>
> @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
> struct rte_mempool *mbuf_pool,
> struct rte_mbuf *pkts,
> uint16_t *buf_id,
> - uint16_t *desc_count)
> + uint16_t *desc_count,
> + bool legacy_ol_flags)
> {
> struct buf_vector buf_vec[BUF_VECTOR_MAX];
> uint32_t buf_len;
> @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
> }
>
> err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
> - mbuf_pool);
> + mbuf_pool, legacy_ol_flags);
> if (unlikely(err)) {
> if (!allocerr_warned) {
> VHOST_LOG_DATA(ERR,
> @@ -2859,14 +2968,15 @@ static __rte_always_inline int
> virtio_dev_tx_single_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> struct rte_mempool *mbuf_pool,
> - struct rte_mbuf *pkts)
> + struct rte_mbuf *pkts,
> + bool legacy_ol_flags)
> {
>
> uint16_t buf_id, desc_count = 0;
> int ret;
>
> ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id,
> - &desc_count);
> + &desc_count, legacy_ol_flags);
>
> if (likely(desc_count > 0)) {
> if (virtio_net_is_inorder(dev))
> @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev,
> return ret;
> }
>
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
> virtio_dev_tx_packed(struct virtio_net *dev,
> struct vhost_virtqueue *__rte_restrict vq,
> struct rte_mempool *mbuf_pool,
> struct rte_mbuf **__rte_restrict pkts,
> - uint32_t count)
> + uint32_t count,
> + bool legacy_ol_flags)
> {
> uint32_t pkt_idx = 0;
>
> @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
>
> if (count - pkt_idx >= PACKED_BATCH_SIZE) {
> if (!virtio_dev_tx_batch_packed(dev, vq,
> - &pkts[pkt_idx])) {
> + &pkts[pkt_idx],
> + legacy_ol_flags)) {
> pkt_idx += PACKED_BATCH_SIZE;
> continue;
> }
> }
>
> if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
> - pkts[pkt_idx]))
> + pkts[pkt_idx],
> + legacy_ol_flags))
> break;
> pkt_idx++;
> } while (pkt_idx < count);
> @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> return pkt_idx;
> }
>
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_legacy(struct virtio_net *dev,
> + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
> + struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_compliant(struct virtio_net *dev,
> + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
> + struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
> uint16_t
> rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> count -= 1;
> }
>
> - if (vq_is_packed(dev))
> - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count);
> - else
> - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
> + if (vq_is_packed(dev)) {
> + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> + count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count);
> + else
> + count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count);
> + } else {
> + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> + count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count);
> + else
> + count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count);
> + }
>
> out:
> if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> --
> 2.23.0
>
Hi David,
May I know how to configures Tx offloading by testpmd, could you help to provide an example case?
I add a case which need vhost tx offload (TSO/cksum) function, this case can't work with the patch, could you use this case as the example if possible?
For example: VM2VM split ring vhost-user/virtio-net test with tcp traffic
=========================================================================
1. Launch the Vhost sample on socket 0 by below commands::
rm -rf vhost-net*
./dpdk-testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev 'net_vhost0,iface=vhost-net0,queues=1' \
--vdev 'net_vhost1,iface=vhost-net1,queues=1' -- -i --nb-cores=2 --txd=1024 --rxd=1024
testpmd>start
2. Launch VM1 and VM2 on socket 1::
taskset -c 32 qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -smp 1 -m 4096 \
-object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
-numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04.img \
-chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
-device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
-monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
-netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \
-chardev socket,id=char0,path=./vhost-net0 \
-netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
-device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :10
taskset -c 33 qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -smp 1 -m 4096 \
-object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
-numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04-2.img \
-chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
-device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
-monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
-netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \
-chardev socket,id=char0,path=./vhost-net1 \
-netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
-device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :12
3. On VM1, set virtio device IP and run arp protocal::
ifconfig ens5 1.1.1.2
arp -s 1.1.1.8 52:54:00:00:00:02
4. On VM2, set virtio device IP and run arp protocal::
ifconfig ens5 1.1.1.8
arp -s 1.1.1.2 52:54:00:00:00:01
5. Check the iperf performance with different packet size between two VMs by below commands::
Under VM1, run: `iperf -s -i 1`
Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60`
BR,
Yinan
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of David Marchand
> Sent: 2021?5?4? 0:44
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo
> <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>
> Subject: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
>
> The vhost library currently configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
>
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
>
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
>
> Copy virtio rx processing to fix those offload flags with some
> differences:
> - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply
> with
> the virtio spec),
>
> Some applications might rely on the current behavior, so it is left
> untouched by default.
> A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to
> enable the
> new behavior.
>
> The vhost example has been updated for the new behavior: TSO is applied
> to
> any packet marked LRO.
>
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> Cc: stable@dpdk.org
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Changes since v3:
> - rebased on next-virtio,
>
> Changes since v2:
> - introduced a new flag to keep existing behavior as the default,
> - packets with unrecognised offload are passed to the application with no
> offload metadata rather than dropped,
> - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
> the virtio driver is not allowed to use this flag when transmitting
> packets,
>
> Changes since v1:
> - updated vhost example,
> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP
> support,
> - restored log on buggy offload request,
>
> ---
> doc/guides/prog_guide/vhost_lib.rst | 12 ++
> doc/guides/rel_notes/release_21_05.rst | 6 +
> drivers/net/vhost/rte_eth_vhost.c | 2 +-
> examples/vhost/main.c | 44 +++---
> lib/vhost/rte_vhost.h | 1 +
> lib/vhost/socket.c | 5 +-
> lib/vhost/vhost.c | 6 +-
> lib/vhost/vhost.h | 14 +-
> lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++---
> 9 files changed, 222 insertions(+), 53 deletions(-)
>
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index 7afa351675..d18fb98910 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -118,6 +118,18 @@ The following is an overview of some key Vhost
> API functions:
>
> It is disabled by default.
>
> + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
> +
> + Since v16.04, the vhost library forwards checksum and gso requests for
> + packets received from a virtio driver by filling Tx offload metadata in
> + the mbuf. This behavior is inconsistent with other drivers but it is left
> + untouched for existing applications that might rely on it.
> +
> + This flag disables the legacy behavior and instead ask vhost to simply
> + populate Rx offload metadata in the mbuf.
> +
> + It is disabled by default.
> +
> * ``rte_vhost_driver_set_features(path, features)``
>
> This function sets the feature bits the vhost-user driver supports. The
> diff --git a/doc/guides/rel_notes/release_21_05.rst
> b/doc/guides/rel_notes/release_21_05.rst
> index a5f21f8425..6b7b0810a5 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -337,6 +337,12 @@ API Changes
> ``policer_action_recolor_supported`` and
> ``policer_action_drop_supported``
> have been removed.
>
> +* vhost: The vhost library currently populates received mbufs from a virtio
> + driver with Tx offload flags while not filling Rx offload flags.
> + While this behavior is arguable, it is kept untouched.
> + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been
> added to ask
> + for a behavior compliant with to the mbuf offload API.
> +
>
> ABI Changes
> -----------
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index d198fc8a8e..281379d6a3 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device
> *dev)
> int ret = 0;
> char *iface_name;
> uint16_t queues;
> - uint64_t flags = 0;
> + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> uint64_t disable_flags = 0;
> int client_mode = 0;
> int iommu_support = 0;
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 0bee1f3321..d2179eadb9 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -19,6 +19,7 @@
> #include <rte_log.h>
> #include <rte_string_fns.h>
> #include <rte_malloc.h>
> +#include <rte_net.h>
> #include <rte_vhost.h>
> #include <rte_ip.h>
> #include <rte_tcp.h>
> @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev,
> struct rte_mbuf *m,
> return 0;
> }
>
> -static uint16_t
> -get_psd_sum(void *l3_hdr, uint64_t ol_flags)
> -{
> - if (ol_flags & PKT_TX_IPV4)
> - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
> - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
> -}
> -
> static void virtio_tx_offload(struct rte_mbuf *m)
> {
> + struct rte_net_hdr_lens hdr_lens;
> + struct rte_ipv4_hdr *ipv4_hdr;
> + struct rte_tcp_hdr *tcp_hdr;
> + uint32_t ptype;
> void *l3_hdr;
> - struct rte_ipv4_hdr *ipv4_hdr = NULL;
> - struct rte_tcp_hdr *tcp_hdr = NULL;
> - struct rte_ether_hdr *eth_hdr =
> - rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
>
> - l3_hdr = (char *)eth_hdr + m->l2_len;
> + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> + m->l2_len = hdr_lens.l2_len;
> + m->l3_len = hdr_lens.l3_len;
> + m->l4_len = hdr_lens.l4_len;
>
> - if (m->ol_flags & PKT_TX_IPV4) {
> + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
> + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
> + m->l2_len + m->l3_len);
> +
> + m->ol_flags |= PKT_TX_TCP_SEG;
> + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
> + m->ol_flags |= PKT_TX_IPV4;
> + m->ol_flags |= PKT_TX_IP_CKSUM;
> ipv4_hdr = l3_hdr;
> ipv4_hdr->hdr_checksum = 0;
> - m->ol_flags |= PKT_TX_IP_CKSUM;
> + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m-
> >ol_flags);
> + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> + m->ol_flags |= PKT_TX_IPV6;
> + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m-
> >ol_flags);
> }
> -
> - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
> - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
> }
>
> static __rte_always_inline void
> @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> rte_mbuf *m, uint16_t vlan_tag)
> m->vlan_tci = vlan_tag;
> }
>
> - if (m->ol_flags & PKT_TX_TCP_SEG)
> + if (m->ol_flags & PKT_RX_LRO)
> virtio_tx_offload(m);
>
> tx_q->m_table[tx_q->len++] = m;
> @@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
> int ret, i;
> uint16_t portid;
> static pthread_t tid;
> - uint64_t flags = 0;
> + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>
> signal(SIGINT, sigint_handler);
>
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index d0a8ae31f2..8d875e9322 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -36,6 +36,7 @@ extern "C" {
> /* support only linear buffers (no chained mbufs) */
> #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6)
> #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7)
> +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8)
>
> /* Features. */
> #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 0169d36481..5d0d728d52 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -42,6 +42,7 @@ struct vhost_user_socket {
> bool extbuf;
> bool linearbuf;
> bool async_copy;
> + bool net_compliant_ol_flags;
>
> /*
> * The "supported_features" indicates the feature bits the
> @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct
> vhost_user_socket *vsocket)
> size = strnlen(vsocket->path, PATH_MAX);
> vhost_set_ifname(vid, vsocket->path, size);
>
> - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
> + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
> + vsocket->net_compliant_ol_flags);
>
> vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
>
> @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path,
> uint64_t flags)
> vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> vsocket->linearbuf = flags &
> RTE_VHOST_USER_LINEARBUF_SUPPORT;
> vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> + vsocket->net_compliant_ol_flags = flags &
> RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>
> if (vsocket->async_copy &&
> (flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index c9b6379f73..9abfc0bfe7 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name,
> unsigned int if_len)
> }
>
> void
> -vhost_set_builtin_virtio_net(int vid, bool enable)
> +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
> {
> struct virtio_net *dev = get_device(vid);
>
> @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
> dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> else
> dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> + if (!compliant_ol_flags)
> + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
> + else
> + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
> }
>
> void
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index b303635645..8078ddff79 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -27,15 +27,17 @@
> #include "rte_vhost_async.h"
>
> /* Used to indicate that the device is running on a data core */
> -#define VIRTIO_DEV_RUNNING 1
> +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
> /* Used to indicate that the device is ready to operate */
> -#define VIRTIO_DEV_READY 2
> +#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
> /* Used to indicate that the built-in vhost net device backend is enabled */
> -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
> +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
> /* Used to indicate that the device has its own data path and configured */
> -#define VIRTIO_DEV_VDPA_CONFIGURED 8
> +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
> /* Used to indicate that the feature negotiation failed */
> -#define VIRTIO_DEV_FEATURES_FAILED 16
> +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
> +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
> +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
>
> /* Backend value set by guest. */
> #define VIRTIO_DEV_STOPPED -1
> @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev,
> uint32_t vring_idx);
> void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
>
> void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> -void vhost_set_builtin_virtio_net(int vid, bool enable);
> +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
> void vhost_enable_extbuf(int vid);
> void vhost_enable_linearbuf(int vid);
> int vhost_enable_guest_notification(struct virtio_net *dev,
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 1a34867f3c..8e36f4c340 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -8,6 +8,7 @@
>
> #include <rte_mbuf.h>
> #include <rte_memcpy.h>
> +#include <rte_net.h>
> #include <rte_ether.h>
> #include <rte_ip.h>
> #include <rte_vhost.h>
> @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t
> *l4_proto, void **l4_hdr)
> }
>
> static __rte_always_inline void
> -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct
> rte_mbuf *m)
> {
> uint16_t l4_proto = 0;
> void *l4_hdr = NULL;
> struct rte_tcp_hdr *tcp_hdr = NULL;
>
> - if (hdr->flags == 0 && hdr->gso_type ==
> VIRTIO_NET_HDR_GSO_NONE)
> - return;
> -
> parse_ethernet(m, &l4_proto, &l4_hdr);
> if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr
> *hdr, struct rte_mbuf *m)
> }
> }
>
> +static __rte_always_inline void
> +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
> + bool legacy_ol_flags)
> +{
> + struct rte_net_hdr_lens hdr_lens;
> + int l4_supported = 0;
> + uint32_t ptype;
> +
> + if (hdr->flags == 0 && hdr->gso_type ==
> VIRTIO_NET_HDR_GSO_NONE)
> + return;
> +
> + if (legacy_ol_flags) {
> + vhost_dequeue_offload_legacy(hdr, m);
> + return;
> + }
> +
> + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> + m->packet_type = ptype;
> + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> + l4_supported = 1;
> +
> + /* According to Virtio 1.1 spec, the device only needs to look at
> + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission
> path.
> + * This differs from the processing incoming packets path where the
> + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by
> the
> + * device.
> + *
> + * 5.1.6.2.1 Driver Requirements: Packet Transmission
> + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID
> and
> + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
> + *
> + * 5.1.6.2.2 Device Requirements: Packet Transmission
> + * The device MUST ignore flag bits that it does not recognize.
> + */
> + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> + uint32_t hdrlen;
> +
> + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> + if (hdr->csum_start <= hdrlen && l4_supported != 0) {
> + m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> + } else {
> + /* Unknown proto or tunnel, do sw cksum. We can
> assume
> + * the cksum field is in the first segment since the
> + * buffers we provided to the host are large enough.
> + * In case of SCTP, this will be wrong since it's a CRC
> + * but there's nothing we can do.
> + */
> + uint16_t csum = 0, off;
> +
> + if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> + rte_pktmbuf_pkt_len(m) - hdr-
> >csum_start, &csum) < 0)
> + return;
> + if (likely(csum != 0xffff))
> + csum = ~csum;
> + off = hdr->csum_offset + hdr->csum_start;
> + if (rte_pktmbuf_data_len(m) >= off + 1)
> + *rte_pktmbuf_mtod_offset(m, uint16_t *,
> off) = csum;
> + }
> + }
> +
> + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> + if (hdr->gso_size == 0)
> + return;
> +
> + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> + case VIRTIO_NET_HDR_GSO_TCPV4:
> + case VIRTIO_NET_HDR_GSO_TCPV6:
> + if ((ptype & RTE_PTYPE_L4_MASK) !=
> RTE_PTYPE_L4_TCP)
> + break;
> + m->ol_flags |= PKT_RX_LRO |
> PKT_RX_L4_CKSUM_NONE;
> + m->tso_segsz = hdr->gso_size;
> + break;
> + case VIRTIO_NET_HDR_GSO_UDP:
> + if ((ptype & RTE_PTYPE_L4_MASK) !=
> RTE_PTYPE_L4_UDP)
> + break;
> + m->ol_flags |= PKT_RX_LRO |
> PKT_RX_L4_CKSUM_NONE;
> + m->tso_segsz = hdr->gso_size;
> + break;
> + default:
> + break;
> + }
> + }
> +}
> +
> static __rte_noinline void
> copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
> struct buf_vector *buf_vec)
> @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr
> *hdr,
> static __rte_always_inline int
> copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> struct buf_vector *buf_vec, uint16_t nr_vec,
> - struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> + struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
> + bool legacy_ol_flags)
> {
> uint32_t buf_avail, buf_offset;
> uint64_t buf_addr, buf_len;
> @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> m->pkt_len += mbuf_offset;
>
> if (hdr)
> - vhost_dequeue_offload(hdr, m);
> + vhost_dequeue_offload(hdr, m, legacy_ol_flags);
>
> out:
>
> @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> *dev, struct rte_mempool *mp,
> return pkt;
> }
>
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
> virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
> + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count,
> + bool legacy_ol_flags)
> {
> uint16_t i;
> uint16_t free_entries;
> @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> }
>
> err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
> - mbuf_pool);
> + mbuf_pool, legacy_ol_flags);
> if (unlikely(err)) {
> rte_pktmbuf_free(pkts[i]);
> if (!allocerr_warned) {
> @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> return (i - dropped);
> }
>
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_legacy(struct virtio_net *dev,
> + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> + struct rte_mbuf **pkts, uint16_t count)
> +{
> + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_compliant(struct virtio_net *dev,
> + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> + struct rte_mbuf **pkts, uint16_t count)
> +{
> + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
> static __rte_always_inline int
> vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct
> virtio_net *dev,
> static __rte_always_inline int
> virtio_dev_tx_batch_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> - struct rte_mbuf **pkts)
> + struct rte_mbuf **pkts,
> + bool legacy_ol_flags)
> {
> uint16_t avail_idx = vq->last_avail_idx;
> uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net
> *dev,
> if (virtio_net_with_host_offload(dev)) {
> vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> - vhost_dequeue_offload(hdr, pkts[i]);
> + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
> }
> }
>
> @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net
> *dev,
> struct rte_mempool *mbuf_pool,
> struct rte_mbuf *pkts,
> uint16_t *buf_id,
> - uint16_t *desc_count)
> + uint16_t *desc_count,
> + bool legacy_ol_flags)
> {
> struct buf_vector buf_vec[BUF_VECTOR_MAX];
> uint32_t buf_len;
> @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net
> *dev,
> }
>
> err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
> - mbuf_pool);
> + mbuf_pool, legacy_ol_flags);
> if (unlikely(err)) {
> if (!allocerr_warned) {
> VHOST_LOG_DATA(ERR,
> @@ -2859,14 +2968,15 @@ static __rte_always_inline int
> virtio_dev_tx_single_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> struct rte_mempool *mbuf_pool,
> - struct rte_mbuf *pkts)
> + struct rte_mbuf *pkts,
> + bool legacy_ol_flags)
> {
>
> uint16_t buf_id, desc_count = 0;
> int ret;
>
> ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts,
> &buf_id,
> - &desc_count);
> + &desc_count, legacy_ol_flags);
>
> if (likely(desc_count > 0)) {
> if (virtio_net_is_inorder(dev))
> @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net
> *dev,
> return ret;
> }
>
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
> virtio_dev_tx_packed(struct virtio_net *dev,
> struct vhost_virtqueue *__rte_restrict vq,
> struct rte_mempool *mbuf_pool,
> struct rte_mbuf **__rte_restrict pkts,
> - uint32_t count)
> + uint32_t count,
> + bool legacy_ol_flags)
> {
> uint32_t pkt_idx = 0;
>
> @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
>
> if (count - pkt_idx >= PACKED_BATCH_SIZE) {
> if (!virtio_dev_tx_batch_packed(dev, vq,
> - &pkts[pkt_idx])) {
> + &pkts[pkt_idx],
> + legacy_ol_flags)) {
> pkt_idx += PACKED_BATCH_SIZE;
> continue;
> }
> }
>
> if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
> - pkts[pkt_idx]))
> + pkts[pkt_idx],
> + legacy_ol_flags))
> break;
> pkt_idx++;
> } while (pkt_idx < count);
> @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> return pkt_idx;
> }
>
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_legacy(struct virtio_net *dev,
> + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> *mbuf_pool,
> + struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_compliant(struct virtio_net *dev,
> + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> *mbuf_pool,
> + struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
> uint16_t
> rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
> @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> queue_id,
> count -= 1;
> }
>
> - if (vq_is_packed(dev))
> - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts,
> count);
> - else
> - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
> + if (vq_is_packed(dev)) {
> + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> + count = virtio_dev_tx_packed_legacy(dev, vq,
> mbuf_pool, pkts, count);
> + else
> + count = virtio_dev_tx_packed_compliant(dev, vq,
> mbuf_pool, pkts, count);
> + } else {
> + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> + count = virtio_dev_tx_split_legacy(dev, vq,
> mbuf_pool, pkts, count);
> + else
> + count = virtio_dev_tx_split_compliant(dev, vq,
> mbuf_pool, pkts, count);
> + }
>
> out:
> if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> --
> 2.23.0
Hi David,
Since vhost tx offload can’t work now, we report a Bugzilla as below, could you help to take a look?
https://bugs.dpdk.org/show_bug.cgi?id=702
We also tried vhost example with VM2VM iperf test, large pkts also can't forwarding.
BR,
Yinan
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Wang, Yinan
> Sent: 2021?5?8? 14:24
> To: David Marchand <david.marchand@redhat.com>; dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo
> <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
>
> Hi David,
>
> May I know how to configures Tx offloading by testpmd, could you help to
> provide an example case?
> I add a case which need vhost tx offload (TSO/cksum) function, this case
> can't work with the patch, could you use this case as the example if possible?
>
> For example: VM2VM split ring vhost-user/virtio-net test with tcp traffic
> ==========================================================
> ===============
>
> 1. Launch the Vhost sample on socket 0 by below commands::
>
> rm -rf vhost-net*
> ./dpdk-testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev
> 'net_vhost0,iface=vhost-net0,queues=1' \
> --vdev 'net_vhost1,iface=vhost-net1,queues=1' -- -i --nb-cores=2 --
> txd=1024 --rxd=1024
> testpmd>start
>
> 2. Launch VM1 and VM2 on socket 1::
>
> taskset -c 32 qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -
> smp 1 -m 4096 \
> -object memory-backend-file,id=mem,size=4096M,mem-
> path=/mnt/huge,share=on \
> -numa node,memdev=mem -mem-prealloc -drive
> file=/home/osimg/ubuntu20-04.img \
> -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0
> -device virtio-serial \
> -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2
> -daemonize \
> -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device
> e1000,netdev=nttsip1 \
> -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \
> -chardev socket,id=char0,path=./vhost-net0 \
> -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
> -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-
> modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest
> _tso4=on,guest_ecn=on -vnc :10
>
> taskset -c 33 qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -
> smp 1 -m 4096 \
> -object memory-backend-file,id=mem,size=4096M,mem-
> path=/mnt/huge,share=on \
> -numa node,memdev=mem -mem-prealloc -drive
> file=/home/osimg/ubuntu20-04-2.img \
> -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0
> -device virtio-serial \
> -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2
> -daemonize \
> -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device
> e1000,netdev=nttsip1 \
> -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \
> -chardev socket,id=char0,path=./vhost-net1 \
> -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
> -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-
> modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest
> _tso4=on,guest_ecn=on -vnc :12
>
> 3. On VM1, set virtio device IP and run arp protocal::
>
> ifconfig ens5 1.1.1.2
> arp -s 1.1.1.8 52:54:00:00:00:02
>
> 4. On VM2, set virtio device IP and run arp protocal::
>
> ifconfig ens5 1.1.1.8
> arp -s 1.1.1.2 52:54:00:00:00:01
>
> 5. Check the iperf performance with different packet size between two VMs
> by below commands::
>
> Under VM1, run: `iperf -s -i 1`
> Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60`
>
> BR,
> Yinan
>
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of David Marchand
> > Sent: 2021?5?4? 0:44
> > To: dev@dpdk.org
> > Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> > fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo
> > <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> > stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> > <yuanhan.liu@linux.intel.com>
> > Subject: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
> >
> > The vhost library currently configures Tx offloading (PKT_TX_*) on any
> > packet received from a guest virtio device which asks for some offloading.
> >
> > This is problematic, as Tx offloading is something that the application
> > must ask for: the application needs to configure devices
> > to support every used offloads (ip, tcp checksumming, tso..), and the
> > various l2/l3/l4 lengths must be set following any processing that
> > happened in the application itself.
> >
> > On the other hand, the received packets are not marked wrt current
> > packet l3/l4 checksumming info.
> >
> > Copy virtio rx processing to fix those offload flags with some
> > differences:
> > - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> > - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to
> comply
> > with
> > the virtio spec),
> >
> > Some applications might rely on the current behavior, so it is left
> > untouched by default.
> > A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to
> > enable the
> > new behavior.
> >
> > The vhost example has been updated for the new behavior: TSO is applied
> > to
> > any packet marked LRO.
> >
> > Fixes: 859b480d5afd ("vhost: add guest offload setting")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> > Changes since v3:
> > - rebased on next-virtio,
> >
> > Changes since v2:
> > - introduced a new flag to keep existing behavior as the default,
> > - packets with unrecognised offload are passed to the application with no
> > offload metadata rather than dropped,
> > - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states
> that
> > the virtio driver is not allowed to use this flag when transmitting
> > packets,
> >
> > Changes since v1:
> > - updated vhost example,
> > - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP
> > support,
> > - restored log on buggy offload request,
> >
> > ---
> > doc/guides/prog_guide/vhost_lib.rst | 12 ++
> > doc/guides/rel_notes/release_21_05.rst | 6 +
> > drivers/net/vhost/rte_eth_vhost.c | 2 +-
> > examples/vhost/main.c | 44 +++---
> > lib/vhost/rte_vhost.h | 1 +
> > lib/vhost/socket.c | 5 +-
> > lib/vhost/vhost.c | 6 +-
> > lib/vhost/vhost.h | 14 +-
> > lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++---
> > 9 files changed, 222 insertions(+), 53 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst
> > index 7afa351675..d18fb98910 100644
> > --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -118,6 +118,18 @@ The following is an overview of some key Vhost
> > API functions:
> >
> > It is disabled by default.
> >
> > + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
> > +
> > + Since v16.04, the vhost library forwards checksum and gso requests
> for
> > + packets received from a virtio driver by filling Tx offload metadata in
> > + the mbuf. This behavior is inconsistent with other drivers but it is left
> > + untouched for existing applications that might rely on it.
> > +
> > + This flag disables the legacy behavior and instead ask vhost to simply
> > + populate Rx offload metadata in the mbuf.
> > +
> > + It is disabled by default.
> > +
> > * ``rte_vhost_driver_set_features(path, features)``
> >
> > This function sets the feature bits the vhost-user driver supports. The
> > diff --git a/doc/guides/rel_notes/release_21_05.rst
> > b/doc/guides/rel_notes/release_21_05.rst
> > index a5f21f8425..6b7b0810a5 100644
> > --- a/doc/guides/rel_notes/release_21_05.rst
> > +++ b/doc/guides/rel_notes/release_21_05.rst
> > @@ -337,6 +337,12 @@ API Changes
> > ``policer_action_recolor_supported`` and
> > ``policer_action_drop_supported``
> > have been removed.
> >
> > +* vhost: The vhost library currently populates received mbufs from a
> virtio
> > + driver with Tx offload flags while not filling Rx offload flags.
> > + While this behavior is arguable, it is kept untouched.
> > + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been
> > added to ask
> > + for a behavior compliant with to the mbuf offload API.
> > +
> >
> > ABI Changes
> > -----------
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > index d198fc8a8e..281379d6a3 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device
> > *dev)
> > int ret = 0;
> > char *iface_name;
> > uint16_t queues;
> > - uint64_t flags = 0;
> > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> > uint64_t disable_flags = 0;
> > int client_mode = 0;
> > int iommu_support = 0;
> > diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> > index 0bee1f3321..d2179eadb9 100644
> > --- a/examples/vhost/main.c
> > +++ b/examples/vhost/main.c
> > @@ -19,6 +19,7 @@
> > #include <rte_log.h>
> > #include <rte_string_fns.h>
> > #include <rte_malloc.h>
> > +#include <rte_net.h>
> > #include <rte_vhost.h>
> > #include <rte_ip.h>
> > #include <rte_tcp.h>
> > @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev,
> > struct rte_mbuf *m,
> > return 0;
> > }
> >
> > -static uint16_t
> > -get_psd_sum(void *l3_hdr, uint64_t ol_flags)
> > -{
> > - if (ol_flags & PKT_TX_IPV4)
> > - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
> > - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> > - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
> > -}
> > -
> > static void virtio_tx_offload(struct rte_mbuf *m)
> > {
> > + struct rte_net_hdr_lens hdr_lens;
> > + struct rte_ipv4_hdr *ipv4_hdr;
> > + struct rte_tcp_hdr *tcp_hdr;
> > + uint32_t ptype;
> > void *l3_hdr;
> > - struct rte_ipv4_hdr *ipv4_hdr = NULL;
> > - struct rte_tcp_hdr *tcp_hdr = NULL;
> > - struct rte_ether_hdr *eth_hdr =
> > - rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
> >
> > - l3_hdr = (char *)eth_hdr + m->l2_len;
> > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> > + m->l2_len = hdr_lens.l2_len;
> > + m->l3_len = hdr_lens.l3_len;
> > + m->l4_len = hdr_lens.l4_len;
> >
> > - if (m->ol_flags & PKT_TX_IPV4) {
> > + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
> > + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
> > + m->l2_len + m->l3_len);
> > +
> > + m->ol_flags |= PKT_TX_TCP_SEG;
> > + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
> > + m->ol_flags |= PKT_TX_IPV4;
> > + m->ol_flags |= PKT_TX_IP_CKSUM;
> > ipv4_hdr = l3_hdr;
> > ipv4_hdr->hdr_checksum = 0;
> > - m->ol_flags |= PKT_TX_IP_CKSUM;
> > + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m-
> > >ol_flags);
> > + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> > + m->ol_flags |= PKT_TX_IPV6;
> > + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m-
> > >ol_flags);
> > }
> > -
> > - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
> > - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
> > }
> >
> > static __rte_always_inline void
> > @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> > rte_mbuf *m, uint16_t vlan_tag)
> > m->vlan_tci = vlan_tag;
> > }
> >
> > - if (m->ol_flags & PKT_TX_TCP_SEG)
> > + if (m->ol_flags & PKT_RX_LRO)
> > virtio_tx_offload(m);
> >
> > tx_q->m_table[tx_q->len++] = m;
> > @@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
> > int ret, i;
> > uint16_t portid;
> > static pthread_t tid;
> > - uint64_t flags = 0;
> > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> >
> > signal(SIGINT, sigint_handler);
> >
> > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > index d0a8ae31f2..8d875e9322 100644
> > --- a/lib/vhost/rte_vhost.h
> > +++ b/lib/vhost/rte_vhost.h
> > @@ -36,6 +36,7 @@ extern "C" {
> > /* support only linear buffers (no chained mbufs) */
> > #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6)
> > #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7)
> > +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8)
> >
> > /* Features. */
> > #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
> > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> > index 0169d36481..5d0d728d52 100644
> > --- a/lib/vhost/socket.c
> > +++ b/lib/vhost/socket.c
> > @@ -42,6 +42,7 @@ struct vhost_user_socket {
> > bool extbuf;
> > bool linearbuf;
> > bool async_copy;
> > + bool net_compliant_ol_flags;
> >
> > /*
> > * The "supported_features" indicates the feature bits the
> > @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct
> > vhost_user_socket *vsocket)
> > size = strnlen(vsocket->path, PATH_MAX);
> > vhost_set_ifname(vid, vsocket->path, size);
> >
> > - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
> > + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
> > + vsocket->net_compliant_ol_flags);
> >
> > vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
> >
> > @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path,
> > uint64_t flags)
> > vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> > vsocket->linearbuf = flags &
> > RTE_VHOST_USER_LINEARBUF_SUPPORT;
> > vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> > + vsocket->net_compliant_ol_flags = flags &
> > RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> >
> > if (vsocket->async_copy &&
> > (flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
> > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> > index c9b6379f73..9abfc0bfe7 100644
> > --- a/lib/vhost/vhost.c
> > +++ b/lib/vhost/vhost.c
> > @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name,
> > unsigned int if_len)
> > }
> >
> > void
> > -vhost_set_builtin_virtio_net(int vid, bool enable)
> > +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
> > {
> > struct virtio_net *dev = get_device(vid);
> >
> > @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool
> enable)
> > dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> > else
> > dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> > + if (!compliant_ol_flags)
> > + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
> > + else
> > + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
> > }
> >
> > void
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > index b303635645..8078ddff79 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -27,15 +27,17 @@
> > #include "rte_vhost_async.h"
> >
> > /* Used to indicate that the device is running on a data core */
> > -#define VIRTIO_DEV_RUNNING 1
> > +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
> > /* Used to indicate that the device is ready to operate */
> > -#define VIRTIO_DEV_READY 2
> > +#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
> > /* Used to indicate that the built-in vhost net device backend is enabled
> */
> > -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
> > +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
> > /* Used to indicate that the device has its own data path and configured
> */
> > -#define VIRTIO_DEV_VDPA_CONFIGURED 8
> > +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
> > /* Used to indicate that the feature negotiation failed */
> > -#define VIRTIO_DEV_FEATURES_FAILED 16
> > +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
> > +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
> > +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
> >
> > /* Backend value set by guest. */
> > #define VIRTIO_DEV_STOPPED -1
> > @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev,
> > uint32_t vring_idx);
> > void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
> >
> > void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> > -void vhost_set_builtin_virtio_net(int vid, bool enable);
> > +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
> > void vhost_enable_extbuf(int vid);
> > void vhost_enable_linearbuf(int vid);
> > int vhost_enable_guest_notification(struct virtio_net *dev,
> > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> > index 1a34867f3c..8e36f4c340 100644
> > --- a/lib/vhost/virtio_net.c
> > +++ b/lib/vhost/virtio_net.c
> > @@ -8,6 +8,7 @@
> >
> > #include <rte_mbuf.h>
> > #include <rte_memcpy.h>
> > +#include <rte_net.h>
> > #include <rte_ether.h>
> > #include <rte_ip.h>
> > #include <rte_vhost.h>
> > @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m,
> uint16_t
> > *l4_proto, void **l4_hdr)
> > }
> >
> > static __rte_always_inline void
> > -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> > +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct
> > rte_mbuf *m)
> > {
> > uint16_t l4_proto = 0;
> > void *l4_hdr = NULL;
> > struct rte_tcp_hdr *tcp_hdr = NULL;
> >
> > - if (hdr->flags == 0 && hdr->gso_type ==
> > VIRTIO_NET_HDR_GSO_NONE)
> > - return;
> > -
> > parse_ethernet(m, &l4_proto, &l4_hdr);
> > if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> > if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> > @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr
> > *hdr, struct rte_mbuf *m)
> > }
> > }
> >
> > +static __rte_always_inline void
> > +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
> > + bool legacy_ol_flags)
> > +{
> > + struct rte_net_hdr_lens hdr_lens;
> > + int l4_supported = 0;
> > + uint32_t ptype;
> > +
> > + if (hdr->flags == 0 && hdr->gso_type ==
> > VIRTIO_NET_HDR_GSO_NONE)
> > + return;
> > +
> > + if (legacy_ol_flags) {
> > + vhost_dequeue_offload_legacy(hdr, m);
> > + return;
> > + }
> > +
> > + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> > +
> > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> > + m->packet_type = ptype;
> > + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> > + l4_supported = 1;
> > +
> > + /* According to Virtio 1.1 spec, the device only needs to look at
> > + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission
> > path.
> > + * This differs from the processing incoming packets path where the
> > + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by
> > the
> > + * device.
> > + *
> > + * 5.1.6.2.1 Driver Requirements: Packet Transmission
> > + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID
> > and
> > + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
> > + *
> > + * 5.1.6.2.2 Device Requirements: Packet Transmission
> > + * The device MUST ignore flag bits that it does not recognize.
> > + */
> > + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> > + uint32_t hdrlen;
> > +
> > + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> > + if (hdr->csum_start <= hdrlen && l4_supported != 0) {
> > + m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> > + } else {
> > + /* Unknown proto or tunnel, do sw cksum. We can
> > assume
> > + * the cksum field is in the first segment since the
> > + * buffers we provided to the host are large enough.
> > + * In case of SCTP, this will be wrong since it's a CRC
> > + * but there's nothing we can do.
> > + */
> > + uint16_t csum = 0, off;
> > +
> > + if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> > + rte_pktmbuf_pkt_len(m) - hdr-
> > >csum_start, &csum) < 0)
> > + return;
> > + if (likely(csum != 0xffff))
> > + csum = ~csum;
> > + off = hdr->csum_offset + hdr->csum_start;
> > + if (rte_pktmbuf_data_len(m) >= off + 1)
> > + *rte_pktmbuf_mtod_offset(m, uint16_t *,
> > off) = csum;
> > + }
> > + }
> > +
> > + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> > + if (hdr->gso_size == 0)
> > + return;
> > +
> > + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> > + case VIRTIO_NET_HDR_GSO_TCPV4:
> > + case VIRTIO_NET_HDR_GSO_TCPV6:
> > + if ((ptype & RTE_PTYPE_L4_MASK) !=
> > RTE_PTYPE_L4_TCP)
> > + break;
> > + m->ol_flags |= PKT_RX_LRO |
> > PKT_RX_L4_CKSUM_NONE;
> > + m->tso_segsz = hdr->gso_size;
> > + break;
> > + case VIRTIO_NET_HDR_GSO_UDP:
> > + if ((ptype & RTE_PTYPE_L4_MASK) !=
> > RTE_PTYPE_L4_UDP)
> > + break;
> > + m->ol_flags |= PKT_RX_LRO |
> > PKT_RX_L4_CKSUM_NONE;
> > + m->tso_segsz = hdr->gso_size;
> > + break;
> > + default:
> > + break;
> > + }
> > + }
> > +}
> > +
> > static __rte_noinline void
> > copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
> > struct buf_vector *buf_vec)
> > @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct
> virtio_net_hdr
> > *hdr,
> > static __rte_always_inline int
> > copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > struct buf_vector *buf_vec, uint16_t nr_vec,
> > - struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> > + struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
> > + bool legacy_ol_flags)
> > {
> > uint32_t buf_avail, buf_offset;
> > uint64_t buf_addr, buf_len;
> > @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > m->pkt_len += mbuf_offset;
> >
> > if (hdr)
> > - vhost_dequeue_offload(hdr, m);
> > + vhost_dequeue_offload(hdr, m, legacy_ol_flags);
> >
> > out:
> >
> > @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> > *dev, struct rte_mempool *mp,
> > return pkt;
> > }
> >
> > -static __rte_noinline uint16_t
> > +__rte_always_inline
> > +static uint16_t
> > virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> > + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count,
> > + bool legacy_ol_flags)
> > {
> > uint16_t i;
> > uint16_t free_entries;
> > @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct
> > vhost_virtqueue *vq,
> > }
> >
> > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
> > - mbuf_pool);
> > + mbuf_pool, legacy_ol_flags);
> > if (unlikely(err)) {
> > rte_pktmbuf_free(pkts[i]);
> > if (!allocerr_warned) {
> > @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > return (i - dropped);
> > }
> >
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_split_legacy(struct virtio_net *dev,
> > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> > + struct rte_mbuf **pkts, uint16_t count)
> > +{
> > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
> > +}
> > +
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_split_compliant(struct virtio_net *dev,
> > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> > + struct rte_mbuf **pkts, uint16_t count)
> > +{
> > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
> > +}
> > +
> > static __rte_always_inline int
> > vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct
> > virtio_net *dev,
> > static __rte_always_inline int
> > virtio_dev_tx_batch_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > - struct rte_mbuf **pkts)
> > + struct rte_mbuf **pkts,
> > + bool legacy_ol_flags)
> > {
> > uint16_t avail_idx = vq->last_avail_idx;
> > uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net
> > *dev,
> > if (virtio_net_with_host_offload(dev)) {
> > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> > - vhost_dequeue_offload(hdr, pkts[i]);
> > + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
> > }
> > }
> >
> > @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct
> virtio_net
> > *dev,
> > struct rte_mempool *mbuf_pool,
> > struct rte_mbuf *pkts,
> > uint16_t *buf_id,
> > - uint16_t *desc_count)
> > + uint16_t *desc_count,
> > + bool legacy_ol_flags)
> > {
> > struct buf_vector buf_vec[BUF_VECTOR_MAX];
> > uint32_t buf_len;
> > @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct
> virtio_net
> > *dev,
> > }
> >
> > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
> > - mbuf_pool);
> > + mbuf_pool, legacy_ol_flags);
> > if (unlikely(err)) {
> > if (!allocerr_warned) {
> > VHOST_LOG_DATA(ERR,
> > @@ -2859,14 +2968,15 @@ static __rte_always_inline int
> > virtio_dev_tx_single_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > struct rte_mempool *mbuf_pool,
> > - struct rte_mbuf *pkts)
> > + struct rte_mbuf *pkts,
> > + bool legacy_ol_flags)
> > {
> >
> > uint16_t buf_id, desc_count = 0;
> > int ret;
> >
> > ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts,
> > &buf_id,
> > - &desc_count);
> > + &desc_count, legacy_ol_flags);
> >
> > if (likely(desc_count > 0)) {
> > if (virtio_net_is_inorder(dev))
> > @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct
> virtio_net
> > *dev,
> > return ret;
> > }
> >
> > -static __rte_noinline uint16_t
> > +__rte_always_inline
> > +static uint16_t
> > virtio_dev_tx_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *__rte_restrict vq,
> > struct rte_mempool *mbuf_pool,
> > struct rte_mbuf **__rte_restrict pkts,
> > - uint32_t count)
> > + uint32_t count,
> > + bool legacy_ol_flags)
> > {
> > uint32_t pkt_idx = 0;
> >
> > @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> >
> > if (count - pkt_idx >= PACKED_BATCH_SIZE) {
> > if (!virtio_dev_tx_batch_packed(dev, vq,
> > - &pkts[pkt_idx])) {
> > + &pkts[pkt_idx],
> > + legacy_ol_flags)) {
> > pkt_idx += PACKED_BATCH_SIZE;
> > continue;
> > }
> > }
> >
> > if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
> > - pkts[pkt_idx]))
> > + pkts[pkt_idx],
> > + legacy_ol_flags))
> > break;
> > pkt_idx++;
> > } while (pkt_idx < count);
> > @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> > return pkt_idx;
> > }
> >
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_packed_legacy(struct virtio_net *dev,
> > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> > *mbuf_pool,
> > + struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> > +{
> > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
> > +}
> > +
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_packed_compliant(struct virtio_net *dev,
> > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> > *mbuf_pool,
> > + struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> > +{
> > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
> > +}
> > +
> > uint16_t
> > rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> > @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> > queue_id,
> > count -= 1;
> > }
> >
> > - if (vq_is_packed(dev))
> > - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts,
> > count);
> > - else
> > - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
> > + if (vq_is_packed(dev)) {
> > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> > + count = virtio_dev_tx_packed_legacy(dev, vq,
> > mbuf_pool, pkts, count);
> > + else
> > + count = virtio_dev_tx_packed_compliant(dev, vq,
> > mbuf_pool, pkts, count);
> > + } else {
> > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> > + count = virtio_dev_tx_split_legacy(dev, vq,
> > mbuf_pool, pkts, count);
> > + else
> > + count = virtio_dev_tx_split_compliant(dev, vq,
> > mbuf_pool, pkts, count);
> > + }
> >
> > out:
> > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> > --
> > 2.23.0
On Wed, May 12, 2021 at 5:30 AM Wang, Yinan <yinan.wang@intel.com> wrote:
>
> Hi David,
>
> Since vhost tx offload can’t work now, we report a Bugzilla as below, could you help to take a look?
> https://bugs.dpdk.org/show_bug.cgi?id=702
(I discovered your mail from 05/08 only today, now that I got a new
mail, might be a pebcak from me, sorry...)
- Looking at the bz, there is a first issue/misconception.
testpmd only does TSO or any kind of tx offloading with the csum forward engine.
The iofwd engine won't make TSO possible.
- Let's say we use the csum fwd engine, testpmd configures drivers
through the ethdev API.
The ethdev API states that no offloading is enabled unless requested
by the application.
TSO, l3/l4 checksums offloading are documented as:
https://doc.dpdk.org/guides/nics/features.html#l3-checksum-offload
https://doc.dpdk.org/guides/nics/features.html#lro
But the vhost pmd does not report such capabilities.
https://git.dpdk.org/dpdk/tree/drivers/net/vhost/rte_eth_vhost.c#n1276
So we can't expect testpmd to have tso working with net/vhost pmd.
- The csum offloading engine swaps mac addresses.
I would expect issues with inter vm traffic.
In summary, I think this is a bad test.
If it worked with the commands in the bugzilla before my change (which
I doubt), it was wrong.
> We also tried vhost example with VM2VM iperf test, large pkts also can't forwarding.
"large pkts", can you give details?
I tried to use this example, without/with my change, but:
When I try to start this example with a physical port and two vhosts,
I get a crash (division by 0 on vdmq stuff).
When I start it without a physical port, I get a complaint about no
port being enabled.
Passing a portmask 0x1 seems to work, the example starts but, next, no
traffic is forwarded (not even arp).
Hooking gdb, I never get packet dequeued from vhost.
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 2021年5月12日 23:20
> To: Wang, Yinan <yinan.wang@intel.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com;
> olivier.matz@6wind.com; fbl@sysclose.org; i.maximets@ovn.org; Xia,
> Chenbo <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
>
> On Wed, May 12, 2021 at 5:30 AM Wang, Yinan <yinan.wang@intel.com>
> wrote:
> >
> > Hi David,
> >
> > Since vhost tx offload can’t work now, we report a Bugzilla as below, could
> you help to take a look?
> > https://bugs.dpdk.org/show_bug.cgi?id=702
>
> (I discovered your mail from 05/08 only today, now that I got a new
> mail, might be a pebcak from me, sorry...)
>
>
> - Looking at the bz, there is a first issue/misconception.
> testpmd only does TSO or any kind of tx offloading with the csum forward
> engine.
> The iofwd engine won't make TSO possible.
>
>
> - Let's say we use the csum fwd engine, testpmd configures drivers
> through the ethdev API.
> The ethdev API states that no offloading is enabled unless requested
> by the application.
> TSO, l3/l4 checksums offloading are documented as:
> https://doc.dpdk.org/guides/nics/features.html#l3-checksum-offload
> https://doc.dpdk.org/guides/nics/features.html#lro
>
> But the vhost pmd does not report such capabilities.
> https://git.dpdk.org/dpdk/tree/drivers/net/vhost/rte_eth_vhost.c#n1276
>
> So we can't expect testpmd to have tso working with net/vhost pmd.
>
>
> - The csum offloading engine swaps mac addresses.
> I would expect issues with inter vm traffic.
>
>
> In summary, I think this is a bad test.
> If it worked with the commands in the bugzilla before my change (which
> I doubt), it was wrong.
Thanks your kindly explanation.
Before this patch, vhost can declare tso offload, if we configure TSO/csum in Qemu, tso offload flags can be marked, such vm2vm can fwd large pkts (64k when using iperf) with iofwd.
Now I am understand this case will not work later, we can move to using vswitch.
>
> > We also tried vhost example with VM2VM iperf test, large pkts also can't
> forwarding.
>
> "large pkts", can you give details?
>
> I tried to use this example, without/with my change, but:
>
> When I try to start this example with a physical port and two vhosts,
> I get a crash (division by 0 on vdmq stuff).
> When I start it without a physical port, I get a complaint about no
> port being enabled.
> Passing a portmask 0x1 seems to work, the example starts but, next, no
> traffic is forwarded (not even arp).
> Hooking gdb, I never get packet dequeued from vhost.
I re-test with vswitch, vm2vm iperf test can work w/ and w/o this patch. Sorry for the wrong result about vhost example before.
There are some special configuration in vswitch sample. Test steps can work as below:
1. Modify the testpmd code as following::
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -29,7 +29,7 @@
#include "main.h"
#ifndef MAX_QUEUES
-#define MAX_QUEUES 128
+#define MAX_QUEUES 512
#endif
/* the maximum number of external ports supported */
2. Bind one physical ports to vfio-pci, launch dpdk-vhost by below command::
./dpdk-vhost -l 26-28 -n 4 -- -p 0x1 --mergeable 1 --vm2vm 1 --socket-file /tmp/vhost-net0 --socket-file /tmp/vhost-net1
3. Start VM0::
/home/qemu-install/qemu-4.2.1/bin/qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -smp 4 -m 4096 \
-object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
-numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04.img \
-chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
-device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
-monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
-netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \
-chardev socket,id=char0,path=/tmp/vhost-net0 \
-netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
-device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-modern=true,mrg_rxbuf=off,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :10
4. Start VM1::
/home/qemu-install/qemu-4.2.1/bin/qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -smp 4 -m 4096 \
-object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
-numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04-2.img \
-chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
-device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
-monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
-netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \
-chardev socket,id=char0,path=/tmp/vhost-net1 \
-netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
-device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-modern=true,mrg_rxbuf=off,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :12
5. On VM1, set virtio device IP and run arp protocal::
ifconfig ens5 1.1.1.2
arp -s 1.1.1.8 52:54:00:00:00:02
6. On VM2, set virtio device IP and run arp protocal::
ifconfig ens5 1.1.1.8
arp -s 1.1.1.2 52:54:00:00:00:01
7. Check the iperf performance with different packet size between two VMs by below commands::
Under VM1, run: `iperf -s -i 1`
Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60`
>
>
> --
> David Marchand
@@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions:
It is disabled by default.
+ - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
+
+ Since v16.04, the vhost library forwards checksum and gso requests for
+ packets received from a virtio driver by filling Tx offload metadata in
+ the mbuf. This behavior is inconsistent with other drivers but it is left
+ untouched for existing applications that might rely on it.
+
+ This flag disables the legacy behavior and instead ask vhost to simply
+ populate Rx offload metadata in the mbuf.
+
+ It is disabled by default.
+
* ``rte_vhost_driver_set_features(path, features)``
This function sets the feature bits the vhost-user driver supports. The
@@ -337,6 +337,12 @@ API Changes
``policer_action_recolor_supported`` and ``policer_action_drop_supported``
have been removed.
+* vhost: The vhost library currently populates received mbufs from a virtio
+ driver with Tx offload flags while not filling Rx offload flags.
+ While this behavior is arguable, it is kept untouched.
+ A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask
+ for a behavior compliant with to the mbuf offload API.
+
ABI Changes
-----------
@@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
int ret = 0;
char *iface_name;
uint16_t queues;
- uint64_t flags = 0;
+ uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
uint64_t disable_flags = 0;
int client_mode = 0;
int iommu_support = 0;
@@ -19,6 +19,7 @@
#include <rte_log.h>
#include <rte_string_fns.h>
#include <rte_malloc.h>
+#include <rte_net.h>
#include <rte_vhost.h>
#include <rte_ip.h>
#include <rte_tcp.h>
@@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
return 0;
}
-static uint16_t
-get_psd_sum(void *l3_hdr, uint64_t ol_flags)
-{
- if (ol_flags & PKT_TX_IPV4)
- return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
- else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
- return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
static void virtio_tx_offload(struct rte_mbuf *m)
{
+ struct rte_net_hdr_lens hdr_lens;
+ struct rte_ipv4_hdr *ipv4_hdr;
+ struct rte_tcp_hdr *tcp_hdr;
+ uint32_t ptype;
void *l3_hdr;
- struct rte_ipv4_hdr *ipv4_hdr = NULL;
- struct rte_tcp_hdr *tcp_hdr = NULL;
- struct rte_ether_hdr *eth_hdr =
- rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
- l3_hdr = (char *)eth_hdr + m->l2_len;
+ ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+ m->l2_len = hdr_lens.l2_len;
+ m->l3_len = hdr_lens.l3_len;
+ m->l4_len = hdr_lens.l4_len;
- if (m->ol_flags & PKT_TX_IPV4) {
+ l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
+ tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
+ m->l2_len + m->l3_len);
+
+ m->ol_flags |= PKT_TX_TCP_SEG;
+ if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
+ m->ol_flags |= PKT_TX_IPV4;
+ m->ol_flags |= PKT_TX_IP_CKSUM;
ipv4_hdr = l3_hdr;
ipv4_hdr->hdr_checksum = 0;
- m->ol_flags |= PKT_TX_IP_CKSUM;
+ tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags);
+ } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
+ m->ol_flags |= PKT_TX_IPV6;
+ tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags);
}
-
- tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
- tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
}
static __rte_always_inline void
@@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
m->vlan_tci = vlan_tag;
}
- if (m->ol_flags & PKT_TX_TCP_SEG)
+ if (m->ol_flags & PKT_RX_LRO)
virtio_tx_offload(m);
tx_q->m_table[tx_q->len++] = m;
@@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
int ret, i;
uint16_t portid;
static pthread_t tid;
- uint64_t flags = 0;
+ uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
signal(SIGINT, sigint_handler);
@@ -36,6 +36,7 @@ extern "C" {
/* support only linear buffers (no chained mbufs) */
#define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6)
#define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7)
+#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8)
/* Features. */
#ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
@@ -42,6 +42,7 @@ struct vhost_user_socket {
bool extbuf;
bool linearbuf;
bool async_copy;
+ bool net_compliant_ol_flags;
/*
* The "supported_features" indicates the feature bits the
@@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
size = strnlen(vsocket->path, PATH_MAX);
vhost_set_ifname(vid, vsocket->path, size);
- vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
+ vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
+ vsocket->net_compliant_ol_flags);
vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
@@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
+ vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
if (vsocket->async_copy &&
(flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
@@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
}
void
-vhost_set_builtin_virtio_net(int vid, bool enable)
+vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
{
struct virtio_net *dev = get_device(vid);
@@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
else
dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
+ if (!compliant_ol_flags)
+ dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
+ else
+ dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
}
void
@@ -27,15 +27,17 @@
#include "rte_vhost_async.h"
/* Used to indicate that the device is running on a data core */
-#define VIRTIO_DEV_RUNNING 1
+#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
/* Used to indicate that the device is ready to operate */
-#define VIRTIO_DEV_READY 2
+#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
/* Used to indicate that the built-in vhost net device backend is enabled */
-#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
+#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
/* Used to indicate that the device has its own data path and configured */
-#define VIRTIO_DEV_VDPA_CONFIGURED 8
+#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
/* Used to indicate that the feature negotiation failed */
-#define VIRTIO_DEV_FEATURES_FAILED 16
+#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
+/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
+#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
/* Backend value set by guest. */
#define VIRTIO_DEV_STOPPED -1
@@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
-void vhost_set_builtin_virtio_net(int vid, bool enable);
+void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
void vhost_enable_extbuf(int vid);
void vhost_enable_linearbuf(int vid);
int vhost_enable_guest_notification(struct virtio_net *dev,
@@ -8,6 +8,7 @@
#include <rte_mbuf.h>
#include <rte_memcpy.h>
+#include <rte_net.h>
#include <rte_ether.h>
#include <rte_ip.h>
#include <rte_vhost.h>
@@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
}
static __rte_always_inline void
-vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
+vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
{
uint16_t l4_proto = 0;
void *l4_hdr = NULL;
struct rte_tcp_hdr *tcp_hdr = NULL;
- if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
- return;
-
parse_ethernet(m, &l4_proto, &l4_hdr);
if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
if (hdr->csum_start == (m->l2_len + m->l3_len)) {
@@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
}
}
+static __rte_always_inline void
+vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
+ bool legacy_ol_flags)
+{
+ struct rte_net_hdr_lens hdr_lens;
+ int l4_supported = 0;
+ uint32_t ptype;
+
+ if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+ return;
+
+ if (legacy_ol_flags) {
+ vhost_dequeue_offload_legacy(hdr, m);
+ return;
+ }
+
+ m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+ ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+ m->packet_type = ptype;
+ if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+ (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+ (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+ l4_supported = 1;
+
+ /* According to Virtio 1.1 spec, the device only needs to look at
+ * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path.
+ * This differs from the processing incoming packets path where the
+ * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the
+ * device.
+ *
+ * 5.1.6.2.1 Driver Requirements: Packet Transmission
+ * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and
+ * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
+ *
+ * 5.1.6.2.2 Device Requirements: Packet Transmission
+ * The device MUST ignore flag bits that it does not recognize.
+ */
+ if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+ uint32_t hdrlen;
+
+ hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+ if (hdr->csum_start <= hdrlen && l4_supported != 0) {
+ m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+ } else {
+ /* Unknown proto or tunnel, do sw cksum. We can assume
+ * the cksum field is in the first segment since the
+ * buffers we provided to the host are large enough.
+ * In case of SCTP, this will be wrong since it's a CRC
+ * but there's nothing we can do.
+ */
+ uint16_t csum = 0, off;
+
+ if (rte_raw_cksum_mbuf(m, hdr->csum_start,
+ rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0)
+ return;
+ if (likely(csum != 0xffff))
+ csum = ~csum;
+ off = hdr->csum_offset + hdr->csum_start;
+ if (rte_pktmbuf_data_len(m) >= off + 1)
+ *rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum;
+ }
+ }
+
+ if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+ if (hdr->gso_size == 0)
+ return;
+
+ switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+ case VIRTIO_NET_HDR_GSO_TCPV4:
+ case VIRTIO_NET_HDR_GSO_TCPV6:
+ if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP)
+ break;
+ m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+ m->tso_segsz = hdr->gso_size;
+ break;
+ case VIRTIO_NET_HDR_GSO_UDP:
+ if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP)
+ break;
+ m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+ m->tso_segsz = hdr->gso_size;
+ break;
+ default:
+ break;
+ }
+ }
+}
+
static __rte_noinline void
copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
struct buf_vector *buf_vec)
@@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
static __rte_always_inline int
copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
struct buf_vector *buf_vec, uint16_t nr_vec,
- struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
+ struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
+ bool legacy_ol_flags)
{
uint32_t buf_avail, buf_offset;
uint64_t buf_addr, buf_len;
@@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
m->pkt_len += mbuf_offset;
if (hdr)
- vhost_dequeue_offload(hdr, m);
+ vhost_dequeue_offload(hdr, m, legacy_ol_flags);
out:
@@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
return pkt;
}
-static __rte_noinline uint16_t
+__rte_always_inline
+static uint16_t
virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
- struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+ struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+ bool legacy_ol_flags)
{
uint16_t i;
uint16_t free_entries;
@@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
}
err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
- mbuf_pool);
+ mbuf_pool, legacy_ol_flags);
if (unlikely(err)) {
rte_pktmbuf_free(pkts[i]);
if (!allocerr_warned) {
@@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
return (i - dropped);
}
+__rte_noinline
+static uint16_t
+virtio_dev_tx_split_legacy(struct virtio_net *dev,
+ struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+ struct rte_mbuf **pkts, uint16_t count)
+{
+ return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_split_compliant(struct virtio_net *dev,
+ struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+ struct rte_mbuf **pkts, uint16_t count)
+{
+ return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
+}
+
static __rte_always_inline int
vhost_reserve_avail_batch_packed(struct virtio_net *dev,
struct vhost_virtqueue *vq,
@@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev,
static __rte_always_inline int
virtio_dev_tx_batch_packed(struct virtio_net *dev,
struct vhost_virtqueue *vq,
- struct rte_mbuf **pkts)
+ struct rte_mbuf **pkts,
+ bool legacy_ol_flags)
{
uint16_t avail_idx = vq->last_avail_idx;
uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
@@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev,
if (virtio_net_with_host_offload(dev)) {
vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
- vhost_dequeue_offload(hdr, pkts[i]);
+ vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
}
}
@@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
struct rte_mempool *mbuf_pool,
struct rte_mbuf *pkts,
uint16_t *buf_id,
- uint16_t *desc_count)
+ uint16_t *desc_count,
+ bool legacy_ol_flags)
{
struct buf_vector buf_vec[BUF_VECTOR_MAX];
uint32_t buf_len;
@@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
}
err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
- mbuf_pool);
+ mbuf_pool, legacy_ol_flags);
if (unlikely(err)) {
if (!allocerr_warned) {
VHOST_LOG_DATA(ERR,
@@ -2859,14 +2968,15 @@ static __rte_always_inline int
virtio_dev_tx_single_packed(struct virtio_net *dev,
struct vhost_virtqueue *vq,
struct rte_mempool *mbuf_pool,
- struct rte_mbuf *pkts)
+ struct rte_mbuf *pkts,
+ bool legacy_ol_flags)
{
uint16_t buf_id, desc_count = 0;
int ret;
ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id,
- &desc_count);
+ &desc_count, legacy_ol_flags);
if (likely(desc_count > 0)) {
if (virtio_net_is_inorder(dev))
@@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev,
return ret;
}
-static __rte_noinline uint16_t
+__rte_always_inline
+static uint16_t
virtio_dev_tx_packed(struct virtio_net *dev,
struct vhost_virtqueue *__rte_restrict vq,
struct rte_mempool *mbuf_pool,
struct rte_mbuf **__rte_restrict pkts,
- uint32_t count)
+ uint32_t count,
+ bool legacy_ol_flags)
{
uint32_t pkt_idx = 0;
@@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
if (count - pkt_idx >= PACKED_BATCH_SIZE) {
if (!virtio_dev_tx_batch_packed(dev, vq,
- &pkts[pkt_idx])) {
+ &pkts[pkt_idx],
+ legacy_ol_flags)) {
pkt_idx += PACKED_BATCH_SIZE;
continue;
}
}
if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
- pkts[pkt_idx]))
+ pkts[pkt_idx],
+ legacy_ol_flags))
break;
pkt_idx++;
} while (pkt_idx < count);
@@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
return pkt_idx;
}
+__rte_noinline
+static uint16_t
+virtio_dev_tx_packed_legacy(struct virtio_net *dev,
+ struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
+ struct rte_mbuf **__rte_restrict pkts, uint32_t count)
+{
+ return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_packed_compliant(struct virtio_net *dev,
+ struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
+ struct rte_mbuf **__rte_restrict pkts, uint32_t count)
+{
+ return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
+}
+
uint16_t
rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
@@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
count -= 1;
}
- if (vq_is_packed(dev))
- count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count);
- else
- count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
+ if (vq_is_packed(dev)) {
+ if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+ count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count);
+ else
+ count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count);
+ } else {
+ if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+ count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count);
+ else
+ count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count);
+ }
out:
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))