[5/5] vhost: fix offload flags in Rx path

Message ID 20210401095243.18211-6-david.marchand@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Offload flags fixes |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS

Commit Message

David Marchand April 1, 2021, 9:52 a.m. UTC
  The vhost library current configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags.

The vhost example needs a reworking as it was built with the assumption
that mbuf TSO configuration is set up by the vhost library.
This is not done in this patch for now so TSO activation is forcibly
refused.

Fixes: 859b480d5afd ("vhost: add guest offload setting")

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 examples/vhost/main.c         |   6 ++
 lib/librte_vhost/virtio_net.c | 148 ++++++++++++++--------------------
 2 files changed, 67 insertions(+), 87 deletions(-)
  

Comments

Olivier Matz April 8, 2021, 8:28 a.m. UTC | #1
Hi David,

On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags.
> 
> The vhost example needs a reworking as it was built with the assumption
> that mbuf TSO configuration is set up by the vhost library.
> This is not done in this patch for now so TSO activation is forcibly
> refused.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

LGTM, just one little comment below.

<...>

> +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->packet_type = ptype;
> +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> +		l4_supported = 1;
> +
> +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> +		if (hdr->csum_start <= hdrlen && l4_supported) {
> +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> +		} else {
> +			/* Unknown proto or tunnel, do sw cksum. We can assume
> +			 * the cksum field is in the first segment since the
> +			 * buffers we provided to the host are large enough.
> +			 * In case of SCTP, this will be wrong since it's a CRC
> +			 * but there's nothing we can do.
> +			 */
> +			uint16_t csum = 0, off;
> +
> +			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> +				rte_pktmbuf_pkt_len(m) - hdr->csum_start,
> +				&csum) < 0)
> +				return -EINVAL;
> +			if (likely(csum != 0xffff))
> +				csum = ~csum;

I was trying to remember the reason for this last test (which is also
present in net/virtio).

If this is a UDP checksum (on top of an unrecognized tunnel), it's
indeed needed to do that, because we don't want to set the checksum to 0
in the packet (which means "no checksum" for UDPv4, or is fordidden for
UDPv6).

If this is something else than UDP, it shouldn't hurt to have a 0xffff in the
packet instead of 0.

Maybe it deserves a comment here, like:

  /* avoid 0 checksum for UDP, shouldn't hurt for other protocols */

What do you think?
  
Flavio Leitner April 8, 2021, 6:38 p.m. UTC | #2
On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags.
> 
> The vhost example needs a reworking as it was built with the assumption
> that mbuf TSO configuration is set up by the vhost library.
> This is not done in this patch for now so TSO activation is forcibly
> refused.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")

There is change that before ECN was ignored and now it is invalid.
I think that's the right way to go, but not sure if virtio blocks
the negotiation of that feature.

Reviewed-by: Flavio Leitner <fbl@sysclose.org>

fbl
  
Maxime Coquelin April 13, 2021, 3:27 p.m. UTC | #3
On 4/8/21 8:38 PM, Flavio Leitner wrote:
> On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
>> The vhost library current configures Tx offloading (PKT_TX_*) on any
>> packet received from a guest virtio device which asks for some offloading.
>>
>> This is problematic, as Tx offloading is something that the application
>> must ask for: the application needs to configure devices
>> to support every used offloads (ip, tcp checksumming, tso..), and the
>> various l2/l3/l4 lengths must be set following any processing that
>> happened in the application itself.
>>
>> On the other hand, the received packets are not marked wrt current
>> packet l3/l4 checksumming info.
>>
>> Copy virtio rx processing to fix those offload flags.
>>
>> The vhost example needs a reworking as it was built with the assumption
>> that mbuf TSO configuration is set up by the vhost library.
>> This is not done in this patch for now so TSO activation is forcibly
>> refused.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> There is change that before ECN was ignored and now it is invalid.
> I think that's the right way to go, but not sure if virtio blocks
> the negotiation of that feature.

No, I just tested and the feature gets negotiated.

Disabling it in Vhost lib should be avoided to avoid breaking
live-migration.

It might be safer to revert back to older behavior for it, i.e. just
ignore the bit. I don't think it is ever set, because otherwise we would
have had lots of reports since the Vhost log would be flooded with:

VHOST_LOG_DATA(WARNING,
	"unsupported gso type %u.\n", hdr->gso_type);

David, what do you think?

> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> 
> fbl
>
  
David Marchand April 27, 2021, 5:09 p.m. UTC | #4
On Tue, Apr 13, 2021 at 5:27 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 4/8/21 8:38 PM, Flavio Leitner wrote:
> > On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> >> The vhost library current configures Tx offloading (PKT_TX_*) on any
> >> packet received from a guest virtio device which asks for some offloading.
> >>
> >> This is problematic, as Tx offloading is something that the application
> >> must ask for: the application needs to configure devices
> >> to support every used offloads (ip, tcp checksumming, tso..), and the
> >> various l2/l3/l4 lengths must be set following any processing that
> >> happened in the application itself.
> >>
> >> On the other hand, the received packets are not marked wrt current
> >> packet l3/l4 checksumming info.
> >>
> >> Copy virtio rx processing to fix those offload flags.
> >>
> >> The vhost example needs a reworking as it was built with the assumption
> >> that mbuf TSO configuration is set up by the vhost library.
> >> This is not done in this patch for now so TSO activation is forcibly
> >> refused.
> >>
> >> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >
> > There is change that before ECN was ignored and now it is invalid.
> > I think that's the right way to go, but not sure if virtio blocks
> > the negotiation of that feature.
>
> No, I just tested and the feature gets negotiated.

I suppose you tested with testpmd, because I can see ECN is disabled
by default with OVS.


>
> Disabling it in Vhost lib should be avoided to avoid breaking
> live-migration.
>
> It might be safer to revert back to older behavior for it, i.e. just
> ignore the bit. I don't think it is ever set, because otherwise we would
> have had lots of reports since the Vhost log would be flooded with:

-  The VIRTIO_NET_HDR_GSO_ECN bit is supposed to be coupled with TSO bits.
Copying a bit more of this code:
   switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
...
   default:
>
> VHOST_LOG_DATA(WARNING,
>         "unsupported gso type %u.\n", hdr->gso_type);

The absence of log does not mean the guest is not sending packets with
VIRTIO_NET_HDR_GSO_ECN set.
Otoh, getting this log instead indicates a bug in the virtio driver
(as we discussed offlist).


- It is not clear to me how deployed the ECN feature is.
I think the Linux kernel won't try to start a TCP connection unless
explicitly configuring it on a socket (but I am a bit lost).

By default, VIRTIO_NET_F_HOST_ECN is announced as supported by vhost-user.
So in theory, a guest virtio netdevice with NETIF_F_TSO_ECN can
transmit packet (with SKB_GSO_TCP_ECN translated to
VIRTIO_NET_HDR_GSO_ECN in virtio_net_hdr_from_skb) to a vhost-user
backend.


- Treating ECN with GSO requires special handling:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0da8537037f337103348f239ad901477e907aa8

I can see some change in the i40e kernel driver at least.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=059dab69652da3525d320d77ac5422ec708ced14
The ixgbe kernel driver is not flagged with NETIF_F_TSO_ECN.

We don't have such a distinction in DPDK: neither a per mbuf flag to
mark packets, nor a device offloading flag/capability.
And the rte_gso library probably does not handle correctly CWR.
About the i40e driver, I can't find the same configuration than the
kernel driver.



- Now, about the next step...

The "good" (I suppose you might disagree here) news, is that this
feature is disabled in OVS:
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L5162

About handling TSO + ECN, this is a generic problem with the DPDK API
and we have been living for a long time.
I understand passing such packets to hw that does not handle this
correctly breaks the ECN feature not work properly.
But "normal" TSO works.

I agree, we can let such packets be received by vhost like it was done
before my patch.

Investigating the other side (GUEST_ECN + the virtio pmd) could be
worth later, as I think GSO+ECN packets are dropped in the current
code.
  
David Marchand April 27, 2021, 5:19 p.m. UTC | #5
On Tue, Apr 27, 2021 at 7:09 PM David Marchand
<david.marchand@redhat.com> wrote:
> Investigating the other side (GUEST_ECN + the virtio pmd) could be
> worth later, as I think GSO+ECN packets are dropped in the current
> code.

Errr, but that would be a problem only for vhost-kernel -> virtio pmd.
Not sure this is a usecase we care about.
  

Patch

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 2ca7d98c58..819cd9909f 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -607,6 +607,12 @@  us_vhost_parse_args(int argc, char **argv)
 				us_vhost_usage(prgname);
 				return -1;
 			}
+			/* FIXME: tso support is broken */
+			if (ret != 0) {
+				RTE_LOG(INFO, VHOST_CONFIG, "TSO support is broken\n");
+				us_vhost_usage(prgname);
+				return -1;
+			}
 			enable_tso = ret;
 			break;
 
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 583bf379c6..06089a4206 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -8,6 +8,7 @@ 
 
 #include <rte_mbuf.h>
 #include <rte_memcpy.h>
+#include <rte_net.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
 #include <rte_vhost.h>
@@ -1821,105 +1822,75 @@  virtio_net_with_host_offload(struct virtio_net *dev)
 	return false;
 }
 
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
-{
-	struct rte_ipv4_hdr *ipv4_hdr;
-	struct rte_ipv6_hdr *ipv6_hdr;
-	void *l3_hdr = NULL;
-	struct rte_ether_hdr *eth_hdr;
-	uint16_t ethertype;
-
-	eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
-
-	m->l2_len = sizeof(struct rte_ether_hdr);
-	ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
-
-	if (ethertype == RTE_ETHER_TYPE_VLAN) {
-		struct rte_vlan_hdr *vlan_hdr =
-			(struct rte_vlan_hdr *)(eth_hdr + 1);
-
-		m->l2_len += sizeof(struct rte_vlan_hdr);
-		ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
-	}
-
-	l3_hdr = (char *)eth_hdr + m->l2_len;
-
-	switch (ethertype) {
-	case RTE_ETHER_TYPE_IPV4:
-		ipv4_hdr = l3_hdr;
-		*l4_proto = ipv4_hdr->next_proto_id;
-		m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
-		*l4_hdr = (char *)l3_hdr + m->l3_len;
-		m->ol_flags |= PKT_TX_IPV4;
-		break;
-	case RTE_ETHER_TYPE_IPV6:
-		ipv6_hdr = l3_hdr;
-		*l4_proto = ipv6_hdr->proto;
-		m->l3_len = sizeof(struct rte_ipv6_hdr);
-		*l4_hdr = (char *)l3_hdr + m->l3_len;
-		m->ol_flags |= PKT_TX_IPV6;
-		break;
-	default:
-		m->l3_len = 0;
-		*l4_proto = 0;
-		*l4_hdr = NULL;
-		break;
-	}
-}
-
-static __rte_always_inline void
+static __rte_always_inline int
 vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 {
-	uint16_t l4_proto = 0;
-	void *l4_hdr = NULL;
-	struct rte_tcp_hdr *tcp_hdr = NULL;
+	struct rte_net_hdr_lens hdr_lens;
+	uint32_t hdrlen, ptype;
+	int l4_supported = 0;
 
+	/* nothing to do */
 	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
-		return;
-
-	parse_ethernet(m, &l4_proto, &l4_hdr);
-	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
-			switch (hdr->csum_offset) {
-			case (offsetof(struct rte_tcp_hdr, cksum)):
-				if (l4_proto == IPPROTO_TCP)
-					m->ol_flags |= PKT_TX_TCP_CKSUM;
-				break;
-			case (offsetof(struct rte_udp_hdr, dgram_cksum)):
-				if (l4_proto == IPPROTO_UDP)
-					m->ol_flags |= PKT_TX_UDP_CKSUM;
-				break;
-			case (offsetof(struct rte_sctp_hdr, cksum)):
-				if (l4_proto == IPPROTO_SCTP)
-					m->ol_flags |= PKT_TX_SCTP_CKSUM;
-				break;
-			default:
-				break;
-			}
+		return 0;
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum = 0, off;
+
+			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
+				rte_pktmbuf_pkt_len(m) - hdr->csum_start,
+				&csum) < 0)
+				return -EINVAL;
+			if (likely(csum != 0xffff))
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *,
+					off) = csum;
 		}
+	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
+		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
 	}
 
-	if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+	/* GSO request, save required information in mbuf */
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		/* Check unsupported modes */
+		if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
+		    (hdr->gso_size == 0)) {
+			return -EINVAL;
+		}
+
+		/* Update mss lengths in mbuf */
+		m->tso_segsz = hdr->gso_size;
 		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 		case VIRTIO_NET_HDR_GSO_TCPV6:
-			tcp_hdr = l4_hdr;
-			m->ol_flags |= PKT_TX_TCP_SEG;
-			m->tso_segsz = hdr->gso_size;
-			m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
-			break;
-		case VIRTIO_NET_HDR_GSO_UDP:
-			m->ol_flags |= PKT_TX_UDP_SEG;
-			m->tso_segsz = hdr->gso_size;
-			m->l4_len = sizeof(struct rte_udp_hdr);
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
 			break;
 		default:
-			VHOST_LOG_DATA(WARNING,
-				"unsupported gso type %u.\n", hdr->gso_type);
-			break;
+			return -EINVAL;
 		}
 	}
+
+	return 0;
 }
 
 static __rte_noinline void
@@ -2078,8 +2049,11 @@  copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(hdr, m);
+	if (hdr && vhost_dequeue_offload(hdr, m) < 0) {
+		VHOST_LOG_DATA(ERR, "Packet with invalid offloads.\n");
+		error = -1;
+		goto out;
+	}
 
 out: