[v3] vhost: add header check in dequeue offload
Checks
Commit Message
When parsing the virtio net header and packet header for dequeue offload,
we need to perform sanity check on the packet header to ensure:
- No out-of-boundary memory access.
- The packet header and virtio_net header are valid and aligned.
Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v3:
- Check data_len before calling rte_pktmbuf_mtod. (David)
v2:
- Allow empty L4 payload for cksum offload. (Konstantin)
---
lib/librte_vhost/virtio_net.c | 49 +++++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 6 deletions(-)
Comments
On Wed, Mar 17, 2021 at 7:50 AM Xiao Wang <xiao.w.wang@intel.com> wrote:
>
> When parsing the virtio net header and packet header for dequeue offload,
> we need to perform sanity check on the packet header to ensure:
> - No out-of-boundary memory access.
> - The packet header and virtio_net header are valid and aligned.
>
> Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
I spent some time digging on this topic.
Afaiu the offload API, vhost is not supposed to populate tx offloads.
I would drop this whole parse_ethernet function and replace
vhost_dequeue_offload with what virtio does on the rx side.
Please have a look at this series (especially the last patch):
http://patchwork.dpdk.org/project/dpdk/list/?series=16052
Thanks.
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, April 1, 2021 8:04 PM
> To: Wang, Xiao W <xiao.w.wang@intel.com>
> Cc: Xia, Chenbo <chenbo.xia@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Liu, Yong <yong.liu@intel.com>; dev
> <dev@dpdk.org>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> dpdk stable <stable@dpdk.org>
> Subject: Re: [PATCH v3] vhost: add header check in dequeue offload
>
> On Wed, Mar 17, 2021 at 7:50 AM Xiao Wang <xiao.w.wang@intel.com>
> wrote:
> >
> > When parsing the virtio net header and packet header for dequeue offload,
> > we need to perform sanity check on the packet header to ensure:
> > - No out-of-boundary memory access.
> > - The packet header and virtio_net header are valid and aligned.
> >
> > Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
>
> I spent some time digging on this topic.
>
> Afaiu the offload API, vhost is not supposed to populate tx offloads.
> I would drop this whole parse_ethernet function and replace
> vhost_dequeue_offload with what virtio does on the rx side.
>
> Please have a look at this series (especially the last patch):
> http://patchwork.dpdk.org/project/dpdk/list/?series=16052
>
>
> Thanks.
>
> --
> David Marchand
+Yang ,Yi into this loop who may have comments especially from OVS perspective on CKSUM/TSO/TSO in tunnel/etc..
I think the original vhost implementation here is to help pass virtio's offload request onto the next output port, either physical device or a virtio device.
If we go with series http://patchwork.dpdk.org/project/dpdk/list/?series=16052, then virtual switch need to do an extra translation on the flags:
e.g. PKT_RX_LRO --> PKT_TX_TCP_SEG. The question is that a packet marked with PKT_RX_LRO may come from different types of ports (non-vhost), how vSwitch can tell if TSO request should be set for this packet at transmission?
If I think from an endpoint app's perspective, I'm inclined to agree with your series. If I think from a switch/router's perspective, I'm inclined to keep the current implementation. Maybe we can add PKT_RX_L4_CKSUM_NONE/PKT_RX_LRO flags into the current implementation, seems this method can cover both scenarios.
BRs,
Xiao
Hi,
> -----Original Message-----
> From: Wang, Xiao W
> Sent: Friday, April 2, 2021 4:39 PM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Xia, Chenbo <Chenbo.Xia@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Liu, Yong <yong.liu@intel.com>; dev
> <dev@dpdk.org>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> dpdk stable <stable@dpdk.org>; yangyi01@inspur.com
> Subject: RE: [PATCH v3] vhost: add header check in dequeue offload
>
>
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Thursday, April 1, 2021 8:04 PM
> > To: Wang, Xiao W <xiao.w.wang@intel.com>
> > Cc: Xia, Chenbo <chenbo.xia@intel.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; Liu, Yong <yong.liu@intel.com>; dev
> > <dev@dpdk.org>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > dpdk stable <stable@dpdk.org>
> > Subject: Re: [PATCH v3] vhost: add header check in dequeue offload
> >
> > On Wed, Mar 17, 2021 at 7:50 AM Xiao Wang <xiao.w.wang@intel.com>
> > wrote:
> > >
> > > When parsing the virtio net header and packet header for dequeue
> offload,
> > > we need to perform sanity check on the packet header to ensure:
> > > - No out-of-boundary memory access.
> > > - The packet header and virtio_net header are valid and aligned.
> > >
> > > Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> >
> > I spent some time digging on this topic.
> >
> > Afaiu the offload API, vhost is not supposed to populate tx offloads.
> > I would drop this whole parse_ethernet function and replace
> > vhost_dequeue_offload with what virtio does on the rx side.
> >
> > Please have a look at this series (especially the last patch):
> > http://patchwork.dpdk.org/project/dpdk/list/?series=16052
> >
> >
> > Thanks.
> >
> > --
> > David Marchand
>
> +Yang ,Yi into this loop who may have comments especially from OVS
> perspective on CKSUM/TSO/TSO in tunnel/etc..
>
> I think the original vhost implementation here is to help pass virtio's offload
> request onto the next output port, either physical device or a virtio device.
> If we go with series
> http://patchwork.dpdk.org/project/dpdk/list/?series=16052, then virtual
> switch need to do an extra translation on the flags:
> e.g. PKT_RX_LRO --> PKT_TX_TCP_SEG. The question is that a packet
> marked with PKT_RX_LRO may come from different types of ports (non-
> vhost), how vSwitch can tell if TSO request should be set for this packet at
> transmission?
>
> If I think from an endpoint app's perspective, I'm inclined to agree with your
> series. If I think from a switch/router's perspective, I'm inclined to keep the
> current implementation. Maybe we can add
> PKT_RX_L4_CKSUM_NONE/PKT_RX_LRO flags into the current
> implementation, seems this method can cover both scenarios.
>
> BRs,
> Xiao
>
>
Considering the major consumer of vhost API is virtual switch/router, I tend to keep the current implementation and apply this fix patch.
Any comments?
BRs,
Xiao
On Mon, Apr 12, 2021 at 11:09 AM Wang, Xiao W <xiao.w.wang@intel.com> wrote:
> Considering the major consumer of vhost API is virtual switch/router, I tend to keep the current implementation and apply this fix patch.
> Any comments?
This is just a hack that bypasses the vswitch control.
It happens to work when the vswitch does nothing.
If anything is done, like popping a vlan header, the vswitch needs to
update l3 offset.
On 4/12/21 11:33 AM, David Marchand wrote:
> On Mon, Apr 12, 2021 at 11:09 AM Wang, Xiao W <xiao.w.wang@intel.com> wrote:
>> Considering the major consumer of vhost API is virtual switch/router, I tend to keep the current implementation and apply this fix patch.
>> Any comments?
>
> This is just a hack that bypasses the vswitch control.
>
> It happens to work when the vswitch does nothing.
> If anything is done, like popping a vlan header, the vswitch needs to
> update l3 offset.
>
>
I agree with David, current behavior is wrong.
Furthermore, when the lib is used via the Vhost PMD, the application
should not have to handle it differently on whether it is Vhost PMD or
any physical NIC PMD.
Hi Maxime and David,
I see patch " vhost: fix offload flags in Rx path " http://patches.dpdk.org/project/dpdk/patch/20210503164344.27916-4-david.marchand@redhat.com/ has been merged, and the legacy implementation is kept. Do you think we still need to fix the header check for the legacy implementation?
BRs,
Xiao
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, April 13, 2021 10:31 PM
> To: David Marchand <david.marchand@redhat.com>; Wang, Xiao W
> <xiao.w.wang@intel.com>
> Cc: Xia, Chenbo <chenbo.xia@intel.com>; Liu, Yong <yong.liu@intel.com>;
> dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dpdk stable <stable@dpdk.org>;
> yangyi01@inspur.com
> Subject: Re: [PATCH v3] vhost: add header check in dequeue offload
>
>
>
> On 4/12/21 11:33 AM, David Marchand wrote:
> > On Mon, Apr 12, 2021 at 11:09 AM Wang, Xiao W
> <xiao.w.wang@intel.com> wrote:
> >> Considering the major consumer of vhost API is virtual switch/router, I
> tend to keep the current implementation and apply this fix patch.
> >> Any comments?
> >
> > This is just a hack that bypasses the vswitch control.
> >
> > It happens to work when the vswitch does nothing.
> > If anything is done, like popping a vlan header, the vswitch needs to
> > update l3 offset.
> >
> >
>
> I agree with David, current behavior is wrong.
>
> Furthermore, when the lib is used via the Vhost PMD, the application
> should not have to handle it differently on whether it is Vhost PMD or
> any physical NIC PMD.
@@ -1821,44 +1821,64 @@ virtio_net_with_host_offload(struct virtio_net *dev)
return false;
}
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
+static int
+parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
+ uint16_t *len)
{
struct rte_ipv4_hdr *ipv4_hdr;
struct rte_ipv6_hdr *ipv6_hdr;
void *l3_hdr = NULL;
struct rte_ether_hdr *eth_hdr;
uint16_t ethertype;
+ uint16_t data_len = m->data_len;
+
+ if (data_len <= sizeof(struct rte_ether_hdr))
+ return -EINVAL;
eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
m->l2_len = sizeof(struct rte_ether_hdr);
ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
+ data_len -= sizeof(struct rte_ether_hdr);
if (ethertype == RTE_ETHER_TYPE_VLAN) {
+ if (data_len <= sizeof(struct rte_vlan_hdr))
+ return -EINVAL;
+
struct rte_vlan_hdr *vlan_hdr =
(struct rte_vlan_hdr *)(eth_hdr + 1);
m->l2_len += sizeof(struct rte_vlan_hdr);
ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
+ data_len -= sizeof(struct rte_vlan_hdr);
}
l3_hdr = (char *)eth_hdr + m->l2_len;
switch (ethertype) {
case RTE_ETHER_TYPE_IPV4:
+ if (data_len <= sizeof(struct rte_ipv4_hdr))
+ return -EINVAL;
ipv4_hdr = l3_hdr;
*l4_proto = ipv4_hdr->next_proto_id;
m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
+ if (data_len <= m->l3_len) {
+ m->l3_len = 0;
+ return -EINVAL;
+ }
*l4_hdr = (char *)l3_hdr + m->l3_len;
m->ol_flags |= PKT_TX_IPV4;
+ data_len -= m->l3_len;
break;
case RTE_ETHER_TYPE_IPV6:
+ if (data_len <= sizeof(struct rte_ipv6_hdr))
+ return -EINVAL;
ipv6_hdr = l3_hdr;
*l4_proto = ipv6_hdr->proto;
m->l3_len = sizeof(struct rte_ipv6_hdr);
*l4_hdr = (char *)l3_hdr + m->l3_len;
m->ol_flags |= PKT_TX_IPV6;
+ data_len -= m->l3_len;
break;
default:
m->l3_len = 0;
@@ -1866,6 +1886,9 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
*l4_hdr = NULL;
break;
}
+
+ *len = data_len;
+ return 0;
}
static __rte_always_inline void
@@ -1874,24 +1897,30 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
uint16_t l4_proto = 0;
void *l4_hdr = NULL;
struct rte_tcp_hdr *tcp_hdr = NULL;
+ uint16_t len = 0;
if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
return;
- parse_ethernet(m, &l4_proto, &l4_hdr);
+ if (parse_ethernet(m, &l4_proto, &l4_hdr, &len) < 0)
+ return;
+
if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
if (hdr->csum_start == (m->l2_len + m->l3_len)) {
switch (hdr->csum_offset) {
case (offsetof(struct rte_tcp_hdr, cksum)):
- if (l4_proto == IPPROTO_TCP)
+ if (l4_proto == IPPROTO_TCP &&
+ len >= sizeof(struct rte_tcp_hdr))
m->ol_flags |= PKT_TX_TCP_CKSUM;
break;
case (offsetof(struct rte_udp_hdr, dgram_cksum)):
- if (l4_proto == IPPROTO_UDP)
+ if (l4_proto == IPPROTO_UDP &&
+ len >= sizeof(struct rte_udp_hdr))
m->ol_flags |= PKT_TX_UDP_CKSUM;
break;
case (offsetof(struct rte_sctp_hdr, cksum)):
- if (l4_proto == IPPROTO_SCTP)
+ if (l4_proto == IPPROTO_SCTP &&
+ len >= sizeof(struct rte_sctp_hdr))
m->ol_flags |= PKT_TX_SCTP_CKSUM;
break;
default:
@@ -1904,12 +1933,20 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
case VIRTIO_NET_HDR_GSO_TCPV4:
case VIRTIO_NET_HDR_GSO_TCPV6:
+ if (l4_proto != IPPROTO_TCP ||
+ len <= sizeof(struct rte_tcp_hdr))
+ break;
tcp_hdr = l4_hdr;
+ if (len <= (tcp_hdr->data_off & 0xf0) >> 2)
+ break;
m->ol_flags |= PKT_TX_TCP_SEG;
m->tso_segsz = hdr->gso_size;
m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ if (l4_proto != IPPROTO_UDP ||
+ len <= sizeof(struct rte_udp_hdr))
+ break;
m->ol_flags |= PKT_TX_UDP_SEG;
m->tso_segsz = hdr->gso_size;
m->l4_len = sizeof(struct rte_udp_hdr);