[v3] gso: add VXLAN UDP GSO support

Message ID 20201116011112.13676-1-yang_y_yi@163.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v3] gso: add VXLAN UDP GSO support |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Functional fail Functional Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed

Commit Message

yang_y_yi Nov. 16, 2020, 1:11 a.m. UTC
  From: Yi Yang <yangyi01@inspur.com>

Many NICs can't offload VXLAN UFO, so it is very important
to do VXLAN UDP GSO by software to improve VM-to-VM UDP
performance, especially for the case that VM MTU is just
1500 but not 9000.

With this enabled in DPDK, OVS DPDK can leverage it to
improve VM-to-VM UDP performance, performance gain is very
huge, over 2 times.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
---
Changelog:

v2 -> v3:
  - Correct gso type check for UDP TSO.

v1 -> v2:
  - Remove condition check for outer udp header because it
    is always true for VXLAN.
  - Remove inner udp header update because it is wrong and
    unnecessary.

---
 .../generic_segmentation_offload_lib.rst           | 14 ++--
 doc/guides/rel_notes/release_20_11.rst             |  4 +
 lib/librte_gso/gso_common.h                        |  5 ++
 lib/librte_gso/gso_tunnel_udp4.c                   | 97 ++++++++++++++++++++++
 lib/librte_gso/gso_tunnel_udp4.h                   | 44 ++++++++++
 lib/librte_gso/meson.build                         |  2 +-
 lib/librte_gso/rte_gso.c                           |  8 ++
 7 files changed, 166 insertions(+), 8 deletions(-)
 create mode 100644 lib/librte_gso/gso_tunnel_udp4.c
 create mode 100644 lib/librte_gso/gso_tunnel_udp4.h
  

Comments

Hu, Jiayu Nov. 19, 2020, 5:37 a.m. UTC | #1
> -----Original Message-----
> From: yang_y_yi@163.com <yang_y_yi@163.com>
> Sent: Monday, November 16, 2020 9:11 AM
> To: dev@dpdk.org
> Cc: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; thomas@monjalon.net;
> yangyi01@inspur.com; yang_y_yi@163.com
> Subject: [PATCH v3] gso: add VXLAN UDP GSO support
> 
> From: Yi Yang <yangyi01@inspur.com>
> 
> Many NICs can't offload VXLAN UFO, so it is very important
> to do VXLAN UDP GSO by software to improve VM-to-VM UDP
> performance, especially for the case that VM MTU is just
> 1500 but not 9000.
> 
> With this enabled in DPDK, OVS DPDK can leverage it to
> improve VM-to-VM UDP performance, performance gain is very
> huge, over 2 times.

GSO means software segmentation, so we don't say "VXLAN UDP GSO
by software".

I think it's better to make the commit message and log more accurate.
For example:
gso: add VxLAN UDP/IPv4 support

As some NICs do not support segmentation for VxLAN-encapsulated
UDP/IPv4 packets, this patch adds VxLAN UDP/IPv4 GSO support.
With leveraging VxLAN UDP/IPv4 GSO, OVS DPDK can significantly
improve performance for VxLAN UDP/IPv4 traffic.

Thanks,
Jiayu

> 
> Signed-off-by: Yi Yang <yangyi01@inspur.com>
> ---
> Changelog:
> 
> v2 -> v3:
>   - Correct gso type check for UDP TSO.
> 
> v1 -> v2:
>   - Remove condition check for outer udp header because it
>     is always true for VXLAN.
>   - Remove inner udp header update because it is wrong and
>     unnecessary.
> 
> ---
>  .../generic_segmentation_offload_lib.rst           | 14 ++--
>  doc/guides/rel_notes/release_20_11.rst             |  4 +
>  lib/librte_gso/gso_common.h                        |  5 ++
>  lib/librte_gso/gso_tunnel_udp4.c                   | 97 ++++++++++++++++++++++
>  lib/librte_gso/gso_tunnel_udp4.h                   | 44 ++++++++++
>  lib/librte_gso/meson.build                         |  2 +-
>  lib/librte_gso/rte_gso.c                           |  8 ++
>  7 files changed, 166 insertions(+), 8 deletions(-)
>  create mode 100644 lib/librte_gso/gso_tunnel_udp4.c
>  create mode 100644 lib/librte_gso/gso_tunnel_udp4.h
> 
> diff --git a/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
> b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
> index ad91c6e..c1d06e1 100644
> --- a/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
> +++ b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
> @@ -46,7 +46,7 @@ Limitations
>   - TCP
>   - UDP
>   - VxLAN
> - - GRE
> + - GRE TCP
> 
>    See `Supported GSO Packet Types`_ for further details.
> 
> @@ -157,14 +157,14 @@ does not modify it during segmentation. Therefore,
> after UDP GSO, only the
>  first output packet has the original UDP header, and others just have l2
>  and l3 headers.
> 
> -VxLAN GSO
> -~~~~~~~~~
> +VxLAN IPv4 GSO
> +~~~~~~~~~~~~~~
>  VxLAN packets GSO supports segmentation of suitably large VxLAN packets,
> -which contain an outer IPv4 header, inner TCP/IPv4 headers, and optional
> -inner and/or outer VLAN tag(s).
> +which contain an outer IPv4 header, inner TCP/IPv4 or UDP/IPv4 headers,
> and
> +optional inner and/or outer VLAN tag(s).
> 
> -GRE GSO
> -~~~~~~~
> +GRE TCP/IPv4 GSO
> +~~~~~~~~~~~~~~~~
>  GRE GSO supports segmentation of suitably large GRE packets, which
> contain
>  an outer IPv4 header, inner TCP/IPv4 headers, and an optional VLAN tag.
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index 24cedba..bd13c5f 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -422,6 +422,10 @@ New Features
>    leverage IOAT DMA channel with vhost asynchronous APIs.
>    See the :doc:`../sample_app_ug/vhost` for more details.
> 
> +* **Added VxLAN UDP/IPv4 GSO support.**
> +
> +  Added inner UDP/IPv4 support for VxLAN IPv4 GSO.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
> index a0b8343..4d5f303 100644
> --- a/lib/librte_gso/gso_common.h
> +++ b/lib/librte_gso/gso_common.h
> @@ -26,6 +26,11 @@
>  		(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
>  		 PKT_TX_TUNNEL_VXLAN))
> 
> +#define IS_IPV4_VXLAN_UDP4(flag) (((flag) & (PKT_TX_UDP_SEG |
> PKT_TX_IPV4 | \
> +				PKT_TX_OUTER_IPV4 |
> PKT_TX_TUNNEL_MASK)) == \
> +		(PKT_TX_UDP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
> +		 PKT_TX_TUNNEL_VXLAN))
> +
>  #define IS_IPV4_GRE_TCP4(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4
> | \
>  				PKT_TX_OUTER_IPV4 |
> PKT_TX_TUNNEL_MASK)) == \
>  		(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
> diff --git a/lib/librte_gso/gso_tunnel_udp4.c
> b/lib/librte_gso/gso_tunnel_udp4.c
> new file mode 100644
> index 0000000..1fc7a8d
> --- /dev/null
> +++ b/lib/librte_gso/gso_tunnel_udp4.c
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Inspur Corporation
> + */
> +
> +#include "gso_common.h"
> +#include "gso_tunnel_udp4.h"
> +
> +#define IPV4_HDR_MF_BIT (1U << 13)
> +
> +static void
> +update_tunnel_ipv4_udp_headers(struct rte_mbuf *pkt, struct rte_mbuf
> **segs,
> +			       uint16_t nb_segs)
> +{
> +	struct rte_ipv4_hdr *ipv4_hdr;
> +	uint16_t outer_id, inner_id, tail_idx, i, length;
> +	uint16_t outer_ipv4_offset, inner_ipv4_offset;
> +	uint16_t outer_udp_offset;
> +	uint16_t frag_offset = 0, is_mf;
> +
> +	outer_ipv4_offset = pkt->outer_l2_len;
> +	outer_udp_offset = outer_ipv4_offset + pkt->outer_l3_len;
> +	inner_ipv4_offset = outer_udp_offset + pkt->l2_len;
> +
> +	/* Outer IPv4 header. */
> +	ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			outer_ipv4_offset);
> +	outer_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +
> +	/* Inner IPv4 header. */
> +	ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			inner_ipv4_offset);
> +	inner_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +
> +	tail_idx = nb_segs - 1;
> +
> +	for (i = 0; i < nb_segs; i++) {
> +		update_ipv4_header(segs[i], outer_ipv4_offset, outer_id);
> +		update_udp_header(segs[i], outer_udp_offset);
> +		update_ipv4_header(segs[i], inner_ipv4_offset, inner_id);
> +		/* For the case inner packet is UDP, we must keep UDP
> +		 * datagram boundary, it must be handled as IP fragment.
> +		 *
> +		 * Set IP fragment offset for inner IP header.
> +		 */
> +		ipv4_hdr = (struct rte_ipv4_hdr *)
> +			(rte_pktmbuf_mtod(segs[i], char *) +
> +				inner_ipv4_offset);
> +		is_mf = i < tail_idx ? IPV4_HDR_MF_BIT : 0;
> +		ipv4_hdr->fragment_offset =
> +			rte_cpu_to_be_16(frag_offset | is_mf);
> +		length = segs[i]->pkt_len - inner_ipv4_offset - pkt->l3_len;
> +		frag_offset += (length >> 3);
> +		outer_id++;
> +	}
> +}
> +
> +int
> +gso_tunnel_udp4_segment(struct rte_mbuf *pkt,
> +		uint16_t gso_size,
> +		struct rte_mempool *direct_pool,
> +		struct rte_mempool *indirect_pool,
> +		struct rte_mbuf **pkts_out,
> +		uint16_t nb_pkts_out)
> +{
> +	struct rte_ipv4_hdr *inner_ipv4_hdr;
> +	uint16_t pyld_unit_size, hdr_offset, frag_off;
> +	int ret;
> +
> +	hdr_offset = pkt->outer_l2_len + pkt->outer_l3_len + pkt->l2_len;
> +	inner_ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> char *) +
> +			hdr_offset);
> +	/*
> +	 * Don't process the packet whose MF bit or offset in the inner
> +	 * IPv4 header are non-zero.
> +	 */
> +	frag_off = rte_be_to_cpu_16(inner_ipv4_hdr->fragment_offset);
> +	if (unlikely(IS_FRAGMENTED(frag_off)))
> +		return 0;
> +
> +	hdr_offset += pkt->l3_len;
> +	/* Don't process the packet without data */
> +	if ((hdr_offset + pkt->l4_len) >= pkt->pkt_len)
> +		return 0;
> +
> +	/* pyld_unit_size must be a multiple of 8 because frag_off
> +	 * uses 8 bytes as unit.
> +	 */
> +	pyld_unit_size = (gso_size - hdr_offset) & ~7U;
> +
> +	/* Segment the payload */
> +	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
> +			indirect_pool, pkts_out, nb_pkts_out);
> +	if (ret > 1)
> +		update_tunnel_ipv4_udp_headers(pkt, pkts_out, ret);
> +
> +	return ret;
> +}
> diff --git a/lib/librte_gso/gso_tunnel_udp4.h
> b/lib/librte_gso/gso_tunnel_udp4.h
> new file mode 100644
> index 0000000..c49b43f
> --- /dev/null
> +++ b/lib/librte_gso/gso_tunnel_udp4.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Inspur Corporation
> + */
> +
> +#ifndef _GSO_TUNNEL_UDP4_H_
> +#define _GSO_TUNNEL_UDP4_H_
> +
> +#include <stdint.h>
> +#include <rte_mbuf.h>
> +
> +/**
> + * Segment a tunneling packet with inner UDP/IPv4 headers. This function
> + * does not check if the input packet has correct checksums, and does not
> + * update checksums for output GSO segments. Furthermore, it does not
> + * process IP fragment packets.
> + *
> + * @param pkt
> + *  The packet mbuf to segment.
> + * @param gso_size
> + *  The max length of a GSO segment, measured in bytes.
> + * @param direct_pool
> + *  MBUF pool used for allocating direct buffers for output segments.
> + * @param indirect_pool
> + *  MBUF pool used for allocating indirect buffers for output segments.
> + * @param pkts_out
> + *  Pointer array used to store the MBUF addresses of output GSO
> + *  segments, when it succeeds. If the memory space in pkts_out is
> + *  insufficient, it fails and returns -EINVAL.
> + * @param nb_pkts_out
> + *  The max number of items that 'pkts_out' can keep.
> + *
> + * @return
> + *   - The number of GSO segments filled in pkts_out on success.
> + *   - Return 0 if it needn't GSO.
> + *   - Return -ENOMEM if run out of memory in MBUF pools.
> + *   - Return -EINVAL for invalid parameters.
> + */
> +int gso_tunnel_udp4_segment(struct rte_mbuf *pkt,
> +		uint16_t gso_size,
> +		struct rte_mempool *direct_pool,
> +		struct rte_mempool *indirect_pool,
> +		struct rte_mbuf **pkts_out,
> +		uint16_t nb_pkts_out);
> +#endif
> diff --git a/lib/librte_gso/meson.build b/lib/librte_gso/meson.build
> index ad8dd85..05904f2 100644
> --- a/lib/librte_gso/meson.build
> +++ b/lib/librte_gso/meson.build
> @@ -2,6 +2,6 @@
>  # Copyright(c) 2017 Intel Corporation
> 
>  sources = files('gso_common.c', 'gso_tcp4.c', 'gso_udp4.c',
> - 		'gso_tunnel_tcp4.c', 'rte_gso.c')
> +		'gso_tunnel_tcp4.c', 'gso_tunnel_udp4.c', 'rte_gso.c')
>  headers = files('rte_gso.h')
>  deps += ['ethdev']
> diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> index 896350e..0d02ec3 100644
> --- a/lib/librte_gso/rte_gso.c
> +++ b/lib/librte_gso/rte_gso.c
> @@ -11,6 +11,7 @@
>  #include "gso_common.h"
>  #include "gso_tcp4.h"
>  #include "gso_tunnel_tcp4.h"
> +#include "gso_tunnel_udp4.h"
>  #include "gso_udp4.h"
> 
>  #define ILLEGAL_UDP_GSO_CTX(ctx) \
> @@ -60,6 +61,13 @@
>  		ret = gso_tunnel_tcp4_segment(pkt, gso_size, ipid_delta,
>  				direct_pool, indirect_pool,
>  				pkts_out, nb_pkts_out);
> +	} else if (IS_IPV4_VXLAN_UDP4(pkt->ol_flags) &&
> +			(gso_ctx->gso_types &
> DEV_TX_OFFLOAD_VXLAN_TNL_TSO) &&
> +			(gso_ctx->gso_types & DEV_TX_OFFLOAD_UDP_TSO))
> {
> +		pkt->ol_flags &= (~PKT_TX_UDP_SEG);
> +		ret = gso_tunnel_udp4_segment(pkt, gso_size,
> +				direct_pool, indirect_pool,
> +				pkts_out, nb_pkts_out);
>  	} else if (IS_IPV4_TCP(pkt->ol_flags) &&
>  			(gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO))
> {
>  		pkt->ol_flags &= (~PKT_TX_TCP_SEG);
> --
> 1.8.3.1
  

Patch

diff --git a/doc/guides/prog_guide/generic_segmentation_offload_lib.rst b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
index ad91c6e..c1d06e1 100644
--- a/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
+++ b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
@@ -46,7 +46,7 @@  Limitations
  - TCP
  - UDP
  - VxLAN
- - GRE
+ - GRE TCP
 
   See `Supported GSO Packet Types`_ for further details.
 
@@ -157,14 +157,14 @@  does not modify it during segmentation. Therefore, after UDP GSO, only the
 first output packet has the original UDP header, and others just have l2
 and l3 headers.
 
-VxLAN GSO
-~~~~~~~~~
+VxLAN IPv4 GSO
+~~~~~~~~~~~~~~
 VxLAN packets GSO supports segmentation of suitably large VxLAN packets,
-which contain an outer IPv4 header, inner TCP/IPv4 headers, and optional
-inner and/or outer VLAN tag(s).
+which contain an outer IPv4 header, inner TCP/IPv4 or UDP/IPv4 headers, and
+optional inner and/or outer VLAN tag(s).
 
-GRE GSO
-~~~~~~~
+GRE TCP/IPv4 GSO
+~~~~~~~~~~~~~~~~
 GRE GSO supports segmentation of suitably large GRE packets, which contain
 an outer IPv4 header, inner TCP/IPv4 headers, and an optional VLAN tag.
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 24cedba..bd13c5f 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -422,6 +422,10 @@  New Features
   leverage IOAT DMA channel with vhost asynchronous APIs.
   See the :doc:`../sample_app_ug/vhost` for more details.
 
+* **Added VxLAN UDP/IPv4 GSO support.**
+
+  Added inner UDP/IPv4 support for VxLAN IPv4 GSO.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index a0b8343..4d5f303 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -26,6 +26,11 @@ 
 		(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
 		 PKT_TX_TUNNEL_VXLAN))
 
+#define IS_IPV4_VXLAN_UDP4(flag) (((flag) & (PKT_TX_UDP_SEG | PKT_TX_IPV4 | \
+				PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_MASK)) == \
+		(PKT_TX_UDP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
+		 PKT_TX_TUNNEL_VXLAN))
+
 #define IS_IPV4_GRE_TCP4(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4 | \
 				PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_MASK)) == \
 		(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
diff --git a/lib/librte_gso/gso_tunnel_udp4.c b/lib/librte_gso/gso_tunnel_udp4.c
new file mode 100644
index 0000000..1fc7a8d
--- /dev/null
+++ b/lib/librte_gso/gso_tunnel_udp4.c
@@ -0,0 +1,97 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Inspur Corporation
+ */
+
+#include "gso_common.h"
+#include "gso_tunnel_udp4.h"
+
+#define IPV4_HDR_MF_BIT (1U << 13)
+
+static void
+update_tunnel_ipv4_udp_headers(struct rte_mbuf *pkt, struct rte_mbuf **segs,
+			       uint16_t nb_segs)
+{
+	struct rte_ipv4_hdr *ipv4_hdr;
+	uint16_t outer_id, inner_id, tail_idx, i, length;
+	uint16_t outer_ipv4_offset, inner_ipv4_offset;
+	uint16_t outer_udp_offset;
+	uint16_t frag_offset = 0, is_mf;
+
+	outer_ipv4_offset = pkt->outer_l2_len;
+	outer_udp_offset = outer_ipv4_offset + pkt->outer_l3_len;
+	inner_ipv4_offset = outer_udp_offset + pkt->l2_len;
+
+	/* Outer IPv4 header. */
+	ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			outer_ipv4_offset);
+	outer_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+
+	/* Inner IPv4 header. */
+	ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			inner_ipv4_offset);
+	inner_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+
+	tail_idx = nb_segs - 1;
+
+	for (i = 0; i < nb_segs; i++) {
+		update_ipv4_header(segs[i], outer_ipv4_offset, outer_id);
+		update_udp_header(segs[i], outer_udp_offset);
+		update_ipv4_header(segs[i], inner_ipv4_offset, inner_id);
+		/* For the case inner packet is UDP, we must keep UDP
+		 * datagram boundary, it must be handled as IP fragment.
+		 *
+		 * Set IP fragment offset for inner IP header.
+		 */
+		ipv4_hdr = (struct rte_ipv4_hdr *)
+			(rte_pktmbuf_mtod(segs[i], char *) +
+				inner_ipv4_offset);
+		is_mf = i < tail_idx ? IPV4_HDR_MF_BIT : 0;
+		ipv4_hdr->fragment_offset =
+			rte_cpu_to_be_16(frag_offset | is_mf);
+		length = segs[i]->pkt_len - inner_ipv4_offset - pkt->l3_len;
+		frag_offset += (length >> 3);
+		outer_id++;
+	}
+}
+
+int
+gso_tunnel_udp4_segment(struct rte_mbuf *pkt,
+		uint16_t gso_size,
+		struct rte_mempool *direct_pool,
+		struct rte_mempool *indirect_pool,
+		struct rte_mbuf **pkts_out,
+		uint16_t nb_pkts_out)
+{
+	struct rte_ipv4_hdr *inner_ipv4_hdr;
+	uint16_t pyld_unit_size, hdr_offset, frag_off;
+	int ret;
+
+	hdr_offset = pkt->outer_l2_len + pkt->outer_l3_len + pkt->l2_len;
+	inner_ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			hdr_offset);
+	/*
+	 * Don't process the packet whose MF bit or offset in the inner
+	 * IPv4 header are non-zero.
+	 */
+	frag_off = rte_be_to_cpu_16(inner_ipv4_hdr->fragment_offset);
+	if (unlikely(IS_FRAGMENTED(frag_off)))
+		return 0;
+
+	hdr_offset += pkt->l3_len;
+	/* Don't process the packet without data */
+	if ((hdr_offset + pkt->l4_len) >= pkt->pkt_len)
+		return 0;
+
+	/* pyld_unit_size must be a multiple of 8 because frag_off
+	 * uses 8 bytes as unit.
+	 */
+	pyld_unit_size = (gso_size - hdr_offset) & ~7U;
+
+	/* Segment the payload */
+	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
+			indirect_pool, pkts_out, nb_pkts_out);
+	if (ret > 1)
+		update_tunnel_ipv4_udp_headers(pkt, pkts_out, ret);
+
+	return ret;
+}
diff --git a/lib/librte_gso/gso_tunnel_udp4.h b/lib/librte_gso/gso_tunnel_udp4.h
new file mode 100644
index 0000000..c49b43f
--- /dev/null
+++ b/lib/librte_gso/gso_tunnel_udp4.h
@@ -0,0 +1,44 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Inspur Corporation
+ */
+
+#ifndef _GSO_TUNNEL_UDP4_H_
+#define _GSO_TUNNEL_UDP4_H_
+
+#include <stdint.h>
+#include <rte_mbuf.h>
+
+/**
+ * Segment a tunneling packet with inner UDP/IPv4 headers. This function
+ * does not check if the input packet has correct checksums, and does not
+ * update checksums for output GSO segments. Furthermore, it does not
+ * process IP fragment packets.
+ *
+ * @param pkt
+ *  The packet mbuf to segment.
+ * @param gso_size
+ *  The max length of a GSO segment, measured in bytes.
+ * @param direct_pool
+ *  MBUF pool used for allocating direct buffers for output segments.
+ * @param indirect_pool
+ *  MBUF pool used for allocating indirect buffers for output segments.
+ * @param pkts_out
+ *  Pointer array used to store the MBUF addresses of output GSO
+ *  segments, when it succeeds. If the memory space in pkts_out is
+ *  insufficient, it fails and returns -EINVAL.
+ * @param nb_pkts_out
+ *  The max number of items that 'pkts_out' can keep.
+ *
+ * @return
+ *   - The number of GSO segments filled in pkts_out on success.
+ *   - Return 0 if it needn't GSO.
+ *   - Return -ENOMEM if run out of memory in MBUF pools.
+ *   - Return -EINVAL for invalid parameters.
+ */
+int gso_tunnel_udp4_segment(struct rte_mbuf *pkt,
+		uint16_t gso_size,
+		struct rte_mempool *direct_pool,
+		struct rte_mempool *indirect_pool,
+		struct rte_mbuf **pkts_out,
+		uint16_t nb_pkts_out);
+#endif
diff --git a/lib/librte_gso/meson.build b/lib/librte_gso/meson.build
index ad8dd85..05904f2 100644
--- a/lib/librte_gso/meson.build
+++ b/lib/librte_gso/meson.build
@@ -2,6 +2,6 @@ 
 # Copyright(c) 2017 Intel Corporation
 
 sources = files('gso_common.c', 'gso_tcp4.c', 'gso_udp4.c',
- 		'gso_tunnel_tcp4.c', 'rte_gso.c')
+		'gso_tunnel_tcp4.c', 'gso_tunnel_udp4.c', 'rte_gso.c')
 headers = files('rte_gso.h')
 deps += ['ethdev']
diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
index 896350e..0d02ec3 100644
--- a/lib/librte_gso/rte_gso.c
+++ b/lib/librte_gso/rte_gso.c
@@ -11,6 +11,7 @@ 
 #include "gso_common.h"
 #include "gso_tcp4.h"
 #include "gso_tunnel_tcp4.h"
+#include "gso_tunnel_udp4.h"
 #include "gso_udp4.h"
 
 #define ILLEGAL_UDP_GSO_CTX(ctx) \
@@ -60,6 +61,13 @@ 
 		ret = gso_tunnel_tcp4_segment(pkt, gso_size, ipid_delta,
 				direct_pool, indirect_pool,
 				pkts_out, nb_pkts_out);
+	} else if (IS_IPV4_VXLAN_UDP4(pkt->ol_flags) &&
+			(gso_ctx->gso_types & DEV_TX_OFFLOAD_VXLAN_TNL_TSO) &&
+			(gso_ctx->gso_types & DEV_TX_OFFLOAD_UDP_TSO)) {
+		pkt->ol_flags &= (~PKT_TX_UDP_SEG);
+		ret = gso_tunnel_udp4_segment(pkt, gso_size,
+				direct_pool, indirect_pool,
+				pkts_out, nb_pkts_out);
 	} else if (IS_IPV4_TCP(pkt->ol_flags) &&
 			(gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO)) {
 		pkt->ol_flags &= (~PKT_TX_TCP_SEG);