[dpdk-dev,v3,2/5] gso: add TCP/IPv4 GSO support

Message ID 1505184211-36728-3-git-send-email-jiayu.hu@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Hu, Jiayu Sept. 12, 2017, 2:43 a.m. UTC
  This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
packets have correct checksums, and doesn't update checksums for output
packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
---
 lib/librte_eal/common/include/rte_log.h |   1 +
 lib/librte_gso/Makefile                 |   2 +
 lib/librte_gso/gso_common.c             | 202 ++++++++++++++++++++++++++++++++
 lib/librte_gso/gso_common.h             | 113 ++++++++++++++++++
 lib/librte_gso/gso_tcp4.c               |  83 +++++++++++++
 lib/librte_gso/gso_tcp4.h               |  76 ++++++++++++
 lib/librte_gso/rte_gso.c                |  41 ++++++-
 7 files changed, 515 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_gso/gso_common.c
 create mode 100644 lib/librte_gso/gso_common.h
 create mode 100644 lib/librte_gso/gso_tcp4.c
 create mode 100644 lib/librte_gso/gso_tcp4.h
  

Comments

Ananyev, Konstantin Sept. 12, 2017, 11:17 a.m. UTC | #1
Hi Jayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Tuesday, September 12, 2017 3:43 AM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> This patch adds GSO support for TCP/IPv4 packets. Supported packets
> may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
> packets have correct checksums, and doesn't update checksums for output
> packets (the responsibility for this lies with the application).

Probably it shouldn't say that checksum have to be valid, right?
As you don't update checksum(s) inside the lib - it probably doesn't matter.

> Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.
> 
> TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
> MBUF, to organize an output packet. Note that we refer to these two
> chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
> header, while the indirect mbuf simply points to a location within the
> original packet's payload. Consequently, use of the GSO library requires
> multi-segment MBUF support in the TX functions of the NIC driver.
> 
> If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
> result, when all of its GSOed segments are freed, the packet is freed
> automatically.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> ---
>  lib/librte_eal/common/include/rte_log.h |   1 +
>  lib/librte_gso/Makefile                 |   2 +
>  lib/librte_gso/gso_common.c             | 202 ++++++++++++++++++++++++++++++++
>  lib/librte_gso/gso_common.h             | 113 ++++++++++++++++++
>  lib/librte_gso/gso_tcp4.c               |  83 +++++++++++++
>  lib/librte_gso/gso_tcp4.h               |  76 ++++++++++++
>  lib/librte_gso/rte_gso.c                |  41 ++++++-
>  7 files changed, 515 insertions(+), 3 deletions(-)
>  create mode 100644 lib/librte_gso/gso_common.c
>  create mode 100644 lib/librte_gso/gso_common.h
>  create mode 100644 lib/librte_gso/gso_tcp4.c
>  create mode 100644 lib/librte_gso/gso_tcp4.h
> 
> diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h
> index ec8dba7..2fa1199 100644
> --- a/lib/librte_eal/common/include/rte_log.h
> +++ b/lib/librte_eal/common/include/rte_log.h
> @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs;
>  #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
>  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
>  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> +#define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> 
>  /* these log types can be used in an application */
>  #define RTE_LOGTYPE_USER1     24 /**< User-defined log type 1. */
> diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
> index aeaacbc..2be64d1 100644
> --- a/lib/librte_gso/Makefile
> +++ b/lib/librte_gso/Makefile
> @@ -42,6 +42,8 @@ LIBABIVER := 1
> 
>  #source files
>  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
> 
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
> diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
> new file mode 100644
> index 0000000..7c32e03
> --- /dev/null
> +++ b/lib/librte_gso/gso_common.c
> @@ -0,0 +1,202 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <stdbool.h>
> +#include <errno.h>
> +
> +#include <rte_memcpy.h>
> +#include <rte_mempool.h>
> +#include <rte_ether.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "gso_common.h"
> +
> +static inline void
> +hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
> +		uint16_t pkt_hdr_offset)
> +{
> +	/* Copy MBUF metadata */
> +	hdr_segment->nb_segs = 1;
> +	hdr_segment->port = pkt->port;
> +	hdr_segment->ol_flags = pkt->ol_flags;
> +	hdr_segment->packet_type = pkt->packet_type;
> +	hdr_segment->pkt_len = pkt_hdr_offset;
> +	hdr_segment->data_len = pkt_hdr_offset;
> +	hdr_segment->tx_offload = pkt->tx_offload;
> +
> +	/* Copy the packet header */
> +	rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *),
> +			rte_pktmbuf_mtod(pkt, char *),
> +			pkt_hdr_offset);
> +}
> +
> +static inline void
> +free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts)
> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < nb_pkts; i++)
> +		rte_pktmbuf_free(pkts[i]);
> +}
> +
> +int
> +gso_do_segment(struct rte_mbuf *pkt,
> +		uint16_t pkt_hdr_offset,
> +		uint16_t pyld_unit_size,
> +		struct rte_mempool *direct_pool,
> +		struct rte_mempool *indirect_pool,
> +		struct rte_mbuf **pkts_out,
> +		uint16_t nb_pkts_out)
> +{
> +	struct rte_mbuf *pkt_in;
> +	struct rte_mbuf *hdr_segment, *pyld_segment, *prev_segment;
> +	uint16_t pkt_in_data_pos, segment_bytes_remaining;
> +	uint16_t pyld_len, nb_segs;
> +	bool more_in_pkt, more_out_segs;
> +
> +	pkt_in = pkt;
> +	nb_segs = 0;
> +	more_in_pkt = 1;
> +	pkt_in_data_pos = pkt_hdr_offset;
> +
> +	while (more_in_pkt) {
> +		if (unlikely(nb_segs >= nb_pkts_out)) {
> +			free_gso_segment(pkts_out, nb_segs);
> +			return -EINVAL;
> +		}
> +
> +		/* Allocate a direct MBUF */
> +		hdr_segment = rte_pktmbuf_alloc(direct_pool);
> +		if (unlikely(hdr_segment == NULL)) {
> +			free_gso_segment(pkts_out, nb_segs);
> +			return -ENOMEM;
> +		}
> +		/* Fill the packet header */
> +		hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset);
> +
> +		prev_segment = hdr_segment;
> +		segment_bytes_remaining = pyld_unit_size;
> +		more_out_segs = 1;
> +
> +		while (more_out_segs && more_in_pkt) {
> +			/* Allocate an indirect MBUF */
> +			pyld_segment = rte_pktmbuf_alloc(indirect_pool);
> +			if (unlikely(pyld_segment == NULL)) {
> +				rte_pktmbuf_free(hdr_segment);
> +				free_gso_segment(pkts_out, nb_segs);
> +				return -ENOMEM;
> +			}
> +			/* Attach to current MBUF segment of pkt */
> +			rte_pktmbuf_attach(pyld_segment, pkt_in);
> +
> +			prev_segment->next = pyld_segment;
> +			prev_segment = pyld_segment;
> +
> +			pyld_len = segment_bytes_remaining;
> +			if (pyld_len + pkt_in_data_pos > pkt_in->data_len)
> +				pyld_len = pkt_in->data_len - pkt_in_data_pos;
> +
> +			pyld_segment->data_off = pkt_in_data_pos +
> +				pkt_in->data_off;
> +			pyld_segment->data_len = pyld_len;
> +
> +			/* Update header segment */
> +			hdr_segment->pkt_len += pyld_len;
> +			hdr_segment->nb_segs++;
> +
> +			pkt_in_data_pos += pyld_len;
> +			segment_bytes_remaining -= pyld_len;
> +
> +			/* Finish processing a MBUF segment of pkt */
> +			if (pkt_in_data_pos == pkt_in->data_len) {
> +				pkt_in = pkt_in->next;
> +				pkt_in_data_pos = 0;
> +				if (pkt_in == NULL)
> +					more_in_pkt = 0;
> +			}
> +
> +			/* Finish generating a GSO segment */
> +			if (segment_bytes_remaining == 0)
> +				more_out_segs = 0;
> +		}
> +		pkts_out[nb_segs++] = hdr_segment;
> +	}
> +	return nb_segs;
> +}
> +
> +static inline void
> +update_inner_tcp4_header(struct rte_mbuf *pkt, uint8_t ipid_delta,
> +		struct rte_mbuf **segs, uint16_t nb_segs)
> +{
> +	struct tcp_hdr *tcp_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct rte_mbuf *seg;
> +	uint32_t sent_seq;
> +	uint16_t inner_l2_offset;
> +	uint16_t id, i;
> +
> +	inner_l2_offset = pkt->outer_l2_len + pkt->outer_l3_len + pkt->l2_len;

Shouldn't it be: pkt->l2_len here?
Or probably even better to pass l2_len as an input parameter.

> +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			inner_l2_offset);
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> +	id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	for (i = 0; i < nb_segs; i++) {
> +		seg = segs[i];
> +		/* Update the inner IPv4 header */
> +		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(seg, char *) +
> +				inner_l2_offset);
> +		ipv4_hdr->total_length = rte_cpu_to_be_16(seg->pkt_len -
> +				inner_l2_offset);
> +		ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
> +		id += ipid_delta;
> +
> +		/* Update the inner TCP header */
> +		tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + seg->l3_len);
> +		tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq);
> +		if (likely(i < nb_segs - 1))
> +			tcp_hdr->tcp_flags &= (~(TCP_HDR_PSH_MASK |
> +						TCP_HDR_FIN_MASK));
> +		sent_seq += (seg->pkt_len - seg->data_len);
> +	}
> +}
> +
> +void
> +gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
> +		struct rte_mbuf **segs, uint16_t nb_segs)
> +{
> +	if (is_ipv4_tcp(pkt->packet_type))
> +		update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
> +}
> diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
> new file mode 100644
> index 0000000..3c76520
> --- /dev/null
> +++ b/lib/librte_gso/gso_common.h
> @@ -0,0 +1,113 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _GSO_COMMON_H_
> +#define _GSO_COMMON_H_
> +
> +#include <stdint.h>
> +#include <rte_mbuf.h>
> +
> +#define IPV4_HDR_DF_SHIFT 14

We have that already defined in librte_net/rte_ip.h

> +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> +
> +#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
> +#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
> +
> +#define ETHER_TCP_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L4_TCP)
> +#define ETHER_VLAN_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | RTE_PTYPE_L4_TCP)
> +static inline uint8_t is_ipv4_tcp(uint32_t ptype)
> +{
> +	switch (ptype & (~RTE_PTYPE_L3_MASK)) {
> +	case ETHER_VLAN_TCP_PKT:
> +	case ETHER_TCP_PKT:

Why not just:
return RTE_ETH_IS_IPV4_HDR(ptype) && (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP;
?

> +		return RTE_ETH_IS_IPV4_HDR(ptype);
> +	default:
> +		return 0;
> +	}
> +}
> +
> +/**
> + * Internal function which updates relevant packet headers, following
> + * segmentation. This is required to update, for example, the IPv4
> + * 'total_length' field, to reflect the reduced length of the now-
> + * segmented packet.
> + *
> + * @param pkt
> + *  The original packet.
> + * @param ipid_delta
> + *  The increasing uint of IP ids.
> + * @param segs
> + *  Pointer array used for storing mbuf addresses for GSO segments.
> + * @param nb_segs
> + *  The number of GSO segments placed in segs.
> + */
> +void gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
> +		struct rte_mbuf **segs, uint16_t nb_segs);
> +
> +/**
> + * Internal function which divides the input packet into small segments.
> + * Each of the newly-created segments is organized as a two-segment MBUF,
> + * where the first segment is a standard mbuf, which stores a copy of
> + * packet header, and the second is an indirect mbuf which points to a
> + * section of data in the input packet.
> + *
> + * @param pkt
> + *  Packet to segment.
> + * @param pkt_hdr_offset
> + *  Packet header offset, measured in bytes.
> + * @param pyld_unit_size
> + *  The max payload length of a GSO segment.
> + * @param direct_pool
> + *  MBUF pool used for allocating direct buffers for output segments.
> + * @param indirect_pool
> + *  MBUF pool used for allocating indirect buffers for output segments.
> + * @param pkts_out
> + *  Pointer array used to keep the mbuf addresses of output segments. If
> + *  the memory space in pkts_out is insufficient, gso_do_segment() fails
> + *  and returns -EINVAL.
> + * @param nb_pkts_out
> + *  The max number of items that pkts_out can keep.
> + *
> + * @return
> + *  - The number of segments created in the event of success.
> + *  - Return -ENOMEM if run out of memory in MBUF pools.
> + *  - Return -EINVAL for invalid parameters.
> + */
> +int gso_do_segment(struct rte_mbuf *pkt,
> +		uint16_t pkt_hdr_offset,
> +		uint16_t pyld_unit_size,
> +		struct rte_mempool *direct_pool,
> +		struct rte_mempool *indirect_pool,
> +		struct rte_mbuf **pkts_out,
> +		uint16_t nb_pkts_out);
> +#endif
> diff --git a/lib/librte_gso/gso_tcp4.c b/lib/librte_gso/gso_tcp4.c
> new file mode 100644
> index 0000000..8d4bfb2
> --- /dev/null
> +++ b/lib/librte_gso/gso_tcp4.c
> @@ -0,0 +1,83 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +
> +#include <rte_ether.h>
> +#include <rte_ip.h>
> +
> +#include "gso_common.h"
> +#include "gso_tcp4.h"
> +
> +int
> +gso_tcp4_segment(struct rte_mbuf *pkt,
> +		uint16_t gso_size,
> +		uint8_t ipid_delta,
> +		struct rte_mempool *direct_pool,
> +		struct rte_mempool *indirect_pool,
> +		struct rte_mbuf **pkts_out,
> +		uint16_t nb_pkts_out)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	uint16_t tcp_dl;
> +	uint16_t pyld_unit_size;
> +	uint16_t hdr_offset;
> +	int ret = 1;
> +
> +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			pkt->l2_len);
> +	/* Don't process the fragmented packet */
> +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> +						IPV4_HDR_DF_MASK)) == 0)) {


It is not a check for fragmented packet - it is a check that fragmentation is allowed for that packet.
Should be IPV4_HDR_DF_MASK - 1,  I think.

> +		pkts_out[0] = pkt;
> +		return ret;
> +	}
> +
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> +		pkt->l4_len;

Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?

> +	/* Don't process the packet without data */
> +	if (unlikely(tcp_dl == 0)) {
> +		pkts_out[0] = pkt;
> +		return ret;
> +	}
> +
> +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;

Hmm, why do we need to count CRC_LEN here?

> +
> +	/* Segment the payload */
> +	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
> +			indirect_pool, pkts_out, nb_pkts_out);
> +	if (ret > 1)
> +		gso_update_pkt_headers(pkt, ipid_delta, pkts_out, ret);
> +
> +	return ret;
> +}
> diff --git a/lib/librte_gso/gso_tcp4.h b/lib/librte_gso/gso_tcp4.h
> new file mode 100644
> index 0000000..9c07984
> --- /dev/null
> +++ b/lib/librte_gso/gso_tcp4.h
> @@ -0,0 +1,76 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _GSO_TCP4_H_
> +#define _GSO_TCP4_H_
> +
> +#include <stdint.h>
> +#include <rte_mbuf.h>
> +
> +/**
> + * Segment an IPv4/TCP packet. This function assumes the input packet has
> + * correct checksums and doesn't update checksums for GSO segment.
> + * Furthermore, it doesn't process IP fragment packets.
> + *
> + * @param pkt
> + *  The packet mbuf to segment.
> + * @param gso_size
> + *  The max length of a GSO segment, measured in bytes.
> + * @param ipid_delta
> + *  The increasing uint of IP ids.
> + * @param direct_pool
> + *  MBUF pool used for allocating direct buffers for output segments.
> + * @param indirect_pool
> + *  MBUF pool used for allocating indirect buffers for output segments.
> + * @param pkts_out
> + *  Pointer array used to store the MBUF addresses of output GSO
> + *  segments, when gso_tcp4_segment() successes. If the memory space in
> + *  pkts_out is insufficient, gso_tcp4_segment() fails and returns
> + *  -EINVAL.
> + * @param nb_pkts_out
> + *  The max number of items that 'pkts_out' can keep.
> + *
> + * @return
> + *   - The number of GSO segments filled in pkts_out on success.
> + *   - Return -ENOMEM if run out of memory in MBUF pools.
> + *   - Return -EINVAL for invalid parameters.
> + */
> +int gso_tcp4_segment(struct rte_mbuf *pkt,
> +		uint16_t gso_size,
> +		uint8_t ip_delta,
> +		struct rte_mempool *direct_pool,
> +		struct rte_mempool *indirect_pool,
> +		struct rte_mbuf **pkts_out,
> +		uint16_t nb_pkts_out);
> +
> +#endif
> diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> index dda50ee..95f6ea6 100644
> --- a/lib/librte_gso/rte_gso.c
> +++ b/lib/librte_gso/rte_gso.c
> @@ -33,18 +33,53 @@
> 
>  #include <errno.h>
> 
> +#include <rte_log.h>
> +
>  #include "rte_gso.h"
> +#include "gso_common.h"
> +#include "gso_tcp4.h"
> 
>  int
>  rte_gso_segment(struct rte_mbuf *pkt,
> -		struct rte_gso_ctx gso_ctx __rte_unused,
> +		struct rte_gso_ctx gso_ctx,
>  		struct rte_mbuf **pkts_out,
>  		uint16_t nb_pkts_out)
>  {
> +	struct rte_mempool *direct_pool, *indirect_pool;
> +	struct rte_mbuf *pkt_seg;
> +	uint16_t gso_size;
> +	uint8_t ipid_delta;
> +	int ret = 1;
> +
>  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
>  		return -EINVAL;
> 
> -	pkts_out[0] = pkt;
> +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> +			(pkt->packet_type & gso_ctx.gso_types) !=
> +			pkt->packet_type) {
> +		pkts_out[0] = pkt;
> +		return ret;
> +	}
> +
> +	direct_pool = gso_ctx.direct_pool;
> +	indirect_pool = gso_ctx.indirect_pool;
> +	gso_size = gso_ctx.gso_size;
> +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> +
> +	if (is_ipv4_tcp(pkt->packet_type)) {

Probably we need here:
If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...

> +		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
> +				direct_pool, indirect_pool,
> +				pkts_out, nb_pkts_out);
> +	} else
> +		RTE_LOG(WARNING, GSO, "Unsupported packet type\n");

Shouldn't we do pkt_out[0] = pkt; here?

> +
> +	if (ret > 1) {
> +		pkt_seg = pkt;
> +		while (pkt_seg) {
> +			rte_mbuf_refcnt_update(pkt_seg, -1);
> +			pkt_seg = pkt_seg->next;
> +		}
> +	}
> 
> -	return 1;
> +	return ret;
>  }
> --
> 2.7.4
  
Ananyev, Konstantin Sept. 12, 2017, 2:17 p.m. UTC | #2
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, September 12, 2017 12:18 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Jayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Tuesday, September 12, 2017 3:43 AM
> > To: dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> > Subject: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > This patch adds GSO support for TCP/IPv4 packets. Supported packets
> > may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
> > packets have correct checksums, and doesn't update checksums for output
> > packets (the responsibility for this lies with the application).
> 
> Probably it shouldn't say that checksum have to be valid, right?
> As you don't update checksum(s) inside the lib - it probably doesn't matter.
> 
> > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.
> >
> > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
> > MBUF, to organize an output packet. Note that we refer to these two
> > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
> > header, while the indirect mbuf simply points to a location within the
> > original packet's payload. Consequently, use of the GSO library requires
> > multi-segment MBUF support in the TX functions of the NIC driver.
> >
> > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
> > result, when all of its GSOed segments are freed, the packet is freed
> > automatically.
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> > ---
> >  lib/librte_eal/common/include/rte_log.h |   1 +
> >  lib/librte_gso/Makefile                 |   2 +
> >  lib/librte_gso/gso_common.c             | 202 ++++++++++++++++++++++++++++++++
> >  lib/librte_gso/gso_common.h             | 113 ++++++++++++++++++
> >  lib/librte_gso/gso_tcp4.c               |  83 +++++++++++++
> >  lib/librte_gso/gso_tcp4.h               |  76 ++++++++++++
> >  lib/librte_gso/rte_gso.c                |  41 ++++++-
> >  7 files changed, 515 insertions(+), 3 deletions(-)
> >  create mode 100644 lib/librte_gso/gso_common.c
> >  create mode 100644 lib/librte_gso/gso_common.h
> >  create mode 100644 lib/librte_gso/gso_tcp4.c
> >  create mode 100644 lib/librte_gso/gso_tcp4.h
> >
> > diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h
> > index ec8dba7..2fa1199 100644
> > --- a/lib/librte_eal/common/include/rte_log.h
> > +++ b/lib/librte_eal/common/include/rte_log.h
> > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs;
> >  #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
> >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > +#define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> >
> >  /* these log types can be used in an application */
> >  #define RTE_LOGTYPE_USER1     24 /**< User-defined log type 1. */
> > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
> > index aeaacbc..2be64d1 100644
> > --- a/lib/librte_gso/Makefile
> > +++ b/lib/librte_gso/Makefile
> > @@ -42,6 +42,8 @@ LIBABIVER := 1
> >
> >  #source files
> >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
> >
> >  # install this header file
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
> > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
> > new file mode 100644
> > index 0000000..7c32e03
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_common.c
> > @@ -0,0 +1,202 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#include <stdbool.h>
> > +#include <errno.h>
> > +
> > +#include <rte_memcpy.h>
> > +#include <rte_mempool.h>
> > +#include <rte_ether.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "gso_common.h"
> > +
> > +static inline void
> > +hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
> > +		uint16_t pkt_hdr_offset)
> > +{
> > +	/* Copy MBUF metadata */
> > +	hdr_segment->nb_segs = 1;
> > +	hdr_segment->port = pkt->port;
> > +	hdr_segment->ol_flags = pkt->ol_flags;
> > +	hdr_segment->packet_type = pkt->packet_type;
> > +	hdr_segment->pkt_len = pkt_hdr_offset;
> > +	hdr_segment->data_len = pkt_hdr_offset;
> > +	hdr_segment->tx_offload = pkt->tx_offload;
> > +
> > +	/* Copy the packet header */
> > +	rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *),
> > +			rte_pktmbuf_mtod(pkt, char *),
> > +			pkt_hdr_offset);
> > +}
> > +
> > +static inline void
> > +free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts)
> > +{
> > +	uint16_t i;
> > +
> > +	for (i = 0; i < nb_pkts; i++)
> > +		rte_pktmbuf_free(pkts[i]);
> > +}
> > +
> > +int
> > +gso_do_segment(struct rte_mbuf *pkt,
> > +		uint16_t pkt_hdr_offset,
> > +		uint16_t pyld_unit_size,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out)
> > +{
> > +	struct rte_mbuf *pkt_in;
> > +	struct rte_mbuf *hdr_segment, *pyld_segment, *prev_segment;
> > +	uint16_t pkt_in_data_pos, segment_bytes_remaining;
> > +	uint16_t pyld_len, nb_segs;
> > +	bool more_in_pkt, more_out_segs;
> > +
> > +	pkt_in = pkt;
> > +	nb_segs = 0;
> > +	more_in_pkt = 1;
> > +	pkt_in_data_pos = pkt_hdr_offset;
> > +
> > +	while (more_in_pkt) {
> > +		if (unlikely(nb_segs >= nb_pkts_out)) {
> > +			free_gso_segment(pkts_out, nb_segs);
> > +			return -EINVAL;
> > +		}
> > +
> > +		/* Allocate a direct MBUF */
> > +		hdr_segment = rte_pktmbuf_alloc(direct_pool);
> > +		if (unlikely(hdr_segment == NULL)) {
> > +			free_gso_segment(pkts_out, nb_segs);
> > +			return -ENOMEM;
> > +		}
> > +		/* Fill the packet header */
> > +		hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset);
> > +
> > +		prev_segment = hdr_segment;
> > +		segment_bytes_remaining = pyld_unit_size;
> > +		more_out_segs = 1;
> > +
> > +		while (more_out_segs && more_in_pkt) {
> > +			/* Allocate an indirect MBUF */
> > +			pyld_segment = rte_pktmbuf_alloc(indirect_pool);
> > +			if (unlikely(pyld_segment == NULL)) {
> > +				rte_pktmbuf_free(hdr_segment);
> > +				free_gso_segment(pkts_out, nb_segs);
> > +				return -ENOMEM;
> > +			}
> > +			/* Attach to current MBUF segment of pkt */
> > +			rte_pktmbuf_attach(pyld_segment, pkt_in);
> > +
> > +			prev_segment->next = pyld_segment;
> > +			prev_segment = pyld_segment;
> > +
> > +			pyld_len = segment_bytes_remaining;
> > +			if (pyld_len + pkt_in_data_pos > pkt_in->data_len)
> > +				pyld_len = pkt_in->data_len - pkt_in_data_pos;
> > +
> > +			pyld_segment->data_off = pkt_in_data_pos +
> > +				pkt_in->data_off;
> > +			pyld_segment->data_len = pyld_len;
> > +
> > +			/* Update header segment */
> > +			hdr_segment->pkt_len += pyld_len;
> > +			hdr_segment->nb_segs++;
> > +
> > +			pkt_in_data_pos += pyld_len;
> > +			segment_bytes_remaining -= pyld_len;
> > +
> > +			/* Finish processing a MBUF segment of pkt */
> > +			if (pkt_in_data_pos == pkt_in->data_len) {
> > +				pkt_in = pkt_in->next;
> > +				pkt_in_data_pos = 0;
> > +				if (pkt_in == NULL)
> > +					more_in_pkt = 0;
> > +			}
> > +
> > +			/* Finish generating a GSO segment */
> > +			if (segment_bytes_remaining == 0)
> > +				more_out_segs = 0;
> > +		}
> > +		pkts_out[nb_segs++] = hdr_segment;
> > +	}
> > +	return nb_segs;
> > +}
> > +
> > +static inline void
> > +update_inner_tcp4_header(struct rte_mbuf *pkt, uint8_t ipid_delta,
> > +		struct rte_mbuf **segs, uint16_t nb_segs)
> > +{
> > +	struct tcp_hdr *tcp_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct rte_mbuf *seg;
> > +	uint32_t sent_seq;
> > +	uint16_t inner_l2_offset;
> > +	uint16_t id, i;
> > +
> > +	inner_l2_offset = pkt->outer_l2_len + pkt->outer_l3_len + pkt->l2_len;
> 
> Shouldn't it be: pkt->l2_len here?
> Or probably even better to pass l2_len as an input parameter.
> 
> > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> > +			inner_l2_offset);
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> > +	id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	for (i = 0; i < nb_segs; i++) {
> > +		seg = segs[i];
> > +		/* Update the inner IPv4 header */
> > +		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(seg, char *) +
> > +				inner_l2_offset);
> > +		ipv4_hdr->total_length = rte_cpu_to_be_16(seg->pkt_len -
> > +				inner_l2_offset);
> > +		ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
> > +		id += ipid_delta;
> > +
> > +		/* Update the inner TCP header */
> > +		tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + seg->l3_len);
> > +		tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq);
> > +		if (likely(i < nb_segs - 1))
> > +			tcp_hdr->tcp_flags &= (~(TCP_HDR_PSH_MASK |
> > +						TCP_HDR_FIN_MASK));
> > +		sent_seq += (seg->pkt_len - seg->data_len);
> > +	}
> > +}
> > +
> > +void
> > +gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
> > +		struct rte_mbuf **segs, uint16_t nb_segs)
> > +{
> > +	if (is_ipv4_tcp(pkt->packet_type))
> > +		update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
> > +}
> > diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
> > new file mode 100644
> > index 0000000..3c76520
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_common.h
> > @@ -0,0 +1,113 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _GSO_COMMON_H_
> > +#define _GSO_COMMON_H_
> > +
> > +#include <stdint.h>
> > +#include <rte_mbuf.h>
> > +
> > +#define IPV4_HDR_DF_SHIFT 14
> 
> We have that already defined in librte_net/rte_ip.h
> 
> > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > +
> > +#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
> > +#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
> > +
> > +#define ETHER_TCP_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L4_TCP)
> > +#define ETHER_VLAN_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | RTE_PTYPE_L4_TCP)
> > +static inline uint8_t is_ipv4_tcp(uint32_t ptype)
> > +{
> > +	switch (ptype & (~RTE_PTYPE_L3_MASK)) {
> > +	case ETHER_VLAN_TCP_PKT:
> > +	case ETHER_TCP_PKT:
> 
> Why not just:
> return RTE_ETH_IS_IPV4_HDR(ptype) && (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP;
> ?
> 
> > +		return RTE_ETH_IS_IPV4_HDR(ptype);
> > +	default:
> > +		return 0;
> > +	}
> > +}
> > +
> > +/**
> > + * Internal function which updates relevant packet headers, following
> > + * segmentation. This is required to update, for example, the IPv4
> > + * 'total_length' field, to reflect the reduced length of the now-
> > + * segmented packet.
> > + *
> > + * @param pkt
> > + *  The original packet.
> > + * @param ipid_delta
> > + *  The increasing uint of IP ids.
> > + * @param segs
> > + *  Pointer array used for storing mbuf addresses for GSO segments.
> > + * @param nb_segs
> > + *  The number of GSO segments placed in segs.
> > + */
> > +void gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
> > +		struct rte_mbuf **segs, uint16_t nb_segs);
> > +
> > +/**
> > + * Internal function which divides the input packet into small segments.
> > + * Each of the newly-created segments is organized as a two-segment MBUF,
> > + * where the first segment is a standard mbuf, which stores a copy of
> > + * packet header, and the second is an indirect mbuf which points to a
> > + * section of data in the input packet.
> > + *
> > + * @param pkt
> > + *  Packet to segment.
> > + * @param pkt_hdr_offset
> > + *  Packet header offset, measured in bytes.
> > + * @param pyld_unit_size
> > + *  The max payload length of a GSO segment.
> > + * @param direct_pool
> > + *  MBUF pool used for allocating direct buffers for output segments.
> > + * @param indirect_pool
> > + *  MBUF pool used for allocating indirect buffers for output segments.
> > + * @param pkts_out
> > + *  Pointer array used to keep the mbuf addresses of output segments. If
> > + *  the memory space in pkts_out is insufficient, gso_do_segment() fails
> > + *  and returns -EINVAL.
> > + * @param nb_pkts_out
> > + *  The max number of items that pkts_out can keep.
> > + *
> > + * @return
> > + *  - The number of segments created in the event of success.
> > + *  - Return -ENOMEM if run out of memory in MBUF pools.
> > + *  - Return -EINVAL for invalid parameters.
> > + */
> > +int gso_do_segment(struct rte_mbuf *pkt,
> > +		uint16_t pkt_hdr_offset,
> > +		uint16_t pyld_unit_size,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out);
> > +#endif
> > diff --git a/lib/librte_gso/gso_tcp4.c b/lib/librte_gso/gso_tcp4.c
> > new file mode 100644
> > index 0000000..8d4bfb2
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_tcp4.c
> > @@ -0,0 +1,83 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +
> > +#include <rte_ether.h>
> > +#include <rte_ip.h>
> > +
> > +#include "gso_common.h"
> > +#include "gso_tcp4.h"
> > +
> > +int
> > +gso_tcp4_segment(struct rte_mbuf *pkt,
> > +		uint16_t gso_size,
> > +		uint8_t ipid_delta,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	uint16_t tcp_dl;
> > +	uint16_t pyld_unit_size;
> > +	uint16_t hdr_offset;
> > +	int ret = 1;
> > +
> > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> > +			pkt->l2_len);
> > +	/* Don't process the fragmented packet */
> > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> > +						IPV4_HDR_DF_MASK)) == 0)) {
> 
> 
> It is not a check for fragmented packet - it is a check that fragmentation is allowed for that packet.
> Should be IPV4_HDR_DF_MASK - 1,  I think.
> 
> > +		pkts_out[0] = pkt;
> > +		return ret;
> > +	}
> > +
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> > +		pkt->l4_len;
> 
> Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> 
> > +	/* Don't process the packet without data */
> > +	if (unlikely(tcp_dl == 0)) {
> > +		pkts_out[0] = pkt;
> > +		return ret;
> > +	}
> > +
> > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> 
> Hmm, why do we need to count CRC_LEN here?
> 
> > +
> > +	/* Segment the payload */
> > +	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
> > +			indirect_pool, pkts_out, nb_pkts_out);
> > +	if (ret > 1)
> > +		gso_update_pkt_headers(pkt, ipid_delta, pkts_out, ret);
> > +
> > +	return ret;
> > +}
> > diff --git a/lib/librte_gso/gso_tcp4.h b/lib/librte_gso/gso_tcp4.h
> > new file mode 100644
> > index 0000000..9c07984
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_tcp4.h
> > @@ -0,0 +1,76 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _GSO_TCP4_H_
> > +#define _GSO_TCP4_H_
> > +
> > +#include <stdint.h>
> > +#include <rte_mbuf.h>
> > +
> > +/**
> > + * Segment an IPv4/TCP packet. This function assumes the input packet has
> > + * correct checksums and doesn't update checksums for GSO segment.
> > + * Furthermore, it doesn't process IP fragment packets.
> > + *
> > + * @param pkt
> > + *  The packet mbuf to segment.
> > + * @param gso_size
> > + *  The max length of a GSO segment, measured in bytes.
> > + * @param ipid_delta
> > + *  The increasing uint of IP ids.
> > + * @param direct_pool
> > + *  MBUF pool used for allocating direct buffers for output segments.
> > + * @param indirect_pool
> > + *  MBUF pool used for allocating indirect buffers for output segments.
> > + * @param pkts_out
> > + *  Pointer array used to store the MBUF addresses of output GSO
> > + *  segments, when gso_tcp4_segment() successes. If the memory space in
> > + *  pkts_out is insufficient, gso_tcp4_segment() fails and returns
> > + *  -EINVAL.
> > + * @param nb_pkts_out
> > + *  The max number of items that 'pkts_out' can keep.
> > + *
> > + * @return
> > + *   - The number of GSO segments filled in pkts_out on success.
> > + *   - Return -ENOMEM if run out of memory in MBUF pools.
> > + *   - Return -EINVAL for invalid parameters.
> > + */
> > +int gso_tcp4_segment(struct rte_mbuf *pkt,
> > +		uint16_t gso_size,
> > +		uint8_t ip_delta,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out);
> > +
> > +#endif
> > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > index dda50ee..95f6ea6 100644
> > --- a/lib/librte_gso/rte_gso.c
> > +++ b/lib/librte_gso/rte_gso.c
> > @@ -33,18 +33,53 @@
> >
> >  #include <errno.h>
> >
> > +#include <rte_log.h>
> > +
> >  #include "rte_gso.h"
> > +#include "gso_common.h"
> > +#include "gso_tcp4.h"
> >
> >  int
> >  rte_gso_segment(struct rte_mbuf *pkt,
> > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > +		struct rte_gso_ctx gso_ctx,
> >  		struct rte_mbuf **pkts_out,
> >  		uint16_t nb_pkts_out)
> >  {
> > +	struct rte_mempool *direct_pool, *indirect_pool;
> > +	struct rte_mbuf *pkt_seg;
> > +	uint16_t gso_size;
> > +	uint8_t ipid_delta;
> > +	int ret = 1;
> > +
> >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> >  		return -EINVAL;
> >
> > -	pkts_out[0] = pkt;
> > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > +			pkt->packet_type) {
> > +		pkts_out[0] = pkt;
> > +		return ret;
> > +	}
> > +
> > +	direct_pool = gso_ctx.direct_pool;
> > +	indirect_pool = gso_ctx.indirect_pool;
> > +	gso_size = gso_ctx.gso_size;
> > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > +
> > +	if (is_ipv4_tcp(pkt->packet_type)) {
> 
> Probably we need here:
> If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...

Sorry, actually it probably should be:
If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 &&
      (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...

Konstantin

> 
> > +		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
> > +				direct_pool, indirect_pool,
> > +				pkts_out, nb_pkts_out);
> > +	} else
> > +		RTE_LOG(WARNING, GSO, "Unsupported packet type\n");
> 
> Shouldn't we do pkt_out[0] = pkt; here?
> 
> > +
> > +	if (ret > 1) {
> > +		pkt_seg = pkt;
> > +		while (pkt_seg) {
> > +			rte_mbuf_refcnt_update(pkt_seg, -1);
> > +			pkt_seg = pkt_seg->next;
> > +		}
> > +	}
> >
> > -	return 1;
> > +	return ret;
> >  }
> > --
> > 2.7.4
  
Hu, Jiayu Sept. 13, 2017, 2:48 a.m. UTC | #3
Hi Konstantin,

On Tue, Sep 12, 2017 at 07:17:49PM +0800, Ananyev, Konstantin wrote:
> Hi Jayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Tuesday, September 12, 2017 3:43 AM
> > To: dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> > Subject: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > 
> > This patch adds GSO support for TCP/IPv4 packets. Supported packets
> > may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
> > packets have correct checksums, and doesn't update checksums for output
> > packets (the responsibility for this lies with the application).
> 
> Probably it shouldn't say that checksum have to be valid, right?
> As you don't update checksum(s) inside the lib - it probably doesn't matter.

Yes, you are right. It's better to use:
"TCP/IPv4 GSO doesn't check if checksums are correct and doesn't update
checksums for output packets".

> 
> > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.
> > 
> > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
> > MBUF, to organize an output packet. Note that we refer to these two
> > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
> > header, while the indirect mbuf simply points to a location within the
> > original packet's payload. Consequently, use of the GSO library requires
> > multi-segment MBUF support in the TX functions of the NIC driver.
> > 
> > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
> > result, when all of its GSOed segments are freed, the packet is freed
> > automatically.
> > 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> > ---
> >  lib/librte_eal/common/include/rte_log.h |   1 +
> >  lib/librte_gso/Makefile                 |   2 +
> >  lib/librte_gso/gso_common.c             | 202 ++++++++++++++++++++++++++++++++
> >  lib/librte_gso/gso_common.h             | 113 ++++++++++++++++++
> >  lib/librte_gso/gso_tcp4.c               |  83 +++++++++++++
> >  lib/librte_gso/gso_tcp4.h               |  76 ++++++++++++
> >  lib/librte_gso/rte_gso.c                |  41 ++++++-
> >  7 files changed, 515 insertions(+), 3 deletions(-)
> >  create mode 100644 lib/librte_gso/gso_common.c
> >  create mode 100644 lib/librte_gso/gso_common.h
> >  create mode 100644 lib/librte_gso/gso_tcp4.c
> >  create mode 100644 lib/librte_gso/gso_tcp4.h
> > 
> > diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h
> > index ec8dba7..2fa1199 100644
> > --- a/lib/librte_eal/common/include/rte_log.h
> > +++ b/lib/librte_eal/common/include/rte_log.h
> > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs;
> >  #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
> >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > +#define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > 
> >  /* these log types can be used in an application */
> >  #define RTE_LOGTYPE_USER1     24 /**< User-defined log type 1. */
> > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
> > index aeaacbc..2be64d1 100644
> > --- a/lib/librte_gso/Makefile
> > +++ b/lib/librte_gso/Makefile
> > @@ -42,6 +42,8 @@ LIBABIVER := 1
> > 
> >  #source files
> >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
> > 
> >  # install this header file
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
> > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
> > new file mode 100644
> > index 0000000..7c32e03
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_common.c
> > @@ -0,0 +1,202 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#include <stdbool.h>
> > +#include <errno.h>
> > +
> > +#include <rte_memcpy.h>
> > +#include <rte_mempool.h>
> > +#include <rte_ether.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "gso_common.h"
> > +
> > +static inline void
> > +hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
> > +		uint16_t pkt_hdr_offset)
> > +{
> > +	/* Copy MBUF metadata */
> > +	hdr_segment->nb_segs = 1;
> > +	hdr_segment->port = pkt->port;
> > +	hdr_segment->ol_flags = pkt->ol_flags;
> > +	hdr_segment->packet_type = pkt->packet_type;
> > +	hdr_segment->pkt_len = pkt_hdr_offset;
> > +	hdr_segment->data_len = pkt_hdr_offset;
> > +	hdr_segment->tx_offload = pkt->tx_offload;
> > +
> > +	/* Copy the packet header */
> > +	rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *),
> > +			rte_pktmbuf_mtod(pkt, char *),
> > +			pkt_hdr_offset);
> > +}
> > +
> > +static inline void
> > +free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts)
> > +{
> > +	uint16_t i;
> > +
> > +	for (i = 0; i < nb_pkts; i++)
> > +		rte_pktmbuf_free(pkts[i]);
> > +}
> > +
> > +int
> > +gso_do_segment(struct rte_mbuf *pkt,
> > +		uint16_t pkt_hdr_offset,
> > +		uint16_t pyld_unit_size,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out)
> > +{
> > +	struct rte_mbuf *pkt_in;
> > +	struct rte_mbuf *hdr_segment, *pyld_segment, *prev_segment;
> > +	uint16_t pkt_in_data_pos, segment_bytes_remaining;
> > +	uint16_t pyld_len, nb_segs;
> > +	bool more_in_pkt, more_out_segs;
> > +
> > +	pkt_in = pkt;
> > +	nb_segs = 0;
> > +	more_in_pkt = 1;
> > +	pkt_in_data_pos = pkt_hdr_offset;
> > +
> > +	while (more_in_pkt) {
> > +		if (unlikely(nb_segs >= nb_pkts_out)) {
> > +			free_gso_segment(pkts_out, nb_segs);
> > +			return -EINVAL;
> > +		}
> > +
> > +		/* Allocate a direct MBUF */
> > +		hdr_segment = rte_pktmbuf_alloc(direct_pool);
> > +		if (unlikely(hdr_segment == NULL)) {
> > +			free_gso_segment(pkts_out, nb_segs);
> > +			return -ENOMEM;
> > +		}
> > +		/* Fill the packet header */
> > +		hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset);
> > +
> > +		prev_segment = hdr_segment;
> > +		segment_bytes_remaining = pyld_unit_size;
> > +		more_out_segs = 1;
> > +
> > +		while (more_out_segs && more_in_pkt) {
> > +			/* Allocate an indirect MBUF */
> > +			pyld_segment = rte_pktmbuf_alloc(indirect_pool);
> > +			if (unlikely(pyld_segment == NULL)) {
> > +				rte_pktmbuf_free(hdr_segment);
> > +				free_gso_segment(pkts_out, nb_segs);
> > +				return -ENOMEM;
> > +			}
> > +			/* Attach to current MBUF segment of pkt */
> > +			rte_pktmbuf_attach(pyld_segment, pkt_in);
> > +
> > +			prev_segment->next = pyld_segment;
> > +			prev_segment = pyld_segment;
> > +
> > +			pyld_len = segment_bytes_remaining;
> > +			if (pyld_len + pkt_in_data_pos > pkt_in->data_len)
> > +				pyld_len = pkt_in->data_len - pkt_in_data_pos;
> > +
> > +			pyld_segment->data_off = pkt_in_data_pos +
> > +				pkt_in->data_off;
> > +			pyld_segment->data_len = pyld_len;
> > +
> > +			/* Update header segment */
> > +			hdr_segment->pkt_len += pyld_len;
> > +			hdr_segment->nb_segs++;
> > +
> > +			pkt_in_data_pos += pyld_len;
> > +			segment_bytes_remaining -= pyld_len;
> > +
> > +			/* Finish processing a MBUF segment of pkt */
> > +			if (pkt_in_data_pos == pkt_in->data_len) {
> > +				pkt_in = pkt_in->next;
> > +				pkt_in_data_pos = 0;
> > +				if (pkt_in == NULL)
> > +					more_in_pkt = 0;
> > +			}
> > +
> > +			/* Finish generating a GSO segment */
> > +			if (segment_bytes_remaining == 0)
> > +				more_out_segs = 0;
> > +		}
> > +		pkts_out[nb_segs++] = hdr_segment;
> > +	}
> > +	return nb_segs;
> > +}
> > +
> > +static inline void
> > +update_inner_tcp4_header(struct rte_mbuf *pkt, uint8_t ipid_delta,
> > +		struct rte_mbuf **segs, uint16_t nb_segs)
> > +{
> > +	struct tcp_hdr *tcp_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct rte_mbuf *seg;
> > +	uint32_t sent_seq;
> > +	uint16_t inner_l2_offset;
> > +	uint16_t id, i;
> > +
> > +	inner_l2_offset = pkt->outer_l2_len + pkt->outer_l3_len + pkt->l2_len;
> 
> Shouldn't it be: pkt->l2_len here?
> Or probably even better to pass l2_len as an input parameter.

Oh, yes. Applications won't guarantee outer_l2_len and outer_l3_len are 0
for non-tunnelling packets. I will add l2_len as a parameter instead.

> 
> > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> > +			inner_l2_offset);
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> > +	id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	for (i = 0; i < nb_segs; i++) {
> > +		seg = segs[i];
> > +		/* Update the inner IPv4 header */
> > +		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(seg, char *) +
> > +				inner_l2_offset);
> > +		ipv4_hdr->total_length = rte_cpu_to_be_16(seg->pkt_len -
> > +				inner_l2_offset);
> > +		ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
> > +		id += ipid_delta;
> > +
> > +		/* Update the inner TCP header */
> > +		tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + seg->l3_len);
> > +		tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq);
> > +		if (likely(i < nb_segs - 1))
> > +			tcp_hdr->tcp_flags &= (~(TCP_HDR_PSH_MASK |
> > +						TCP_HDR_FIN_MASK));
> > +		sent_seq += (seg->pkt_len - seg->data_len);
> > +	}
> > +}
> > +
> > +void
> > +gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
> > +		struct rte_mbuf **segs, uint16_t nb_segs)
> > +{
> > +	if (is_ipv4_tcp(pkt->packet_type))
> > +		update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
> > +}
> > diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
> > new file mode 100644
> > index 0000000..3c76520
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_common.h
> > @@ -0,0 +1,113 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _GSO_COMMON_H_
> > +#define _GSO_COMMON_H_
> > +
> > +#include <stdint.h>
> > +#include <rte_mbuf.h>
> > +
> > +#define IPV4_HDR_DF_SHIFT 14
> 
> We have that already defined in librte_net/rte_ip.h

Yes. I will remove it here.

> 
> > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > +
> > +#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
> > +#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
> > +
> > +#define ETHER_TCP_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L4_TCP)
> > +#define ETHER_VLAN_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | RTE_PTYPE_L4_TCP)
> > +static inline uint8_t is_ipv4_tcp(uint32_t ptype)
> > +{
> > +	switch (ptype & (~RTE_PTYPE_L3_MASK)) {
> > +	case ETHER_VLAN_TCP_PKT:
> > +	case ETHER_TCP_PKT:
> 
> Why not just:
> return RTE_ETH_IS_IPV4_HDR(ptype) && (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP;
> ?

Yes, we don't need to check if the packet is vlan encapsulated.

> 
> > +		return RTE_ETH_IS_IPV4_HDR(ptype);
> > +	default:
> > +		return 0;
> > +	}
> > +}
> > +
> > +/**
> > + * Internal function which updates relevant packet headers, following
> > + * segmentation. This is required to update, for example, the IPv4
> > + * 'total_length' field, to reflect the reduced length of the now-
> > + * segmented packet.
> > + *
> > + * @param pkt
> > + *  The original packet.
> > + * @param ipid_delta
> > + *  The increasing uint of IP ids.
> > + * @param segs
> > + *  Pointer array used for storing mbuf addresses for GSO segments.
> > + * @param nb_segs
> > + *  The number of GSO segments placed in segs.
> > + */
> > +void gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
> > +		struct rte_mbuf **segs, uint16_t nb_segs);
> > +
> > +/**
> > + * Internal function which divides the input packet into small segments.
> > + * Each of the newly-created segments is organized as a two-segment MBUF,
> > + * where the first segment is a standard mbuf, which stores a copy of
> > + * packet header, and the second is an indirect mbuf which points to a
> > + * section of data in the input packet.
> > + *
> > + * @param pkt
> > + *  Packet to segment.
> > + * @param pkt_hdr_offset
> > + *  Packet header offset, measured in bytes.
> > + * @param pyld_unit_size
> > + *  The max payload length of a GSO segment.
> > + * @param direct_pool
> > + *  MBUF pool used for allocating direct buffers for output segments.
> > + * @param indirect_pool
> > + *  MBUF pool used for allocating indirect buffers for output segments.
> > + * @param pkts_out
> > + *  Pointer array used to keep the mbuf addresses of output segments. If
> > + *  the memory space in pkts_out is insufficient, gso_do_segment() fails
> > + *  and returns -EINVAL.
> > + * @param nb_pkts_out
> > + *  The max number of items that pkts_out can keep.
> > + *
> > + * @return
> > + *  - The number of segments created in the event of success.
> > + *  - Return -ENOMEM if run out of memory in MBUF pools.
> > + *  - Return -EINVAL for invalid parameters.
> > + */
> > +int gso_do_segment(struct rte_mbuf *pkt,
> > +		uint16_t pkt_hdr_offset,
> > +		uint16_t pyld_unit_size,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out);
> > +#endif
> > diff --git a/lib/librte_gso/gso_tcp4.c b/lib/librte_gso/gso_tcp4.c
> > new file mode 100644
> > index 0000000..8d4bfb2
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_tcp4.c
> > @@ -0,0 +1,83 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +
> > +#include <rte_ether.h>
> > +#include <rte_ip.h>
> > +
> > +#include "gso_common.h"
> > +#include "gso_tcp4.h"
> > +
> > +int
> > +gso_tcp4_segment(struct rte_mbuf *pkt,
> > +		uint16_t gso_size,
> > +		uint8_t ipid_delta,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	uint16_t tcp_dl;
> > +	uint16_t pyld_unit_size;
> > +	uint16_t hdr_offset;
> > +	int ret = 1;
> > +
> > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> > +			pkt->l2_len);
> > +	/* Don't process the fragmented packet */
> > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> > +						IPV4_HDR_DF_MASK)) == 0)) {
> 
> 
> It is not a check for fragmented packet - it is a check that fragmentation is allowed for that packet.
> Should be IPV4_HDR_DF_MASK - 1,  I think.

IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. It's a
little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
So the value of DF bit should be "ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.

> 
> > +		pkts_out[0] = pkt;
> > +		return ret;
> > +	}
> > +
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> > +		pkt->l4_len;
> 
> Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?

Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here.

> 
> > +	/* Don't process the packet without data */
> > +	if (unlikely(tcp_dl == 0)) {
> > +		pkts_out[0] = pkt;
> > +		return ret;
> > +	}
> > +
> > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> 
> Hmm, why do we need to count CRC_LEN here?

Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
included in gso_size.

> 
> > +
> > +	/* Segment the payload */
> > +	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
> > +			indirect_pool, pkts_out, nb_pkts_out);
> > +	if (ret > 1)
> > +		gso_update_pkt_headers(pkt, ipid_delta, pkts_out, ret);
> > +
> > +	return ret;
> > +}
> > diff --git a/lib/librte_gso/gso_tcp4.h b/lib/librte_gso/gso_tcp4.h
> > new file mode 100644
> > index 0000000..9c07984
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_tcp4.h
> > @@ -0,0 +1,76 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _GSO_TCP4_H_
> > +#define _GSO_TCP4_H_
> > +
> > +#include <stdint.h>
> > +#include <rte_mbuf.h>
> > +
> > +/**
> > + * Segment an IPv4/TCP packet. This function assumes the input packet has
> > + * correct checksums and doesn't update checksums for GSO segment.
> > + * Furthermore, it doesn't process IP fragment packets.
> > + *
> > + * @param pkt
> > + *  The packet mbuf to segment.
> > + * @param gso_size
> > + *  The max length of a GSO segment, measured in bytes.
> > + * @param ipid_delta
> > + *  The increasing uint of IP ids.
> > + * @param direct_pool
> > + *  MBUF pool used for allocating direct buffers for output segments.
> > + * @param indirect_pool
> > + *  MBUF pool used for allocating indirect buffers for output segments.
> > + * @param pkts_out
> > + *  Pointer array used to store the MBUF addresses of output GSO
> > + *  segments, when gso_tcp4_segment() successes. If the memory space in
> > + *  pkts_out is insufficient, gso_tcp4_segment() fails and returns
> > + *  -EINVAL.
> > + * @param nb_pkts_out
> > + *  The max number of items that 'pkts_out' can keep.
> > + *
> > + * @return
> > + *   - The number of GSO segments filled in pkts_out on success.
> > + *   - Return -ENOMEM if run out of memory in MBUF pools.
> > + *   - Return -EINVAL for invalid parameters.
> > + */
> > +int gso_tcp4_segment(struct rte_mbuf *pkt,
> > +		uint16_t gso_size,
> > +		uint8_t ip_delta,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out);
> > +
> > +#endif
> > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > index dda50ee..95f6ea6 100644
> > --- a/lib/librte_gso/rte_gso.c
> > +++ b/lib/librte_gso/rte_gso.c
> > @@ -33,18 +33,53 @@
> > 
> >  #include <errno.h>
> > 
> > +#include <rte_log.h>
> > +
> >  #include "rte_gso.h"
> > +#include "gso_common.h"
> > +#include "gso_tcp4.h"
> > 
> >  int
> >  rte_gso_segment(struct rte_mbuf *pkt,
> > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > +		struct rte_gso_ctx gso_ctx,
> >  		struct rte_mbuf **pkts_out,
> >  		uint16_t nb_pkts_out)
> >  {
> > +	struct rte_mempool *direct_pool, *indirect_pool;
> > +	struct rte_mbuf *pkt_seg;
> > +	uint16_t gso_size;
> > +	uint8_t ipid_delta;
> > +	int ret = 1;
> > +
> >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> >  		return -EINVAL;
> > 
> > -	pkts_out[0] = pkt;
> > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > +			pkt->packet_type) {
> > +		pkts_out[0] = pkt;
> > +		return ret;
> > +	}
> > +
> > +	direct_pool = gso_ctx.direct_pool;
> > +	indirect_pool = gso_ctx.indirect_pool;
> > +	gso_size = gso_ctx.gso_size;
> > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > +
> > +	if (is_ipv4_tcp(pkt->packet_type)) {
> 
> Probably we need here:
> If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> 
> > +		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
> > +				direct_pool, indirect_pool,
> > +				pkts_out, nb_pkts_out);
> > +	} else
> > +		RTE_LOG(WARNING, GSO, "Unsupported packet type\n");
> 
> Shouldn't we do pkt_out[0] = pkt; here?

Yes, we need to add it here. Thanks for reminder.

> 
> > +
> > +	if (ret > 1) {
> > +		pkt_seg = pkt;
> > +		while (pkt_seg) {
> > +			rte_mbuf_refcnt_update(pkt_seg, -1);
> > +			pkt_seg = pkt_seg->next;
> > +		}
> > +	}
> > 
> > -	return 1;
> > +	return ret;
> >  }
> > --
> > 2.7.4
  
Ananyev, Konstantin Sept. 13, 2017, 9:38 a.m. UTC | #4
> > > +
> > > +int
> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> > > +		uint16_t gso_size,
> > > +		uint8_t ipid_delta,
> > > +		struct rte_mempool *direct_pool,
> > > +		struct rte_mempool *indirect_pool,
> > > +		struct rte_mbuf **pkts_out,
> > > +		uint16_t nb_pkts_out)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t tcp_dl;
> > > +	uint16_t pyld_unit_size;
> > > +	uint16_t hdr_offset;
> > > +	int ret = 1;
> > > +
> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> > > +			pkt->l2_len);
> > > +	/* Don't process the fragmented packet */
> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> > > +						IPV4_HDR_DF_MASK)) == 0)) {
> >
> >
> > It is not a check for fragmented packet - it is a check that fragmentation is allowed for that packet.
> > Should be IPV4_HDR_DF_MASK - 1,  I think.

DF bit doesn't indicate is packet fragmented or not.
It forbids to fragment packet any further.
To check is packet already fragmented or not, you have to check MF bit and frag_offset.
Both have to be zero for un-fragmented packets.

> 
> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. It's a
> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
> So the value of DF bit should be "ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
> 
> >
> > > +		pkts_out[0] = pkt;
> > > +		return ret;
> > > +	}
> > > +
> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> > > +		pkt->l4_len;
> >
> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> 
> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here.
> 
> >
> > > +	/* Don't process the packet without data */
> > > +	if (unlikely(tcp_dl == 0)) {
> > > +		pkts_out[0] = pkt;
> > > +		return ret;
> > > +	}
> > > +
> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> >
> > Hmm, why do we need to count CRC_LEN here?
> 
> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> included in gso_size.

Why?
What is the point to account crc len into this computation?
Why not just assume that gso_size is already a max_frame_size - crc_len
As I remember, when we RX packet crc bytes will be already stripped,
when user populates the packet, he doesn't care about crc bytes too. 

Konstantin
  
Hu, Jiayu Sept. 13, 2017, 10:23 a.m. UTC | #5
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, September 13, 2017 5:38 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> 
> 
> > > > +
> > > > +int
> > > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> > > > +		uint16_t gso_size,
> > > > +		uint8_t ipid_delta,
> > > > +		struct rte_mempool *direct_pool,
> > > > +		struct rte_mempool *indirect_pool,
> > > > +		struct rte_mbuf **pkts_out,
> > > > +		uint16_t nb_pkts_out)
> > > > +{
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	uint16_t tcp_dl;
> > > > +	uint16_t pyld_unit_size;
> > > > +	uint16_t hdr_offset;
> > > > +	int ret = 1;
> > > > +
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> > > > +			pkt->l2_len);
> > > > +	/* Don't process the fragmented packet */
> > > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> > > > +						IPV4_HDR_DF_MASK)) == 0))
> {
> > >
> > >
> > > It is not a check for fragmented packet - it is a check that fragmentation
> is allowed for that packet.
> > > Should be IPV4_HDR_DF_MASK - 1,  I think.
> 
> DF bit doesn't indicate is packet fragmented or not.
> It forbids to fragment packet any further.
> To check is packet already fragmented or not, you have to check MF bit and
> frag_offset.
> Both have to be zero for un-fragmented packets.

Yes, you are right. I checked the RFC and I misunderstood the meaning of DF bit.
When DF bit is set to 1, the packet isn't IP fragmented. When DF bit is 0, the packet
may or may not be fragmented. So it can't indicate if the packet is an IP fragment.
Only both MF and offset are 0, the packet is not fragmented.

> 
> >
> > IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. It's
> a
> > little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
> > So the value of DF bit should be "ipv4_hdr->fragment_offset &
> rte_cpu_to_be_16(
> > IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
> >
> > >
> > > > +		pkts_out[0] = pkt;
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> > > > +		pkt->l4_len;
> > >
> > > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> >
> > Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here.
> >
> > >
> > > > +	/* Don't process the packet without data */
> > > > +	if (unlikely(tcp_dl == 0)) {
> > > > +		pkts_out[0] = pkt;
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> > > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> > >
> > > Hmm, why do we need to count CRC_LEN here?
> >
> > Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> > included in gso_size.
> 
> Why?
> What is the point to account crc len into this computation?
> Why not just assume that gso_size is already a max_frame_size - crc_len
> As I remember, when we RX packet crc bytes will be already stripped,
> when user populates the packet, he doesn't care about crc bytes too.

Sorry, maybe I didn't make it clear. I don't mean that applications must count
CRC when set gso_segsz. It's related specific scenarios to decide if count CRC
in gso_segsz or not, IMO. The GSO library shouldn't be aware of CRC, and just
uses gso_segsz to split packets.

Thanks,
Jiayu
> 
> Konstantin
  
Hu, Jiayu Sept. 13, 2017, 10:44 a.m. UTC | #6
Hi Konstantin,

On Tue, Sep 12, 2017 at 10:17:27PM +0800, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Tuesday, September 12, 2017 12:18 PM
> > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > 
> > > result, when all of its GSOed segments are freed, the packet is freed
> > > automatically.
> > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > index dda50ee..95f6ea6 100644
> > > --- a/lib/librte_gso/rte_gso.c
> > > +++ b/lib/librte_gso/rte_gso.c
> > > @@ -33,18 +33,53 @@
> > >
> > >  #include <errno.h>
> > >
> > > +#include <rte_log.h>
> > > +
> > >  #include "rte_gso.h"
> > > +#include "gso_common.h"
> > > +#include "gso_tcp4.h"
> > >
> > >  int
> > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > +		struct rte_gso_ctx gso_ctx,
> > >  		struct rte_mbuf **pkts_out,
> > >  		uint16_t nb_pkts_out)
> > >  {
> > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > +	struct rte_mbuf *pkt_seg;
> > > +	uint16_t gso_size;
> > > +	uint8_t ipid_delta;
> > > +	int ret = 1;
> > > +
> > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > >  		return -EINVAL;
> > >
> > > -	pkts_out[0] = pkt;
> > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > +			pkt->packet_type) {
> > > +		pkts_out[0] = pkt;
> > > +		return ret;
> > > +	}
> > > +
> > > +	direct_pool = gso_ctx.direct_pool;
> > > +	indirect_pool = gso_ctx.indirect_pool;
> > > +	gso_size = gso_ctx.gso_size;
> > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > +
> > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > 
> > Probably we need here:
> > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> 
> Sorry, actually it probably should be:
> If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 &&
>       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...

I don't quite understand why the GSO library should be aware if the TSO
flag is set or not. Applications can query device TSO capability before
they call the GSO library. Do I misundertsand anything?

Additionally, we don't need to check if the packet is a TCP/IPv4 packet here?

Thanks,
Jiayu
> 
> Konstantin
> 
> > 
> > > +		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
> > > +				direct_pool, indirect_pool,
> > > +				pkts_out, nb_pkts_out);
> > > +	} else
> > > +		RTE_LOG(WARNING, GSO, "Unsupported packet type\n");
> > 
> > Shouldn't we do pkt_out[0] = pkt; here?
> > 
> > > +
> > > +	if (ret > 1) {
> > > +		pkt_seg = pkt;
> > > +		while (pkt_seg) {
> > > +			rte_mbuf_refcnt_update(pkt_seg, -1);
> > > +			pkt_seg = pkt_seg->next;
> > > +		}
> > > +	}
> > >
> > > -	return 1;
> > > +	return ret;
> > >  }
> > > --
> > > 2.7.4
  
Mark Kavanagh Sept. 13, 2017, 2:52 p.m. UTC | #7
>From: Ananyev, Konstantin
>Sent: Wednesday, September 13, 2017 10:38 AM
>To: Hu, Jiayu <jiayu.hu@intel.com>
>Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
><jianfeng.tan@intel.com>
>Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>
>
>> > > +
>> > > +int
>> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
>> > > +		uint16_t gso_size,
>> > > +		uint8_t ipid_delta,
>> > > +		struct rte_mempool *direct_pool,
>> > > +		struct rte_mempool *indirect_pool,
>> > > +		struct rte_mbuf **pkts_out,
>> > > +		uint16_t nb_pkts_out)
>> > > +{
>> > > +	struct ipv4_hdr *ipv4_hdr;
>> > > +	uint16_t tcp_dl;
>> > > +	uint16_t pyld_unit_size;
>> > > +	uint16_t hdr_offset;
>> > > +	int ret = 1;
>> > > +
>> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
>> > > +			pkt->l2_len);
>> > > +	/* Don't process the fragmented packet */
>> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
>> > > +						IPV4_HDR_DF_MASK)) == 0)) {
>> >
>> >
>> > It is not a check for fragmented packet - it is a check that fragmentation
>is allowed for that packet.
>> > Should be IPV4_HDR_DF_MASK - 1,  I think.
>
>DF bit doesn't indicate is packet fragmented or not.
>It forbids to fragment packet any further.
>To check is packet already fragmented or not, you have to check MF bit and
>frag_offset.
>Both have to be zero for un-fragmented packets.
>
>>
>> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. It's a
>> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
>> So the value of DF bit should be "ipv4_hdr->fragment_offset &
>rte_cpu_to_be_16(
>> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
>>
>> >
>> > > +		pkts_out[0] = pkt;
>> > > +		return ret;
>> > > +	}
>> > > +
>> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
>> > > +		pkt->l4_len;
>> >
>> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
>>
>> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here.
>>
>> >
>> > > +	/* Don't process the packet without data */
>> > > +	if (unlikely(tcp_dl == 0)) {
>> > > +		pkts_out[0] = pkt;
>> > > +		return ret;
>> > > +	}
>> > > +
>> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
>> >
>> > Hmm, why do we need to count CRC_LEN here?
>>
>> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
>> included in gso_size.
>
>Why?
>What is the point to account crc len into this computation?
>Why not just assume that gso_size is already a max_frame_size - crc_len
>As I remember, when we RX packet crc bytes will be already stripped,
>when user populates the packet, he doesn't care about crc bytes too.

Hi Konstantin,

When packet is tx'd, the 4B for CRC are added back into the packet; if the payload is already at max capacity, then the actual segment size will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
To prevent that from happening, we account for the CRC len in this calculation.

If I've missed anything, please do let me know!

Thanks,
Mark 

>
>Konstantin
  
Ananyev, Konstantin Sept. 13, 2017, 3:13 p.m. UTC | #8
Hi Mark,

> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Wednesday, September 13, 2017 3:52 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> >From: Ananyev, Konstantin
> >Sent: Wednesday, September 13, 2017 10:38 AM
> >To: Hu, Jiayu <jiayu.hu@intel.com>
> >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> ><jianfeng.tan@intel.com>
> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >
> >
> >> > > +
> >> > > +int
> >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> >> > > +		uint16_t gso_size,
> >> > > +		uint8_t ipid_delta,
> >> > > +		struct rte_mempool *direct_pool,
> >> > > +		struct rte_mempool *indirect_pool,
> >> > > +		struct rte_mbuf **pkts_out,
> >> > > +		uint16_t nb_pkts_out)
> >> > > +{
> >> > > +	struct ipv4_hdr *ipv4_hdr;
> >> > > +	uint16_t tcp_dl;
> >> > > +	uint16_t pyld_unit_size;
> >> > > +	uint16_t hdr_offset;
> >> > > +	int ret = 1;
> >> > > +
> >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> >> > > +			pkt->l2_len);
> >> > > +	/* Don't process the fragmented packet */
> >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> >> > > +						IPV4_HDR_DF_MASK)) == 0)) {
> >> >
> >> >
> >> > It is not a check for fragmented packet - it is a check that fragmentation
> >is allowed for that packet.
> >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
> >
> >DF bit doesn't indicate is packet fragmented or not.
> >It forbids to fragment packet any further.
> >To check is packet already fragmented or not, you have to check MF bit and
> >frag_offset.
> >Both have to be zero for un-fragmented packets.
> >
> >>
> >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. It's a
> >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
> >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
> >rte_cpu_to_be_16(
> >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
> >>
> >> >
> >> > > +		pkts_out[0] = pkt;
> >> > > +		return ret;
> >> > > +	}
> >> > > +
> >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> >> > > +		pkt->l4_len;
> >> >
> >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> >>
> >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here.
> >>
> >> >
> >> > > +	/* Don't process the packet without data */
> >> > > +	if (unlikely(tcp_dl == 0)) {
> >> > > +		pkts_out[0] = pkt;
> >> > > +		return ret;
> >> > > +	}
> >> > > +
> >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> >> >
> >> > Hmm, why do we need to count CRC_LEN here?
> >>
> >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> >> included in gso_size.
> >
> >Why?
> >What is the point to account crc len into this computation?
> >Why not just assume that gso_size is already a max_frame_size - crc_len
> >As I remember, when we RX packet crc bytes will be already stripped,
> >when user populates the packet, he doesn't care about crc bytes too.
> 
> Hi Konstantin,
> 
> When packet is tx'd, the 4B for CRC are added back into the packet; if the payload is already at max capacity, then the actual segment size
> will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
> To prevent that from happening, we account for the CRC len in this calculation.


Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame size without crc bytes */
?
Konstantin 

> 
> If I've missed anything, please do let me know!
> 
> Thanks,
> Mark
> 
> >
> >Konstantin
  
Ananyev, Konstantin Sept. 13, 2017, 10:10 p.m. UTC | #9
Hi Jiayu,

> >
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > > > result, when all of its GSOed segments are freed, the packet is freed
> > > > automatically.
> > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > > index dda50ee..95f6ea6 100644
> > > > --- a/lib/librte_gso/rte_gso.c
> > > > +++ b/lib/librte_gso/rte_gso.c
> > > > @@ -33,18 +33,53 @@
> > > >
> > > >  #include <errno.h>
> > > >
> > > > +#include <rte_log.h>
> > > > +
> > > >  #include "rte_gso.h"
> > > > +#include "gso_common.h"
> > > > +#include "gso_tcp4.h"
> > > >
> > > >  int
> > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > +		struct rte_gso_ctx gso_ctx,
> > > >  		struct rte_mbuf **pkts_out,
> > > >  		uint16_t nb_pkts_out)
> > > >  {
> > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > +	struct rte_mbuf *pkt_seg;
> > > > +	uint16_t gso_size;
> > > > +	uint8_t ipid_delta;
> > > > +	int ret = 1;
> > > > +
> > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > > >  		return -EINVAL;
> > > >
> > > > -	pkts_out[0] = pkt;
> > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > > +			pkt->packet_type) {
> > > > +		pkts_out[0] = pkt;
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	direct_pool = gso_ctx.direct_pool;
> > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > +	gso_size = gso_ctx.gso_size;
> > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > > +
> > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > >
> > > Probably we need here:
> > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> >
> > Sorry, actually it probably should be:
> > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 &&
> >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> 
> I don't quite understand why the GSO library should be aware if the TSO
> flag is set or not. Applications can query device TSO capability before
> they call the GSO library. Do I misundertsand anything?
> 
> Additionally, we don't need to check if the packet is a TCP/IPv4 packet here?

Well, right now  PMD we doesn't rely on ptype to figure out what type of packet and
what TX offload have to be performed.
Instead it looks at TX part of ol_flags, and 
My thought was that as what we doing is actually TSO in SW, it would be good
to use the same API here too.
Also with that approach, by setting ol_flags properly user can use the same gso_ctx and still
specify what segmentation to perform on a per-packet basis.

Alternative way is to rely on ptype to distinguish should segmentation be performed on that package or not.
The only advantage I see here is that if someone would like to add GSO for some new protocol,
he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
Though he still would need to update TX_OFFLOAD_* capabilities and probably packet_type definitions.
    
So from my perspective first variant (use HW TSO API) is more plausible.
Wonder what is your and Mark opinions here?
Konstantin
  
Hu, Jiayu Sept. 14, 2017, 12:59 a.m. UTC | #10
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, September 13, 2017 11:13 PM
> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Mark,
> 
> > -----Original Message-----
> > From: Kavanagh, Mark B
> > Sent: Wednesday, September 13, 2017 3:52 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
> <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > >From: Ananyev, Konstantin
> > >Sent: Wednesday, September 13, 2017 10:38 AM
> > >To: Hu, Jiayu <jiayu.hu@intel.com>
> > >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> Tan, Jianfeng
> > ><jianfeng.tan@intel.com>
> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > >
> > >
> > >> > > +
> > >> > > +int
> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> > >> > > +		uint16_t gso_size,
> > >> > > +		uint8_t ipid_delta,
> > >> > > +		struct rte_mempool *direct_pool,
> > >> > > +		struct rte_mempool *indirect_pool,
> > >> > > +		struct rte_mbuf **pkts_out,
> > >> > > +		uint16_t nb_pkts_out)
> > >> > > +{
> > >> > > +	struct ipv4_hdr *ipv4_hdr;
> > >> > > +	uint16_t tcp_dl;
> > >> > > +	uint16_t pyld_unit_size;
> > >> > > +	uint16_t hdr_offset;
> > >> > > +	int ret = 1;
> > >> > > +
> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
> +
> > >> > > +			pkt->l2_len);
> > >> > > +	/* Don't process the fragmented packet */
> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> > >> > > +
> 	IPV4_HDR_DF_MASK)) == 0)) {
> > >> >
> > >> >
> > >> > It is not a check for fragmented packet - it is a check that
> fragmentation
> > >is allowed for that packet.
> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
> > >
> > >DF bit doesn't indicate is packet fragmented or not.
> > >It forbids to fragment packet any further.
> > >To check is packet already fragmented or not, you have to check MF bit
> and
> > >frag_offset.
> > >Both have to be zero for un-fragmented packets.
> > >
> > >>
> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit.
> It's a
> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
> > >rte_cpu_to_be_16(
> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
> > >>
> > >> >
> > >> > > +		pkts_out[0] = pkt;
> > >> > > +		return ret;
> > >> > > +	}
> > >> > > +
> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
> >l3_len -
> > >> > > +		pkt->l4_len;
> > >> >
> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> > >>
> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here.
> > >>
> > >> >
> > >> > > +	/* Don't process the packet without data */
> > >> > > +	if (unlikely(tcp_dl == 0)) {
> > >> > > +		pkts_out[0] = pkt;
> > >> > > +		return ret;
> > >> > > +	}
> > >> > > +
> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> > >> >
> > >> > Hmm, why do we need to count CRC_LEN here?
> > >>
> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> > >> included in gso_size.
> > >
> > >Why?
> > >What is the point to account crc len into this computation?
> > >Why not just assume that gso_size is already a max_frame_size - crc_len
> > >As I remember, when we RX packet crc bytes will be already stripped,
> > >when user populates the packet, he doesn't care about crc bytes too.
> >
> > Hi Konstantin,
> >
> > When packet is tx'd, the 4B for CRC are added back into the packet; if the
> payload is already at max capacity, then the actual segment size
> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
> > To prevent that from happening, we account for the CRC len in this
> calculation.
> 
> 
> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame
> size without crc bytes */
> ?

Exactly, applications can set 1514 to gso_segsz instead of 1518, if the lower layer
will add CRC to the packet.

Jiayu

> Konstantin
> 
> >
> > If I've missed anything, please do let me know!
> >
> > Thanks,
> > Mark
> >
> > >
> > >Konstantin
  
Hu, Jiayu Sept. 14, 2017, 6:07 a.m. UTC | #11
Hi Konstantin,

On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> 
> Hi Jiayu,
> 
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >
> > > > > result, when all of its GSOed segments are freed, the packet is freed
> > > > > automatically.
> > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > > > index dda50ee..95f6ea6 100644
> > > > > --- a/lib/librte_gso/rte_gso.c
> > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > > @@ -33,18 +33,53 @@
> > > > >
> > > > >  #include <errno.h>
> > > > >
> > > > > +#include <rte_log.h>
> > > > > +
> > > > >  #include "rte_gso.h"
> > > > > +#include "gso_common.h"
> > > > > +#include "gso_tcp4.h"
> > > > >
> > > > >  int
> > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > > +		struct rte_gso_ctx gso_ctx,
> > > > >  		struct rte_mbuf **pkts_out,
> > > > >  		uint16_t nb_pkts_out)
> > > > >  {
> > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > > +	struct rte_mbuf *pkt_seg;
> > > > > +	uint16_t gso_size;
> > > > > +	uint8_t ipid_delta;
> > > > > +	int ret = 1;
> > > > > +
> > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > > > >  		return -EINVAL;
> > > > >
> > > > > -	pkts_out[0] = pkt;
> > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > > > +			pkt->packet_type) {
> > > > > +		pkts_out[0] = pkt;
> > > > > +		return ret;
> > > > > +	}
> > > > > +
> > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > > +	gso_size = gso_ctx.gso_size;
> > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > > > +
> > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > >
> > > > Probably we need here:
> > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > >
> > > Sorry, actually it probably should be:
> > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 &&
> > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > 
> > I don't quite understand why the GSO library should be aware if the TSO
> > flag is set or not. Applications can query device TSO capability before
> > they call the GSO library. Do I misundertsand anything?
> > 
> > Additionally, we don't need to check if the packet is a TCP/IPv4 packet here?
> 
> Well, right now  PMD we doesn't rely on ptype to figure out what type of packet and
> what TX offload have to be performed.
> Instead it looks at TX part of ol_flags, and 
> My thought was that as what we doing is actually TSO in SW, it would be good
> to use the same API here too.
> Also with that approach, by setting ol_flags properly user can use the same gso_ctx and still
> specify what segmentation to perform on a per-packet basis.
> 
> Alternative way is to rely on ptype to distinguish should segmentation be performed on that package or not.
> The only advantage I see here is that if someone would like to add GSO for some new protocol,
> he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> Though he still would need to update TX_OFFLOAD_* capabilities and probably packet_type definitions.
>     
> So from my perspective first variant (use HW TSO API) is more plausible.
> Wonder what is your and Mark opinions here?

In the first choice, you mean:
the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a specific GSO
segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for each input packet.
Applications should parse the packet type, and set an exactly correct DEV_TX_OFFLOAD_*_TSO
flag to gso_types and ol_flags according to the packet type. That is, the value of gso_types
is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags at the same time
is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and the inner L4 type, and
we need to know L3 type by ol_flags. With this design, HW segmentation and SW segmentation
are indeed consistent.

If I understand it correctly, applications need to set 'ol_flags = PKT_TX_IPV4' and
'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a "ether+ipv4+udp+vxlan+ether+ipv4+
tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for tunneled packet.
How about the outer L3 type? Always assume the inner and the outer L3 type are the same?

Jiayu
> Konstantin
  
Mark Kavanagh Sept. 14, 2017, 8:35 a.m. UTC | #12
>From: Hu, Jiayu
>Sent: Thursday, September 14, 2017 2:00 AM
>To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
><mark.b.kavanagh@intel.com>
>Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>Hi Konstantin,
>
>> -----Original Message-----
>> From: Ananyev, Konstantin
>> Sent: Wednesday, September 13, 2017 11:13 PM
>> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
>> <jiayu.hu@intel.com>
>> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>>
>> Hi Mark,
>>
>> > -----Original Message-----
>> > From: Kavanagh, Mark B
>> > Sent: Wednesday, September 13, 2017 3:52 PM
>> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
>> <jiayu.hu@intel.com>
>> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >
>> > >From: Ananyev, Konstantin
>> > >Sent: Wednesday, September 13, 2017 10:38 AM
>> > >To: Hu, Jiayu <jiayu.hu@intel.com>
>> > >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
>> Tan, Jianfeng
>> > ><jianfeng.tan@intel.com>
>> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> > >
>> > >
>> > >
>> > >> > > +
>> > >> > > +int
>> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
>> > >> > > +		uint16_t gso_size,
>> > >> > > +		uint8_t ipid_delta,
>> > >> > > +		struct rte_mempool *direct_pool,
>> > >> > > +		struct rte_mempool *indirect_pool,
>> > >> > > +		struct rte_mbuf **pkts_out,
>> > >> > > +		uint16_t nb_pkts_out)
>> > >> > > +{
>> > >> > > +	struct ipv4_hdr *ipv4_hdr;
>> > >> > > +	uint16_t tcp_dl;
>> > >> > > +	uint16_t pyld_unit_size;
>> > >> > > +	uint16_t hdr_offset;
>> > >> > > +	int ret = 1;
>> > >> > > +
>> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
>> +
>> > >> > > +			pkt->l2_len);
>> > >> > > +	/* Don't process the fragmented packet */
>> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
>> > >> > > +
>> 	IPV4_HDR_DF_MASK)) == 0)) {
>> > >> >
>> > >> >
>> > >> > It is not a check for fragmented packet - it is a check that
>> fragmentation
>> > >is allowed for that packet.
>> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
>> > >
>> > >DF bit doesn't indicate is packet fragmented or not.
>> > >It forbids to fragment packet any further.
>> > >To check is packet already fragmented or not, you have to check MF bit
>> and
>> > >frag_offset.
>> > >Both have to be zero for un-fragmented packets.
>> > >
>> > >>
>> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit.
>> It's a
>> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
>> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
>> > >rte_cpu_to_be_16(
>> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
>> > >>
>> > >> >
>> > >> > > +		pkts_out[0] = pkt;
>> > >> > > +		return ret;
>> > >> > > +	}
>> > >> > > +
>> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
>> >l3_len -
>> > >> > > +		pkt->l4_len;
>> > >> >
>> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
>> > >>
>> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len
>here.
>> > >>
>> > >> >
>> > >> > > +	/* Don't process the packet without data */
>> > >> > > +	if (unlikely(tcp_dl == 0)) {
>> > >> > > +		pkts_out[0] = pkt;
>> > >> > > +		return ret;
>> > >> > > +	}
>> > >> > > +
>> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
>> > >> >
>> > >> > Hmm, why do we need to count CRC_LEN here?
>> > >>
>> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
>> > >> included in gso_size.
>> > >
>> > >Why?
>> > >What is the point to account crc len into this computation?
>> > >Why not just assume that gso_size is already a max_frame_size - crc_len
>> > >As I remember, when we RX packet crc bytes will be already stripped,
>> > >when user populates the packet, he doesn't care about crc bytes too.
>> >
>> > Hi Konstantin,
>> >
>> > When packet is tx'd, the 4B for CRC are added back into the packet; if the
>> payload is already at max capacity, then the actual segment size
>> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
>> > To prevent that from happening, we account for the CRC len in this
>> calculation.
>>
>>
>> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame
>> size without crc bytes */
>> ?

Hey Konstantin,

If the user sets the gso_size to 1514, the resultant output segments' size should be 1514, and not 1518. Consequently, the payload capacity of each segment would be reduced accordingly.
The user only cares about the output segment size (i.e. gso_ctx.gso_size); we need to ensure that the size of the segments that are produced is consistent with that. As a result, we need to ensure that any packet overhead is accounted for in the segment size, before we can determine how much space remains for data.

Hope this makes sense.

Thanks,
Mark
 
>
>Exactly, applications can set 1514 to gso_segsz instead of 1518, if the lower
>layer
>will add CRC to the packet.
>
>Jiayu
>
>> Konstantin
>>
>> >
>> > If I've missed anything, please do let me know!
>> >
>> > Thanks,
>> > Mark
>> >
>> > >
>> > >Konstantin
  
Ananyev, Konstantin Sept. 14, 2017, 8:39 a.m. UTC | #13
> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Thursday, September 14, 2017 9:35 AM
> To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> >From: Hu, Jiayu
> >Sent: Thursday, September 14, 2017 2:00 AM
> >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
> ><mark.b.kavanagh@intel.com>
> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >Hi Konstantin,
> >
> >> -----Original Message-----
> >> From: Ananyev, Konstantin
> >> Sent: Wednesday, September 13, 2017 11:13 PM
> >> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> >> <jiayu.hu@intel.com>
> >> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >>
> >> Hi Mark,
> >>
> >> > -----Original Message-----
> >> > From: Kavanagh, Mark B
> >> > Sent: Wednesday, September 13, 2017 3:52 PM
> >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
> >> <jiayu.hu@intel.com>
> >> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >
> >> > >From: Ananyev, Konstantin
> >> > >Sent: Wednesday, September 13, 2017 10:38 AM
> >> > >To: Hu, Jiayu <jiayu.hu@intel.com>
> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> >> Tan, Jianfeng
> >> > ><jianfeng.tan@intel.com>
> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> > >
> >> > >
> >> > >
> >> > >> > > +
> >> > >> > > +int
> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> >> > >> > > +		uint16_t gso_size,
> >> > >> > > +		uint8_t ipid_delta,
> >> > >> > > +		struct rte_mempool *direct_pool,
> >> > >> > > +		struct rte_mempool *indirect_pool,
> >> > >> > > +		struct rte_mbuf **pkts_out,
> >> > >> > > +		uint16_t nb_pkts_out)
> >> > >> > > +{
> >> > >> > > +	struct ipv4_hdr *ipv4_hdr;
> >> > >> > > +	uint16_t tcp_dl;
> >> > >> > > +	uint16_t pyld_unit_size;
> >> > >> > > +	uint16_t hdr_offset;
> >> > >> > > +	int ret = 1;
> >> > >> > > +
> >> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
> >> +
> >> > >> > > +			pkt->l2_len);
> >> > >> > > +	/* Don't process the fragmented packet */
> >> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> >> > >> > > +
> >> 	IPV4_HDR_DF_MASK)) == 0)) {
> >> > >> >
> >> > >> >
> >> > >> > It is not a check for fragmented packet - it is a check that
> >> fragmentation
> >> > >is allowed for that packet.
> >> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
> >> > >
> >> > >DF bit doesn't indicate is packet fragmented or not.
> >> > >It forbids to fragment packet any further.
> >> > >To check is packet already fragmented or not, you have to check MF bit
> >> and
> >> > >frag_offset.
> >> > >Both have to be zero for un-fragmented packets.
> >> > >
> >> > >>
> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit.
> >> It's a
> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
> >> > >rte_cpu_to_be_16(
> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
> >> > >>
> >> > >> >
> >> > >> > > +		pkts_out[0] = pkt;
> >> > >> > > +		return ret;
> >> > >> > > +	}
> >> > >> > > +
> >> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
> >> >l3_len -
> >> > >> > > +		pkt->l4_len;
> >> > >> >
> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> >> > >>
> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len
> >here.
> >> > >>
> >> > >> >
> >> > >> > > +	/* Don't process the packet without data */
> >> > >> > > +	if (unlikely(tcp_dl == 0)) {
> >> > >> > > +		pkts_out[0] = pkt;
> >> > >> > > +		return ret;
> >> > >> > > +	}
> >> > >> > > +
> >> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> >> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> >> > >> >
> >> > >> > Hmm, why do we need to count CRC_LEN here?
> >> > >>
> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> >> > >> included in gso_size.
> >> > >
> >> > >Why?
> >> > >What is the point to account crc len into this computation?
> >> > >Why not just assume that gso_size is already a max_frame_size - crc_len
> >> > >As I remember, when we RX packet crc bytes will be already stripped,
> >> > >when user populates the packet, he doesn't care about crc bytes too.
> >> >
> >> > Hi Konstantin,
> >> >
> >> > When packet is tx'd, the 4B for CRC are added back into the packet; if the
> >> payload is already at max capacity, then the actual segment size
> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
> >> > To prevent that from happening, we account for the CRC len in this
> >> calculation.
> >>
> >>
> >> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame
> >> size without crc bytes */
> >> ?
> 
> Hey Konstantin,
> 
> If the user sets the gso_size to 1514, the resultant output segments' size should be 1514, and not 1518.

Yes and then NIC HW will add CRC bytes for you.
You are not filling CRC bytes in HW, and when providing to the HW size to send  - it is a payload size
(CRC bytes are not accounted).
Konstantin

 Consequently, the payload capacity
> of each segment would be reduced accordingly.
> The user only cares about the output segment size (i.e. gso_ctx.gso_size); we need to ensure that the size of the segments that are
> produced is consistent with that. As a result, we need to ensure that any packet overhead is accounted for in the segment size, before we
> can determine how much space remains for data.
> 
> Hope this makes sense.
> 
> Thanks,
> Mark
> 
> >
> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if the lower
> >layer
> >will add CRC to the packet.
> >
> >Jiayu
> >
> >> Konstantin
> >>
> >> >
> >> > If I've missed anything, please do let me know!
> >> >
> >> > Thanks,
> >> > Mark
> >> >
> >> > >
> >> > >Konstantin
  
Ananyev, Konstantin Sept. 14, 2017, 8:47 a.m. UTC | #14
Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Thursday, September 14, 2017 7:07 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Konstantin,
> 
> On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> >
> > Hi Jiayu,
> >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Ananyev, Konstantin
> > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >
> > > > > > result, when all of its GSOed segments are freed, the packet is freed
> > > > > > automatically.
> > > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > > > > index dda50ee..95f6ea6 100644
> > > > > > --- a/lib/librte_gso/rte_gso.c
> > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > > > @@ -33,18 +33,53 @@
> > > > > >
> > > > > >  #include <errno.h>
> > > > > >
> > > > > > +#include <rte_log.h>
> > > > > > +
> > > > > >  #include "rte_gso.h"
> > > > > > +#include "gso_common.h"
> > > > > > +#include "gso_tcp4.h"
> > > > > >
> > > > > >  int
> > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > > > +		struct rte_gso_ctx gso_ctx,
> > > > > >  		struct rte_mbuf **pkts_out,
> > > > > >  		uint16_t nb_pkts_out)
> > > > > >  {
> > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > > > +	struct rte_mbuf *pkt_seg;
> > > > > > +	uint16_t gso_size;
> > > > > > +	uint8_t ipid_delta;
> > > > > > +	int ret = 1;
> > > > > > +
> > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > > -	pkts_out[0] = pkt;
> > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > > > > +			pkt->packet_type) {
> > > > > > +		pkts_out[0] = pkt;
> > > > > > +		return ret;
> > > > > > +	}
> > > > > > +
> > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > > > +	gso_size = gso_ctx.gso_size;
> > > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > > > > +
> > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > > >
> > > > > Probably we need here:
> > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > >
> > > > Sorry, actually it probably should be:
> > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 &&
> > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > >
> > > I don't quite understand why the GSO library should be aware if the TSO
> > > flag is set or not. Applications can query device TSO capability before
> > > they call the GSO library. Do I misundertsand anything?
> > >
> > > Additionally, we don't need to check if the packet is a TCP/IPv4 packet here?
> >
> > Well, right now  PMD we doesn't rely on ptype to figure out what type of packet and
> > what TX offload have to be performed.
> > Instead it looks at TX part of ol_flags, and
> > My thought was that as what we doing is actually TSO in SW, it would be good
> > to use the same API here too.
> > Also with that approach, by setting ol_flags properly user can use the same gso_ctx and still
> > specify what segmentation to perform on a per-packet basis.
> >
> > Alternative way is to rely on ptype to distinguish should segmentation be performed on that package or not.
> > The only advantage I see here is that if someone would like to add GSO for some new protocol,
> > he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> > Though he still would need to update TX_OFFLOAD_* capabilities and probably packet_type definitions.
> >
> > So from my perspective first variant (use HW TSO API) is more plausible.
> > Wonder what is your and Mark opinions here?
> 
> In the first choice, you mean:
> the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a specific GSO
> segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for each input packet.
> Applications should parse the packet type, and set an exactly correct DEV_TX_OFFLOAD_*_TSO
> flag to gso_types and ol_flags according to the packet type. That is, the value of gso_types
> is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags at the same time
> is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and the inner L4 type, and
> we need to know L3 type by ol_flags. With this design, HW segmentation and SW segmentation
> are indeed consistent.
> 
> If I understand it correctly, applications need to set 'ol_flags = PKT_TX_IPV4' and
> 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a "ether+ipv4+udp+vxlan+ether+ipv4+
> tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for tunneled packet.
> How about the outer L3 type? Always assume the inner and the outer L3 type are the same?

It think that for that case you'll have to set in ol_flags:

PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN | PKT_TX_TCP_SEG

Konstantin

> 
> Jiayu
> > Konstantin
  
Mark Kavanagh Sept. 14, 2017, 8:51 a.m. UTC | #15
>From: Hu, Jiayu
>Sent: Thursday, September 14, 2017 7:07 AM
>To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
><jianfeng.tan@intel.com>
>Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>Hi Konstantin,
>
>On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
>>
>> Hi Jiayu,
>>
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Ananyev, Konstantin
>> > > > Sent: Tuesday, September 12, 2017 12:18 PM
>> > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
>> > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
><jianfeng.tan@intel.com>
>> > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> > > >
>> > > > > result, when all of its GSOed segments are freed, the packet is
>freed
>> > > > > automatically.
>> > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
>> > > > > index dda50ee..95f6ea6 100644
>> > > > > --- a/lib/librte_gso/rte_gso.c
>> > > > > +++ b/lib/librte_gso/rte_gso.c
>> > > > > @@ -33,18 +33,53 @@
>> > > > >
>> > > > >  #include <errno.h>
>> > > > >
>> > > > > +#include <rte_log.h>
>> > > > > +
>> > > > >  #include "rte_gso.h"
>> > > > > +#include "gso_common.h"
>> > > > > +#include "gso_tcp4.h"
>> > > > >
>> > > > >  int
>> > > > >  rte_gso_segment(struct rte_mbuf *pkt,
>> > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
>> > > > > +		struct rte_gso_ctx gso_ctx,
>> > > > >  		struct rte_mbuf **pkts_out,
>> > > > >  		uint16_t nb_pkts_out)
>> > > > >  {
>> > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
>> > > > > +	struct rte_mbuf *pkt_seg;
>> > > > > +	uint16_t gso_size;
>> > > > > +	uint8_t ipid_delta;
>> > > > > +	int ret = 1;
>> > > > > +
>> > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
>> > > > >  		return -EINVAL;
>> > > > >
>> > > > > -	pkts_out[0] = pkt;
>> > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
>> > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
>> > > > > +			pkt->packet_type) {
>> > > > > +		pkts_out[0] = pkt;
>> > > > > +		return ret;
>> > > > > +	}
>> > > > > +
>> > > > > +	direct_pool = gso_ctx.direct_pool;
>> > > > > +	indirect_pool = gso_ctx.indirect_pool;
>> > > > > +	gso_size = gso_ctx.gso_size;
>> > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
>> > > > > +
>> > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
>> > > >
>> > > > Probably we need here:
>> > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
>DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
>> > >
>> > > Sorry, actually it probably should be:
>> > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 &&
>> > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
>> >
>> > I don't quite understand why the GSO library should be aware if the TSO
>> > flag is set or not. Applications can query device TSO capability before
>> > they call the GSO library. Do I misundertsand anything?
>> >
>> > Additionally, we don't need to check if the packet is a TCP/IPv4 packet
>here?
>>
>> Well, right now  PMD we doesn't rely on ptype to figure out what type of
>packet and
>> what TX offload have to be performed.
>> Instead it looks at TX part of ol_flags, and
>> My thought was that as what we doing is actually TSO in SW, it would be good
>> to use the same API here too.
>> Also with that approach, by setting ol_flags properly user can use the same
>gso_ctx and still
>> specify what segmentation to perform on a per-packet basis.
>>
>> Alternative way is to rely on ptype to distinguish should segmentation be
>performed on that package or not.
>> The only advantage I see here is that if someone would like to add GSO for
>some new protocol,
>> he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
>> Though he still would need to update TX_OFFLOAD_* capabilities and probably
>packet_type definitions.
>>
>> So from my perspective first variant (use HW TSO API) is more plausible.
>> Wonder what is your and Mark opinions here?
>
>In the first choice, you mean:
>the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a specific
>GSO
>segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for each
>input packet.
>Applications should parse the packet type, and set an exactly correct
>DEV_TX_OFFLOAD_*_TSO
>flag to gso_types and ol_flags according to the packet type. That is, the
>value of gso_types
>is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags at the
>same time
>is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and the inner
>L4 type, and
>we need to know L3 type by ol_flags. With this design, HW segmentation and SW
>segmentation
>are indeed consistent.
>
>If I understand it correctly, applications need to set 'ol_flags =
>PKT_TX_IPV4' and
>'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
>"ether+ipv4+udp+vxlan+ether+ipv4+
>tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for
>tunneled packet.
>How about the outer L3 type? Always assume the inner and the outer L3 type are
>the same?

Hi Jiayu, 

If I'm not mistaken, I think what Konstantin is suggesting is as follows: 

- The DEV_TX_OFFLOAD_*_TSO flags are currently used to describe a NIC's TSO capabilities; the GSO capabilities may also be described using the same macros, to provide a consistent view of segmentation capabilities across the HW and SW implementations.

- As part of segmentation, it's still a case of checking the packet type, but then setting the appropriate ol_flags in the mbuf, which the GSO library can use to segment the packet.

Thanks,
Mark

>
>Jiayu
>> Konstantin
  
Mark Kavanagh Sept. 14, 2017, 9 a.m. UTC | #16
>From: Ananyev, Konstantin
>Sent: Thursday, September 14, 2017 9:40 AM
>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
><jiayu.hu@intel.com>
>Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>
>
>> -----Original Message-----
>> From: Kavanagh, Mark B
>> Sent: Thursday, September 14, 2017 9:35 AM
>> To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
><konstantin.ananyev@intel.com>
>> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>>
>> >From: Hu, Jiayu
>> >Sent: Thursday, September 14, 2017 2:00 AM
>> >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
>> ><mark.b.kavanagh@intel.com>
>> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >
>> >Hi Konstantin,
>> >
>> >> -----Original Message-----
>> >> From: Ananyev, Konstantin
>> >> Sent: Wednesday, September 13, 2017 11:13 PM
>> >> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
>> >> <jiayu.hu@intel.com>
>> >> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >>
>> >> Hi Mark,
>> >>
>> >> > -----Original Message-----
>> >> > From: Kavanagh, Mark B
>> >> > Sent: Wednesday, September 13, 2017 3:52 PM
>> >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
>> >> <jiayu.hu@intel.com>
>> >> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >
>> >> > >From: Ananyev, Konstantin
>> >> > >Sent: Wednesday, September 13, 2017 10:38 AM
>> >> > >To: Hu, Jiayu <jiayu.hu@intel.com>
>> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
>> >> Tan, Jianfeng
>> >> > ><jianfeng.tan@intel.com>
>> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> > >
>> >> > >
>> >> > >
>> >> > >> > > +
>> >> > >> > > +int
>> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
>> >> > >> > > +		uint16_t gso_size,
>> >> > >> > > +		uint8_t ipid_delta,
>> >> > >> > > +		struct rte_mempool *direct_pool,
>> >> > >> > > +		struct rte_mempool *indirect_pool,
>> >> > >> > > +		struct rte_mbuf **pkts_out,
>> >> > >> > > +		uint16_t nb_pkts_out)
>> >> > >> > > +{
>> >> > >> > > +	struct ipv4_hdr *ipv4_hdr;
>> >> > >> > > +	uint16_t tcp_dl;
>> >> > >> > > +	uint16_t pyld_unit_size;
>> >> > >> > > +	uint16_t hdr_offset;
>> >> > >> > > +	int ret = 1;
>> >> > >> > > +
>> >> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
>> >> +
>> >> > >> > > +			pkt->l2_len);
>> >> > >> > > +	/* Don't process the fragmented packet */
>> >> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
>> >> > >> > > +
>> >> 	IPV4_HDR_DF_MASK)) == 0)) {
>> >> > >> >
>> >> > >> >
>> >> > >> > It is not a check for fragmented packet - it is a check that
>> >> fragmentation
>> >> > >is allowed for that packet.
>> >> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
>> >> > >
>> >> > >DF bit doesn't indicate is packet fragmented or not.
>> >> > >It forbids to fragment packet any further.
>> >> > >To check is packet already fragmented or not, you have to check MF bit
>> >> and
>> >> > >frag_offset.
>> >> > >Both have to be zero for un-fragmented packets.
>> >> > >
>> >> > >>
>> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF
>bit.
>> >> It's a
>> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian
>order.
>> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
>> >> > >rte_cpu_to_be_16(
>> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is
>fragmented.
>> >> > >>
>> >> > >> >
>> >> > >> > > +		pkts_out[0] = pkt;
>> >> > >> > > +		return ret;
>> >> > >> > > +	}
>> >> > >> > > +
>> >> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
>> >> >l3_len -
>> >> > >> > > +		pkt->l4_len;
>> >> > >> >
>> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
>> >> > >>
>> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len
>> >here.
>> >> > >>
>> >> > >> >
>> >> > >> > > +	/* Don't process the packet without data */
>> >> > >> > > +	if (unlikely(tcp_dl == 0)) {
>> >> > >> > > +		pkts_out[0] = pkt;
>> >> > >> > > +		return ret;
>> >> > >> > > +	}
>> >> > >> > > +
>> >> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>> >> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
>> >> > >> >
>> >> > >> > Hmm, why do we need to count CRC_LEN here?
>> >> > >>
>> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
>> >> > >> included in gso_size.
>> >> > >
>> >> > >Why?
>> >> > >What is the point to account crc len into this computation?
>> >> > >Why not just assume that gso_size is already a max_frame_size -
>crc_len
>> >> > >As I remember, when we RX packet crc bytes will be already stripped,
>> >> > >when user populates the packet, he doesn't care about crc bytes too.
>> >> >
>> >> > Hi Konstantin,
>> >> >
>> >> > When packet is tx'd, the 4B for CRC are added back into the packet; if
>the
>> >> payload is already at max capacity, then the actual segment size
>> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
>> >> > To prevent that from happening, we account for the CRC len in this
>> >> calculation.
>> >>
>> >>
>> >> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame
>> >> size without crc bytes */
>> >> ?
>>
>> Hey Konstantin,
>>
>> If the user sets the gso_size to 1514, the resultant output segments' size
>should be 1514, and not 1518.

Just to clarify - I meant here that the final output segment, including CRC len, should be 1514. I think this is where we're crossing wires ;)

>
>Yes and then NIC HW will add CRC bytes for you.
>You are not filling CRC bytes in HW, and when providing to the HW size to send
>- it is a payload size
>(CRC bytes are not accounted).
>Konstantin

Yes, exactly - in that case though, the gso_size specified by the user is not the actual final output segment size, but (segment size - 4B), right?

We can set that expectation in documentation, but from an application's/user's perspective, do you think that this might be confusing/misleading?

Thanks again,
Mark  

>
> Consequently, the payload capacity
>> of each segment would be reduced accordingly.
>> The user only cares about the output segment size (i.e. gso_ctx.gso_size);
>we need to ensure that the size of the segments that are
>> produced is consistent with that. As a result, we need to ensure that any
>packet overhead is accounted for in the segment size, before we
>> can determine how much space remains for data.
>>
>> Hope this makes sense.
>>
>> Thanks,
>> Mark
>>
>> >
>> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if the
>lower
>> >layer
>> >will add CRC to the packet.
>> >
>> >Jiayu
>> >
>> >> Konstantin
>> >>
>> >> >
>> >> > If I've missed anything, please do let me know!
>> >> >
>> >> > Thanks,
>> >> > Mark
>> >> >
>> >> > >
>> >> > >Konstantin
  
Ananyev, Konstantin Sept. 14, 2017, 9:10 a.m. UTC | #17
> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Thursday, September 14, 2017 10:01 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> >From: Ananyev, Konstantin
> >Sent: Thursday, September 14, 2017 9:40 AM
> >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> ><jiayu.hu@intel.com>
> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >
> >
> >> -----Original Message-----
> >> From: Kavanagh, Mark B
> >> Sent: Thursday, September 14, 2017 9:35 AM
> >> To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> ><konstantin.ananyev@intel.com>
> >> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >>
> >> >From: Hu, Jiayu
> >> >Sent: Thursday, September 14, 2017 2:00 AM
> >> >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
> >> ><mark.b.kavanagh@intel.com>
> >> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >
> >> >Hi Konstantin,
> >> >
> >> >> -----Original Message-----
> >> >> From: Ananyev, Konstantin
> >> >> Sent: Wednesday, September 13, 2017 11:13 PM
> >> >> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> >> >> <jiayu.hu@intel.com>
> >> >> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >>
> >> >> Hi Mark,
> >> >>
> >> >> > -----Original Message-----
> >> >> > From: Kavanagh, Mark B
> >> >> > Sent: Wednesday, September 13, 2017 3:52 PM
> >> >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
> >> >> <jiayu.hu@intel.com>
> >> >> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >> >
> >> >> > >From: Ananyev, Konstantin
> >> >> > >Sent: Wednesday, September 13, 2017 10:38 AM
> >> >> > >To: Hu, Jiayu <jiayu.hu@intel.com>
> >> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> >> >> Tan, Jianfeng
> >> >> > ><jianfeng.tan@intel.com>
> >> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >> > > +
> >> >> > >> > > +int
> >> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> >> >> > >> > > +		uint16_t gso_size,
> >> >> > >> > > +		uint8_t ipid_delta,
> >> >> > >> > > +		struct rte_mempool *direct_pool,
> >> >> > >> > > +		struct rte_mempool *indirect_pool,
> >> >> > >> > > +		struct rte_mbuf **pkts_out,
> >> >> > >> > > +		uint16_t nb_pkts_out)
> >> >> > >> > > +{
> >> >> > >> > > +	struct ipv4_hdr *ipv4_hdr;
> >> >> > >> > > +	uint16_t tcp_dl;
> >> >> > >> > > +	uint16_t pyld_unit_size;
> >> >> > >> > > +	uint16_t hdr_offset;
> >> >> > >> > > +	int ret = 1;
> >> >> > >> > > +
> >> >> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
> >> >> +
> >> >> > >> > > +			pkt->l2_len);
> >> >> > >> > > +	/* Don't process the fragmented packet */
> >> >> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> >> >> > >> > > +
> >> >> 	IPV4_HDR_DF_MASK)) == 0)) {
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > It is not a check for fragmented packet - it is a check that
> >> >> fragmentation
> >> >> > >is allowed for that packet.
> >> >> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
> >> >> > >
> >> >> > >DF bit doesn't indicate is packet fragmented or not.
> >> >> > >It forbids to fragment packet any further.
> >> >> > >To check is packet already fragmented or not, you have to check MF bit
> >> >> and
> >> >> > >frag_offset.
> >> >> > >Both have to be zero for un-fragmented packets.
> >> >> > >
> >> >> > >>
> >> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF
> >bit.
> >> >> It's a
> >> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian
> >order.
> >> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
> >> >> > >rte_cpu_to_be_16(
> >> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is
> >fragmented.
> >> >> > >>
> >> >> > >> >
> >> >> > >> > > +		pkts_out[0] = pkt;
> >> >> > >> > > +		return ret;
> >> >> > >> > > +	}
> >> >> > >> > > +
> >> >> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
> >> >> >l3_len -
> >> >> > >> > > +		pkt->l4_len;
> >> >> > >> >
> >> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> >> >> > >>
> >> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len
> >> >here.
> >> >> > >>
> >> >> > >> >
> >> >> > >> > > +	/* Don't process the packet without data */
> >> >> > >> > > +	if (unlikely(tcp_dl == 0)) {
> >> >> > >> > > +		pkts_out[0] = pkt;
> >> >> > >> > > +		return ret;
> >> >> > >> > > +	}
> >> >> > >> > > +
> >> >> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> >> >> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> >> >> > >> >
> >> >> > >> > Hmm, why do we need to count CRC_LEN here?
> >> >> > >>
> >> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> >> >> > >> included in gso_size.
> >> >> > >
> >> >> > >Why?
> >> >> > >What is the point to account crc len into this computation?
> >> >> > >Why not just assume that gso_size is already a max_frame_size -
> >crc_len
> >> >> > >As I remember, when we RX packet crc bytes will be already stripped,
> >> >> > >when user populates the packet, he doesn't care about crc bytes too.
> >> >> >
> >> >> > Hi Konstantin,
> >> >> >
> >> >> > When packet is tx'd, the 4B for CRC are added back into the packet; if
> >the
> >> >> payload is already at max capacity, then the actual segment size
> >> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
> >> >> > To prevent that from happening, we account for the CRC len in this
> >> >> calculation.
> >> >>
> >> >>
> >> >> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame
> >> >> size without crc bytes */
> >> >> ?
> >>
> >> Hey Konstantin,
> >>
> >> If the user sets the gso_size to 1514, the resultant output segments' size
> >should be 1514, and not 1518.
> 
> Just to clarify - I meant here that the final output segment, including CRC len, should be 1514. I think this is where we're crossing wires ;)
> 
> >
> >Yes and then NIC HW will add CRC bytes for you.
> >You are not filling CRC bytes in HW, and when providing to the HW size to send
> >- it is a payload size
> >(CRC bytes are not accounted).
> >Konstantin
> 
> Yes, exactly - in that case though, the gso_size specified by the user is not the actual final output segment size, but (segment size - 4B),
> right?

CRC bytes will be add by HW, it is totally transparent for user.

> 
> We can set that expectation in documentation, but from an application's/user's perspective, do you think that this might be
> confusing/misleading?

I think it would be much more confusing to make user account for CRC bytes.
Let say when in DPDK you form a packet and send it out via rte_eth_tx_burst()
you specify only your payload size, not payload size plus crc bytes that HW will add for you.
Konstantin

> 
> Thanks again,
> Mark
> 
> >
> > Consequently, the payload capacity
> >> of each segment would be reduced accordingly.
> >> The user only cares about the output segment size (i.e. gso_ctx.gso_size);
> >we need to ensure that the size of the segments that are
> >> produced is consistent with that. As a result, we need to ensure that any
> >packet overhead is accounted for in the segment size, before we
> >> can determine how much space remains for data.
> >>
> >> Hope this makes sense.
> >>
> >> Thanks,
> >> Mark
> >>
> >> >
> >> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if the
> >lower
> >> >layer
> >> >will add CRC to the packet.
> >> >
> >> >Jiayu
> >> >
> >> >> Konstantin
> >> >>
> >> >> >
> >> >> > If I've missed anything, please do let me know!
> >> >> >
> >> >> > Thanks,
> >> >> > Mark
> >> >> >
> >> >> > >
> >> >> > >Konstantin
  
Hu, Jiayu Sept. 14, 2017, 9:29 a.m. UTC | #18
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, September 14, 2017 4:47 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Thursday, September 14, 2017 7:07 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>
> > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > Hi Konstantin,
> >
> > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> > >
> > > Hi Jiayu,
> > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Ananyev, Konstantin
> > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > >
> > > > > > > result, when all of its GSOed segments are freed, the packet is
> freed
> > > > > > > automatically.
> > > > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > > > > > index dda50ee..95f6ea6 100644
> > > > > > > --- a/lib/librte_gso/rte_gso.c
> > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > > > > @@ -33,18 +33,53 @@
> > > > > > >
> > > > > > >  #include <errno.h>
> > > > > > >
> > > > > > > +#include <rte_log.h>
> > > > > > > +
> > > > > > >  #include "rte_gso.h"
> > > > > > > +#include "gso_common.h"
> > > > > > > +#include "gso_tcp4.h"
> > > > > > >
> > > > > > >  int
> > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > > > > +		struct rte_gso_ctx gso_ctx,
> > > > > > >  		struct rte_mbuf **pkts_out,
> > > > > > >  		uint16_t nb_pkts_out)
> > > > > > >  {
> > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > > > > +	struct rte_mbuf *pkt_seg;
> > > > > > > +	uint16_t gso_size;
> > > > > > > +	uint8_t ipid_delta;
> > > > > > > +	int ret = 1;
> > > > > > > +
> > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > > > > > >  		return -EINVAL;
> > > > > > >
> > > > > > > -	pkts_out[0] = pkt;
> > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > > > > > +			pkt->packet_type) {
> > > > > > > +		pkts_out[0] = pkt;
> > > > > > > +		return ret;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > > > > +	gso_size = gso_ctx.gso_size;
> > > > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > > > > > +
> > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > > > >
> > > > > > Probably we need here:
> > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
> DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > >
> > > > > Sorry, actually it probably should be:
> > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4
> &&
> > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > >
> > > > I don't quite understand why the GSO library should be aware if the TSO
> > > > flag is set or not. Applications can query device TSO capability before
> > > > they call the GSO library. Do I misundertsand anything?
> > > >
> > > > Additionally, we don't need to check if the packet is a TCP/IPv4 packet
> here?
> > >
> > > Well, right now  PMD we doesn't rely on ptype to figure out what type of
> packet and
> > > what TX offload have to be performed.
> > > Instead it looks at TX part of ol_flags, and
> > > My thought was that as what we doing is actually TSO in SW, it would be
> good
> > > to use the same API here too.
> > > Also with that approach, by setting ol_flags properly user can use the
> same gso_ctx and still
> > > specify what segmentation to perform on a per-packet basis.
> > >
> > > Alternative way is to rely on ptype to distinguish should segmentation be
> performed on that package or not.
> > > The only advantage I see here is that if someone would like to add GSO
> for some new protocol,
> > > he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> > > Though he still would need to update TX_OFFLOAD_* capabilities and
> probably packet_type definitions.
> > >
> > > So from my perspective first variant (use HW TSO API) is more plausible.
> > > Wonder what is your and Mark opinions here?
> >
> > In the first choice, you mean:
> > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a
> specific GSO
> > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for
> each input packet.
> > Applications should parse the packet type, and set an exactly correct
> DEV_TX_OFFLOAD_*_TSO
> > flag to gso_types and ol_flags according to the packet type. That is, the
> value of gso_types
> > is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags at
> the same time
> > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and the
> inner L4 type, and
> > we need to know L3 type by ol_flags. With this design, HW segmentation
> and SW segmentation
> > are indeed consistent.
> >
> > If I understand it correctly, applications need to set 'ol_flags =
> PKT_TX_IPV4' and
> > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> "ether+ipv4+udp+vxlan+ether+ipv4+
> > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for
> tunneled packet.
> > How about the outer L3 type? Always assume the inner and the outer L3
> type are the same?
> 
> It think that for that case you'll have to set in ol_flags:
> 
> PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> PKT_TX_TCP_SEG

OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
GSO library doesn't need gso_types anymore.

The first choice makes HW and SW segmentation are totally the same.
Applications just need to parse the packet and set proper ol_flags, and
the GSO library uses ol_flags to decide which segmentation function to use.
I think it's better than the second choice which depending on ptype to
choose segmentation function.

Jiayu
> 
> Konstantin
> 
> >
> > Jiayu
> > > Konstantin
  
Ananyev, Konstantin Sept. 14, 2017, 9:35 a.m. UTC | #19
> -----Original Message-----
> From: Hu, Jiayu
> Sent: Thursday, September 14, 2017 10:29 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, September 14, 2017 4:47 PM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> > Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > Hi Jiayu,
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Thursday, September 14, 2017 7:07 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> > Jianfeng <jianfeng.tan@intel.com>
> > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > > Hi Konstantin,
> > >
> > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> > > >
> > > > Hi Jiayu,
> > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Ananyev, Konstantin
> > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>
> > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > > >
> > > > > > > > result, when all of its GSOed segments are freed, the packet is
> > freed
> > > > > > > > automatically.
> > > > > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > > > > > > index dda50ee..95f6ea6 100644
> > > > > > > > --- a/lib/librte_gso/rte_gso.c
> > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > > > > > @@ -33,18 +33,53 @@
> > > > > > > >
> > > > > > > >  #include <errno.h>
> > > > > > > >
> > > > > > > > +#include <rte_log.h>
> > > > > > > > +
> > > > > > > >  #include "rte_gso.h"
> > > > > > > > +#include "gso_common.h"
> > > > > > > > +#include "gso_tcp4.h"
> > > > > > > >
> > > > > > > >  int
> > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > > > > > +		struct rte_gso_ctx gso_ctx,
> > > > > > > >  		struct rte_mbuf **pkts_out,
> > > > > > > >  		uint16_t nb_pkts_out)
> > > > > > > >  {
> > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > > > > > +	struct rte_mbuf *pkt_seg;
> > > > > > > > +	uint16_t gso_size;
> > > > > > > > +	uint8_t ipid_delta;
> > > > > > > > +	int ret = 1;
> > > > > > > > +
> > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > > > > > > >  		return -EINVAL;
> > > > > > > >
> > > > > > > > -	pkts_out[0] = pkt;
> > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > > > > > > +			pkt->packet_type) {
> > > > > > > > +		pkts_out[0] = pkt;
> > > > > > > > +		return ret;
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > > > > > +	gso_size = gso_ctx.gso_size;
> > > > > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > > > > > > +
> > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > > > > >
> > > > > > > Probably we need here:
> > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
> > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > > >
> > > > > > Sorry, actually it probably should be:
> > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4
> > &&
> > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > >
> > > > > I don't quite understand why the GSO library should be aware if the TSO
> > > > > flag is set or not. Applications can query device TSO capability before
> > > > > they call the GSO library. Do I misundertsand anything?
> > > > >
> > > > > Additionally, we don't need to check if the packet is a TCP/IPv4 packet
> > here?
> > > >
> > > > Well, right now  PMD we doesn't rely on ptype to figure out what type of
> > packet and
> > > > what TX offload have to be performed.
> > > > Instead it looks at TX part of ol_flags, and
> > > > My thought was that as what we doing is actually TSO in SW, it would be
> > good
> > > > to use the same API here too.
> > > > Also with that approach, by setting ol_flags properly user can use the
> > same gso_ctx and still
> > > > specify what segmentation to perform on a per-packet basis.
> > > >
> > > > Alternative way is to rely on ptype to distinguish should segmentation be
> > performed on that package or not.
> > > > The only advantage I see here is that if someone would like to add GSO
> > for some new protocol,
> > > > he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> > > > Though he still would need to update TX_OFFLOAD_* capabilities and
> > probably packet_type definitions.
> > > >
> > > > So from my perspective first variant (use HW TSO API) is more plausible.
> > > > Wonder what is your and Mark opinions here?
> > >
> > > In the first choice, you mean:
> > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a
> > specific GSO
> > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for
> > each input packet.
> > > Applications should parse the packet type, and set an exactly correct
> > DEV_TX_OFFLOAD_*_TSO
> > > flag to gso_types and ol_flags according to the packet type. That is, the
> > value of gso_types
> > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags at
> > the same time
> > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and the
> > inner L4 type, and
> > > we need to know L3 type by ol_flags. With this design, HW segmentation
> > and SW segmentation
> > > are indeed consistent.
> > >
> > > If I understand it correctly, applications need to set 'ol_flags =
> > PKT_TX_IPV4' and
> > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> > "ether+ipv4+udp+vxlan+ether+ipv4+
> > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for
> > tunneled packet.
> > > How about the outer L3 type? Always assume the inner and the outer L3
> > type are the same?
> >
> > It think that for that case you'll have to set in ol_flags:
> >
> > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> > PKT_TX_TCP_SEG
> 
> OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
> GSO library doesn't need gso_types anymore.

You still might need gso_ctx.gso_types to let user limit what types of segmentation
that particular gso_ctx supports.
An alternative would be to assume that each gso_ctx supports all
currently implemented segmentations.
This is possible too, but probably not very convenient to the user.
Konstantin

> 
> The first choice makes HW and SW segmentation are totally the same.
> Applications just need to parse the packet and set proper ol_flags, and
> the GSO library uses ol_flags to decide which segmentation function to use.
> I think it's better than the second choice which depending on ptype to
> choose segmentation function.
> 
> Jiayu
> >
> > Konstantin
> >
> > >
> > > Jiayu
> > > > Konstantin
  
Mark Kavanagh Sept. 14, 2017, 9:35 a.m. UTC | #20
>From: Ananyev, Konstantin
>Sent: Thursday, September 14, 2017 10:11 AM
>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
><jiayu.hu@intel.com>
>Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>
>
>> -----Original Message-----
>> From: Kavanagh, Mark B
>> Sent: Thursday, September 14, 2017 10:01 AM
>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
><jiayu.hu@intel.com>
>> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>>
>> >From: Ananyev, Konstantin
>> >Sent: Thursday, September 14, 2017 9:40 AM
>> >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
>> ><jiayu.hu@intel.com>
>> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >
>> >
>> >
>> >> -----Original Message-----
>> >> From: Kavanagh, Mark B
>> >> Sent: Thursday, September 14, 2017 9:35 AM
>> >> To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
>> ><konstantin.ananyev@intel.com>
>> >> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >>
>> >> >From: Hu, Jiayu
>> >> >Sent: Thursday, September 14, 2017 2:00 AM
>> >> >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
>> >> ><mark.b.kavanagh@intel.com>
>> >> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >
>> >> >Hi Konstantin,
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Ananyev, Konstantin
>> >> >> Sent: Wednesday, September 13, 2017 11:13 PM
>> >> >> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
>> >> >> <jiayu.hu@intel.com>
>> >> >> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >>
>> >> >> Hi Mark,
>> >> >>
>> >> >> > -----Original Message-----
>> >> >> > From: Kavanagh, Mark B
>> >> >> > Sent: Wednesday, September 13, 2017 3:52 PM
>> >> >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
>> >> >> <jiayu.hu@intel.com>
>> >> >> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>> >> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >> >
>> >> >> > >From: Ananyev, Konstantin
>> >> >> > >Sent: Wednesday, September 13, 2017 10:38 AM
>> >> >> > >To: Hu, Jiayu <jiayu.hu@intel.com>
>> >> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
>> >> >> Tan, Jianfeng
>> >> >> > ><jianfeng.tan@intel.com>
>> >> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >> > > +
>> >> >> > >> > > +int
>> >> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
>> >> >> > >> > > +		uint16_t gso_size,
>> >> >> > >> > > +		uint8_t ipid_delta,
>> >> >> > >> > > +		struct rte_mempool *direct_pool,
>> >> >> > >> > > +		struct rte_mempool *indirect_pool,
>> >> >> > >> > > +		struct rte_mbuf **pkts_out,
>> >> >> > >> > > +		uint16_t nb_pkts_out)
>> >> >> > >> > > +{
>> >> >> > >> > > +	struct ipv4_hdr *ipv4_hdr;
>> >> >> > >> > > +	uint16_t tcp_dl;
>> >> >> > >> > > +	uint16_t pyld_unit_size;
>> >> >> > >> > > +	uint16_t hdr_offset;
>> >> >> > >> > > +	int ret = 1;
>> >> >> > >> > > +
>> >> >> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
>char *)
>> >> >> +
>> >> >> > >> > > +			pkt->l2_len);
>> >> >> > >> > > +	/* Don't process the fragmented packet */
>> >> >> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset &
>rte_cpu_to_be_16(
>> >> >> > >> > > +
>> >> >> 	IPV4_HDR_DF_MASK)) == 0)) {
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> > It is not a check for fragmented packet - it is a check that
>> >> >> fragmentation
>> >> >> > >is allowed for that packet.
>> >> >> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
>> >> >> > >
>> >> >> > >DF bit doesn't indicate is packet fragmented or not.
>> >> >> > >It forbids to fragment packet any further.
>> >> >> > >To check is packet already fragmented or not, you have to check MF
>bit
>> >> >> and
>> >> >> > >frag_offset.
>> >> >> > >Both have to be zero for un-fragmented packets.
>> >> >> > >
>> >> >> > >>
>> >> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF
>> >bit.
>> >> >> It's a
>> >> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian
>> >order.
>> >> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
>> >> >> > >rte_cpu_to_be_16(
>> >> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is
>> >fragmented.
>> >> >> > >>
>> >> >> > >> >
>> >> >> > >> > > +		pkts_out[0] = pkt;
>> >> >> > >> > > +		return ret;
>> >> >> > >> > > +	}
>> >> >> > >> > > +
>> >> >> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
>> >> >> >l3_len -
>> >> >> > >> > > +		pkt->l4_len;
>> >> >> > >> >
>> >> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len -
>pkt_l4_len?
>> >> >> > >>
>> >> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len -
>pkt_l4_len
>> >> >here.
>> >> >> > >>
>> >> >> > >> >
>> >> >> > >> > > +	/* Don't process the packet without data */
>> >> >> > >> > > +	if (unlikely(tcp_dl == 0)) {
>> >> >> > >> > > +		pkts_out[0] = pkt;
>> >> >> > >> > > +		return ret;
>> >> >> > >> > > +	}
>> >> >> > >> > > +
>> >> >> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>> >> >> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
>> >> >> > >> >
>> >> >> > >> > Hmm, why do we need to count CRC_LEN here?
>> >> >> > >>
>> >> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
>> >> >> > >> included in gso_size.
>> >> >> > >
>> >> >> > >Why?
>> >> >> > >What is the point to account crc len into this computation?
>> >> >> > >Why not just assume that gso_size is already a max_frame_size -
>> >crc_len
>> >> >> > >As I remember, when we RX packet crc bytes will be already
>stripped,
>> >> >> > >when user populates the packet, he doesn't care about crc bytes
>too.
>> >> >> >
>> >> >> > Hi Konstantin,
>> >> >> >
>> >> >> > When packet is tx'd, the 4B for CRC are added back into the packet;
>if
>> >the
>> >> >> payload is already at max capacity, then the actual segment size
>> >> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
>> >> >> > To prevent that from happening, we account for the CRC len in this
>> >> >> calculation.
>> >> >>
>> >> >>
>> >> >> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether
>frame
>> >> >> size without crc bytes */
>> >> >> ?
>> >>
>> >> Hey Konstantin,
>> >>
>> >> If the user sets the gso_size to 1514, the resultant output segments'
>size
>> >should be 1514, and not 1518.
>>
>> Just to clarify - I meant here that the final output segment, including CRC
>len, should be 1514. I think this is where we're crossing wires ;)
>>
>> >
>> >Yes and then NIC HW will add CRC bytes for you.
>> >You are not filling CRC bytes in HW, and when providing to the HW size to
>send
>> >- it is a payload size
>> >(CRC bytes are not accounted).
>> >Konstantin
>>
>> Yes, exactly - in that case though, the gso_size specified by the user is
>not the actual final output segment size, but (segment size - 4B),
>> right?
>
>CRC bytes will be add by HW, it is totally transparent for user.

Yes - I completely agree/understand.

>
>>
>> We can set that expectation in documentation, but from an
>application's/user's perspective, do you think that this might be
>> confusing/misleading?
>
>I think it would be much more confusing to make user account for CRC bytes.
>Let say when in DPDK you form a packet and send it out via rte_eth_tx_burst()
>you specify only your payload size, not payload size plus crc bytes that HW
>will add for you.
>Konstantin

I guess I've just been looking at it from a different perspective (i.e. the user wants to decide the final total packet size); using the example of rte_eth_tx_burst above, I see where you're coming from though.
Thanks for clarifying,
Mark

>
>>
>> Thanks again,
>> Mark
>>
>> >
>> > Consequently, the payload capacity
>> >> of each segment would be reduced accordingly.
>> >> The user only cares about the output segment size (i.e.
>gso_ctx.gso_size);
>> >we need to ensure that the size of the segments that are
>> >> produced is consistent with that. As a result, we need to ensure that any
>> >packet overhead is accounted for in the segment size, before we
>> >> can determine how much space remains for data.
>> >>
>> >> Hope this makes sense.
>> >>
>> >> Thanks,
>> >> Mark
>> >>
>> >> >
>> >> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if the
>> >lower
>> >> >layer
>> >> >will add CRC to the packet.
>> >> >
>> >> >Jiayu
>> >> >
>> >> >> Konstantin
>> >> >>
>> >> >> >
>> >> >> > If I've missed anything, please do let me know!
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Mark
>> >> >> >
>> >> >> > >
>> >> >> > >Konstantin
  
Hu, Jiayu Sept. 14, 2017, 9:45 a.m. UTC | #21
Hi Mark,

> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Thursday, September 14, 2017 4:52 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> >From: Hu, Jiayu
> >Sent: Thursday, September 14, 2017 7:07 AM
> >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> >Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng
> ><jianfeng.tan@intel.com>
> >Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >Hi Konstantin,
> >
> >On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> >>
> >> Hi Jiayu,
> >>
> >> > >
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Ananyev, Konstantin
> >> > > > Sent: Tuesday, September 12, 2017 12:18 PM
> >> > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> >> > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianfeng
> ><jianfeng.tan@intel.com>
> >> > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> > > >
> >> > > > > result, when all of its GSOed segments are freed, the packet is
> >freed
> >> > > > > automatically.
> >> > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> >> > > > > index dda50ee..95f6ea6 100644
> >> > > > > --- a/lib/librte_gso/rte_gso.c
> >> > > > > +++ b/lib/librte_gso/rte_gso.c
> >> > > > > @@ -33,18 +33,53 @@
> >> > > > >
> >> > > > >  #include <errno.h>
> >> > > > >
> >> > > > > +#include <rte_log.h>
> >> > > > > +
> >> > > > >  #include "rte_gso.h"
> >> > > > > +#include "gso_common.h"
> >> > > > > +#include "gso_tcp4.h"
> >> > > > >
> >> > > > >  int
> >> > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> >> > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> >> > > > > +		struct rte_gso_ctx gso_ctx,
> >> > > > >  		struct rte_mbuf **pkts_out,
> >> > > > >  		uint16_t nb_pkts_out)
> >> > > > >  {
> >> > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> >> > > > > +	struct rte_mbuf *pkt_seg;
> >> > > > > +	uint16_t gso_size;
> >> > > > > +	uint8_t ipid_delta;
> >> > > > > +	int ret = 1;
> >> > > > > +
> >> > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> >> > > > >  		return -EINVAL;
> >> > > > >
> >> > > > > -	pkts_out[0] = pkt;
> >> > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> >> > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> >> > > > > +			pkt->packet_type) {
> >> > > > > +		pkts_out[0] = pkt;
> >> > > > > +		return ret;
> >> > > > > +	}
> >> > > > > +
> >> > > > > +	direct_pool = gso_ctx.direct_pool;
> >> > > > > +	indirect_pool = gso_ctx.indirect_pool;
> >> > > > > +	gso_size = gso_ctx.gso_size;
> >> > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> >> > > > > +
> >> > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> >> > > >
> >> > > > Probably we need here:
> >> > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
> >DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> >> > >
> >> > > Sorry, actually it probably should be:
> >> > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4
> &&
> >> > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> >> >
> >> > I don't quite understand why the GSO library should be aware if the TSO
> >> > flag is set or not. Applications can query device TSO capability before
> >> > they call the GSO library. Do I misundertsand anything?
> >> >
> >> > Additionally, we don't need to check if the packet is a TCP/IPv4 packet
> >here?
> >>
> >> Well, right now  PMD we doesn't rely on ptype to figure out what type of
> >packet and
> >> what TX offload have to be performed.
> >> Instead it looks at TX part of ol_flags, and
> >> My thought was that as what we doing is actually TSO in SW, it would be
> good
> >> to use the same API here too.
> >> Also with that approach, by setting ol_flags properly user can use the
> same
> >gso_ctx and still
> >> specify what segmentation to perform on a per-packet basis.
> >>
> >> Alternative way is to rely on ptype to distinguish should segmentation be
> >performed on that package or not.
> >> The only advantage I see here is that if someone would like to add GSO
> for
> >some new protocol,
> >> he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> >> Though he still would need to update TX_OFFLOAD_* capabilities and
> probably
> >packet_type definitions.
> >>
> >> So from my perspective first variant (use HW TSO API) is more plausible.
> >> Wonder what is your and Mark opinions here?
> >
> >In the first choice, you mean:
> >the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a
> specific
> >GSO
> >segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for
> each
> >input packet.
> >Applications should parse the packet type, and set an exactly correct
> >DEV_TX_OFFLOAD_*_TSO
> >flag to gso_types and ol_flags according to the packet type. That is, the
> >value of gso_types
> >is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags at
> the
> >same time
> >is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and the
> inner
> >L4 type, and
> >we need to know L3 type by ol_flags. With this design, HW segmentation
> and SW
> >segmentation
> >are indeed consistent.
> >
> >If I understand it correctly, applications need to set 'ol_flags =
> >PKT_TX_IPV4' and
> >'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> >"ether+ipv4+udp+vxlan+ether+ipv4+
> >tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for
> >tunneled packet.
> >How about the outer L3 type? Always assume the inner and the outer L3
> type are
> >the same?
> 
> Hi Jiayu,
> 
> If I'm not mistaken, I think what Konstantin is suggesting is as follows:
> 
> - The DEV_TX_OFFLOAD_*_TSO flags are currently used to describe a NIC's
> TSO capabilities; the GSO capabilities may also be described using the same
> macros, to provide a consistent view of segmentation capabilities across the
> HW and SW implementations.

Yes, DEV_TX_OFFLOAD_*_TSO stored in gso_types are used to by applications
to tell the GSO library what GSO types are required. The GSO library uses ol_flags
to decide which segmentation function to use.

Thanks,
Jiayu
> 
> - As part of segmentation, it's still a case of checking the packet type, but
> then setting the appropriate ol_flags in the mbuf, which the GSO library can
> use to segment the packet.
> 
> Thanks,
> Mark
> 
> >
> >Jiayu
> >> Konstantin
  
Hu, Jiayu Sept. 14, 2017, 10:01 a.m. UTC | #22
Hi Konstantin and Mark,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, September 14, 2017 5:36 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> 
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Thursday, September 14, 2017 10:29 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, September 14, 2017 4:47 PM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> Tan,
> > > Jianfeng <jianfeng.tan@intel.com>
> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > > Hi Jiayu,
> > >
> > > > -----Original Message-----
> > > > From: Hu, Jiayu
> > > > Sent: Thursday, September 14, 2017 7:07 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> Tan,
> > > Jianfeng <jianfeng.tan@intel.com>
> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > > Hi Jiayu,
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ananyev, Konstantin
> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> Jianfeng
> > > <jianfeng.tan@intel.com>
> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > > > >
> > > > > > > > > result, when all of its GSOed segments are freed, the packet is
> > > freed
> > > > > > > > > automatically.
> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
> > > > > > > > > index dda50ee..95f6ea6 100644
> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > > > > > > @@ -33,18 +33,53 @@
> > > > > > > > >
> > > > > > > > >  #include <errno.h>
> > > > > > > > >
> > > > > > > > > +#include <rte_log.h>
> > > > > > > > > +
> > > > > > > > >  #include "rte_gso.h"
> > > > > > > > > +#include "gso_common.h"
> > > > > > > > > +#include "gso_tcp4.h"
> > > > > > > > >
> > > > > > > > >  int
> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
> > > > > > > > >  		struct rte_mbuf **pkts_out,
> > > > > > > > >  		uint16_t nb_pkts_out)
> > > > > > > > >  {
> > > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > > > > > > +	struct rte_mbuf *pkt_seg;
> > > > > > > > > +	uint16_t gso_size;
> > > > > > > > > +	uint8_t ipid_delta;
> > > > > > > > > +	int ret = 1;
> > > > > > > > > +
> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> > > > > > > > >  		return -EINVAL;
> > > > > > > > >
> > > > > > > > > -	pkts_out[0] = pkt;
> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > > > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> > > > > > > > > +			pkt->packet_type) {
> > > > > > > > > +		pkts_out[0] = pkt;
> > > > > > > > > +		return ret;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > > > > > > +	gso_size = gso_ctx.gso_size;
> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> > > > > > > > > +
> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > > > > > >
> > > > > > > > Probably we need here:
> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > > > >
> > > > > > > Sorry, actually it probably should be:
> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
> PKT_TX_IPV4
> > > &&
> > > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > > >
> > > > > > I don't quite understand why the GSO library should be aware if the
> TSO
> > > > > > flag is set or not. Applications can query device TSO capability
> before
> > > > > > they call the GSO library. Do I misundertsand anything?
> > > > > >
> > > > > > Additionally, we don't need to check if the packet is a TCP/IPv4
> packet
> > > here?
> > > > >
> > > > > Well, right now  PMD we doesn't rely on ptype to figure out what type
> of
> > > packet and
> > > > > what TX offload have to be performed.
> > > > > Instead it looks at TX part of ol_flags, and
> > > > > My thought was that as what we doing is actually TSO in SW, it would
> be
> > > good
> > > > > to use the same API here too.
> > > > > Also with that approach, by setting ol_flags properly user can use the
> > > same gso_ctx and still
> > > > > specify what segmentation to perform on a per-packet basis.
> > > > >
> > > > > Alternative way is to rely on ptype to distinguish should segmentation
> be
> > > performed on that package or not.
> > > > > The only advantage I see here is that if someone would like to add
> GSO
> > > for some new protocol,
> > > > > he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> > > > > Though he still would need to update TX_OFFLOAD_* capabilities and
> > > probably packet_type definitions.
> > > > >
> > > > > So from my perspective first variant (use HW TSO API) is more
> plausible.
> > > > > Wonder what is your and Mark opinions here?
> > > >
> > > > In the first choice, you mean:
> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a
> > > specific GSO
> > > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for
> > > each input packet.
> > > > Applications should parse the packet type, and set an exactly correct
> > > DEV_TX_OFFLOAD_*_TSO
> > > > flag to gso_types and ol_flags according to the packet type. That is, the
> > > value of gso_types
> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags
> at
> > > the same time
> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and
> the
> > > inner L4 type, and
> > > > we need to know L3 type by ol_flags. With this design, HW
> segmentation
> > > and SW segmentation
> > > > are indeed consistent.
> > > >
> > > > If I understand it correctly, applications need to set 'ol_flags =
> > > PKT_TX_IPV4' and
> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> > > "ether+ipv4+udp+vxlan+ether+ipv4+
> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type for
> > > tunneled packet.
> > > > How about the outer L3 type? Always assume the inner and the outer L3
> > > type are the same?
> > >
> > > It think that for that case you'll have to set in ol_flags:
> > >
> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> > > PKT_TX_TCP_SEG
> >
> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
> > GSO library doesn't need gso_types anymore.
> 
> You still might need gso_ctx.gso_types to let user limit what types of
> segmentation
> that particular gso_ctx supports.
> An alternative would be to assume that each gso_ctx supports all
> currently implemented segmentations.
> This is possible too, but probably not very convenient to the user.

Hmm, make sense.

One thing to confirm: the value of gso_types should be DEV_TX_OFFLOAD_*_TSO,
or new macros?

Jiayu
> Konstantin
> 
> >
> > The first choice makes HW and SW segmentation are totally the same.
> > Applications just need to parse the packet and set proper ol_flags, and
> > the GSO library uses ol_flags to decide which segmentation function to use.
> > I think it's better than the second choice which depending on ptype to
> > choose segmentation function.
> >
> > Jiayu
> > >
> > > Konstantin
> > >
> > > >
> > > > Jiayu
> > > > > Konstantin
  
Mark Kavanagh Sept. 14, 2017, 3:42 p.m. UTC | #23
>From: Hu, Jiayu
>Sent: Thursday, September 14, 2017 11:01 AM
>To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
><mark.b.kavanagh@intel.com>
>Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
>Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>Hi Konstantin and Mark,
>
>> -----Original Message-----
>> From: Ananyev, Konstantin
>> Sent: Thursday, September 14, 2017 5:36 PM
>> To: Hu, Jiayu <jiayu.hu@intel.com>
>> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
>> Jianfeng <jianfeng.tan@intel.com>
>> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>>
>>
>>
>> > -----Original Message-----
>> > From: Hu, Jiayu
>> > Sent: Thursday, September 14, 2017 10:29 AM
>> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
>> Jianfeng <jianfeng.tan@intel.com>
>> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >
>> > Hi Konstantin,
>> >
>> > > -----Original Message-----
>> > > From: Ananyev, Konstantin
>> > > Sent: Thursday, September 14, 2017 4:47 PM
>> > > To: Hu, Jiayu <jiayu.hu@intel.com>
>> > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
>> Tan,
>> > > Jianfeng <jianfeng.tan@intel.com>
>> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> > >
>> > > Hi Jiayu,
>> > >
>> > > > -----Original Message-----
>> > > > From: Hu, Jiayu
>> > > > Sent: Thursday, September 14, 2017 7:07 AM
>> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>> > > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
>> Tan,
>> > > Jianfeng <jianfeng.tan@intel.com>
>> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> > > >
>> > > > Hi Konstantin,
>> > > >
>> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
>> > > > >
>> > > > > Hi Jiayu,
>> > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > > -----Original Message-----
>> > > > > > > > From: Ananyev, Konstantin
>> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
>> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
>> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
>> Jianfeng
>> > > <jianfeng.tan@intel.com>
>> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> > > > > > > >
>> > > > > > > > > result, when all of its GSOed segments are freed, the packet
>is
>> > > freed
>> > > > > > > > > automatically.
>> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c
>b/lib/librte_gso/rte_gso.c
>> > > > > > > > > index dda50ee..95f6ea6 100644
>> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
>> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
>> > > > > > > > > @@ -33,18 +33,53 @@
>> > > > > > > > >
>> > > > > > > > >  #include <errno.h>
>> > > > > > > > >
>> > > > > > > > > +#include <rte_log.h>
>> > > > > > > > > +
>> > > > > > > > >  #include "rte_gso.h"
>> > > > > > > > > +#include "gso_common.h"
>> > > > > > > > > +#include "gso_tcp4.h"
>> > > > > > > > >
>> > > > > > > > >  int
>> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
>> > > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
>> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
>> > > > > > > > >  		struct rte_mbuf **pkts_out,
>> > > > > > > > >  		uint16_t nb_pkts_out)
>> > > > > > > > >  {
>> > > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
>> > > > > > > > > +	struct rte_mbuf *pkt_seg;
>> > > > > > > > > +	uint16_t gso_size;
>> > > > > > > > > +	uint8_t ipid_delta;
>> > > > > > > > > +	int ret = 1;
>> > > > > > > > > +
>> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
>> > > > > > > > >  		return -EINVAL;
>> > > > > > > > >
>> > > > > > > > > -	pkts_out[0] = pkt;
>> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
>> > > > > > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
>> > > > > > > > > +			pkt->packet_type) {
>> > > > > > > > > +		pkts_out[0] = pkt;
>> > > > > > > > > +		return ret;
>> > > > > > > > > +	}
>> > > > > > > > > +
>> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
>> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
>> > > > > > > > > +	gso_size = gso_ctx.gso_size;
>> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
>> > > > > > > > > +
>> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
>> > > > > > > >
>> > > > > > > > Probably we need here:
>> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
>> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
>> > > > > > >
>> > > > > > > Sorry, actually it probably should be:
>> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
>> PKT_TX_IPV4
>> > > &&
>> > > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
>> > > > > >
>> > > > > > I don't quite understand why the GSO library should be aware if
>the
>> TSO
>> > > > > > flag is set or not. Applications can query device TSO capability
>> before
>> > > > > > they call the GSO library. Do I misundertsand anything?
>> > > > > >
>> > > > > > Additionally, we don't need to check if the packet is a TCP/IPv4
>> packet
>> > > here?
>> > > > >
>> > > > > Well, right now  PMD we doesn't rely on ptype to figure out what
>type
>> of
>> > > packet and
>> > > > > what TX offload have to be performed.
>> > > > > Instead it looks at TX part of ol_flags, and
>> > > > > My thought was that as what we doing is actually TSO in SW, it would
>> be
>> > > good
>> > > > > to use the same API here too.
>> > > > > Also with that approach, by setting ol_flags properly user can use
>the
>> > > same gso_ctx and still
>> > > > > specify what segmentation to perform on a per-packet basis.
>> > > > >
>> > > > > Alternative way is to rely on ptype to distinguish should
>segmentation
>> be
>> > > performed on that package or not.
>> > > > > The only advantage I see here is that if someone would like to add
>> GSO
>> > > for some new protocol,
>> > > > > he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
>> > > > > Though he still would need to update TX_OFFLOAD_* capabilities and
>> > > probably packet_type definitions.
>> > > > >
>> > > > > So from my perspective first variant (use HW TSO API) is more
>> plausible.
>> > > > > Wonder what is your and Mark opinions here?
>> > > >
>> > > > In the first choice, you mean:
>> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a
>> > > specific GSO
>> > > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for
>> > > each input packet.
>> > > > Applications should parse the packet type, and set an exactly correct
>> > > DEV_TX_OFFLOAD_*_TSO
>> > > > flag to gso_types and ol_flags according to the packet type. That is,
>the
>> > > value of gso_types
>> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags
>> at
>> > > the same time
>> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and
>> the
>> > > inner L4 type, and
>> > > > we need to know L3 type by ol_flags. With this design, HW
>> segmentation
>> > > and SW segmentation
>> > > > are indeed consistent.
>> > > >
>> > > > If I understand it correctly, applications need to set 'ol_flags =
>> > > PKT_TX_IPV4' and
>> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
>> > > "ether+ipv4+udp+vxlan+ether+ipv4+
>> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type
>for
>> > > tunneled packet.
>> > > > How about the outer L3 type? Always assume the inner and the outer L3
>> > > type are the same?
>> > >
>> > > It think that for that case you'll have to set in ol_flags:
>> > >
>> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
>> > > PKT_TX_TCP_SEG
>> >
>> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
>> > GSO library doesn't need gso_types anymore.
>>
>> You still might need gso_ctx.gso_types to let user limit what types of
>> segmentation
>> that particular gso_ctx supports.
>> An alternative would be to assume that each gso_ctx supports all
>> currently implemented segmentations.
>> This is possible too, but probably not very convenient to the user.
>
>Hmm, make sense.
>
>One thing to confirm: the value of gso_types should be DEV_TX_OFFLOAD_*_TSO,
>or new macros?

Hi Jiayu, Konstantin,

I think that the existing macros are fine, as they provide a consistent view of segmentation capabilities to the application/user.

I was initially concerned that they might be too coarse-grained (i.e. only IPv4 is currently supported, and not IPv6), but as per Konstantin's previous example, the DEV_TX_OFFLOAD_*_TSO macros can be used in concert with the packet type to determine whether a packet should be fragmented or not.

Thanks,
Mark

>
>Jiayu
>> Konstantin
>>
>> >
>> > The first choice makes HW and SW segmentation are totally the same.
>> > Applications just need to parse the packet and set proper ol_flags, and
>> > the GSO library uses ol_flags to decide which segmentation function to
>use.
>> > I think it's better than the second choice which depending on ptype to
>> > choose segmentation function.
>> >
>> > Jiayu
>> > >
>> > > Konstantin
>> > >
>> > > >
>> > > > Jiayu
>> > > > > Konstantin
  
Ananyev, Konstantin Sept. 14, 2017, 6:38 p.m. UTC | #24
> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Thursday, September 14, 2017 4:42 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> >From: Hu, Jiayu
> >Sent: Thursday, September 14, 2017 11:01 AM
> >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B
> ><mark.b.kavanagh@intel.com>
> >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >Hi Konstantin and Mark,
> >
> >> -----Original Message-----
> >> From: Ananyev, Konstantin
> >> Sent: Thursday, September 14, 2017 5:36 PM
> >> To: Hu, Jiayu <jiayu.hu@intel.com>
> >> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> >> Jianfeng <jianfeng.tan@intel.com>
> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: Hu, Jiayu
> >> > Sent: Thursday, September 14, 2017 10:29 AM
> >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> >> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> >> Jianfeng <jianfeng.tan@intel.com>
> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >
> >> > Hi Konstantin,
> >> >
> >> > > -----Original Message-----
> >> > > From: Ananyev, Konstantin
> >> > > Sent: Thursday, September 14, 2017 4:47 PM
> >> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> >> > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> >> Tan,
> >> > > Jianfeng <jianfeng.tan@intel.com>
> >> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> > >
> >> > > Hi Jiayu,
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Hu, Jiayu
> >> > > > Sent: Thursday, September 14, 2017 7:07 AM
> >> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> >> > > > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> >> Tan,
> >> > > Jianfeng <jianfeng.tan@intel.com>
> >> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> > > >
> >> > > > Hi Konstantin,
> >> > > >
> >> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote:
> >> > > > >
> >> > > > > Hi Jiayu,
> >> > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > -----Original Message-----
> >> > > > > > > > From: Ananyev, Konstantin
> >> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> >> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> >> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> >> Jianfeng
> >> > > <jianfeng.tan@intel.com>
> >> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> > > > > > > >
> >> > > > > > > > > result, when all of its GSOed segments are freed, the packet
> >is
> >> > > freed
> >> > > > > > > > > automatically.
> >> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c
> >b/lib/librte_gso/rte_gso.c
> >> > > > > > > > > index dda50ee..95f6ea6 100644
> >> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
> >> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> >> > > > > > > > > @@ -33,18 +33,53 @@
> >> > > > > > > > >
> >> > > > > > > > >  #include <errno.h>
> >> > > > > > > > >
> >> > > > > > > > > +#include <rte_log.h>
> >> > > > > > > > > +
> >> > > > > > > > >  #include "rte_gso.h"
> >> > > > > > > > > +#include "gso_common.h"
> >> > > > > > > > > +#include "gso_tcp4.h"
> >> > > > > > > > >
> >> > > > > > > > >  int
> >> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> >> > > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> >> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
> >> > > > > > > > >  		struct rte_mbuf **pkts_out,
> >> > > > > > > > >  		uint16_t nb_pkts_out)
> >> > > > > > > > >  {
> >> > > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> >> > > > > > > > > +	struct rte_mbuf *pkt_seg;
> >> > > > > > > > > +	uint16_t gso_size;
> >> > > > > > > > > +	uint8_t ipid_delta;
> >> > > > > > > > > +	int ret = 1;
> >> > > > > > > > > +
> >> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
> >> > > > > > > > >  		return -EINVAL;
> >> > > > > > > > >
> >> > > > > > > > > -	pkts_out[0] = pkt;
> >> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> >> > > > > > > > > +			(pkt->packet_type & gso_ctx.gso_types) !=
> >> > > > > > > > > +			pkt->packet_type) {
> >> > > > > > > > > +		pkts_out[0] = pkt;
> >> > > > > > > > > +		return ret;
> >> > > > > > > > > +	}
> >> > > > > > > > > +
> >> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> >> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> >> > > > > > > > > +	gso_size = gso_ctx.gso_size;
> >> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
> >> > > > > > > > > +
> >> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> >> > > > > > > >
> >> > > > > > > > Probably we need here:
> >> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types &
> >> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> >> > > > > > >
> >> > > > > > > Sorry, actually it probably should be:
> >> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
> >> PKT_TX_IPV4
> >> > > &&
> >> > > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> >> > > > > >
> >> > > > > > I don't quite understand why the GSO library should be aware if
> >the
> >> TSO
> >> > > > > > flag is set or not. Applications can query device TSO capability
> >> before
> >> > > > > > they call the GSO library. Do I misundertsand anything?
> >> > > > > >
> >> > > > > > Additionally, we don't need to check if the packet is a TCP/IPv4
> >> packet
> >> > > here?
> >> > > > >
> >> > > > > Well, right now  PMD we doesn't rely on ptype to figure out what
> >type
> >> of
> >> > > packet and
> >> > > > > what TX offload have to be performed.
> >> > > > > Instead it looks at TX part of ol_flags, and
> >> > > > > My thought was that as what we doing is actually TSO in SW, it would
> >> be
> >> > > good
> >> > > > > to use the same API here too.
> >> > > > > Also with that approach, by setting ol_flags properly user can use
> >the
> >> > > same gso_ctx and still
> >> > > > > specify what segmentation to perform on a per-packet basis.
> >> > > > >
> >> > > > > Alternative way is to rely on ptype to distinguish should
> >segmentation
> >> be
> >> > > performed on that package or not.
> >> > > > > The only advantage I see here is that if someone would like to add
> >> GSO
> >> > > for some new protocol,
> >> > > > > he wouldn't need to introduce new TX flag value for mbuf.ol_flags.
> >> > > > > Though he still would need to update TX_OFFLOAD_* capabilities and
> >> > > probably packet_type definitions.
> >> > > > >
> >> > > > > So from my perspective first variant (use HW TSO API) is more
> >> plausible.
> >> > > > > Wonder what is your and Mark opinions here?
> >> > > >
> >> > > > In the first choice, you mean:
> >> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call a
> >> > > specific GSO
> >> > > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx()) for
> >> > > each input packet.
> >> > > > Applications should parse the packet type, and set an exactly correct
> >> > > DEV_TX_OFFLOAD_*_TSO
> >> > > > flag to gso_types and ol_flags according to the packet type. That is,
> >the
> >> > > value of gso_types
> >> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf->ol_flags
> >> at
> >> > > the same time
> >> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type and
> >> the
> >> > > inner L4 type, and
> >> > > > we need to know L3 type by ol_flags. With this design, HW
> >> segmentation
> >> > > and SW segmentation
> >> > > > are indeed consistent.
> >> > > >
> >> > > > If I understand it correctly, applications need to set 'ol_flags =
> >> > > PKT_TX_IPV4' and
> >> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> >> > > "ether+ipv4+udp+vxlan+ether+ipv4+
> >> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3 type
> >for
> >> > > tunneled packet.
> >> > > > How about the outer L3 type? Always assume the inner and the outer L3
> >> > > type are the same?
> >> > >
> >> > > It think that for that case you'll have to set in ol_flags:
> >> > >
> >> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> >> > > PKT_TX_TCP_SEG
> >> >
> >> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
> >> > GSO library doesn't need gso_types anymore.
> >>
> >> You still might need gso_ctx.gso_types to let user limit what types of
> >> segmentation
> >> that particular gso_ctx supports.
> >> An alternative would be to assume that each gso_ctx supports all
> >> currently implemented segmentations.
> >> This is possible too, but probably not very convenient to the user.
> >
> >Hmm, make sense.
> >
> >One thing to confirm: the value of gso_types should be DEV_TX_OFFLOAD_*_TSO,
> >or new macros?
> 
> Hi Jiayu, Konstantin,
> 
> I think that the existing macros are fine, as they provide a consistent view of segmentation capabilities to the application/user.

+1
I also think it is better to re-use DEV_TX_OFFLOAD_*_TSO.

> 
> I was initially concerned that they might be too coarse-grained (i.e. only IPv4 is currently supported, and not IPv6), but as per Konstantin's
> previous example, the DEV_TX_OFFLOAD_*_TSO macros can be used in concert with the packet type to determine whether a packet should
> be fragmented or not.
> 
> Thanks,
> Mark
> 
> >
> >Jiayu
> >> Konstantin
> >>
> >> >
> >> > The first choice makes HW and SW segmentation are totally the same.
> >> > Applications just need to parse the packet and set proper ol_flags, and
> >> > the GSO library uses ol_flags to decide which segmentation function to
> >use.
> >> > I think it's better than the second choice which depending on ptype to
> >> > choose segmentation function.
> >> >
> >> > Jiayu
> >> > >
> >> > > Konstantin
> >> > >
> >> > > >
> >> > > > Jiayu
> >> > > > > Konstantin
  
Hu, Jiayu Sept. 15, 2017, 7:54 a.m. UTC | #25
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, September 15, 2017 2:39 AM
> To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> 
> 
> > -----Original Message-----
> > From: Kavanagh, Mark B
> > Sent: Thursday, September 14, 2017 4:42 PM
> > To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > >From: Hu, Jiayu
> > >Sent: Thursday, September 14, 2017 11:01 AM
> > >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh,
> Mark B
> > ><mark.b.kavanagh@intel.com>
> > >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > >Hi Konstantin and Mark,
> > >
> > >> -----Original Message-----
> > >> From: Ananyev, Konstantin
> > >> Sent: Thursday, September 14, 2017 5:36 PM
> > >> To: Hu, Jiayu <jiayu.hu@intel.com>
> > >> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> Tan,
> > >> Jianfeng <jianfeng.tan@intel.com>
> > >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >>
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: Hu, Jiayu
> > >> > Sent: Thursday, September 14, 2017 10:29 AM
> > >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > >> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> Tan,
> > >> Jianfeng <jianfeng.tan@intel.com>
> > >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >> >
> > >> > Hi Konstantin,
> > >> >
> > >> > > -----Original Message-----
> > >> > > From: Ananyev, Konstantin
> > >> > > Sent: Thursday, September 14, 2017 4:47 PM
> > >> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > >> > > Cc: dev@dpdk.org; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>;
> > >> Tan,
> > >> > > Jianfeng <jianfeng.tan@intel.com>
> > >> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >> > >
> > >> > > Hi Jiayu,
> > >> > >
> > >> > > > -----Original Message-----
> > >> > > > From: Hu, Jiayu
> > >> > > > Sent: Thursday, September 14, 2017 7:07 AM
> > >> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > >> > > > Cc: dev@dpdk.org; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>;
> > >> Tan,
> > >> > > Jianfeng <jianfeng.tan@intel.com>
> > >> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >> > > >
> > >> > > > Hi Konstantin,
> > >> > > >
> > >> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin
> wrote:
> > >> > > > >
> > >> > > > > Hi Jiayu,
> > >> > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > > -----Original Message-----
> > >> > > > > > > > From: Ananyev, Konstantin
> > >> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > >> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > >> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> > >> Jianfeng
> > >> > > <jianfeng.tan@intel.com>
> > >> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >> > > > > > > >
> > >> > > > > > > > > result, when all of its GSOed segments are freed, the
> packet
> > >is
> > >> > > freed
> > >> > > > > > > > > automatically.
> > >> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c
> > >b/lib/librte_gso/rte_gso.c
> > >> > > > > > > > > index dda50ee..95f6ea6 100644
> > >> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
> > >> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > >> > > > > > > > > @@ -33,18 +33,53 @@
> > >> > > > > > > > >
> > >> > > > > > > > >  #include <errno.h>
> > >> > > > > > > > >
> > >> > > > > > > > > +#include <rte_log.h>
> > >> > > > > > > > > +
> > >> > > > > > > > >  #include "rte_gso.h"
> > >> > > > > > > > > +#include "gso_common.h"
> > >> > > > > > > > > +#include "gso_tcp4.h"
> > >> > > > > > > > >
> > >> > > > > > > > >  int
> > >> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > >> > > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > >> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
> > >> > > > > > > > >  		struct rte_mbuf **pkts_out,
> > >> > > > > > > > >  		uint16_t nb_pkts_out)
> > >> > > > > > > > >  {
> > >> > > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > >> > > > > > > > > +	struct rte_mbuf *pkt_seg;
> > >> > > > > > > > > +	uint16_t gso_size;
> > >> > > > > > > > > +	uint8_t ipid_delta;
> > >> > > > > > > > > +	int ret = 1;
> > >> > > > > > > > > +
> > >> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out
> < 1)
> > >> > > > > > > > >  		return -EINVAL;
> > >> > > > > > > > >
> > >> > > > > > > > > -	pkts_out[0] = pkt;
> > >> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > >> > > > > > > > > +			(pkt->packet_type &
> gso_ctx.gso_types) !=
> > >> > > > > > > > > +			pkt->packet_type) {
> > >> > > > > > > > > +		pkts_out[0] = pkt;
> > >> > > > > > > > > +		return ret;
> > >> > > > > > > > > +	}
> > >> > > > > > > > > +
> > >> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > >> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > >> > > > > > > > > +	gso_size = gso_ctx.gso_size;
> > >> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag ==
> RTE_GSO_IPID_INCREASE;
> > >> > > > > > > > > +
> > >> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > >> > > > > > > >
> > >> > > > > > > > Probably we need here:
> > >> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types
> &
> > >> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > >> > > > > > >
> > >> > > > > > > Sorry, actually it probably should be:
> > >> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
> > >> PKT_TX_IPV4
> > >> > > &&
> > >> > > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0)
> {...
> > >> > > > > >
> > >> > > > > > I don't quite understand why the GSO library should be aware if
> > >the
> > >> TSO
> > >> > > > > > flag is set or not. Applications can query device TSO capability
> > >> before
> > >> > > > > > they call the GSO library. Do I misundertsand anything?
> > >> > > > > >
> > >> > > > > > Additionally, we don't need to check if the packet is a TCP/IPv4
> > >> packet
> > >> > > here?
> > >> > > > >
> > >> > > > > Well, right now  PMD we doesn't rely on ptype to figure out what
> > >type
> > >> of
> > >> > > packet and
> > >> > > > > what TX offload have to be performed.
> > >> > > > > Instead it looks at TX part of ol_flags, and
> > >> > > > > My thought was that as what we doing is actually TSO in SW, it
> would
> > >> be
> > >> > > good
> > >> > > > > to use the same API here too.
> > >> > > > > Also with that approach, by setting ol_flags properly user can use
> > >the
> > >> > > same gso_ctx and still
> > >> > > > > specify what segmentation to perform on a per-packet basis.
> > >> > > > >
> > >> > > > > Alternative way is to rely on ptype to distinguish should
> > >segmentation
> > >> be
> > >> > > performed on that package or not.
> > >> > > > > The only advantage I see here is that if someone would like to
> add
> > >> GSO
> > >> > > for some new protocol,
> > >> > > > > he wouldn't need to introduce new TX flag value for
> mbuf.ol_flags.
> > >> > > > > Though he still would need to update TX_OFFLOAD_* capabilities
> and
> > >> > > probably packet_type definitions.
> > >> > > > >
> > >> > > > > So from my perspective first variant (use HW TSO API) is more
> > >> plausible.
> > >> > > > > Wonder what is your and Mark opinions here?
> > >> > > >
> > >> > > > In the first choice, you mean:
> > >> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call
> a
> > >> > > specific GSO
> > >> > > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx())
> for
> > >> > > each input packet.
> > >> > > > Applications should parse the packet type, and set an exactly
> correct
> > >> > > DEV_TX_OFFLOAD_*_TSO
> > >> > > > flag to gso_types and ol_flags according to the packet type. That is,
> > >the
> > >> > > value of gso_types
> > >> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf-
> >ol_flags
> > >> at
> > >> > > the same time
> > >> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type
> and
> > >> the
> > >> > > inner L4 type, and
> > >> > > > we need to know L3 type by ol_flags. With this design, HW
> > >> segmentation
> > >> > > and SW segmentation
> > >> > > > are indeed consistent.
> > >> > > >
> > >> > > > If I understand it correctly, applications need to set 'ol_flags =
> > >> > > PKT_TX_IPV4' and
> > >> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> > >> > > "ether+ipv4+udp+vxlan+ether+ipv4+
> > >> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3
> type
> > >for
> > >> > > tunneled packet.
> > >> > > > How about the outer L3 type? Always assume the inner and the
> outer L3
> > >> > > type are the same?
> > >> > >
> > >> > > It think that for that case you'll have to set in ol_flags:
> > >> > >
> > >> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> > >> > > PKT_TX_TCP_SEG
> > >> >
> > >> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
> > >> > GSO library doesn't need gso_types anymore.
> > >>
> > >> You still might need gso_ctx.gso_types to let user limit what types of
> > >> segmentation
> > >> that particular gso_ctx supports.
> > >> An alternative would be to assume that each gso_ctx supports all
> > >> currently implemented segmentations.
> > >> This is possible too, but probably not very convenient to the user.
> > >
> > >Hmm, make sense.
> > >
> > >One thing to confirm: the value of gso_types should be
> DEV_TX_OFFLOAD_*_TSO,
> > >or new macros?
> >
> > Hi Jiayu, Konstantin,
> >
> > I think that the existing macros are fine, as they provide a consistent view
> of segmentation capabilities to the application/user.
> 
> +1
> I also think it is better to re-use DEV_TX_OFFLOAD_*_TSO.

There might be an 'issue', if we use 'PKT_TX_TCP_SEG' to tell the
GSO library to segment a packet or not. Given the scenario that
an application only wants to do GSO and doesn't want to use TSO.
The application sets 'mbuf->ol_flags=PKT_TX_TCP_SEG' and doesn't
set mbuf->tso_segsz. Then the GSO library segments the packet, and
all output GSO segments have the same ol_flags as the input packet
(in current GSO library design). Then the output GSO segments are
transmitted to rte_eth_tx_prepare(). If the NIC is i40e, its TX prepare function,
i40e_prep_pkts, checks if mbuf->tso_segsz is in the range of I40E_MIN_TSO_MSS
and I40E_MAX_TSO_MSS, when PKT_TX_TCP_SEG is set. So an error happens in
this scenario, since tso_segsz is 0.
 
In fact, it may confuse the PMD driver when set PKT_TX_TCP_SEG but don't want
to do TSO. One solution is that the GSO library removes the PKT_TX_TCP_SEG flag
for all GSO segments after finishes segmenting. Wonder you and Mark's opinion.
 
Thanks,
Jiayu
> 
> >
> > I was initially concerned that they might be too coarse-grained (i.e. only
> IPv4 is currently supported, and not IPv6), but as per Konstantin's
> > previous example, the DEV_TX_OFFLOAD_*_TSO macros can be used in
> concert with the packet type to determine whether a packet should
> > be fragmented or not.
> >
> > Thanks,
> > Mark
> >
> > >
> > >Jiayu
> > >> Konstantin
> > >>
> > >> >
> > >> > The first choice makes HW and SW segmentation are totally the same.
> > >> > Applications just need to parse the packet and set proper ol_flags, and
> > >> > the GSO library uses ol_flags to decide which segmentation function to
> > >use.
> > >> > I think it's better than the second choice which depending on ptype to
> > >> > choose segmentation function.
> > >> >
> > >> > Jiayu
> > >> > >
> > >> > > Konstantin
> > >> > >
> > >> > > >
> > >> > > > Jiayu
> > >> > > > > Konstantin
  
Ananyev, Konstantin Sept. 15, 2017, 8:15 a.m. UTC | #26
Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Friday, September 15, 2017 8:55 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, September 15, 2017 2:39 AM
> > To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> > <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >
> >
> > > -----Original Message-----
> > > From: Kavanagh, Mark B
> > > Sent: Thursday, September 14, 2017 4:42 PM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > > >From: Hu, Jiayu
> > > >Sent: Thursday, September 14, 2017 11:01 AM
> > > >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh,
> > Mark B
> > > ><mark.b.kavanagh@intel.com>
> > > >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >
> > > >Hi Konstantin and Mark,
> > > >
> > > >> -----Original Message-----
> > > >> From: Ananyev, Konstantin
> > > >> Sent: Thursday, September 14, 2017 5:36 PM
> > > >> To: Hu, Jiayu <jiayu.hu@intel.com>
> > > >> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> > Tan,
> > > >> Jianfeng <jianfeng.tan@intel.com>
> > > >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >>
> > > >>
> > > >>
> > > >> > -----Original Message-----
> > > >> > From: Hu, Jiayu
> > > >> > Sent: Thursday, September 14, 2017 10:29 AM
> > > >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > >> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> > Tan,
> > > >> Jianfeng <jianfeng.tan@intel.com>
> > > >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >> >
> > > >> > Hi Konstantin,
> > > >> >
> > > >> > > -----Original Message-----
> > > >> > > From: Ananyev, Konstantin
> > > >> > > Sent: Thursday, September 14, 2017 4:47 PM
> > > >> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > >> > > Cc: dev@dpdk.org; Kavanagh, Mark B
> > <mark.b.kavanagh@intel.com>;
> > > >> Tan,
> > > >> > > Jianfeng <jianfeng.tan@intel.com>
> > > >> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >> > >
> > > >> > > Hi Jiayu,
> > > >> > >
> > > >> > > > -----Original Message-----
> > > >> > > > From: Hu, Jiayu
> > > >> > > > Sent: Thursday, September 14, 2017 7:07 AM
> > > >> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > >> > > > Cc: dev@dpdk.org; Kavanagh, Mark B
> > <mark.b.kavanagh@intel.com>;
> > > >> Tan,
> > > >> > > Jianfeng <jianfeng.tan@intel.com>
> > > >> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >> > > >
> > > >> > > > Hi Konstantin,
> > > >> > > >
> > > >> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin
> > wrote:
> > > >> > > > >
> > > >> > > > > Hi Jiayu,
> > > >> > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > > -----Original Message-----
> > > >> > > > > > > > From: Ananyev, Konstantin
> > > >> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > >> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > >> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> > > >> Jianfeng
> > > >> > > <jianfeng.tan@intel.com>
> > > >> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >> > > > > > > >
> > > >> > > > > > > > > result, when all of its GSOed segments are freed, the
> > packet
> > > >is
> > > >> > > freed
> > > >> > > > > > > > > automatically.
> > > >> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c
> > > >b/lib/librte_gso/rte_gso.c
> > > >> > > > > > > > > index dda50ee..95f6ea6 100644
> > > >> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
> > > >> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > >> > > > > > > > > @@ -33,18 +33,53 @@
> > > >> > > > > > > > >
> > > >> > > > > > > > >  #include <errno.h>
> > > >> > > > > > > > >
> > > >> > > > > > > > > +#include <rte_log.h>
> > > >> > > > > > > > > +
> > > >> > > > > > > > >  #include "rte_gso.h"
> > > >> > > > > > > > > +#include "gso_common.h"
> > > >> > > > > > > > > +#include "gso_tcp4.h"
> > > >> > > > > > > > >
> > > >> > > > > > > > >  int
> > > >> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > >> > > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > >> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
> > > >> > > > > > > > >  		struct rte_mbuf **pkts_out,
> > > >> > > > > > > > >  		uint16_t nb_pkts_out)
> > > >> > > > > > > > >  {
> > > >> > > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > >> > > > > > > > > +	struct rte_mbuf *pkt_seg;
> > > >> > > > > > > > > +	uint16_t gso_size;
> > > >> > > > > > > > > +	uint8_t ipid_delta;
> > > >> > > > > > > > > +	int ret = 1;
> > > >> > > > > > > > > +
> > > >> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out
> > < 1)
> > > >> > > > > > > > >  		return -EINVAL;
> > > >> > > > > > > > >
> > > >> > > > > > > > > -	pkts_out[0] = pkt;
> > > >> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > >> > > > > > > > > +			(pkt->packet_type &
> > gso_ctx.gso_types) !=
> > > >> > > > > > > > > +			pkt->packet_type) {
> > > >> > > > > > > > > +		pkts_out[0] = pkt;
> > > >> > > > > > > > > +		return ret;
> > > >> > > > > > > > > +	}
> > > >> > > > > > > > > +
> > > >> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > >> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > >> > > > > > > > > +	gso_size = gso_ctx.gso_size;
> > > >> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag ==
> > RTE_GSO_IPID_INCREASE;
> > > >> > > > > > > > > +
> > > >> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > >> > > > > > > >
> > > >> > > > > > > > Probably we need here:
> > > >> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types
> > &
> > > >> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > >> > > > > > >
> > > >> > > > > > > Sorry, actually it probably should be:
> > > >> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
> > > >> PKT_TX_IPV4
> > > >> > > &&
> > > >> > > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0)
> > {...
> > > >> > > > > >
> > > >> > > > > > I don't quite understand why the GSO library should be aware if
> > > >the
> > > >> TSO
> > > >> > > > > > flag is set or not. Applications can query device TSO capability
> > > >> before
> > > >> > > > > > they call the GSO library. Do I misundertsand anything?
> > > >> > > > > >
> > > >> > > > > > Additionally, we don't need to check if the packet is a TCP/IPv4
> > > >> packet
> > > >> > > here?
> > > >> > > > >
> > > >> > > > > Well, right now  PMD we doesn't rely on ptype to figure out what
> > > >type
> > > >> of
> > > >> > > packet and
> > > >> > > > > what TX offload have to be performed.
> > > >> > > > > Instead it looks at TX part of ol_flags, and
> > > >> > > > > My thought was that as what we doing is actually TSO in SW, it
> > would
> > > >> be
> > > >> > > good
> > > >> > > > > to use the same API here too.
> > > >> > > > > Also with that approach, by setting ol_flags properly user can use
> > > >the
> > > >> > > same gso_ctx and still
> > > >> > > > > specify what segmentation to perform on a per-packet basis.
> > > >> > > > >
> > > >> > > > > Alternative way is to rely on ptype to distinguish should
> > > >segmentation
> > > >> be
> > > >> > > performed on that package or not.
> > > >> > > > > The only advantage I see here is that if someone would like to
> > add
> > > >> GSO
> > > >> > > for some new protocol,
> > > >> > > > > he wouldn't need to introduce new TX flag value for
> > mbuf.ol_flags.
> > > >> > > > > Though he still would need to update TX_OFFLOAD_* capabilities
> > and
> > > >> > > probably packet_type definitions.
> > > >> > > > >
> > > >> > > > > So from my perspective first variant (use HW TSO API) is more
> > > >> plausible.
> > > >> > > > > Wonder what is your and Mark opinions here?
> > > >> > > >
> > > >> > > > In the first choice, you mean:
> > > >> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call
> > a
> > > >> > > specific GSO
> > > >> > > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx())
> > for
> > > >> > > each input packet.
> > > >> > > > Applications should parse the packet type, and set an exactly
> > correct
> > > >> > > DEV_TX_OFFLOAD_*_TSO
> > > >> > > > flag to gso_types and ol_flags according to the packet type. That is,
> > > >the
> > > >> > > value of gso_types
> > > >> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf-
> > >ol_flags
> > > >> at
> > > >> > > the same time
> > > >> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type
> > and
> > > >> the
> > > >> > > inner L4 type, and
> > > >> > > > we need to know L3 type by ol_flags. With this design, HW
> > > >> segmentation
> > > >> > > and SW segmentation
> > > >> > > > are indeed consistent.
> > > >> > > >
> > > >> > > > If I understand it correctly, applications need to set 'ol_flags =
> > > >> > > PKT_TX_IPV4' and
> > > >> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> > > >> > > "ether+ipv4+udp+vxlan+ether+ipv4+
> > > >> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3
> > type
> > > >for
> > > >> > > tunneled packet.
> > > >> > > > How about the outer L3 type? Always assume the inner and the
> > outer L3
> > > >> > > type are the same?
> > > >> > >
> > > >> > > It think that for that case you'll have to set in ol_flags:
> > > >> > >
> > > >> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> > > >> > > PKT_TX_TCP_SEG
> > > >> >
> > > >> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
> > > >> > GSO library doesn't need gso_types anymore.
> > > >>
> > > >> You still might need gso_ctx.gso_types to let user limit what types of
> > > >> segmentation
> > > >> that particular gso_ctx supports.
> > > >> An alternative would be to assume that each gso_ctx supports all
> > > >> currently implemented segmentations.
> > > >> This is possible too, but probably not very convenient to the user.
> > > >
> > > >Hmm, make sense.
> > > >
> > > >One thing to confirm: the value of gso_types should be
> > DEV_TX_OFFLOAD_*_TSO,
> > > >or new macros?
> > >
> > > Hi Jiayu, Konstantin,
> > >
> > > I think that the existing macros are fine, as they provide a consistent view
> > of segmentation capabilities to the application/user.
> >
> > +1
> > I also think it is better to re-use DEV_TX_OFFLOAD_*_TSO.
> 
> There might be an 'issue', if we use 'PKT_TX_TCP_SEG' to tell the
> GSO library to segment a packet or not. Given the scenario that
> an application only wants to do GSO and doesn't want to use TSO.
> The application sets 'mbuf->ol_flags=PKT_TX_TCP_SEG' and doesn't
> set mbuf->tso_segsz. Then the GSO library segments the packet, and
> all output GSO segments have the same ol_flags as the input packet
> (in current GSO library design). Then the output GSO segments are
> transmitted to rte_eth_tx_prepare(). If the NIC is i40e, its TX prepare function,
> i40e_prep_pkts, checks if mbuf->tso_segsz is in the range of I40E_MIN_TSO_MSS
> and I40E_MAX_TSO_MSS, when PKT_TX_TCP_SEG is set. So an error happens in
> this scenario, since tso_segsz is 0.
> 
> In fact, it may confuse the PMD driver when set PKT_TX_TCP_SEG but don't want
> to do TSO. One solution is that the GSO library removes the PKT_TX_TCP_SEG flag
> for all GSO segments after finishes segmenting.

Yes, that was my thought too: after successful segmentation we probably 
need to cleanup related ol_flags.
Konstantin

> Wonder you and Mark's opinion.
> 
> Thanks,
> Jiayu
> >
> > >
> > > I was initially concerned that they might be too coarse-grained (i.e. only
> > IPv4 is currently supported, and not IPv6), but as per Konstantin's
> > > previous example, the DEV_TX_OFFLOAD_*_TSO macros can be used in
> > concert with the packet type to determine whether a packet should
> > > be fragmented or not.
> > >
> > > Thanks,
> > > Mark
> > >
> > > >
> > > >Jiayu
> > > >> Konstantin
> > > >>
> > > >> >
> > > >> > The first choice makes HW and SW segmentation are totally the same.
> > > >> > Applications just need to parse the packet and set proper ol_flags, and
> > > >> > the GSO library uses ol_flags to decide which segmentation function to
> > > >use.
> > > >> > I think it's better than the second choice which depending on ptype to
> > > >> > choose segmentation function.
> > > >> >
> > > >> > Jiayu
> > > >> > >
> > > >> > > Konstantin
> > > >> > >
> > > >> > > >
> > > >> > > > Jiayu
> > > >> > > > > Konstantin
  
Ananyev, Konstantin Sept. 15, 2017, 8:17 a.m. UTC | #27
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, September 15, 2017 9:16 AM
> To: Hu, Jiayu <jiayu.hu@intel.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Friday, September 15, 2017 8:55 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com>
> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Friday, September 15, 2017 2:39 AM
> > > To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> > > <jiayu.hu@intel.com>
> > > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Kavanagh, Mark B
> > > > Sent: Thursday, September 14, 2017 4:42 PM
> > > > To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >
> > > > >From: Hu, Jiayu
> > > > >Sent: Thursday, September 14, 2017 11:01 AM
> > > > >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh,
> > > Mark B
> > > > ><mark.b.kavanagh@intel.com>
> > > > >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >
> > > > >Hi Konstantin and Mark,
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Ananyev, Konstantin
> > > > >> Sent: Thursday, September 14, 2017 5:36 PM
> > > > >> To: Hu, Jiayu <jiayu.hu@intel.com>
> > > > >> Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> > > Tan,
> > > > >> Jianfeng <jianfeng.tan@intel.com>
> > > > >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >>
> > > > >>
> > > > >>
> > > > >> > -----Original Message-----
> > > > >> > From: Hu, Jiayu
> > > > >> > Sent: Thursday, September 14, 2017 10:29 AM
> > > > >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > >> > Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> > > Tan,
> > > > >> Jianfeng <jianfeng.tan@intel.com>
> > > > >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >> >
> > > > >> > Hi Konstantin,
> > > > >> >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Ananyev, Konstantin
> > > > >> > > Sent: Thursday, September 14, 2017 4:47 PM
> > > > >> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > > >> > > Cc: dev@dpdk.org; Kavanagh, Mark B
> > > <mark.b.kavanagh@intel.com>;
> > > > >> Tan,
> > > > >> > > Jianfeng <jianfeng.tan@intel.com>
> > > > >> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >> > >
> > > > >> > > Hi Jiayu,
> > > > >> > >
> > > > >> > > > -----Original Message-----
> > > > >> > > > From: Hu, Jiayu
> > > > >> > > > Sent: Thursday, September 14, 2017 7:07 AM
> > > > >> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > >> > > > Cc: dev@dpdk.org; Kavanagh, Mark B
> > > <mark.b.kavanagh@intel.com>;
> > > > >> Tan,
> > > > >> > > Jianfeng <jianfeng.tan@intel.com>
> > > > >> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >> > > >
> > > > >> > > > Hi Konstantin,
> > > > >> > > >
> > > > >> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin
> > > wrote:
> > > > >> > > > >
> > > > >> > > > > Hi Jiayu,
> > > > >> > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > > -----Original Message-----
> > > > >> > > > > > > > From: Ananyev, Konstantin
> > > > >> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > >> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > >> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan,
> > > > >> Jianfeng
> > > > >> > > <jianfeng.tan@intel.com>
> > > > >> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >> > > > > > > >
> > > > >> > > > > > > > > result, when all of its GSOed segments are freed, the
> > > packet
> > > > >is
> > > > >> > > freed
> > > > >> > > > > > > > > automatically.
> > > > >> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c
> > > > >b/lib/librte_gso/rte_gso.c
> > > > >> > > > > > > > > index dda50ee..95f6ea6 100644
> > > > >> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
> > > > >> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > >> > > > > > > > > @@ -33,18 +33,53 @@
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >  #include <errno.h>
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > +#include <rte_log.h>
> > > > >> > > > > > > > > +
> > > > >> > > > > > > > >  #include "rte_gso.h"
> > > > >> > > > > > > > > +#include "gso_common.h"
> > > > >> > > > > > > > > +#include "gso_tcp4.h"
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >  int
> > > > >> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > >> > > > > > > > > -		struct rte_gso_ctx gso_ctx __rte_unused,
> > > > >> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
> > > > >> > > > > > > > >  		struct rte_mbuf **pkts_out,
> > > > >> > > > > > > > >  		uint16_t nb_pkts_out)
> > > > >> > > > > > > > >  {
> > > > >> > > > > > > > > +	struct rte_mempool *direct_pool, *indirect_pool;
> > > > >> > > > > > > > > +	struct rte_mbuf *pkt_seg;
> > > > >> > > > > > > > > +	uint16_t gso_size;
> > > > >> > > > > > > > > +	uint8_t ipid_delta;
> > > > >> > > > > > > > > +	int ret = 1;
> > > > >> > > > > > > > > +
> > > > >> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL || nb_pkts_out
> > > < 1)
> > > > >> > > > > > > > >  		return -EINVAL;
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > -	pkts_out[0] = pkt;
> > > > >> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > >> > > > > > > > > +			(pkt->packet_type &
> > > gso_ctx.gso_types) !=
> > > > >> > > > > > > > > +			pkt->packet_type) {
> > > > >> > > > > > > > > +		pkts_out[0] = pkt;
> > > > >> > > > > > > > > +		return ret;
> > > > >> > > > > > > > > +	}
> > > > >> > > > > > > > > +
> > > > >> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > >> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > >> > > > > > > > > +	gso_size = gso_ctx.gso_size;
> > > > >> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag ==
> > > RTE_GSO_IPID_INCREASE;
> > > > >> > > > > > > > > +
> > > > >> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > > >> > > > > > > >
> > > > >> > > > > > > > Probably we need here:
> > > > >> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx->gso_types
> > > &
> > > > >> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > >> > > > > > >
> > > > >> > > > > > > Sorry, actually it probably should be:
> > > > >> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
> > > > >> PKT_TX_IPV4
> > > > >> > > &&
> > > > >> > > > > > >       (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0)
> > > {...
> > > > >> > > > > >
> > > > >> > > > > > I don't quite understand why the GSO library should be aware if
> > > > >the
> > > > >> TSO
> > > > >> > > > > > flag is set or not. Applications can query device TSO capability
> > > > >> before
> > > > >> > > > > > they call the GSO library. Do I misundertsand anything?
> > > > >> > > > > >
> > > > >> > > > > > Additionally, we don't need to check if the packet is a TCP/IPv4
> > > > >> packet
> > > > >> > > here?
> > > > >> > > > >
> > > > >> > > > > Well, right now  PMD we doesn't rely on ptype to figure out what
> > > > >type
> > > > >> of
> > > > >> > > packet and
> > > > >> > > > > what TX offload have to be performed.
> > > > >> > > > > Instead it looks at TX part of ol_flags, and
> > > > >> > > > > My thought was that as what we doing is actually TSO in SW, it
> > > would
> > > > >> be
> > > > >> > > good
> > > > >> > > > > to use the same API here too.
> > > > >> > > > > Also with that approach, by setting ol_flags properly user can use
> > > > >the
> > > > >> > > same gso_ctx and still
> > > > >> > > > > specify what segmentation to perform on a per-packet basis.
> > > > >> > > > >
> > > > >> > > > > Alternative way is to rely on ptype to distinguish should
> > > > >segmentation
> > > > >> be
> > > > >> > > performed on that package or not.
> > > > >> > > > > The only advantage I see here is that if someone would like to
> > > add
> > > > >> GSO
> > > > >> > > for some new protocol,
> > > > >> > > > > he wouldn't need to introduce new TX flag value for
> > > mbuf.ol_flags.
> > > > >> > > > > Though he still would need to update TX_OFFLOAD_* capabilities
> > > and
> > > > >> > > probably packet_type definitions.
> > > > >> > > > >
> > > > >> > > > > So from my perspective first variant (use HW TSO API) is more
> > > > >> plausible.
> > > > >> > > > > Wonder what is your and Mark opinions here?
> > > > >> > > >
> > > > >> > > > In the first choice, you mean:
> > > > >> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags to call
> > > a
> > > > >> > > specific GSO
> > > > >> > > > segmentation function (e.g. gso_tcp4_segment(), gso_tunnel_xxx())
> > > for
> > > > >> > > each input packet.
> > > > >> > > > Applications should parse the packet type, and set an exactly
> > > correct
> > > > >> > > DEV_TX_OFFLOAD_*_TSO
> > > > >> > > > flag to gso_types and ol_flags according to the packet type. That is,
> > > > >the
> > > > >> > > value of gso_types
> > > > >> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf-
> > > >ol_flags
> > > > >> at
> > > > >> > > the same time
> > > > >> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling type
> > > and
> > > > >> the
> > > > >> > > inner L4 type, and
> > > > >> > > > we need to know L3 type by ol_flags. With this design, HW
> > > > >> segmentation
> > > > >> > > and SW segmentation
> > > > >> > > > are indeed consistent.
> > > > >> > > >
> > > > >> > > > If I understand it correctly, applications need to set 'ol_flags =
> > > > >> > > PKT_TX_IPV4' and
> > > > >> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> > > > >> > > "ether+ipv4+udp+vxlan+ether+ipv4+
> > > > >> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner L3
> > > type
> > > > >for
> > > > >> > > tunneled packet.
> > > > >> > > > How about the outer L3 type? Always assume the inner and the
> > > outer L3
> > > > >> > > type are the same?
> > > > >> > >
> > > > >> > > It think that for that case you'll have to set in ol_flags:
> > > > >> > >
> > > > >> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN |
> > > > >> > > PKT_TX_TCP_SEG
> > > > >> >
> > > > >> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO. The
> > > > >> > GSO library doesn't need gso_types anymore.
> > > > >>
> > > > >> You still might need gso_ctx.gso_types to let user limit what types of
> > > > >> segmentation
> > > > >> that particular gso_ctx supports.
> > > > >> An alternative would be to assume that each gso_ctx supports all
> > > > >> currently implemented segmentations.
> > > > >> This is possible too, but probably not very convenient to the user.
> > > > >
> > > > >Hmm, make sense.
> > > > >
> > > > >One thing to confirm: the value of gso_types should be
> > > DEV_TX_OFFLOAD_*_TSO,
> > > > >or new macros?
> > > >
> > > > Hi Jiayu, Konstantin,
> > > >
> > > > I think that the existing macros are fine, as they provide a consistent view
> > > of segmentation capabilities to the application/user.
> > >
> > > +1
> > > I also think it is better to re-use DEV_TX_OFFLOAD_*_TSO.
> >
> > There might be an 'issue', if we use 'PKT_TX_TCP_SEG' to tell the
> > GSO library to segment a packet or not. Given the scenario that
> > an application only wants to do GSO and doesn't want to use TSO.
> > The application sets 'mbuf->ol_flags=PKT_TX_TCP_SEG' and doesn't
> > set mbuf->tso_segsz. Then the GSO library segments the packet, and
> > all output GSO segments have the same ol_flags as the input packet
> > (in current GSO library design). Then the output GSO segments are
> > transmitted to rte_eth_tx_prepare(). If the NIC is i40e, its TX prepare function,
> > i40e_prep_pkts, checks if mbuf->tso_segsz is in the range of I40E_MIN_TSO_MSS
> > and I40E_MAX_TSO_MSS, when PKT_TX_TCP_SEG is set. So an error happens in
> > this scenario, since tso_segsz is 0.
> >
> > In fact, it may confuse the PMD driver when set PKT_TX_TCP_SEG but don't want
> > to do TSO. One solution is that the GSO library removes the PKT_TX_TCP_SEG flag
> > for all GSO segments after finishes segmenting.
> 
> Yes, that was my thought too: after successful segmentation we probably
> need to cleanup related ol_flags.

In fact, we just don't need to set these flags in our newly created segments.

> Konstantin
> 
> > Wonder you and Mark's opinion.
> >
> > Thanks,
> > Jiayu
> > >
> > > >
> > > > I was initially concerned that they might be too coarse-grained (i.e. only
> > > IPv4 is currently supported, and not IPv6), but as per Konstantin's
> > > > previous example, the DEV_TX_OFFLOAD_*_TSO macros can be used in
> > > concert with the packet type to determine whether a packet should
> > > > be fragmented or not.
> > > >
> > > > Thanks,
> > > > Mark
> > > >
> > > > >
> > > > >Jiayu
> > > > >> Konstantin
> > > > >>
> > > > >> >
> > > > >> > The first choice makes HW and SW segmentation are totally the same.
> > > > >> > Applications just need to parse the packet and set proper ol_flags, and
> > > > >> > the GSO library uses ol_flags to decide which segmentation function to
> > > > >use.
> > > > >> > I think it's better than the second choice which depending on ptype to
> > > > >> > choose segmentation function.
> > > > >> >
> > > > >> > Jiayu
> > > > >> > >
> > > > >> > > Konstantin
> > > > >> > >
> > > > >> > > >
> > > > >> > > > Jiayu
> > > > >> > > > > Konstantin
  
Hu, Jiayu Sept. 15, 2017, 8:38 a.m. UTC | #28
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, September 15, 2017 4:17 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, September 15, 2017 9:16 AM
> > To: Hu, Jiayu <jiayu.hu@intel.com>; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>
> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> > Hi Jiayu,
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Friday, September 15, 2017 8:55 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kavanagh,
> Mark B <mark.b.kavanagh@intel.com>
> > > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > >
> > > Hi Konstantin,
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Friday, September 15, 2017 2:39 AM
> > > > To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Hu, Jiayu
> > > > <jiayu.hu@intel.com>
> > > > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Kavanagh, Mark B
> > > > > Sent: Thursday, September 14, 2017 4:42 PM
> > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > >
> > > > > >From: Hu, Jiayu
> > > > > >Sent: Thursday, September 14, 2017 11:01 AM
> > > > > >To: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> Kavanagh,
> > > > Mark B
> > > > > ><mark.b.kavanagh@intel.com>
> > > > > >Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > > > > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > >
> > > > > >Hi Konstantin and Mark,
> > > > > >
> > > > > >> -----Original Message-----
> > > > > >> From: Ananyev, Konstantin
> > > > > >> Sent: Thursday, September 14, 2017 5:36 PM
> > > > > >> To: Hu, Jiayu <jiayu.hu@intel.com>
> > > > > >> Cc: dev@dpdk.org; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>;
> > > > Tan,
> > > > > >> Jianfeng <jianfeng.tan@intel.com>
> > > > > >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> > -----Original Message-----
> > > > > >> > From: Hu, Jiayu
> > > > > >> > Sent: Thursday, September 14, 2017 10:29 AM
> > > > > >> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > >> > Cc: dev@dpdk.org; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>;
> > > > Tan,
> > > > > >> Jianfeng <jianfeng.tan@intel.com>
> > > > > >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > >> >
> > > > > >> > Hi Konstantin,
> > > > > >> >
> > > > > >> > > -----Original Message-----
> > > > > >> > > From: Ananyev, Konstantin
> > > > > >> > > Sent: Thursday, September 14, 2017 4:47 PM
> > > > > >> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > > > >> > > Cc: dev@dpdk.org; Kavanagh, Mark B
> > > > <mark.b.kavanagh@intel.com>;
> > > > > >> Tan,
> > > > > >> > > Jianfeng <jianfeng.tan@intel.com>
> > > > > >> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > >> > >
> > > > > >> > > Hi Jiayu,
> > > > > >> > >
> > > > > >> > > > -----Original Message-----
> > > > > >> > > > From: Hu, Jiayu
> > > > > >> > > > Sent: Thursday, September 14, 2017 7:07 AM
> > > > > >> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > >> > > > Cc: dev@dpdk.org; Kavanagh, Mark B
> > > > <mark.b.kavanagh@intel.com>;
> > > > > >> Tan,
> > > > > >> > > Jianfeng <jianfeng.tan@intel.com>
> > > > > >> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> > > > > >> > > >
> > > > > >> > > > Hi Konstantin,
> > > > > >> > > >
> > > > > >> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev,
> Konstantin
> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > Hi Jiayu,
> > > > > >> > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > > -----Original Message-----
> > > > > >> > > > > > > > From: Ananyev, Konstantin
> > > > > >> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM
> > > > > >> > > > > > > > To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> > > > > >> > > > > > > > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> Tan,
> > > > > >> Jianfeng
> > > > > >> > > <jianfeng.tan@intel.com>
> > > > > >> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO
> support
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > result, when all of its GSOed segments are freed, the
> > > > packet
> > > > > >is
> > > > > >> > > freed
> > > > > >> > > > > > > > > automatically.
> > > > > >> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c
> > > > > >b/lib/librte_gso/rte_gso.c
> > > > > >> > > > > > > > > index dda50ee..95f6ea6 100644
> > > > > >> > > > > > > > > --- a/lib/librte_gso/rte_gso.c
> > > > > >> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c
> > > > > >> > > > > > > > > @@ -33,18 +33,53 @@
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >  #include <errno.h>
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > +#include <rte_log.h>
> > > > > >> > > > > > > > > +
> > > > > >> > > > > > > > >  #include "rte_gso.h"
> > > > > >> > > > > > > > > +#include "gso_common.h"
> > > > > >> > > > > > > > > +#include "gso_tcp4.h"
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >  int
> > > > > >> > > > > > > > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > > > >> > > > > > > > > -		struct rte_gso_ctx gso_ctx
> __rte_unused,
> > > > > >> > > > > > > > > +		struct rte_gso_ctx gso_ctx,
> > > > > >> > > > > > > > >  		struct rte_mbuf **pkts_out,
> > > > > >> > > > > > > > >  		uint16_t nb_pkts_out)
> > > > > >> > > > > > > > >  {
> > > > > >> > > > > > > > > +	struct rte_mempool *direct_pool,
> *indirect_pool;
> > > > > >> > > > > > > > > +	struct rte_mbuf *pkt_seg;
> > > > > >> > > > > > > > > +	uint16_t gso_size;
> > > > > >> > > > > > > > > +	uint8_t ipid_delta;
> > > > > >> > > > > > > > > +	int ret = 1;
> > > > > >> > > > > > > > > +
> > > > > >> > > > > > > > >  	if (pkt == NULL || pkts_out == NULL ||
> nb_pkts_out
> > > > < 1)
> > > > > >> > > > > > > > >  		return -EINVAL;
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > -	pkts_out[0] = pkt;
> > > > > >> > > > > > > > > +	if (gso_ctx.gso_size >= pkt->pkt_len ||
> > > > > >> > > > > > > > > +			(pkt->packet_type &
> > > > gso_ctx.gso_types) !=
> > > > > >> > > > > > > > > +			pkt->packet_type) {
> > > > > >> > > > > > > > > +		pkts_out[0] = pkt;
> > > > > >> > > > > > > > > +		return ret;
> > > > > >> > > > > > > > > +	}
> > > > > >> > > > > > > > > +
> > > > > >> > > > > > > > > +	direct_pool = gso_ctx.direct_pool;
> > > > > >> > > > > > > > > +	indirect_pool = gso_ctx.indirect_pool;
> > > > > >> > > > > > > > > +	gso_size = gso_ctx.gso_size;
> > > > > >> > > > > > > > > +	ipid_delta = gso_ctx.ipid_flag ==
> > > > RTE_GSO_IPID_INCREASE;
> > > > > >> > > > > > > > > +
> > > > > >> > > > > > > > > +	if (is_ipv4_tcp(pkt->packet_type)) {
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Probably we need here:
> > > > > >> > > > > > > > If (is_ipv4_tcp(pkt->packet_type)  && (gso_ctx-
> >gso_types
> > > > &
> > > > > >> > > DEV_TX_OFFLOAD_TCP_TSO) != 0) {...
> > > > > >> > > > > > >
> > > > > >> > > > > > > Sorry, actually it probably should be:
> > > > > >> > > > > > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) ==
> > > > > >> PKT_TX_IPV4
> > > > > >> > > &&
> > > > > >> > > > > > >       (gso_ctx->gso_types &
> DEV_TX_OFFLOAD_TCP_TSO) != 0)
> > > > {...
> > > > > >> > > > > >
> > > > > >> > > > > > I don't quite understand why the GSO library should be
> aware if
> > > > > >the
> > > > > >> TSO
> > > > > >> > > > > > flag is set or not. Applications can query device TSO
> capability
> > > > > >> before
> > > > > >> > > > > > they call the GSO library. Do I misundertsand anything?
> > > > > >> > > > > >
> > > > > >> > > > > > Additionally, we don't need to check if the packet is a
> TCP/IPv4
> > > > > >> packet
> > > > > >> > > here?
> > > > > >> > > > >
> > > > > >> > > > > Well, right now  PMD we doesn't rely on ptype to figure out
> what
> > > > > >type
> > > > > >> of
> > > > > >> > > packet and
> > > > > >> > > > > what TX offload have to be performed.
> > > > > >> > > > > Instead it looks at TX part of ol_flags, and
> > > > > >> > > > > My thought was that as what we doing is actually TSO in SW,
> it
> > > > would
> > > > > >> be
> > > > > >> > > good
> > > > > >> > > > > to use the same API here too.
> > > > > >> > > > > Also with that approach, by setting ol_flags properly user
> can use
> > > > > >the
> > > > > >> > > same gso_ctx and still
> > > > > >> > > > > specify what segmentation to perform on a per-packet
> basis.
> > > > > >> > > > >
> > > > > >> > > > > Alternative way is to rely on ptype to distinguish should
> > > > > >segmentation
> > > > > >> be
> > > > > >> > > performed on that package or not.
> > > > > >> > > > > The only advantage I see here is that if someone would like
> to
> > > > add
> > > > > >> GSO
> > > > > >> > > for some new protocol,
> > > > > >> > > > > he wouldn't need to introduce new TX flag value for
> > > > mbuf.ol_flags.
> > > > > >> > > > > Though he still would need to update TX_OFFLOAD_*
> capabilities
> > > > and
> > > > > >> > > probably packet_type definitions.
> > > > > >> > > > >
> > > > > >> > > > > So from my perspective first variant (use HW TSO API) is
> more
> > > > > >> plausible.
> > > > > >> > > > > Wonder what is your and Mark opinions here?
> > > > > >> > > >
> > > > > >> > > > In the first choice, you mean:
> > > > > >> > > > the GSO library uses gso_ctx->gso_types and mbuf->ol_flags
> to call
> > > > a
> > > > > >> > > specific GSO
> > > > > >> > > > segmentation function (e.g. gso_tcp4_segment(),
> gso_tunnel_xxx())
> > > > for
> > > > > >> > > each input packet.
> > > > > >> > > > Applications should parse the packet type, and set an exactly
> > > > correct
> > > > > >> > > DEV_TX_OFFLOAD_*_TSO
> > > > > >> > > > flag to gso_types and ol_flags according to the packet type.
> That is,
> > > > > >the
> > > > > >> > > value of gso_types
> > > > > >> > > > is on a per-packet basis. Using gso_ctx->gso_types and mbuf-
> > > > >ol_flags
> > > > > >> at
> > > > > >> > > the same time
> > > > > >> > > > is because that DEV_TX_OFFLOAD_*_TSO only tells tunnelling
> type
> > > > and
> > > > > >> the
> > > > > >> > > inner L4 type, and
> > > > > >> > > > we need to know L3 type by ol_flags. With this design, HW
> > > > > >> segmentation
> > > > > >> > > and SW segmentation
> > > > > >> > > > are indeed consistent.
> > > > > >> > > >
> > > > > >> > > > If I understand it correctly, applications need to set 'ol_flags =
> > > > > >> > > PKT_TX_IPV4' and
> > > > > >> > > > 'gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO' for a
> > > > > >> > > "ether+ipv4+udp+vxlan+ether+ipv4+
> > > > > >> > > > tcp+payload" packet. But PKT_TX_IPV4 just present the inner
> L3
> > > > type
> > > > > >for
> > > > > >> > > tunneled packet.
> > > > > >> > > > How about the outer L3 type? Always assume the inner and
> the
> > > > outer L3
> > > > > >> > > type are the same?
> > > > > >> > >
> > > > > >> > > It think that for that case you'll have to set in ol_flags:
> > > > > >> > >
> > > > > >> > > PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 |
> PKT_TX_TUNNEL_VXLAN |
> > > > > >> > > PKT_TX_TCP_SEG
> > > > > >> >
> > > > > >> > OK, so it means PKT_TX_TCP_SEG is also used for tunneled TSO.
> The
> > > > > >> > GSO library doesn't need gso_types anymore.
> > > > > >>
> > > > > >> You still might need gso_ctx.gso_types to let user limit what types
> of
> > > > > >> segmentation
> > > > > >> that particular gso_ctx supports.
> > > > > >> An alternative would be to assume that each gso_ctx supports all
> > > > > >> currently implemented segmentations.
> > > > > >> This is possible too, but probably not very convenient to the user.
> > > > > >
> > > > > >Hmm, make sense.
> > > > > >
> > > > > >One thing to confirm: the value of gso_types should be
> > > > DEV_TX_OFFLOAD_*_TSO,
> > > > > >or new macros?
> > > > >
> > > > > Hi Jiayu, Konstantin,
> > > > >
> > > > > I think that the existing macros are fine, as they provide a consistent
> view
> > > > of segmentation capabilities to the application/user.
> > > >
> > > > +1
> > > > I also think it is better to re-use DEV_TX_OFFLOAD_*_TSO.
> > >
> > > There might be an 'issue', if we use 'PKT_TX_TCP_SEG' to tell the
> > > GSO library to segment a packet or not. Given the scenario that
> > > an application only wants to do GSO and doesn't want to use TSO.
> > > The application sets 'mbuf->ol_flags=PKT_TX_TCP_SEG' and doesn't
> > > set mbuf->tso_segsz. Then the GSO library segments the packet, and
> > > all output GSO segments have the same ol_flags as the input packet
> > > (in current GSO library design). Then the output GSO segments are
> > > transmitted to rte_eth_tx_prepare(). If the NIC is i40e, its TX prepare
> function,
> > > i40e_prep_pkts, checks if mbuf->tso_segsz is in the range of
> I40E_MIN_TSO_MSS
> > > and I40E_MAX_TSO_MSS, when PKT_TX_TCP_SEG is set. So an error
> happens in
> > > this scenario, since tso_segsz is 0.
> > >
> > > In fact, it may confuse the PMD driver when set PKT_TX_TCP_SEG but
> don't want
> > > to do TSO. One solution is that the GSO library removes the
> PKT_TX_TCP_SEG flag
> > > for all GSO segments after finishes segmenting.
> >
> > Yes, that was my thought too: after successful segmentation we probably
> > need to cleanup related ol_flags.
> 
> In fact, we just don't need to set these flags in our newly created segments.

+1. PKT_TX_TCP_SEG is not needed, but others, like PKT_TX_IPV4, should be
kept, since they may also be used by other HW offloadings, like csum.

Thanks,
Jiayu
> 
> > Konstantin
> >
> > > Wonder you and Mark's opinion.
> > >
> > > Thanks,
> > > Jiayu
> > > >
> > > > >
> > > > > I was initially concerned that they might be too coarse-grained (i.e.
> only
> > > > IPv4 is currently supported, and not IPv6), but as per Konstantin's
> > > > > previous example, the DEV_TX_OFFLOAD_*_TSO macros can be used
> in
> > > > concert with the packet type to determine whether a packet should
> > > > > be fragmented or not.
> > > > >
> > > > > Thanks,
> > > > > Mark
> > > > >
> > > > > >
> > > > > >Jiayu
> > > > > >> Konstantin
> > > > > >>
> > > > > >> >
> > > > > >> > The first choice makes HW and SW segmentation are totally the
> same.
> > > > > >> > Applications just need to parse the packet and set proper ol_flags,
> and
> > > > > >> > the GSO library uses ol_flags to decide which segmentation
> function to
> > > > > >use.
> > > > > >> > I think it's better than the second choice which depending on
> ptype to
> > > > > >> > choose segmentation function.
> > > > > >> >
> > > > > >> > Jiayu
> > > > > >> > >
> > > > > >> > > Konstantin
> > > > > >> > >
> > > > > >> > > >
> > > > > >> > > > Jiayu
> > > > > >> > > > > Konstantin
  

Patch

diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h
index ec8dba7..2fa1199 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -87,6 +87,7 @@  extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
 #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
 #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
+#define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
 
 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1     24 /**< User-defined log type 1. */
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index aeaacbc..2be64d1 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -42,6 +42,8 @@  LIBABIVER := 1
 
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
new file mode 100644
index 0000000..7c32e03
--- /dev/null
+++ b/lib/librte_gso/gso_common.c
@@ -0,0 +1,202 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdbool.h>
+#include <errno.h>
+
+#include <rte_memcpy.h>
+#include <rte_mempool.h>
+#include <rte_ether.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "gso_common.h"
+
+static inline void
+hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
+		uint16_t pkt_hdr_offset)
+{
+	/* Copy MBUF metadata */
+	hdr_segment->nb_segs = 1;
+	hdr_segment->port = pkt->port;
+	hdr_segment->ol_flags = pkt->ol_flags;
+	hdr_segment->packet_type = pkt->packet_type;
+	hdr_segment->pkt_len = pkt_hdr_offset;
+	hdr_segment->data_len = pkt_hdr_offset;
+	hdr_segment->tx_offload = pkt->tx_offload;
+
+	/* Copy the packet header */
+	rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *),
+			rte_pktmbuf_mtod(pkt, char *),
+			pkt_hdr_offset);
+}
+
+static inline void
+free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts)
+{
+	uint16_t i;
+
+	for (i = 0; i < nb_pkts; i++)
+		rte_pktmbuf_free(pkts[i]);
+}
+
+int
+gso_do_segment(struct rte_mbuf *pkt,
+		uint16_t pkt_hdr_offset,
+		uint16_t pyld_unit_size,
+		struct rte_mempool *direct_pool,
+		struct rte_mempool *indirect_pool,
+		struct rte_mbuf **pkts_out,
+		uint16_t nb_pkts_out)
+{
+	struct rte_mbuf *pkt_in;
+	struct rte_mbuf *hdr_segment, *pyld_segment, *prev_segment;
+	uint16_t pkt_in_data_pos, segment_bytes_remaining;
+	uint16_t pyld_len, nb_segs;
+	bool more_in_pkt, more_out_segs;
+
+	pkt_in = pkt;
+	nb_segs = 0;
+	more_in_pkt = 1;
+	pkt_in_data_pos = pkt_hdr_offset;
+
+	while (more_in_pkt) {
+		if (unlikely(nb_segs >= nb_pkts_out)) {
+			free_gso_segment(pkts_out, nb_segs);
+			return -EINVAL;
+		}
+
+		/* Allocate a direct MBUF */
+		hdr_segment = rte_pktmbuf_alloc(direct_pool);
+		if (unlikely(hdr_segment == NULL)) {
+			free_gso_segment(pkts_out, nb_segs);
+			return -ENOMEM;
+		}
+		/* Fill the packet header */
+		hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset);
+
+		prev_segment = hdr_segment;
+		segment_bytes_remaining = pyld_unit_size;
+		more_out_segs = 1;
+
+		while (more_out_segs && more_in_pkt) {
+			/* Allocate an indirect MBUF */
+			pyld_segment = rte_pktmbuf_alloc(indirect_pool);
+			if (unlikely(pyld_segment == NULL)) {
+				rte_pktmbuf_free(hdr_segment);
+				free_gso_segment(pkts_out, nb_segs);
+				return -ENOMEM;
+			}
+			/* Attach to current MBUF segment of pkt */
+			rte_pktmbuf_attach(pyld_segment, pkt_in);
+
+			prev_segment->next = pyld_segment;
+			prev_segment = pyld_segment;
+
+			pyld_len = segment_bytes_remaining;
+			if (pyld_len + pkt_in_data_pos > pkt_in->data_len)
+				pyld_len = pkt_in->data_len - pkt_in_data_pos;
+
+			pyld_segment->data_off = pkt_in_data_pos +
+				pkt_in->data_off;
+			pyld_segment->data_len = pyld_len;
+
+			/* Update header segment */
+			hdr_segment->pkt_len += pyld_len;
+			hdr_segment->nb_segs++;
+
+			pkt_in_data_pos += pyld_len;
+			segment_bytes_remaining -= pyld_len;
+
+			/* Finish processing a MBUF segment of pkt */
+			if (pkt_in_data_pos == pkt_in->data_len) {
+				pkt_in = pkt_in->next;
+				pkt_in_data_pos = 0;
+				if (pkt_in == NULL)
+					more_in_pkt = 0;
+			}
+
+			/* Finish generating a GSO segment */
+			if (segment_bytes_remaining == 0)
+				more_out_segs = 0;
+		}
+		pkts_out[nb_segs++] = hdr_segment;
+	}
+	return nb_segs;
+}
+
+static inline void
+update_inner_tcp4_header(struct rte_mbuf *pkt, uint8_t ipid_delta,
+		struct rte_mbuf **segs, uint16_t nb_segs)
+{
+	struct tcp_hdr *tcp_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct rte_mbuf *seg;
+	uint32_t sent_seq;
+	uint16_t inner_l2_offset;
+	uint16_t id, i;
+
+	inner_l2_offset = pkt->outer_l2_len + pkt->outer_l3_len + pkt->l2_len;
+	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			inner_l2_offset);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+	id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	for (i = 0; i < nb_segs; i++) {
+		seg = segs[i];
+		/* Update the inner IPv4 header */
+		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(seg, char *) +
+				inner_l2_offset);
+		ipv4_hdr->total_length = rte_cpu_to_be_16(seg->pkt_len -
+				inner_l2_offset);
+		ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
+		id += ipid_delta;
+
+		/* Update the inner TCP header */
+		tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + seg->l3_len);
+		tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq);
+		if (likely(i < nb_segs - 1))
+			tcp_hdr->tcp_flags &= (~(TCP_HDR_PSH_MASK |
+						TCP_HDR_FIN_MASK));
+		sent_seq += (seg->pkt_len - seg->data_len);
+	}
+}
+
+void
+gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
+		struct rte_mbuf **segs, uint16_t nb_segs)
+{
+	if (is_ipv4_tcp(pkt->packet_type))
+		update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
+}
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
new file mode 100644
index 0000000..3c76520
--- /dev/null
+++ b/lib/librte_gso/gso_common.h
@@ -0,0 +1,113 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _GSO_COMMON_H_
+#define _GSO_COMMON_H_
+
+#include <stdint.h>
+#include <rte_mbuf.h>
+
+#define IPV4_HDR_DF_SHIFT 14
+#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
+
+#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
+#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
+
+#define ETHER_TCP_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L4_TCP)
+#define ETHER_VLAN_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | RTE_PTYPE_L4_TCP)
+static inline uint8_t is_ipv4_tcp(uint32_t ptype)
+{
+	switch (ptype & (~RTE_PTYPE_L3_MASK)) {
+	case ETHER_VLAN_TCP_PKT:
+	case ETHER_TCP_PKT:
+		return RTE_ETH_IS_IPV4_HDR(ptype);
+	default:
+		return 0;
+	}
+}
+
+/**
+ * Internal function which updates relevant packet headers, following
+ * segmentation. This is required to update, for example, the IPv4
+ * 'total_length' field, to reflect the reduced length of the now-
+ * segmented packet.
+ *
+ * @param pkt
+ *  The original packet.
+ * @param ipid_delta
+ *  The increasing uint of IP ids.
+ * @param segs
+ *  Pointer array used for storing mbuf addresses for GSO segments.
+ * @param nb_segs
+ *  The number of GSO segments placed in segs.
+ */
+void gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
+		struct rte_mbuf **segs, uint16_t nb_segs);
+
+/**
+ * Internal function which divides the input packet into small segments.
+ * Each of the newly-created segments is organized as a two-segment MBUF,
+ * where the first segment is a standard mbuf, which stores a copy of
+ * packet header, and the second is an indirect mbuf which points to a
+ * section of data in the input packet.
+ *
+ * @param pkt
+ *  Packet to segment.
+ * @param pkt_hdr_offset
+ *  Packet header offset, measured in bytes.
+ * @param pyld_unit_size
+ *  The max payload length of a GSO segment.
+ * @param direct_pool
+ *  MBUF pool used for allocating direct buffers for output segments.
+ * @param indirect_pool
+ *  MBUF pool used for allocating indirect buffers for output segments.
+ * @param pkts_out
+ *  Pointer array used to keep the mbuf addresses of output segments. If
+ *  the memory space in pkts_out is insufficient, gso_do_segment() fails
+ *  and returns -EINVAL.
+ * @param nb_pkts_out
+ *  The max number of items that pkts_out can keep.
+ *
+ * @return
+ *  - The number of segments created in the event of success.
+ *  - Return -ENOMEM if run out of memory in MBUF pools.
+ *  - Return -EINVAL for invalid parameters.
+ */
+int gso_do_segment(struct rte_mbuf *pkt,
+		uint16_t pkt_hdr_offset,
+		uint16_t pyld_unit_size,
+		struct rte_mempool *direct_pool,
+		struct rte_mempool *indirect_pool,
+		struct rte_mbuf **pkts_out,
+		uint16_t nb_pkts_out);
+#endif
diff --git a/lib/librte_gso/gso_tcp4.c b/lib/librte_gso/gso_tcp4.c
new file mode 100644
index 0000000..8d4bfb2
--- /dev/null
+++ b/lib/librte_gso/gso_tcp4.c
@@ -0,0 +1,83 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+#include <rte_ether.h>
+#include <rte_ip.h>
+
+#include "gso_common.h"
+#include "gso_tcp4.h"
+
+int
+gso_tcp4_segment(struct rte_mbuf *pkt,
+		uint16_t gso_size,
+		uint8_t ipid_delta,
+		struct rte_mempool *direct_pool,
+		struct rte_mempool *indirect_pool,
+		struct rte_mbuf **pkts_out,
+		uint16_t nb_pkts_out)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t tcp_dl;
+	uint16_t pyld_unit_size;
+	uint16_t hdr_offset;
+	int ret = 1;
+
+	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			pkt->l2_len);
+	/* Don't process the fragmented packet */
+	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
+						IPV4_HDR_DF_MASK)) == 0)) {
+		pkts_out[0] = pkt;
+		return ret;
+	}
+
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
+		pkt->l4_len;
+	/* Don't process the packet without data */
+	if (unlikely(tcp_dl == 0)) {
+		pkts_out[0] = pkt;
+		return ret;
+	}
+
+	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
+	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
+
+	/* Segment the payload */
+	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
+			indirect_pool, pkts_out, nb_pkts_out);
+	if (ret > 1)
+		gso_update_pkt_headers(pkt, ipid_delta, pkts_out, ret);
+
+	return ret;
+}
diff --git a/lib/librte_gso/gso_tcp4.h b/lib/librte_gso/gso_tcp4.h
new file mode 100644
index 0000000..9c07984
--- /dev/null
+++ b/lib/librte_gso/gso_tcp4.h
@@ -0,0 +1,76 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _GSO_TCP4_H_
+#define _GSO_TCP4_H_
+
+#include <stdint.h>
+#include <rte_mbuf.h>
+
+/**
+ * Segment an IPv4/TCP packet. This function assumes the input packet has
+ * correct checksums and doesn't update checksums for GSO segment.
+ * Furthermore, it doesn't process IP fragment packets.
+ *
+ * @param pkt
+ *  The packet mbuf to segment.
+ * @param gso_size
+ *  The max length of a GSO segment, measured in bytes.
+ * @param ipid_delta
+ *  The increasing uint of IP ids.
+ * @param direct_pool
+ *  MBUF pool used for allocating direct buffers for output segments.
+ * @param indirect_pool
+ *  MBUF pool used for allocating indirect buffers for output segments.
+ * @param pkts_out
+ *  Pointer array used to store the MBUF addresses of output GSO
+ *  segments, when gso_tcp4_segment() successes. If the memory space in
+ *  pkts_out is insufficient, gso_tcp4_segment() fails and returns
+ *  -EINVAL.
+ * @param nb_pkts_out
+ *  The max number of items that 'pkts_out' can keep.
+ *
+ * @return
+ *   - The number of GSO segments filled in pkts_out on success.
+ *   - Return -ENOMEM if run out of memory in MBUF pools.
+ *   - Return -EINVAL for invalid parameters.
+ */
+int gso_tcp4_segment(struct rte_mbuf *pkt,
+		uint16_t gso_size,
+		uint8_t ip_delta,
+		struct rte_mempool *direct_pool,
+		struct rte_mempool *indirect_pool,
+		struct rte_mbuf **pkts_out,
+		uint16_t nb_pkts_out);
+
+#endif
diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
index dda50ee..95f6ea6 100644
--- a/lib/librte_gso/rte_gso.c
+++ b/lib/librte_gso/rte_gso.c
@@ -33,18 +33,53 @@ 
 
 #include <errno.h>
 
+#include <rte_log.h>
+
 #include "rte_gso.h"
+#include "gso_common.h"
+#include "gso_tcp4.h"
 
 int
 rte_gso_segment(struct rte_mbuf *pkt,
-		struct rte_gso_ctx gso_ctx __rte_unused,
+		struct rte_gso_ctx gso_ctx,
 		struct rte_mbuf **pkts_out,
 		uint16_t nb_pkts_out)
 {
+	struct rte_mempool *direct_pool, *indirect_pool;
+	struct rte_mbuf *pkt_seg;
+	uint16_t gso_size;
+	uint8_t ipid_delta;
+	int ret = 1;
+
 	if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1)
 		return -EINVAL;
 
-	pkts_out[0] = pkt;
+	if (gso_ctx.gso_size >= pkt->pkt_len ||
+			(pkt->packet_type & gso_ctx.gso_types) !=
+			pkt->packet_type) {
+		pkts_out[0] = pkt;
+		return ret;
+	}
+
+	direct_pool = gso_ctx.direct_pool;
+	indirect_pool = gso_ctx.indirect_pool;
+	gso_size = gso_ctx.gso_size;
+	ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE;
+
+	if (is_ipv4_tcp(pkt->packet_type)) {
+		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
+				direct_pool, indirect_pool,
+				pkts_out, nb_pkts_out);
+	} else
+		RTE_LOG(WARNING, GSO, "Unsupported packet type\n");
+
+	if (ret > 1) {
+		pkt_seg = pkt;
+		while (pkt_seg) {
+			rte_mbuf_refcnt_update(pkt_seg, -1);
+			pkt_seg = pkt_seg->next;
+		}
+	}
 
-	return 1;
+	return ret;
 }