[2/5] net/tap: do not touch Tx offload flags

Message ID 20210401095243.18211-3-david.marchand@redhat.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series Offload flags fixes |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

David Marchand April 1, 2021, 9:52 a.m. UTC
  Tx offload flags are of the application responsibility.
Leave the mbuf alone and check for TSO where needed.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/net/tap/rte_eth_tap.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)
  

Comments

Flavio Leitner April 7, 2021, 8:15 p.m. UTC | #1
On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and check for TSO where needed.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

The patch looks good, but maybe a better approach would be
to change the documentation to require the TCP_CKSUM flag
when TCP_SEG is used, otherwise this flag adjusting needs
to be replicated every time TCP_SEG is used.

The above could break existing applications, so perhaps doing
something like below would be better and backwards compatible?
Then we can remove those places tweaking the flags completely.

diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index c17dc95c5..6a0c2cdd9 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -298,7 +298,7 @@ extern "C" {
  *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
  *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
  */
-#define PKT_TX_TCP_SEG       (1ULL << 50)
+#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
 
 /** TX IEEE1588 packet to timestamp. */
 #define PKT_TX_IEEE1588_TMST (1ULL << 51)

Thanks,
fbl
  
Olivier Matz April 8, 2021, 7:41 a.m. UTC | #2
On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > Tx offload flags are of the application responsibility.
> > Leave the mbuf alone and check for TSO where needed.
> > 
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> 
> The patch looks good, but maybe a better approach would be
> to change the documentation to require the TCP_CKSUM flag
> when TCP_SEG is used, otherwise this flag adjusting needs
> to be replicated every time TCP_SEG is used.
> 
> The above could break existing applications, so perhaps doing
> something like below would be better and backwards compatible?
> Then we can remove those places tweaking the flags completely.

As a first step, I suggest to document that:
- applications must set TCP_CKSUM when setting TCP_SEG
- pmds must suppose that TCP_CKSUM is set when TCP_SEG is set

This is clearer that what we have today, and I think it does not break
anything. This will guide apps in the correct direction, facilitating
an eventual future PMD change.

> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index c17dc95c5..6a0c2cdd9 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -298,7 +298,7 @@ extern "C" {
>   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
>   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
>   */
> -#define PKT_TX_TCP_SEG       (1ULL << 50)
> +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
>  
>  /** TX IEEE1588 packet to timestamp. */
>  #define PKT_TX_IEEE1588_TMST (1ULL << 51)

I'm afraid some applications or drivers use extended bit manipulations
to do the conversion from/to another domain (like hardware descriptors
or application-specific flags). They may expect this constant to be a
uniq flag.
  
Olivier Matz April 8, 2021, 7:53 a.m. UTC | #3
On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and check for TSO where needed.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Maybe the problem being solved should be better described in the commit
log. Is it a problem (other than cosmetic) to touch a mbuf in the Tx
function of a driver, where we could expect that the mbuf is owned by
the driver?

The only problem I can think about is in case we transmit a direct mbuf
whose refcnt is increased, but I wonder how much this is really
supported: for instance, several drivers add vlans using
rte_vlan_insert() in their Tx path.


> ---
>  drivers/net/tap/rte_eth_tap.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index c36d4bf76e..285fe395c5 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -562,6 +562,7 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
>  		uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum)
>  {
>  	void *l3_hdr = packet + l2_len;
> +	uint64_t csum_l4;
>  
>  	if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) {
>  		struct rte_ipv4_hdr *iph = l3_hdr;
> @@ -571,13 +572,17 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
>  		cksum = rte_raw_cksum(iph, l3_len);
>  		iph->hdr_checksum = (cksum == 0xffff) ? cksum : ~cksum;
>  	}
> -	if (ol_flags & PKT_TX_L4_MASK) {
> +
> +	csum_l4 = ol_flags & PKT_TX_L4_MASK;
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		csum_l4 |= PKT_TX_TCP_CKSUM;
> +	if (csum_l4) {
>  		void *l4_hdr;
>  
>  		l4_hdr = packet + l2_len + l3_len;
> -		if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM)
> +		if (csum_l4 == PKT_TX_UDP_CKSUM)
>  			*l4_cksum = &((struct rte_udp_hdr *)l4_hdr)->dgram_cksum;
> -		else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM)
> +		else if (csum_l4 == PKT_TX_TCP_CKSUM)
>  			*l4_cksum = &((struct rte_tcp_hdr *)l4_hdr)->cksum;
>  		else
>  			return;
> @@ -648,7 +653,8 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
>  		if (txq->csum &&
>  		    ((mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) ||
>  		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM ||
> -		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM))) {
> +		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) ||
> +		     (mbuf->ol_flags & PKT_TX_TCP_SEG))) {
>  			is_cksum = 1;
>  
>  			/* Support only packets with at least layer 4
> @@ -742,9 +748,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  		if (tso) {
>  			struct rte_gso_ctx *gso_ctx = &txq->gso_ctx;
>  
> -			/* TCP segmentation implies TCP checksum offload */
> -			mbuf_in->ol_flags |= PKT_TX_TCP_CKSUM;
> -
>  			/* gso size is calculated without RTE_ETHER_CRC_LEN */
>  			hdrs_len = mbuf_in->l2_len + mbuf_in->l3_len +
>  					mbuf_in->l4_len;
> -- 
> 2.23.0
>
  
Flavio Leitner April 8, 2021, 11:21 a.m. UTC | #4
On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > Tx offload flags are of the application responsibility.
> > > Leave the mbuf alone and check for TSO where needed.
> > > 
> > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > ---
> > 
> > The patch looks good, but maybe a better approach would be
> > to change the documentation to require the TCP_CKSUM flag
> > when TCP_SEG is used, otherwise this flag adjusting needs
> > to be replicated every time TCP_SEG is used.
> > 
> > The above could break existing applications, so perhaps doing
> > something like below would be better and backwards compatible?
> > Then we can remove those places tweaking the flags completely.
> 
> As a first step, I suggest to document that:
> - applications must set TCP_CKSUM when setting TCP_SEG

That's what I suggest above.

> - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set

But that keeps the problem of implying the TCP_CKSUM flag in
various places.

> This is clearer that what we have today, and I think it does not break
> anything. This will guide apps in the correct direction, facilitating
> an eventual future PMD change.
> 
> > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > index c17dc95c5..6a0c2cdd9 100644
> > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > @@ -298,7 +298,7 @@ extern "C" {
> >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> >   */
> > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> >  
> >  /** TX IEEE1588 packet to timestamp. */
> >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> 
> I'm afraid some applications or drivers use extended bit manipulations
> to do the conversion from/to another domain (like hardware descriptors
> or application-specific flags). They may expect this constant to be a
> uniq flag.

Interesting, do you have an example? Because each flag still has an
separate meaning.
  
Olivier Matz April 8, 2021, 12:05 p.m. UTC | #5
On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > Tx offload flags are of the application responsibility.
> > > > Leave the mbuf alone and check for TSO where needed.
> > > > 
> > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > ---
> > > 
> > > The patch looks good, but maybe a better approach would be
> > > to change the documentation to require the TCP_CKSUM flag
> > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > to be replicated every time TCP_SEG is used.
> > > 
> > > The above could break existing applications, so perhaps doing
> > > something like below would be better and backwards compatible?
> > > Then we can remove those places tweaking the flags completely.
> > 
> > As a first step, I suggest to document that:
> > - applications must set TCP_CKSUM when setting TCP_SEG
> 
> That's what I suggest above.
> 
> > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> 
> But that keeps the problem of implying the TCP_CKSUM flag in
> various places.

Yes. What I propose is just a first step: better document what is the
current expected behavior, before doing something else.

> > This is clearer that what we have today, and I think it does not break
> > anything. This will guide apps in the correct direction, facilitating
> > an eventual future PMD change.
> > 
> > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > index c17dc95c5..6a0c2cdd9 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -298,7 +298,7 @@ extern "C" {
> > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > >   */
> > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> > >  
> > >  /** TX IEEE1588 packet to timestamp. */
> > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > 
> > I'm afraid some applications or drivers use extended bit manipulations
> > to do the conversion from/to another domain (like hardware descriptors
> > or application-specific flags). They may expect this constant to be a
> > uniq flag.
> 
> Interesting, do you have an example? Because each flag still has an
> separate meaning.

Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.

I have in mind operations that are done with tables or vector
instructions inside the drivers, but this is mainly done for Rx, not Tx.
You can look at Tx functions like mlx5_set_cksum_table() or
nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
about.
  
Ananyev, Konstantin April 8, 2021, 12:16 p.m. UTC | #6
> 
> On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > Tx offload flags are of the application responsibility.
> > > > Leave the mbuf alone and check for TSO where needed.
> > > >
> > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > ---
> > >
> > > The patch looks good, but maybe a better approach would be
> > > to change the documentation to require the TCP_CKSUM flag
> > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > to be replicated every time TCP_SEG is used.
> > >
> > > The above could break existing applications, so perhaps doing
> > > something like below would be better and backwards compatible?
> > > Then we can remove those places tweaking the flags completely.
> >
> > As a first step, I suggest to document that:
> > - applications must set TCP_CKSUM when setting TCP_SEG
> 
> That's what I suggest above.
> 
> > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> 
> But that keeps the problem of implying the TCP_CKSUM flag in
> various places.
> 
> > This is clearer that what we have today, and I think it does not break
> > anything. This will guide apps in the correct direction, facilitating
> > an eventual future PMD change.
> >
> > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > index c17dc95c5..6a0c2cdd9 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -298,7 +298,7 @@ extern "C" {
> > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > >   */
> > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM

I think that would be an ABI breakage.

> > >
> > >  /** TX IEEE1588 packet to timestamp. */
> > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> >
> > I'm afraid some applications or drivers use extended bit manipulations
> > to do the conversion from/to another domain (like hardware descriptors
> > or application-specific flags). They may expect this constant to be a
> > uniq flag.
> 
> Interesting, do you have an example? Because each flag still has an
> separate meaning.
  
Flavio Leitner April 8, 2021, 12:58 p.m. UTC | #7
On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > Tx offload flags are of the application responsibility.
> > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > 
> > > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > > ---
> > > > 
> > > > The patch looks good, but maybe a better approach would be
> > > > to change the documentation to require the TCP_CKSUM flag
> > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > to be replicated every time TCP_SEG is used.
> > > > 
> > > > The above could break existing applications, so perhaps doing
> > > > something like below would be better and backwards compatible?
> > > > Then we can remove those places tweaking the flags completely.
> > > 
> > > As a first step, I suggest to document that:
> > > - applications must set TCP_CKSUM when setting TCP_SEG
> > 
> > That's what I suggest above.
> > 
> > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > 
> > But that keeps the problem of implying the TCP_CKSUM flag in
> > various places.
> 
> Yes. What I propose is just a first step: better document what is the
> current expected behavior, before doing something else.
> 
> > > This is clearer that what we have today, and I think it does not break
> > > anything. This will guide apps in the correct direction, facilitating
> > > an eventual future PMD change.
> > > 
> > > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > > index c17dc95c5..6a0c2cdd9 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > @@ -298,7 +298,7 @@ extern "C" {
> > > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > > >   */
> > > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> > > >  
> > > >  /** TX IEEE1588 packet to timestamp. */
> > > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > > 
> > > I'm afraid some applications or drivers use extended bit manipulations
> > > to do the conversion from/to another domain (like hardware descriptors
> > > or application-specific flags). They may expect this constant to be a
> > > uniq flag.
> > 
> > Interesting, do you have an example? Because each flag still has an
> > separate meaning.
> 
> Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.
> 
> I have in mind operations that are done with tables or vector
> instructions inside the drivers, but this is mainly done for Rx, not Tx.
> You can look at Tx functions like mlx5_set_cksum_table() or
> nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
> enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
> about.

I see your point. Going back to improving the documentation as a
first step, what would be the next steps? Are we going to wait few
releases and then remove the flag tweaking code assuming that PMDs
and apps are ok?

Thanks,
  
Olivier Matz April 9, 2021, 1:30 p.m. UTC | #8
On Thu, Apr 08, 2021 at 09:58:35AM -0300, Flavio Leitner wrote:
> On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> > On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > > Tx offload flags are of the application responsibility.
> > > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > > 
> > > > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > > > ---
> > > > > 
> > > > > The patch looks good, but maybe a better approach would be
> > > > > to change the documentation to require the TCP_CKSUM flag
> > > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > > to be replicated every time TCP_SEG is used.
> > > > > 
> > > > > The above could break existing applications, so perhaps doing
> > > > > something like below would be better and backwards compatible?
> > > > > Then we can remove those places tweaking the flags completely.
> > > > 
> > > > As a first step, I suggest to document that:
> > > > - applications must set TCP_CKSUM when setting TCP_SEG
> > > 
> > > That's what I suggest above.
> > > 
> > > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > > 
> > > But that keeps the problem of implying the TCP_CKSUM flag in
> > > various places.
> > 
> > Yes. What I propose is just a first step: better document what is the
> > current expected behavior, before doing something else.
> > 
> > > > This is clearer that what we have today, and I think it does not break
> > > > anything. This will guide apps in the correct direction, facilitating
> > > > an eventual future PMD change.
> > > > 
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > index c17dc95c5..6a0c2cdd9 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > @@ -298,7 +298,7 @@ extern "C" {
> > > > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > > > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > > > >   */
> > > > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> > > > >  
> > > > >  /** TX IEEE1588 packet to timestamp. */
> > > > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > > > 
> > > > I'm afraid some applications or drivers use extended bit manipulations
> > > > to do the conversion from/to another domain (like hardware descriptors
> > > > or application-specific flags). They may expect this constant to be a
> > > > uniq flag.
> > > 
> > > Interesting, do you have an example? Because each flag still has an
> > > separate meaning.
> > 
> > Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.
> > 
> > I have in mind operations that are done with tables or vector
> > instructions inside the drivers, but this is mainly done for Rx, not Tx.
> > You can look at Tx functions like mlx5_set_cksum_table() or
> > nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
> > enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
> > about.
> 
> I see your point. Going back to improving the documentation as a
> first step, what would be the next steps? Are we going to wait few
> releases and then remove the flag tweaking code assuming that PMDs
> and apps are ok?

After this documentation step, in few releases, we could relax the
constraint on PMD: applications will be expected to set TCP_CKSUM when
TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
TCP_SEG is set. The documentation will be updated again.

This plan can be described in the deprecation notice, and later in the
release note.

How does it sound?
  
Flavio Leitner April 9, 2021, 4:55 p.m. UTC | #9
On Fri, Apr 09, 2021 at 03:30:18PM +0200, Olivier Matz wrote:
> On Thu, Apr 08, 2021 at 09:58:35AM -0300, Flavio Leitner wrote:
> > On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> > > On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > > > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > > > Tx offload flags are of the application responsibility.
> > > > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > > > 
> > > > > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > > > > ---
> > > > > > 
> > > > > > The patch looks good, but maybe a better approach would be
> > > > > > to change the documentation to require the TCP_CKSUM flag
> > > > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > > > to be replicated every time TCP_SEG is used.
> > > > > > 
> > > > > > The above could break existing applications, so perhaps doing
> > > > > > something like below would be better and backwards compatible?
> > > > > > Then we can remove those places tweaking the flags completely.
> > > > > 
> > > > > As a first step, I suggest to document that:
> > > > > - applications must set TCP_CKSUM when setting TCP_SEG
> > > > 
> > > > That's what I suggest above.
> > > > 
> > > > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > > > 
> > > > But that keeps the problem of implying the TCP_CKSUM flag in
> > > > various places.
> > > 
> > > Yes. What I propose is just a first step: better document what is the
> > > current expected behavior, before doing something else.
> > > 
> > > > > This is clearer that what we have today, and I think it does not break
> > > > > anything. This will guide apps in the correct direction, facilitating
> > > > > an eventual future PMD change.
[...]
> > I see your point. Going back to improving the documentation as a
> > first step, what would be the next steps? Are we going to wait few
> > releases and then remove the flag tweaking code assuming that PMDs
> > and apps are ok?
> 
> After this documentation step, in few releases, we could relax the
> constraint on PMD: applications will be expected to set TCP_CKSUM when
> TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
> TCP_SEG is set. The documentation will be updated again.
> 
> This plan can be described in the deprecation notice, and later in the
> release note.
> 
> How does it sound?

Works for me.
Thanks,
  
David Marchand April 28, 2021, 12:12 p.m. UTC | #10
On Thu, Apr 8, 2021 at 9:53 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > Tx offload flags are of the application responsibility.
> > Leave the mbuf alone and check for TSO where needed.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>

Self nack on this patch.

>
> Maybe the problem being solved should be better described in the commit
> log. Is it a problem (other than cosmetic) to touch a mbuf in the Tx
> function of a driver, where we could expect that the mbuf is owned by
> the driver?
>
> The only problem I can think about is in case we transmit a direct mbuf
> whose refcnt is increased, but I wonder how much this is really
> supported: for instance, several drivers add vlans using
> rte_vlan_insert() in their Tx path.

This was my initial thought, as I suspected issues with applications
which keep a refcount on the mbuf.
But after more discussions offlist, it is hard to find a usecase for this.

The gso library already touches the ol_flags from the input mbuf and
this is documented in its API.

Plus, my patch also has an issue.
This library gives back an array of direct/indirect mbufs pointing at
the data from the original mbuf.
Those mbufs are populated with the original mbuf ol_flags but nothing
has been done on the checksum in this data.
The net/tap driver relies on this feature: mbuf_in is marked with
PKT_TX_TCP_CKSUM so that the generated mbufs given back by the rte_gso
library has this offload flag set and PKT_TX_TCP_SEG is not present
anymore.
Later in the net/tap tx handler, PKT_TX_TCP_CKSUM presence triggers
tcp checksum computation.
So this patch breaks the tso support in net/tap.


Just for the record.
As far as rte_vlan_insert() is concerned, it ensures that the mbuf is
not shared.
https://git.dpdk.org/dpdk/tree/lib/net/rte_ether.h#n352
  
David Marchand April 28, 2021, 12:17 p.m. UTC | #11
On Fri, Apr 9, 2021 at 3:30 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > I see your point. Going back to improving the documentation as a
> > first step, what would be the next steps? Are we going to wait few
> > releases and then remove the flag tweaking code assuming that PMDs
> > and apps are ok?
>
> After this documentation step, in few releases, we could relax the
> constraint on PMD: applications will be expected to set TCP_CKSUM when
> TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
> TCP_SEG is set. The documentation will be updated again.
>
> This plan can be described in the deprecation notice, and later in the
> release note.

Looking at drivers, some of them already trigger tcp checksumming with
the presence of PKT_TX_TCP_SEG only.
See for example:
https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_rxtx.c#n391

The hw needs to fill tcp checksum when generating segments.
So I suppose this was the original meaning of the "implies" comment.
https://git.dpdk.org/dpdk/tree/lib/mbuf/rte_mbuf_core.h#n292

We could reword the comment, but I don't think there is anything to
change in the API.
  

Patch

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index c36d4bf76e..285fe395c5 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -562,6 +562,7 @@  tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
 		uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum)
 {
 	void *l3_hdr = packet + l2_len;
+	uint64_t csum_l4;
 
 	if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) {
 		struct rte_ipv4_hdr *iph = l3_hdr;
@@ -571,13 +572,17 @@  tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
 		cksum = rte_raw_cksum(iph, l3_len);
 		iph->hdr_checksum = (cksum == 0xffff) ? cksum : ~cksum;
 	}
-	if (ol_flags & PKT_TX_L4_MASK) {
+
+	csum_l4 = ol_flags & PKT_TX_L4_MASK;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		csum_l4 |= PKT_TX_TCP_CKSUM;
+	if (csum_l4) {
 		void *l4_hdr;
 
 		l4_hdr = packet + l2_len + l3_len;
-		if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM)
+		if (csum_l4 == PKT_TX_UDP_CKSUM)
 			*l4_cksum = &((struct rte_udp_hdr *)l4_hdr)->dgram_cksum;
-		else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM)
+		else if (csum_l4 == PKT_TX_TCP_CKSUM)
 			*l4_cksum = &((struct rte_tcp_hdr *)l4_hdr)->cksum;
 		else
 			return;
@@ -648,7 +653,8 @@  tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
 		if (txq->csum &&
 		    ((mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) ||
 		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM ||
-		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM))) {
+		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) ||
+		     (mbuf->ol_flags & PKT_TX_TCP_SEG))) {
 			is_cksum = 1;
 
 			/* Support only packets with at least layer 4
@@ -742,9 +748,6 @@  pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		if (tso) {
 			struct rte_gso_ctx *gso_ctx = &txq->gso_ctx;
 
-			/* TCP segmentation implies TCP checksum offload */
-			mbuf_in->ol_flags |= PKT_TX_TCP_CKSUM;
-
 			/* gso size is calculated without RTE_ETHER_CRC_LEN */
 			hdrs_len = mbuf_in->l2_len + mbuf_in->l3_len +
 					mbuf_in->l4_len;