mbuf: fix reset on mbuf free
diff mbox series

Message ID 20201104170007.8026-1-olivier.matz@6wind.com
State Superseded
Delegated to: David Marchand
Headers show
Series
  • mbuf: fix reset on mbuf free
Related show

Checks

Context Check Description
ci/iol-mellanox-Performance fail Performance Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional fail Functional Testing issues
ci/iol-testing success Testing PASS
ci/iol-broadcom-Functional fail Functional Testing issues
ci/travis-robot success Travis build: passed
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Olivier Matz Nov. 4, 2020, 5 p.m. UTC
m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Freeing this mbuf chain will not restore nb_seg=1 in the second
segment. This is expected that mbufs stored in pool have their
nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.c |  6 ++----
 lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
 2 files changed, 6 insertions(+), 12 deletions(-)

Comments

Ananyev, Konstantin Nov. 5, 2020, 12:15 a.m. UTC | #1
Hi Olivier,
 
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
> 
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
> 
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
> 
> Freeing this mbuf chain will not restore nb_seg=1 in the second
> segment. 

Hmm, not sure why is that?
You are talking about freeing m1, right?
rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
	...
	if (m->next != NULL) {
                        m->next = NULL;
                        m->nb_segs = 1;
                }

m1->next != NULL, so it will enter the if() block,
and will reset both next and nb_segs.
What I am missing here? 
Thinking in more generic way, that change:
 -		if (m->next != NULL) {
 -			m->next = NULL;
 -			m->nb_segs = 1;
 -		}
 +		m->next = NULL;
 +		m->nb_segs = 1;

Assumes that it is ok to have an mbuf with
nb_seg > 1 and next == NULL.
Which seems wrong to me.
Konstantin


>This is expected that mbufs stored in pool have their
> nb_seg field set to 1.
> 
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
>  2 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 8a456e5e64..e632071c23 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> 
>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>  	m->ol_flags = EXT_ATTACHED_MBUF;
> -	if (m->next != NULL) {
> -		m->next = NULL;
> -		m->nb_segs = 1;
> -	}
> +	m->next = NULL;
> +	m->nb_segs = 1;
>  	rte_mbuf_raw_free(m);
>  }
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index a1414ed7cd..ef5800c8ef 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> -			m->next = NULL;
> -			m->nb_segs = 1;
> -		}
> +		m->next = NULL;
> +		m->nb_segs = 1;
> 
>  		return m;
> 
> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> -			m->next = NULL;
> -			m->nb_segs = 1;
> -		}
> +		m->next = NULL;
> +		m->nb_segs = 1;
>  		rte_mbuf_refcnt_set(m, 1);
> 
>  		return m;
> --
> 2.25.1
Olivier Matz Nov. 5, 2020, 7:46 a.m. UTC | #2
On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> 
> Hi Olivier,
>  
> > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > because it can happen that m->nb_seg is != 1. For instance in this
> > case:
> > 
> >   m1 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m1, 500);
> >   m2 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m2, 500);
> >   rte_pktmbuf_chain(m1, m2);
> >   m0 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m0, 500);
> >   rte_pktmbuf_chain(m0, m1);
> > 
> > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > segment (this is not required), after this code the mbuf chain
> > have 3 segments:
> >   - m0: next=m1, nb_seg=3
> >   - m1: next=m2, nb_seg=2
> >   - m2: next=NULL, nb_seg=1
> > 
> > Freeing this mbuf chain will not restore nb_seg=1 in the second
> > segment. 
> 
> Hmm, not sure why is that?
> You are talking about freeing m1, right?
> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> {
> 	...
> 	if (m->next != NULL) {
>                         m->next = NULL;
>                         m->nb_segs = 1;
>                 }
> 
> m1->next != NULL, so it will enter the if() block,
> and will reset both next and nb_segs.
> What I am missing here? 
> Thinking in more generic way, that change:
>  -		if (m->next != NULL) {
>  -			m->next = NULL;
>  -			m->nb_segs = 1;
>  -		}
>  +		m->next = NULL;
>  +		m->nb_segs = 1;

Ah, sorry. I oversimplified the example and now it does not
show the issue...

The full example also adds a split() to break the mbuf chain
between m1 and m2. The kind of thing that would be done for
software TCP segmentation.

After this operation, we have 2 mbuf chain:
 - m0 with 2 segments, the last one has next=NULL but nb_seg=2
 - new_m with 1 segment

Freeing m0 will not restore nb_seg=1 in the second segment.

> Assumes that it is ok to have an mbuf with
> nb_seg > 1 and next == NULL.
> Which seems wrong to me.

I don't think it is wrong: nb_seg is just ignored when not in the first
segment, and there is nothing saying it should be set to 1. Typically,
rte_pktmbuf_chain() does not change it, and I guess it's the same for
many similar functions in applications.

Olivier

> 
> 
> >This is expected that mbufs stored in pool have their
> > nb_seg field set to 1.
> > 
> > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> >  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> >  2 files changed, 6 insertions(+), 12 deletions(-)
> > 
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 8a456e5e64..e632071c23 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > 
> >  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> >  	m->ol_flags = EXT_ATTACHED_MBUF;
> > -	if (m->next != NULL) {
> > -		m->next = NULL;
> > -		m->nb_segs = 1;
> > -	}
> > +	m->next = NULL;
> > +	m->nb_segs = 1;
> >  	rte_mbuf_raw_free(m);
> >  }
> > 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index a1414ed7cd..ef5800c8ef 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >  				return NULL;
> >  		}
> > 
> > -		if (m->next != NULL) {
> > -			m->next = NULL;
> > -			m->nb_segs = 1;
> > -		}
> > +		m->next = NULL;
> > +		m->nb_segs = 1;
> > 
> >  		return m;
> > 
> > @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >  				return NULL;
> >  		}
> > 
> > -		if (m->next != NULL) {
> > -			m->next = NULL;
> > -			m->nb_segs = 1;
> > -		}
> > +		m->next = NULL;
> > +		m->nb_segs = 1;
> >  		rte_mbuf_refcnt_set(m, 1);
> > 
> >  		return m;
> > --
> > 2.25.1
>
Andrew Rybchenko Nov. 5, 2020, 8:26 a.m. UTC | #3
On 11/5/20 10:46 AM, Olivier Matz wrote:
> On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
>>
>> Hi Olivier,
>>  
>>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
>>> because it can happen that m->nb_seg is != 1. For instance in this
>>> case:
>>>
>>>   m1 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m1, 500);
>>>   m2 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m2, 500);
>>>   rte_pktmbuf_chain(m1, m2);
>>>   m0 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m0, 500);
>>>   rte_pktmbuf_chain(m0, m1);
>>>
>>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
>>> segment (this is not required), after this code the mbuf chain
>>> have 3 segments:
>>>   - m0: next=m1, nb_seg=3
>>>   - m1: next=m2, nb_seg=2
>>>   - m2: next=NULL, nb_seg=1
>>>
>>> Freeing this mbuf chain will not restore nb_seg=1 in the second
>>> segment. 
>>
>> Hmm, not sure why is that?
>> You are talking about freeing m1, right?
>> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>> {
>> 	...
>> 	if (m->next != NULL) {
>>                         m->next = NULL;
>>                         m->nb_segs = 1;
>>                 }
>>
>> m1->next != NULL, so it will enter the if() block,
>> and will reset both next and nb_segs.
>> What I am missing here? 
>> Thinking in more generic way, that change:
>>  -		if (m->next != NULL) {
>>  -			m->next = NULL;
>>  -			m->nb_segs = 1;
>>  -		}
>>  +		m->next = NULL;
>>  +		m->nb_segs = 1;
> 
> Ah, sorry. I oversimplified the example and now it does not
> show the issue...
> 
> The full example also adds a split() to break the mbuf chain
> between m1 and m2. The kind of thing that would be done for
> software TCP segmentation.
> 

If so, may be the right solution is to care about nb_segs
when next is set to NULL on split? Any place when next is set
to NULL. Just to keep the optimization in a more generic place.

> After this operation, we have 2 mbuf chain:
>  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
>  - new_m with 1 segment
> 
> Freeing m0 will not restore nb_seg=1 in the second segment.
> 
>> Assumes that it is ok to have an mbuf with
>> nb_seg > 1 and next == NULL.
>> Which seems wrong to me.
> 
> I don't think it is wrong: nb_seg is just ignored when not in the first
> segment, and there is nothing saying it should be set to 1. Typically,
> rte_pktmbuf_chain() does not change it, and I guess it's the same for
> many similar functions in applications.
> 
> Olivier
> 
>>
>>
>>> This is expected that mbufs stored in pool have their
>>> nb_seg field set to 1.
>>>
>>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>> ---
>>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
>>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
>>>  2 files changed, 6 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>> index 8a456e5e64..e632071c23 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
>>>
>>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>>>  	m->ol_flags = EXT_ATTACHED_MBUF;
>>> -	if (m->next != NULL) {
>>> -		m->next = NULL;
>>> -		m->nb_segs = 1;
>>> -	}
>>> +	m->next = NULL;
>>> +	m->nb_segs = 1;
>>>  	rte_mbuf_raw_free(m);
>>>  }
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>> index a1414ed7cd..ef5800c8ef 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>
>>>  		return m;
>>>
>>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>  		rte_mbuf_refcnt_set(m, 1);
>>>
>>>  		return m;
>>> --
>>> 2.25.1
>>
Morten Brørup Nov. 5, 2020, 8:33 a.m. UTC | #4
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Thursday, November 5, 2020 8:46 AM
> 
> On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Olivier,
> >
> > > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > because it can happen that m->nb_seg is != 1. For instance in this
> > > case:
> > >
> > >   m1 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m1, 500);
> > >   m2 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m2, 500);
> > >   rte_pktmbuf_chain(m1, m2);
> > >   m0 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m0, 500);
> > >   rte_pktmbuf_chain(m0, m1);
> > >
> > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > segment (this is not required), after this code the mbuf chain
> > > have 3 segments:
> > >   - m0: next=m1, nb_seg=3
> > >   - m1: next=m2, nb_seg=2
> > >   - m2: next=NULL, nb_seg=1
> > >
> > > Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > segment.
> >
> > Hmm, not sure why is that?
> > You are talking about freeing m1, right?
> > rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > {
> > 	...
> > 	if (m->next != NULL) {
> >                         m->next = NULL;
> >                         m->nb_segs = 1;
> >                 }
> >
> > m1->next != NULL, so it will enter the if() block,
> > and will reset both next and nb_segs.
> > What I am missing here?
> > Thinking in more generic way, that change:
> >  -		if (m->next != NULL) {
> >  -			m->next = NULL;
> >  -			m->nb_segs = 1;
> >  -		}
> >  +		m->next = NULL;
> >  +		m->nb_segs = 1;
> 
> Ah, sorry. I oversimplified the example and now it does not
> show the issue...
> 
> The full example also adds a split() to break the mbuf chain
> between m1 and m2. The kind of thing that would be done for
> software TCP segmentation.
> 
> After this operation, we have 2 mbuf chain:
>  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
>  - new_m with 1 segment
> 
> Freeing m0 will not restore nb_seg=1 in the second segment.
> 
> > Assumes that it is ok to have an mbuf with
> > nb_seg > 1 and next == NULL.
> > Which seems wrong to me.
> 
> I don't think it is wrong: nb_seg is just ignored when not in the first
> segment, and there is nothing saying it should be set to 1. Typically,
> rte_pktmbuf_chain() does not change it, and I guess it's the same for
> many similar functions in applications.
> 
> Olivier

Acked-by: Morten Brørup <mb@smartsharesystems.com>

And while you are at it, please consider extending the description of the two mbuf fields with their invariants:
1. nb_segs is only valid for the first segment of a multi-segment packet.
2. next is NULL for non-segmented packets.

> 
> >
> >
> > >This is expected that mbufs stored in pool have their
> > > nb_seg field set to 1.
> > >
> > > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > ---
> > >  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > >  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > >  2 files changed, 6 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.c
> b/lib/librte_mbuf/rte_mbuf.c
> > > index 8a456e5e64..e632071c23 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr,
> void *opaque)
> > >
> > >  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > >  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > -	if (m->next != NULL) {
> > > -		m->next = NULL;
> > > -		m->nb_segs = 1;
> > > -	}
> > > +	m->next = NULL;
> > > +	m->nb_segs = 1;
> > >  	rte_mbuf_raw_free(m);
> > >  }
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> b/lib/librte_mbuf/rte_mbuf.h
> > > index a1414ed7cd..ef5800c8ef 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >  				return NULL;
> > >  		}
> > >
> > > -		if (m->next != NULL) {
> > > -			m->next = NULL;
> > > -			m->nb_segs = 1;
> > > -		}
> > > +		m->next = NULL;
> > > +		m->nb_segs = 1;
> > >
> > >  		return m;
> > >
> > > @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >  				return NULL;
> > >  		}
> > >
> > > -		if (m->next != NULL) {
> > > -			m->next = NULL;
> > > -			m->nb_segs = 1;
> > > -		}
> > > +		m->next = NULL;
> > > +		m->nb_segs = 1;
> > >  		rte_mbuf_refcnt_set(m, 1);
> > >
> > >  		return m;
> > > --
> > > 2.25.1
> >
Olivier Matz Nov. 5, 2020, 9:03 a.m. UTC | #5
On Thu, Nov 05, 2020 at 09:33:58AM +0100, Morten Brørup wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > Sent: Thursday, November 5, 2020 8:46 AM
> > 
> > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > >
> > > Hi Olivier,
> > >
> > > > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > because it can happen that m->nb_seg is != 1. For instance in this
> > > > case:
> > > >
> > > >   m1 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m1, 500);
> > > >   m2 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m2, 500);
> > > >   rte_pktmbuf_chain(m1, m2);
> > > >   m0 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m0, 500);
> > > >   rte_pktmbuf_chain(m0, m1);
> > > >
> > > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > segment (this is not required), after this code the mbuf chain
> > > > have 3 segments:
> > > >   - m0: next=m1, nb_seg=3
> > > >   - m1: next=m2, nb_seg=2
> > > >   - m2: next=NULL, nb_seg=1
> > > >
> > > > Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > segment.
> > >
> > > Hmm, not sure why is that?
> > > You are talking about freeing m1, right?
> > > rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > {
> > > 	...
> > > 	if (m->next != NULL) {
> > >                         m->next = NULL;
> > >                         m->nb_segs = 1;
> > >                 }
> > >
> > > m1->next != NULL, so it will enter the if() block,
> > > and will reset both next and nb_segs.
> > > What I am missing here?
> > > Thinking in more generic way, that change:
> > >  -		if (m->next != NULL) {
> > >  -			m->next = NULL;
> > >  -			m->nb_segs = 1;
> > >  -		}
> > >  +		m->next = NULL;
> > >  +		m->nb_segs = 1;
> > 
> > Ah, sorry. I oversimplified the example and now it does not
> > show the issue...
> > 
> > The full example also adds a split() to break the mbuf chain
> > between m1 and m2. The kind of thing that would be done for
> > software TCP segmentation.
> > 
> > After this operation, we have 2 mbuf chain:
> >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> >  - new_m with 1 segment
> > 
> > Freeing m0 will not restore nb_seg=1 in the second segment.
> > 
> > > Assumes that it is ok to have an mbuf with
> > > nb_seg > 1 and next == NULL.
> > > Which seems wrong to me.
> > 
> > I don't think it is wrong: nb_seg is just ignored when not in the first
> > segment, and there is nothing saying it should be set to 1. Typically,
> > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > many similar functions in applications.
> > 
> > Olivier
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> And while you are at it, please consider extending the description of the two mbuf fields with their invariants:
> 1. nb_segs is only valid for the first segment of a multi-segment packet.
> 2. next is NULL for non-segmented packets.

Good point, will add it in v2.

> 
> > 
> > >
> > >
> > > >This is expected that mbufs stored in pool have their
> > > > nb_seg field set to 1.
> > > >
> > > > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > >  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > >  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > >  2 files changed, 6 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.c
> > b/lib/librte_mbuf/rte_mbuf.c
> > > > index 8a456e5e64..e632071c23 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr,
> > void *opaque)
> > > >
> > > >  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > >  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > -	if (m->next != NULL) {
> > > > -		m->next = NULL;
> > > > -		m->nb_segs = 1;
> > > > -	}
> > > > +	m->next = NULL;
> > > > +	m->nb_segs = 1;
> > > >  	rte_mbuf_raw_free(m);
> > > >  }
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > b/lib/librte_mbuf/rte_mbuf.h
> > > > index a1414ed7cd..ef5800c8ef 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >  				return NULL;
> > > >  		}
> > > >
> > > > -		if (m->next != NULL) {
> > > > -			m->next = NULL;
> > > > -			m->nb_segs = 1;
> > > > -		}
> > > > +		m->next = NULL;
> > > > +		m->nb_segs = 1;
> > > >
> > > >  		return m;
> > > >
> > > > @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >  				return NULL;
> > > >  		}
> > > >
> > > > -		if (m->next != NULL) {
> > > > -			m->next = NULL;
> > > > -			m->nb_segs = 1;
> > > > -		}
> > > > +		m->next = NULL;
> > > > +		m->nb_segs = 1;
> > > >  		rte_mbuf_refcnt_set(m, 1);
> > > >
> > > >  		return m;
> > > > --
> > > > 2.25.1
> > >
>
Andrew Rybchenko Nov. 5, 2020, 9:09 a.m. UTC | #6
Just resend with lost Cc restored.

On 11/5/20 10:46 AM, Olivier Matz wrote:
> On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
>>
>> Hi Olivier,
>>  
>>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
>>> because it can happen that m->nb_seg is != 1. For instance in this
>>> case:
>>>
>>>   m1 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m1, 500);
>>>   m2 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m2, 500);
>>>   rte_pktmbuf_chain(m1, m2);
>>>   m0 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m0, 500);
>>>   rte_pktmbuf_chain(m0, m1);
>>>
>>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
>>> segment (this is not required), after this code the mbuf chain
>>> have 3 segments:
>>>   - m0: next=m1, nb_seg=3
>>>   - m1: next=m2, nb_seg=2
>>>   - m2: next=NULL, nb_seg=1
>>>
>>> Freeing this mbuf chain will not restore nb_seg=1 in the second
>>> segment. 
>>
>> Hmm, not sure why is that?
>> You are talking about freeing m1, right?
>> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>> {
>> 	...
>> 	if (m->next != NULL) {
>>                         m->next = NULL;
>>                         m->nb_segs = 1;
>>                 }
>>
>> m1->next != NULL, so it will enter the if() block,
>> and will reset both next and nb_segs.
>> What I am missing here? 
>> Thinking in more generic way, that change:
>>  -		if (m->next != NULL) {
>>  -			m->next = NULL;
>>  -			m->nb_segs = 1;
>>  -		}
>>  +		m->next = NULL;
>>  +		m->nb_segs = 1;
> 
> Ah, sorry. I oversimplified the example and now it does not
> show the issue...
> 
> The full example also adds a split() to break the mbuf chain
> between m1 and m2. The kind of thing that would be done for
> software TCP segmentation.
> 

If so, may be the right solution is to care about nb_segs
when next is set to NULL on split? Any place when next is set
to NULL. Just to keep the optimization in a more generic place.

> After this operation, we have 2 mbuf chain:
>  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
>  - new_m with 1 segment
> 
> Freeing m0 will not restore nb_seg=1 in the second segment.
> 
>> Assumes that it is ok to have an mbuf with
>> nb_seg > 1 and next == NULL.
>> Which seems wrong to me.
> 
> I don't think it is wrong: nb_seg is just ignored when not in the first
> segment, and there is nothing saying it should be set to 1. Typically,
> rte_pktmbuf_chain() does not change it, and I guess it's the same for
> many similar functions in applications.
> 
> Olivier
> 
>>
>>
>>> This is expected that mbufs stored in pool have their
>>> nb_seg field set to 1.
>>>
>>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>> ---
>>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
>>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
>>>  2 files changed, 6 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>> index 8a456e5e64..e632071c23 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
>>>
>>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>>>  	m->ol_flags = EXT_ATTACHED_MBUF;
>>> -	if (m->next != NULL) {
>>> -		m->next = NULL;
>>> -		m->nb_segs = 1;
>>> -	}
>>> +	m->next = NULL;
>>> +	m->nb_segs = 1;
>>>  	rte_mbuf_raw_free(m);
>>>  }
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>> index a1414ed7cd..ef5800c8ef 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>
>>>  		return m;
>>>
>>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>  		rte_mbuf_refcnt_set(m, 1);
>>>
>>>  		return m;
>>> --
>>> 2.25.1
>>
Olivier Matz Nov. 5, 2020, 9:10 a.m. UTC | #7
On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> On 11/5/20 10:46 AM, Olivier Matz wrote:
> > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> >>
> >> Hi Olivier,
> >>  
> >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> >>> because it can happen that m->nb_seg is != 1. For instance in this
> >>> case:
> >>>
> >>>   m1 = rte_pktmbuf_alloc(mp);
> >>>   rte_pktmbuf_append(m1, 500);
> >>>   m2 = rte_pktmbuf_alloc(mp);
> >>>   rte_pktmbuf_append(m2, 500);
> >>>   rte_pktmbuf_chain(m1, m2);
> >>>   m0 = rte_pktmbuf_alloc(mp);
> >>>   rte_pktmbuf_append(m0, 500);
> >>>   rte_pktmbuf_chain(m0, m1);
> >>>
> >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> >>> segment (this is not required), after this code the mbuf chain
> >>> have 3 segments:
> >>>   - m0: next=m1, nb_seg=3
> >>>   - m1: next=m2, nb_seg=2
> >>>   - m2: next=NULL, nb_seg=1
> >>>
> >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> >>> segment. 
> >>
> >> Hmm, not sure why is that?
> >> You are talking about freeing m1, right?
> >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >> {
> >> 	...
> >> 	if (m->next != NULL) {
> >>                         m->next = NULL;
> >>                         m->nb_segs = 1;
> >>                 }
> >>
> >> m1->next != NULL, so it will enter the if() block,
> >> and will reset both next and nb_segs.
> >> What I am missing here? 
> >> Thinking in more generic way, that change:
> >>  -		if (m->next != NULL) {
> >>  -			m->next = NULL;
> >>  -			m->nb_segs = 1;
> >>  -		}
> >>  +		m->next = NULL;
> >>  +		m->nb_segs = 1;
> > 
> > Ah, sorry. I oversimplified the example and now it does not
> > show the issue...
> > 
> > The full example also adds a split() to break the mbuf chain
> > between m1 and m2. The kind of thing that would be done for
> > software TCP segmentation.
> > 
> 
> If so, may be the right solution is to care about nb_segs
> when next is set to NULL on split? Any place when next is set
> to NULL. Just to keep the optimization in a more generic place.

The problem with that approach is that there are already several
existing split() or trim() implementations in different DPDK-based
applications. For instance, we have some in 6WINDGate. If we force
applications to set nb_seg to 1 when resetting next, it has to be
documented because it is not straightforward. I think the approach from
this patch is safer.

By the way, for 21.11, if we are able to do some optimizations and have
both pool (index?) and next in the first cache line, we may reconsider
the fact that next and nb_segs are already set for new allocated mbufs,
because it is not straightforward either.


> > After this operation, we have 2 mbuf chain:
> >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> >  - new_m with 1 segment
> > 
> > Freeing m0 will not restore nb_seg=1 in the second segment.
> > 
> >> Assumes that it is ok to have an mbuf with
> >> nb_seg > 1 and next == NULL.
> >> Which seems wrong to me.
> > 
> > I don't think it is wrong: nb_seg is just ignored when not in the first
> > segment, and there is nothing saying it should be set to 1. Typically,
> > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > many similar functions in applications.
> > 
> > Olivier
> > 
> >>
> >>
> >>> This is expected that mbufs stored in pool have their
> >>> nb_seg field set to 1.
> >>>
> >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> >>> ---
> >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> >>> index 8a456e5e64..e632071c23 100644
> >>> --- a/lib/librte_mbuf/rte_mbuf.c
> >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> >>>
> >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> >>> -	if (m->next != NULL) {
> >>> -		m->next = NULL;
> >>> -		m->nb_segs = 1;
> >>> -	}
> >>> +	m->next = NULL;
> >>> +	m->nb_segs = 1;
> >>>  	rte_mbuf_raw_free(m);
> >>>  }
> >>>
> >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >>> index a1414ed7cd..ef5800c8ef 100644
> >>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >>>  				return NULL;
> >>>  		}
> >>>
> >>> -		if (m->next != NULL) {
> >>> -			m->next = NULL;
> >>> -			m->nb_segs = 1;
> >>> -		}
> >>> +		m->next = NULL;
> >>> +		m->nb_segs = 1;
> >>>
> >>>  		return m;
> >>>
> >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >>>  				return NULL;
> >>>  		}
> >>>
> >>> -		if (m->next != NULL) {
> >>> -			m->next = NULL;
> >>> -			m->nb_segs = 1;
> >>> -		}
> >>> +		m->next = NULL;
> >>> +		m->nb_segs = 1;
> >>>  		rte_mbuf_refcnt_set(m, 1);
> >>>
> >>>  		return m;
> >>> --
> >>> 2.25.1
> >>
>
Ananyev, Konstantin Nov. 5, 2020, 11:34 a.m. UTC | #8
> On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > >>
> > >> Hi Olivier,
> > >>
> > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > >>> case:
> > >>>
> > >>>   m1 = rte_pktmbuf_alloc(mp);
> > >>>   rte_pktmbuf_append(m1, 500);
> > >>>   m2 = rte_pktmbuf_alloc(mp);
> > >>>   rte_pktmbuf_append(m2, 500);
> > >>>   rte_pktmbuf_chain(m1, m2);
> > >>>   m0 = rte_pktmbuf_alloc(mp);
> > >>>   rte_pktmbuf_append(m0, 500);
> > >>>   rte_pktmbuf_chain(m0, m1);
> > >>>
> > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > >>> segment (this is not required), after this code the mbuf chain
> > >>> have 3 segments:
> > >>>   - m0: next=m1, nb_seg=3
> > >>>   - m1: next=m2, nb_seg=2
> > >>>   - m2: next=NULL, nb_seg=1
> > >>>
> > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > >>> segment.
> > >>
> > >> Hmm, not sure why is that?
> > >> You are talking about freeing m1, right?
> > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >> {
> > >> 	...
> > >> 	if (m->next != NULL) {
> > >>                         m->next = NULL;
> > >>                         m->nb_segs = 1;
> > >>                 }
> > >>
> > >> m1->next != NULL, so it will enter the if() block,
> > >> and will reset both next and nb_segs.
> > >> What I am missing here?
> > >> Thinking in more generic way, that change:
> > >>  -		if (m->next != NULL) {
> > >>  -			m->next = NULL;
> > >>  -			m->nb_segs = 1;
> > >>  -		}
> > >>  +		m->next = NULL;
> > >>  +		m->nb_segs = 1;
> > >
> > > Ah, sorry. I oversimplified the example and now it does not
> > > show the issue...
> > >
> > > The full example also adds a split() to break the mbuf chain
> > > between m1 and m2. The kind of thing that would be done for
> > > software TCP segmentation.
> > >
> >
> > If so, may be the right solution is to care about nb_segs
> > when next is set to NULL on split? Any place when next is set
> > to NULL. Just to keep the optimization in a more generic place.


> The problem with that approach is that there are already several
> existing split() or trim() implementations in different DPDK-based
> applications. For instance, we have some in 6WINDGate. If we force
> applications to set nb_seg to 1 when resetting next, it has to be
> documented because it is not straightforward. 

I think it is better to go that way.
From my perspective it seems natural to reset nb_seg at same time
we reset next, otherwise inconsistency will occur. 

> I think the approach from
> this patch is safer.

It might be easier from perspective that changes in less places are required,
Though I think that this patch will introduce some performance drop.
As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
 
> By the way, for 21.11, if we are able to do some optimizations and have
> both pool (index?) and next in the first cache line, we may reconsider
> the fact that next and nb_segs are already set for new allocated mbufs,
> because it is not straightforward either.

My suggestion - let's put future optimization discussion aside for now,
and concentrate on that particular patch. 

> 
> > > After this operation, we have 2 mbuf chain:
> > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > >  - new_m with 1 segment
> > >
> > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > >
> > >> Assumes that it is ok to have an mbuf with
> > >> nb_seg > 1 and next == NULL.
> > >> Which seems wrong to me.
> > >
> > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > segment, and there is nothing saying it should be set to 1. Typically,
> > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > many similar functions in applications.
> > >
> > > Olivier
> > >
> > >>
> > >>
> > >>> This is expected that mbufs stored in pool have their
> > >>> nb_seg field set to 1.
> > >>>
> > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > >>> Cc: stable@dpdk.org
> > >>>
> > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > >>> ---
> > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > >>>
> > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > >>> index 8a456e5e64..e632071c23 100644
> > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > >>>
> > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > >>> -	if (m->next != NULL) {
> > >>> -		m->next = NULL;
> > >>> -		m->nb_segs = 1;
> > >>> -	}
> > >>> +	m->next = NULL;
> > >>> +	m->nb_segs = 1;
> > >>>  	rte_mbuf_raw_free(m);
> > >>>  }
> > >>>
> > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > >>> index a1414ed7cd..ef5800c8ef 100644
> > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >>>  				return NULL;
> > >>>  		}
> > >>>
> > >>> -		if (m->next != NULL) {
> > >>> -			m->next = NULL;
> > >>> -			m->nb_segs = 1;
> > >>> -		}
> > >>> +		m->next = NULL;
> > >>> +		m->nb_segs = 1;
> > >>>
> > >>>  		return m;
> > >>>
> > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >>>  				return NULL;
> > >>>  		}
> > >>>
> > >>> -		if (m->next != NULL) {
> > >>> -			m->next = NULL;
> > >>> -			m->nb_segs = 1;
> > >>> -		}
> > >>> +		m->next = NULL;
> > >>> +		m->nb_segs = 1;
> > >>>  		rte_mbuf_refcnt_set(m, 1);
> > >>>
> > >>>  		return m;
> > >>> --
> > >>> 2.25.1
> > >>
> >
Olivier Matz Nov. 5, 2020, 12:31 p.m. UTC | #9
On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > >>
> > > >> Hi Olivier,
> > > >>
> > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > >>> case:
> > > >>>
> > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > >>>   rte_pktmbuf_append(m1, 500);
> > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > >>>   rte_pktmbuf_append(m2, 500);
> > > >>>   rte_pktmbuf_chain(m1, m2);
> > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > >>>   rte_pktmbuf_append(m0, 500);
> > > >>>   rte_pktmbuf_chain(m0, m1);
> > > >>>
> > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > >>> segment (this is not required), after this code the mbuf chain
> > > >>> have 3 segments:
> > > >>>   - m0: next=m1, nb_seg=3
> > > >>>   - m1: next=m2, nb_seg=2
> > > >>>   - m2: next=NULL, nb_seg=1
> > > >>>
> > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > >>> segment.
> > > >>
> > > >> Hmm, not sure why is that?
> > > >> You are talking about freeing m1, right?
> > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >> {
> > > >> 	...
> > > >> 	if (m->next != NULL) {
> > > >>                         m->next = NULL;
> > > >>                         m->nb_segs = 1;
> > > >>                 }
> > > >>
> > > >> m1->next != NULL, so it will enter the if() block,
> > > >> and will reset both next and nb_segs.
> > > >> What I am missing here?
> > > >> Thinking in more generic way, that change:
> > > >>  -		if (m->next != NULL) {
> > > >>  -			m->next = NULL;
> > > >>  -			m->nb_segs = 1;
> > > >>  -		}
> > > >>  +		m->next = NULL;
> > > >>  +		m->nb_segs = 1;
> > > >
> > > > Ah, sorry. I oversimplified the example and now it does not
> > > > show the issue...
> > > >
> > > > The full example also adds a split() to break the mbuf chain
> > > > between m1 and m2. The kind of thing that would be done for
> > > > software TCP segmentation.
> > > >
> > >
> > > If so, may be the right solution is to care about nb_segs
> > > when next is set to NULL on split? Any place when next is set
> > > to NULL. Just to keep the optimization in a more generic place.
> 
> 
> > The problem with that approach is that there are already several
> > existing split() or trim() implementations in different DPDK-based
> > applications. For instance, we have some in 6WINDGate. If we force
> > applications to set nb_seg to 1 when resetting next, it has to be
> > documented because it is not straightforward. 
> 
> I think it is better to go that way.
> From my perspective it seems natural to reset nb_seg at same time
> we reset next, otherwise inconsistency will occur. 

While it is not explicitly stated for nb_segs, to me it was clear that
nb_segs is only valid in the first segment, like for many fields (port,
ol_flags, vlan, rss, ...).

If we say that nb_segs has to be valid in any segments, it means that
chain() or split() will have to update it in all segments, which is not
efficient.

Saying that nb_segs has to be valid for the first and last segment seems
really odd to me. What is the logic behind that except keeping this test
as is?

In any case, it has to be better documented.


Olivier


> > I think the approach from
> > this patch is safer.
> 
> It might be easier from perspective that changes in less places are required,
> Though I think that this patch will introduce some performance drop.
> As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
>  
> > By the way, for 21.11, if we are able to do some optimizations and have
> > both pool (index?) and next in the first cache line, we may reconsider
> > the fact that next and nb_segs are already set for new allocated mbufs,
> > because it is not straightforward either.
> 
> My suggestion - let's put future optimization discussion aside for now,
> and concentrate on that particular patch. 
> 
> > 
> > > > After this operation, we have 2 mbuf chain:
> > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > >  - new_m with 1 segment
> > > >
> > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > >
> > > >> Assumes that it is ok to have an mbuf with
> > > >> nb_seg > 1 and next == NULL.
> > > >> Which seems wrong to me.
> > > >
> > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > many similar functions in applications.
> > > >
> > > > Olivier
> > > >
> > > >>
> > > >>
> > > >>> This is expected that mbufs stored in pool have their
> > > >>> nb_seg field set to 1.
> > > >>>
> > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > >>> Cc: stable@dpdk.org
> > > >>>
> > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > >>> ---
> > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > >>>
> > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > >>> index 8a456e5e64..e632071c23 100644
> > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > >>>
> > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > >>> -	if (m->next != NULL) {
> > > >>> -		m->next = NULL;
> > > >>> -		m->nb_segs = 1;
> > > >>> -	}
> > > >>> +	m->next = NULL;
> > > >>> +	m->nb_segs = 1;
> > > >>>  	rte_mbuf_raw_free(m);
> > > >>>  }
> > > >>>
> > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >>>  				return NULL;
> > > >>>  		}
> > > >>>
> > > >>> -		if (m->next != NULL) {
> > > >>> -			m->next = NULL;
> > > >>> -			m->nb_segs = 1;
> > > >>> -		}
> > > >>> +		m->next = NULL;
> > > >>> +		m->nb_segs = 1;
> > > >>>
> > > >>>  		return m;
> > > >>>
> > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >>>  				return NULL;
> > > >>>  		}
> > > >>>
> > > >>> -		if (m->next != NULL) {
> > > >>> -			m->next = NULL;
> > > >>> -			m->nb_segs = 1;
> > > >>> -		}
> > > >>> +		m->next = NULL;
> > > >>> +		m->nb_segs = 1;
> > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > >>>
> > > >>>  		return m;
> > > >>> --
> > > >>> 2.25.1
> > > >>
> > >
Ananyev, Konstantin Nov. 5, 2020, 1:14 p.m. UTC | #10
> 
> On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > > >>
> > > > >> Hi Olivier,
> > > > >>
> > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > > >>> case:
> > > > >>>
> > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > >>>
> > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > >>> segment (this is not required), after this code the mbuf chain
> > > > >>> have 3 segments:
> > > > >>>   - m0: next=m1, nb_seg=3
> > > > >>>   - m1: next=m2, nb_seg=2
> > > > >>>   - m2: next=NULL, nb_seg=1
> > > > >>>
> > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > >>> segment.
> > > > >>
> > > > >> Hmm, not sure why is that?
> > > > >> You are talking about freeing m1, right?
> > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >> {
> > > > >> 	...
> > > > >> 	if (m->next != NULL) {
> > > > >>                         m->next = NULL;
> > > > >>                         m->nb_segs = 1;
> > > > >>                 }
> > > > >>
> > > > >> m1->next != NULL, so it will enter the if() block,
> > > > >> and will reset both next and nb_segs.
> > > > >> What I am missing here?
> > > > >> Thinking in more generic way, that change:
> > > > >>  -		if (m->next != NULL) {
> > > > >>  -			m->next = NULL;
> > > > >>  -			m->nb_segs = 1;
> > > > >>  -		}
> > > > >>  +		m->next = NULL;
> > > > >>  +		m->nb_segs = 1;
> > > > >
> > > > > Ah, sorry. I oversimplified the example and now it does not
> > > > > show the issue...
> > > > >
> > > > > The full example also adds a split() to break the mbuf chain
> > > > > between m1 and m2. The kind of thing that would be done for
> > > > > software TCP segmentation.
> > > > >
> > > >
> > > > If so, may be the right solution is to care about nb_segs
> > > > when next is set to NULL on split? Any place when next is set
> > > > to NULL. Just to keep the optimization in a more generic place.
> >
> >
> > > The problem with that approach is that there are already several
> > > existing split() or trim() implementations in different DPDK-based
> > > applications. For instance, we have some in 6WINDGate. If we force
> > > applications to set nb_seg to 1 when resetting next, it has to be
> > > documented because it is not straightforward.
> >
> > I think it is better to go that way.
> > From my perspective it seems natural to reset nb_seg at same time
> > we reset next, otherwise inconsistency will occur.
> 
> While it is not explicitly stated for nb_segs, to me it was clear that
> nb_segs is only valid in the first segment, like for many fields (port,
> ol_flags, vlan, rss, ...).
> 
> If we say that nb_segs has to be valid in any segments, it means that
> chain() or split() will have to update it in all segments, which is not
> efficient.

Why in all?
We can state that nb_segs on non-first segment should always equal 1.
As I understand in that case, both split() and chain() have to update nb_segs
only for head mbufs, rest ones will remain untouched.  

> 
> Saying that nb_segs has to be valid for the first and last segment seems
> really odd to me. What is the logic behind that except keeping this test
> as is?
> 
> In any case, it has to be better documented.
> 
> 
> Olivier
> 
> 
> > > I think the approach from
> > > this patch is safer.
> >
> > It might be easier from perspective that changes in less places are required,
> > Though I think that this patch will introduce some performance drop.
> > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
> >
> > > By the way, for 21.11, if we are able to do some optimizations and have
> > > both pool (index?) and next in the first cache line, we may reconsider
> > > the fact that next and nb_segs are already set for new allocated mbufs,
> > > because it is not straightforward either.
> >
> > My suggestion - let's put future optimization discussion aside for now,
> > and concentrate on that particular patch.
> >
> > >
> > > > > After this operation, we have 2 mbuf chain:
> > > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > > >  - new_m with 1 segment
> > > > >
> > > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > > >
> > > > >> Assumes that it is ok to have an mbuf with
> > > > >> nb_seg > 1 and next == NULL.
> > > > >> Which seems wrong to me.
> > > > >
> > > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > > many similar functions in applications.
> > > > >
> > > > > Olivier
> > > > >
> > > > >>
> > > > >>
> > > > >>> This is expected that mbufs stored in pool have their
> > > > >>> nb_seg field set to 1.
> > > > >>>
> > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > >>> Cc: stable@dpdk.org
> > > > >>>
> > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > >>> ---
> > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > >>>
> > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > >>> index 8a456e5e64..e632071c23 100644
> > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > >>>
> > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > >>> -	if (m->next != NULL) {
> > > > >>> -		m->next = NULL;
> > > > >>> -		m->nb_segs = 1;
> > > > >>> -	}
> > > > >>> +	m->next = NULL;
> > > > >>> +	m->nb_segs = 1;
> > > > >>>  	rte_mbuf_raw_free(m);
> > > > >>>  }
> > > > >>>
> > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >>>  				return NULL;
> > > > >>>  		}
> > > > >>>
> > > > >>> -		if (m->next != NULL) {
> > > > >>> -			m->next = NULL;
> > > > >>> -			m->nb_segs = 1;
> > > > >>> -		}
> > > > >>> +		m->next = NULL;
> > > > >>> +		m->nb_segs = 1;
> > > > >>>
> > > > >>>  		return m;
> > > > >>>
> > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >>>  				return NULL;
> > > > >>>  		}
> > > > >>>
> > > > >>> -		if (m->next != NULL) {
> > > > >>> -			m->next = NULL;
> > > > >>> -			m->nb_segs = 1;
> > > > >>> -		}
> > > > >>> +		m->next = NULL;
> > > > >>> +		m->nb_segs = 1;
> > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > >>>
> > > > >>>  		return m;
> > > > >>> --
> > > > >>> 2.25.1
> > > > >>
> > > >
Olivier Matz Nov. 5, 2020, 1:24 p.m. UTC | #11
On Thu, Nov 05, 2020 at 01:14:07PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > 
> > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > > > >>
> > > > > >> Hi Olivier,
> > > > > >>
> > > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > > > >>> case:
> > > > > >>>
> > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > >>>
> > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > > >>> segment (this is not required), after this code the mbuf chain
> > > > > >>> have 3 segments:
> > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > >>>
> > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > > >>> segment.
> > > > > >>
> > > > > >> Hmm, not sure why is that?
> > > > > >> You are talking about freeing m1, right?
> > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >> {
> > > > > >> 	...
> > > > > >> 	if (m->next != NULL) {
> > > > > >>                         m->next = NULL;
> > > > > >>                         m->nb_segs = 1;
> > > > > >>                 }
> > > > > >>
> > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > >> and will reset both next and nb_segs.
> > > > > >> What I am missing here?
> > > > > >> Thinking in more generic way, that change:
> > > > > >>  -		if (m->next != NULL) {
> > > > > >>  -			m->next = NULL;
> > > > > >>  -			m->nb_segs = 1;
> > > > > >>  -		}
> > > > > >>  +		m->next = NULL;
> > > > > >>  +		m->nb_segs = 1;
> > > > > >
> > > > > > Ah, sorry. I oversimplified the example and now it does not
> > > > > > show the issue...
> > > > > >
> > > > > > The full example also adds a split() to break the mbuf chain
> > > > > > between m1 and m2. The kind of thing that would be done for
> > > > > > software TCP segmentation.
> > > > > >
> > > > >
> > > > > If so, may be the right solution is to care about nb_segs
> > > > > when next is set to NULL on split? Any place when next is set
> > > > > to NULL. Just to keep the optimization in a more generic place.
> > >
> > >
> > > > The problem with that approach is that there are already several
> > > > existing split() or trim() implementations in different DPDK-based
> > > > applications. For instance, we have some in 6WINDGate. If we force
> > > > applications to set nb_seg to 1 when resetting next, it has to be
> > > > documented because it is not straightforward.
> > >
> > > I think it is better to go that way.
> > > From my perspective it seems natural to reset nb_seg at same time
> > > we reset next, otherwise inconsistency will occur.
> > 
> > While it is not explicitly stated for nb_segs, to me it was clear that
> > nb_segs is only valid in the first segment, like for many fields (port,
> > ol_flags, vlan, rss, ...).
> > 
> > If we say that nb_segs has to be valid in any segments, it means that
> > chain() or split() will have to update it in all segments, which is not
> > efficient.
> 
> Why in all?
> We can state that nb_segs on non-first segment should always equal 1.
> As I understand in that case, both split() and chain() have to update nb_segs
> only for head mbufs, rest ones will remain untouched.  

Well, anyway, I think it's strange to have a constraint on m->nb_segs for
non-first segment. We don't have that kind of constraints for other fields.


> 
> > 
> > Saying that nb_segs has to be valid for the first and last segment seems
> > really odd to me. What is the logic behind that except keeping this test
> > as is?
> > 
> > In any case, it has to be better documented.
> > 
> > 
> > Olivier
> > 
> > 
> > > > I think the approach from
> > > > this patch is safer.
> > >
> > > It might be easier from perspective that changes in less places are required,
> > > Though I think that this patch will introduce some performance drop.
> > > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
> > >
> > > > By the way, for 21.11, if we are able to do some optimizations and have
> > > > both pool (index?) and next in the first cache line, we may reconsider
> > > > the fact that next and nb_segs are already set for new allocated mbufs,
> > > > because it is not straightforward either.
> > >
> > > My suggestion - let's put future optimization discussion aside for now,
> > > and concentrate on that particular patch.
> > >
> > > >
> > > > > > After this operation, we have 2 mbuf chain:
> > > > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > > > >  - new_m with 1 segment
> > > > > >
> > > > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > > > >
> > > > > >> Assumes that it is ok to have an mbuf with
> > > > > >> nb_seg > 1 and next == NULL.
> > > > > >> Which seems wrong to me.
> > > > > >
> > > > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > > > many similar functions in applications.
> > > > > >
> > > > > > Olivier
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >>> This is expected that mbufs stored in pool have their
> > > > > >>> nb_seg field set to 1.
> > > > > >>>
> > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > > >>> Cc: stable@dpdk.org
> > > > > >>>
> > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > >>> ---
> > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > >>>
> > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > >>>
> > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > >>> -	if (m->next != NULL) {
> > > > > >>> -		m->next = NULL;
> > > > > >>> -		m->nb_segs = 1;
> > > > > >>> -	}
> > > > > >>> +	m->next = NULL;
> > > > > >>> +	m->nb_segs = 1;
> > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > >>>  }
> > > > > >>>
> > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >>>  				return NULL;
> > > > > >>>  		}
> > > > > >>>
> > > > > >>> -		if (m->next != NULL) {
> > > > > >>> -			m->next = NULL;
> > > > > >>> -			m->nb_segs = 1;
> > > > > >>> -		}
> > > > > >>> +		m->next = NULL;
> > > > > >>> +		m->nb_segs = 1;
> > > > > >>>
> > > > > >>>  		return m;
> > > > > >>>
> > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >>>  				return NULL;
> > > > > >>>  		}
> > > > > >>>
> > > > > >>> -		if (m->next != NULL) {
> > > > > >>> -			m->next = NULL;
> > > > > >>> -			m->nb_segs = 1;
> > > > > >>> -		}
> > > > > >>> +		m->next = NULL;
> > > > > >>> +		m->nb_segs = 1;
> > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > >>>
> > > > > >>>  		return m;
> > > > > >>> --
> > > > > >>> 2.25.1
> > > > > >>
> > > > >
Ananyev, Konstantin Nov. 5, 2020, 1:55 p.m. UTC | #12
> >
> > >
> > > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > > > > >>
> > > > > > >> Hi Olivier,
> > > > > > >>
> > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > > > > >>> case:
> > > > > > >>>
> > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > >>>
> > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > > > >>> segment (this is not required), after this code the mbuf chain
> > > > > > >>> have 3 segments:
> > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > >>>
> > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > > > >>> segment.
> > > > > > >>
> > > > > > >> Hmm, not sure why is that?
> > > > > > >> You are talking about freeing m1, right?
> > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > >> {
> > > > > > >> 	...
> > > > > > >> 	if (m->next != NULL) {
> > > > > > >>                         m->next = NULL;
> > > > > > >>                         m->nb_segs = 1;
> > > > > > >>                 }
> > > > > > >>
> > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > >> and will reset both next and nb_segs.
> > > > > > >> What I am missing here?
> > > > > > >> Thinking in more generic way, that change:
> > > > > > >>  -		if (m->next != NULL) {
> > > > > > >>  -			m->next = NULL;
> > > > > > >>  -			m->nb_segs = 1;
> > > > > > >>  -		}
> > > > > > >>  +		m->next = NULL;
> > > > > > >>  +		m->nb_segs = 1;
> > > > > > >
> > > > > > > Ah, sorry. I oversimplified the example and now it does not
> > > > > > > show the issue...
> > > > > > >
> > > > > > > The full example also adds a split() to break the mbuf chain
> > > > > > > between m1 and m2. The kind of thing that would be done for
> > > > > > > software TCP segmentation.
> > > > > > >
> > > > > >
> > > > > > If so, may be the right solution is to care about nb_segs
> > > > > > when next is set to NULL on split? Any place when next is set
> > > > > > to NULL. Just to keep the optimization in a more generic place.
> > > >
> > > >
> > > > > The problem with that approach is that there are already several
> > > > > existing split() or trim() implementations in different DPDK-based
> > > > > applications. For instance, we have some in 6WINDGate. If we force
> > > > > applications to set nb_seg to 1 when resetting next, it has to be
> > > > > documented because it is not straightforward.
> > > >
> > > > I think it is better to go that way.
> > > > From my perspective it seems natural to reset nb_seg at same time
> > > > we reset next, otherwise inconsistency will occur.
> > >
> > > While it is not explicitly stated for nb_segs, to me it was clear that
> > > nb_segs is only valid in the first segment, like for many fields (port,
> > > ol_flags, vlan, rss, ...).
> > >
> > > If we say that nb_segs has to be valid in any segments, it means that
> > > chain() or split() will have to update it in all segments, which is not
> > > efficient.
> >
> > Why in all?
> > We can state that nb_segs on non-first segment should always equal 1.
> > As I understand in that case, both split() and chain() have to update nb_segs
> > only for head mbufs, rest ones will remain untouched.
> 
> Well, anyway, I think it's strange to have a constraint on m->nb_segs for
> non-first segment. We don't have that kind of constraints for other fields.

True, we don't. But this is one of the fields we consider critical
for proper work of mbuf alloc/free mechanism. 

> 
> 
> >
> > >
> > > Saying that nb_segs has to be valid for the first and last segment seems
> > > really odd to me. What is the logic behind that except keeping this test
> > > as is?
> > >
> > > In any case, it has to be better documented.
> > >
> > >
> > > Olivier
> > >
> > >
> > > > > I think the approach from
> > > > > this patch is safer.
> > > >
> > > > It might be easier from perspective that changes in less places are required,
> > > > Though I think that this patch will introduce some performance drop.
> > > > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
> > > >
> > > > > By the way, for 21.11, if we are able to do some optimizations and have
> > > > > both pool (index?) and next in the first cache line, we may reconsider
> > > > > the fact that next and nb_segs are already set for new allocated mbufs,
> > > > > because it is not straightforward either.
> > > >
> > > > My suggestion - let's put future optimization discussion aside for now,
> > > > and concentrate on that particular patch.
> > > >
> > > > >
> > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > > > > >  - new_m with 1 segment
> > > > > > >
> > > > > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > > > > >
> > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > >> Which seems wrong to me.
> > > > > > >
> > > > > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > > > > many similar functions in applications.
> > > > > > >
> > > > > > > Olivier
> > > > > > >
> > > > > > >>
> > > > > > >>
> > > > > > >>> This is expected that mbufs stored in pool have their
> > > > > > >>> nb_seg field set to 1.
> > > > > > >>>
> > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > > > >>> Cc: stable@dpdk.org
> > > > > > >>>
> > > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > >>> ---
> > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > >>>
> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > >>>
> > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > >>> -	if (m->next != NULL) {
> > > > > > >>> -		m->next = NULL;
> > > > > > >>> -		m->nb_segs = 1;
> > > > > > >>> -	}
> > > > > > >>> +	m->next = NULL;
> > > > > > >>> +	m->nb_segs = 1;
> > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > >>>  }
> > > > > > >>>
> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > >>>  				return NULL;
> > > > > > >>>  		}
> > > > > > >>>
> > > > > > >>> -		if (m->next != NULL) {
> > > > > > >>> -			m->next = NULL;
> > > > > > >>> -			m->nb_segs = 1;
> > > > > > >>> -		}
> > > > > > >>> +		m->next = NULL;
> > > > > > >>> +		m->nb_segs = 1;
> > > > > > >>>
> > > > > > >>>  		return m;
> > > > > > >>>
> > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > >>>  				return NULL;
> > > > > > >>>  		}
> > > > > > >>>
> > > > > > >>> -		if (m->next != NULL) {
> > > > > > >>> -			m->next = NULL;
> > > > > > >>> -			m->nb_segs = 1;
> > > > > > >>> -		}
> > > > > > >>> +		m->next = NULL;
> > > > > > >>> +		m->nb_segs = 1;
> > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > >>>
> > > > > > >>>  		return m;
> > > > > > >>> --
> > > > > > >>> 2.25.1
> > > > > > >>
> > > > > >
Morten Brørup Nov. 5, 2020, 4:30 p.m. UTC | #13
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Thursday, November 5, 2020 2:56 PM
> 
> > >
> > > >
> > > > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin
> wrote:
> > > > >
> > > > >
> > > > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko
> wrote:
> > > > > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev,
> Konstantin wrote:
> > > > > > > >>
> > > > > > > >> Hi Olivier,
> > > > > > > >>
> > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value
> of m->next,
> > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> instance in this
> > > > > > > >>> case:
> > > > > > > >>>
> > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > >>>
> > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> initial m1
> > > > > > > >>> segment (this is not required), after this code the
> mbuf chain
> > > > > > > >>> have 3 segments:
> > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > >>>
> > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in
> the second
> > > > > > > >>> segment.
> > > > > > > >>
> > > > > > > >> Hmm, not sure why is that?
> > > > > > > >> You are talking about freeing m1, right?
> > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > >> {
> > > > > > > >> 	...
> > > > > > > >> 	if (m->next != NULL) {
> > > > > > > >>                         m->next = NULL;
> > > > > > > >>                         m->nb_segs = 1;
> > > > > > > >>                 }
> > > > > > > >>
> > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > >> and will reset both next and nb_segs.
> > > > > > > >> What I am missing here?
> > > > > > > >> Thinking in more generic way, that change:
> > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > >>  -			m->next = NULL;
> > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > >>  -		}
> > > > > > > >>  +		m->next = NULL;
> > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > >
> > > > > > > > Ah, sorry. I oversimplified the example and now it does
> not
> > > > > > > > show the issue...
> > > > > > > >
> > > > > > > > The full example also adds a split() to break the mbuf
> chain
> > > > > > > > between m1 and m2. The kind of thing that would be done
> for
> > > > > > > > software TCP segmentation.
> > > > > > > >
> > > > > > >
> > > > > > > If so, may be the right solution is to care about nb_segs
> > > > > > > when next is set to NULL on split? Any place when next is
> set
> > > > > > > to NULL. Just to keep the optimization in a more generic
> place.
> > > > >
> > > > >
> > > > > > The problem with that approach is that there are already
> several
> > > > > > existing split() or trim() implementations in different DPDK-
> based
> > > > > > applications. For instance, we have some in 6WINDGate. If we
> force
> > > > > > applications to set nb_seg to 1 when resetting next, it has
> to be
> > > > > > documented because it is not straightforward.
> > > > >
> > > > > I think it is better to go that way.
> > > > > From my perspective it seems natural to reset nb_seg at same
> time
> > > > > we reset next, otherwise inconsistency will occur.
> > > >
> > > > While it is not explicitly stated for nb_segs, to me it was clear
> that
> > > > nb_segs is only valid in the first segment, like for many fields
> (port,
> > > > ol_flags, vlan, rss, ...).
> > > >
> > > > If we say that nb_segs has to be valid in any segments, it means
> that
> > > > chain() or split() will have to update it in all segments, which
> is not
> > > > efficient.
> > >
> > > Why in all?
> > > We can state that nb_segs on non-first segment should always equal
> 1.
> > > As I understand in that case, both split() and chain() have to
> update nb_segs
> > > only for head mbufs, rest ones will remain untouched.
> >
> > Well, anyway, I think it's strange to have a constraint on m->nb_segs
> for
> > non-first segment. We don't have that kind of constraints for other
> fields.
> 
> True, we don't. But this is one of the fields we consider critical
> for proper work of mbuf alloc/free mechanism.
> 

I am not sure that requiring m->nb_segs == 1 on non-first segments will provide any benefits.

E.g. the second segment of a three-segment chain will still have m->next != NULL, so it cannot be used as a gate to prevent accessing m->next.

I might have overlooked something, though.

> >
> >
> > >
> > > >
> > > > Saying that nb_segs has to be valid for the first and last
> segment seems
> > > > really odd to me. What is the logic behind that except keeping
> this test
> > > > as is?
> > > >
> > > > In any case, it has to be better documented.
> > > >
> > > >
> > > > Olivier
> > > >
> > > >
> > > > > > I think the approach from
> > > > > > this patch is safer.
> > > > >
> > > > > It might be easier from perspective that changes in less places
> are required,
> > > > > Though I think that this patch will introduce some performance
> drop.
> > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> lines unconditionally.
> > > > >
> > > > > > By the way, for 21.11, if we are able to do some
> optimizations and have
> > > > > > both pool (index?) and next in the first cache line, we may
> reconsider
> > > > > > the fact that next and nb_segs are already set for new
> allocated mbufs,
> > > > > > because it is not straightforward either.
> > > > >
> > > > > My suggestion - let's put future optimization discussion aside
> for now,
> > > > > and concentrate on that particular patch.
> > > > >
> > > > > >
> > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> nb_seg=2
> > > > > > > >  - new_m with 1 segment
> > > > > > > >
> > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> segment.
> > > > > > > >
> > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > >> Which seems wrong to me.
> > > > > > > >
> > > > > > > > I don't think it is wrong: nb_seg is just ignored when
> not in the first
> > > > > > > > segment, and there is nothing saying it should be set to
> 1. Typically,
> > > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's
> the same for
> > > > > > > > many similar functions in applications.
> > > > > > > >
> > > > > > > > Olivier
> > > > > > > >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> This is expected that mbufs stored in pool have their
> > > > > > > >>> nb_seg field set to 1.
> > > > > > > >>>
> > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in
> pool")
> > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > >>> ---
> > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > >>>
> > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > >>> @@ -129,10 +129,8 @@
> rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > >>>
> > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > >>> -		m->next = NULL;
> > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > >>> -	}
> > > > > > > >>> +	m->next = NULL;
> > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > >>>  }
> > > > > > > >>>
> > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct
> rte_mbuf *m)
> > > > > > > >>>  				return NULL;
> > > > > > > >>>  		}
> > > > > > > >>>
> > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > >>> -			m->next = NULL;
> > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > >>> -		}
> > > > > > > >>> +		m->next = NULL;
> > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > >>>
> > > > > > > >>>  		return m;
> > > > > > > >>>
> > > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct
> rte_mbuf *m)
> > > > > > > >>>  				return NULL;
> > > > > > > >>>  		}
> > > > > > > >>>
> > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > >>> -			m->next = NULL;
> > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > >>> -		}
> > > > > > > >>> +		m->next = NULL;
> > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > >>>
> > > > > > > >>>  		return m;
> > > > > > > >>> --
> > > > > > > >>> 2.25.1
> > > > > > > >>
> > > > > > >
Ananyev, Konstantin Nov. 5, 2020, 11:55 p.m. UTC | #14
> > > > > > > > >> Hi Olivier,
> > > > > > > > >>
> > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value
> > of m->next,
> > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > instance in this
> > > > > > > > >>> case:
> > > > > > > > >>>
> > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > >>>
> > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > initial m1
> > > > > > > > >>> segment (this is not required), after this code the
> > mbuf chain
> > > > > > > > >>> have 3 segments:
> > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > >>>
> > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in
> > the second
> > > > > > > > >>> segment.
> > > > > > > > >>
> > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > >> {
> > > > > > > > >> 	...
> > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > >>                         m->next = NULL;
> > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > >>                 }
> > > > > > > > >>
> > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > >> What I am missing here?
> > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > >>  -			m->next = NULL;
> > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > >>  -		}
> > > > > > > > >>  +		m->next = NULL;
> > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > >
> > > > > > > > > Ah, sorry. I oversimplified the example and now it does
> > not
> > > > > > > > > show the issue...
> > > > > > > > >
> > > > > > > > > The full example also adds a split() to break the mbuf
> > chain
> > > > > > > > > between m1 and m2. The kind of thing that would be done
> > for
> > > > > > > > > software TCP segmentation.
> > > > > > > > >
> > > > > > > >
> > > > > > > > If so, may be the right solution is to care about nb_segs
> > > > > > > > when next is set to NULL on split? Any place when next is
> > set
> > > > > > > > to NULL. Just to keep the optimization in a more generic
> > place.
> > > > > >
> > > > > >
> > > > > > > The problem with that approach is that there are already
> > several
> > > > > > > existing split() or trim() implementations in different DPDK-
> > based
> > > > > > > applications. For instance, we have some in 6WINDGate. If we
> > force
> > > > > > > applications to set nb_seg to 1 when resetting next, it has
> > to be
> > > > > > > documented because it is not straightforward.
> > > > > >
> > > > > > I think it is better to go that way.
> > > > > > From my perspective it seems natural to reset nb_seg at same
> > time
> > > > > > we reset next, otherwise inconsistency will occur.
> > > > >
> > > > > While it is not explicitly stated for nb_segs, to me it was clear
> > that
> > > > > nb_segs is only valid in the first segment, like for many fields
> > (port,
> > > > > ol_flags, vlan, rss, ...).
> > > > >
> > > > > If we say that nb_segs has to be valid in any segments, it means
> > that
> > > > > chain() or split() will have to update it in all segments, which
> > is not
> > > > > efficient.
> > > >
> > > > Why in all?
> > > > We can state that nb_segs on non-first segment should always equal
> > 1.
> > > > As I understand in that case, both split() and chain() have to
> > update nb_segs
> > > > only for head mbufs, rest ones will remain untouched.
> > >
> > > Well, anyway, I think it's strange to have a constraint on m->nb_segs
> > for
> > > non-first segment. We don't have that kind of constraints for other
> > fields.
> >
> > True, we don't. But this is one of the fields we consider critical
> > for proper work of mbuf alloc/free mechanism.
> >
> 
> I am not sure that requiring m->nb_segs == 1 on non-first segments will provide any benefits.

It would make this patch unneeded.
So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-free.

> 
> E.g. the second segment of a three-segment chain will still have m->next != NULL, so it cannot be used as a gate to prevent accessing m-
> >next.
> 
> I might have overlooked something, though.
> 
> > >
> > >
> > > >
> > > > >
> > > > > Saying that nb_segs has to be valid for the first and last
> > segment seems
> > > > > really odd to me. What is the logic behind that except keeping
> > this test
> > > > > as is?
> > > > >
> > > > > In any case, it has to be better documented.
> > > > >
> > > > >
> > > > > Olivier
> > > > >
> > > > >
> > > > > > > I think the approach from
> > > > > > > this patch is safer.
> > > > > >
> > > > > > It might be easier from perspective that changes in less places
> > are required,
> > > > > > Though I think that this patch will introduce some performance
> > drop.
> > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > lines unconditionally.
> > > > > >
> > > > > > > By the way, for 21.11, if we are able to do some
> > optimizations and have
> > > > > > > both pool (index?) and next in the first cache line, we may
> > reconsider
> > > > > > > the fact that next and nb_segs are already set for new
> > allocated mbufs,
> > > > > > > because it is not straightforward either.
> > > > > >
> > > > > > My suggestion - let's put future optimization discussion aside
> > for now,
> > > > > > and concentrate on that particular patch.
> > > > > >
> > > > > > >
> > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > nb_seg=2
> > > > > > > > >  - new_m with 1 segment
> > > > > > > > >
> > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > segment.
> > > > > > > > >
> > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > >> Which seems wrong to me.
> > > > > > > > >
> > > > > > > > > I don't think it is wrong: nb_seg is just ignored when
> > not in the first
> > > > > > > > > segment, and there is nothing saying it should be set to
> > 1. Typically,
> > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's
> > the same for
> > > > > > > > > many similar functions in applications.
> > > > > > > > >
> > > > > > > > > Olivier
> > > > > > > > >
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> This is expected that mbufs stored in pool have their
> > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > >>>
> > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in
> > pool")
> > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > >>>
> > > > > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > > >>> ---
> > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > >>>
> > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > >>> @@ -129,10 +129,8 @@
> > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > >>>
> > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > >>> -		m->next = NULL;
> > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > >>> -	}
> > > > > > > > >>> +	m->next = NULL;
> > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > >>>  }
> > > > > > > > >>>
> > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct
> > rte_mbuf *m)
> > > > > > > > >>>  				return NULL;
> > > > > > > > >>>  		}
> > > > > > > > >>>
> > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > >>> -			m->next = NULL;
> > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > >>> -		}
> > > > > > > > >>> +		m->next = NULL;
> > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > >>>
> > > > > > > > >>>  		return m;
> > > > > > > > >>>
> > > > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct
> > rte_mbuf *m)
> > > > > > > > >>>  				return NULL;
> > > > > > > > >>>  		}
> > > > > > > > >>>
> > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > >>> -			m->next = NULL;
> > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > >>> -		}
> > > > > > > > >>> +		m->next = NULL;
> > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > >>>
> > > > > > > > >>>  		return m;
> > > > > > > > >>> --
> > > > > > > > >>> 2.25.1
> > > > > > > > >>
> > > > > > > >
Morten Brørup Nov. 6, 2020, 7:52 a.m. UTC | #15
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Friday, November 6, 2020 12:55 AM
> 
> > > > > > > > > >> Hi Olivier,
> > > > > > > > > >>
> > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the
> value
> > > of m->next,
> > > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > > instance in this
> > > > > > > > > >>> case:
> > > > > > > > > >>>
> > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > >>>
> > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > > initial m1
> > > > > > > > > >>> segment (this is not required), after this code the
> > > mbuf chain
> > > > > > > > > >>> have 3 segments:
> > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > >>>
> > > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1
> in
> > > the second
> > > > > > > > > >>> segment.
> > > > > > > > > >>
> > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > >> {
> > > > > > > > > >> 	...
> > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > >>                 }
> > > > > > > > > >>
> > > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > >> What I am missing here?
> > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > >>  -		}
> > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > >
> > > > > > > > > > Ah, sorry. I oversimplified the example and now it
> does
> > > not
> > > > > > > > > > show the issue...
> > > > > > > > > >
> > > > > > > > > > The full example also adds a split() to break the
> mbuf
> > > chain
> > > > > > > > > > between m1 and m2. The kind of thing that would be
> done
> > > for
> > > > > > > > > > software TCP segmentation.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > If so, may be the right solution is to care about
> nb_segs
> > > > > > > > > when next is set to NULL on split? Any place when next
> is
> > > set
> > > > > > > > > to NULL. Just to keep the optimization in a more
> generic
> > > place.
> > > > > > >
> > > > > > >
> > > > > > > > The problem with that approach is that there are already
> > > several
> > > > > > > > existing split() or trim() implementations in different
> DPDK-
> > > based
> > > > > > > > applications. For instance, we have some in 6WINDGate. If
> we
> > > force
> > > > > > > > applications to set nb_seg to 1 when resetting next, it
> has
> > > to be
> > > > > > > > documented because it is not straightforward.
> > > > > > >
> > > > > > > I think it is better to go that way.
> > > > > > > From my perspective it seems natural to reset nb_seg at
> same
> > > time
> > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > >
> > > > > > While it is not explicitly stated for nb_segs, to me it was
> clear
> > > that
> > > > > > nb_segs is only valid in the first segment, like for many
> fields
> > > (port,
> > > > > > ol_flags, vlan, rss, ...).
> > > > > >
> > > > > > If we say that nb_segs has to be valid in any segments, it
> means
> > > that
> > > > > > chain() or split() will have to update it in all segments,
> which
> > > is not
> > > > > > efficient.
> > > > >
> > > > > Why in all?
> > > > > We can state that nb_segs on non-first segment should always
> equal
> > > 1.
> > > > > As I understand in that case, both split() and chain() have to
> > > update nb_segs
> > > > > only for head mbufs, rest ones will remain untouched.
> > > >
> > > > Well, anyway, I think it's strange to have a constraint on m-
> >nb_segs
> > > for
> > > > non-first segment. We don't have that kind of constraints for
> other
> > > fields.
> > >
> > > True, we don't. But this is one of the fields we consider critical
> > > for proper work of mbuf alloc/free mechanism.
> > >
> >
> > I am not sure that requiring m->nb_segs == 1 on non-first segments
> will provide any benefits.
> 
> It would make this patch unneeded.
> So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-
> free.

I see. Then I agree with Konstantin that alternative solutions should be considered.

The benefit regarding free()'ing non-segmented mbufs - which is a very common operation - certainly outweighs the cost of requiring split()/chain() operations to set the new head mbuf's nb_segs = 1.

Nonetheless, the bug needs to be fixed somehow.

If we can't come up with a better solution that doesn't break the ABI, we are forced to accept the patch.

Unless the techboard accepts to break the ABI in order to avoid the performance cost of this patch.

> 
> >
> > E.g. the second segment of a three-segment chain will still have m-
> >next != NULL, so it cannot be used as a gate to prevent accessing m-
> > >next.
> >
> > I might have overlooked something, though.
> >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Saying that nb_segs has to be valid for the first and last
> > > segment seems
> > > > > > really odd to me. What is the logic behind that except
> keeping
> > > this test
> > > > > > as is?
> > > > > >
> > > > > > In any case, it has to be better documented.
> > > > > >
> > > > > >
> > > > > > Olivier
> > > > > >
> > > > > >
> > > > > > > > I think the approach from
> > > > > > > > this patch is safer.
> > > > > > >
> > > > > > > It might be easier from perspective that changes in less
> places
> > > are required,
> > > > > > > Though I think that this patch will introduce some
> performance
> > > drop.
> > > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > > lines unconditionally.
> > > > > > >
> > > > > > > > By the way, for 21.11, if we are able to do some
> > > optimizations and have
> > > > > > > > both pool (index?) and next in the first cache line, we
> may
> > > reconsider
> > > > > > > > the fact that next and nb_segs are already set for new
> > > allocated mbufs,
> > > > > > > > because it is not straightforward either.
> > > > > > >
> > > > > > > My suggestion - let's put future optimization discussion
> aside
> > > for now,
> > > > > > > and concentrate on that particular patch.
> > > > > > >
> > > > > > > >
> > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > > nb_seg=2
> > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > >
> > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > > segment.
> > > > > > > > > >
> > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > >
> > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> when
> > > not in the first
> > > > > > > > > > segment, and there is nothing saying it should be set
> to
> > > 1. Typically,
> > > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess
> it's
> > > the same for
> > > > > > > > > > many similar functions in applications.
> > > > > > > > > >
> > > > > > > > > > Olivier
> > > > > > > > > >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> This is expected that mbufs stored in pool have
> their
> > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > >>>
> > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while
> in
> > > pool")
> > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > >>>
> > > > > > > > > >>> Signed-off-by: Olivier Matz
> <olivier.matz@6wind.com>
> > > > > > > > > >>> ---
> > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > >>>
> > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > >>> -	}
> > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > >>>  }
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> rte_pktmbuf_prefree_seg(struct
> > > rte_mbuf *m)
> > > > > > > > > >>>  				return NULL;
> > > > > > > > > >>>  		}
> > > > > > > > > >>>
> > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > >>> -		}
> > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > >>>
> > > > > > > > > >>>  		return m;
> > > > > > > > > >>>
> > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> rte_pktmbuf_prefree_seg(struct
> > > rte_mbuf *m)
> > > > > > > > > >>>  				return NULL;
> > > > > > > > > >>>  		}
> > > > > > > > > >>>
> > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > >>> -		}
> > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > >>>
> > > > > > > > > >>>  		return m;
> > > > > > > > > >>> --
> > > > > > > > > >>> 2.25.1
> > > > > > > > > >>
> > > > > > > > >
Olivier Matz Nov. 6, 2020, 8:20 a.m. UTC | #16
On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > Sent: Friday, November 6, 2020 12:55 AM
> > 
> > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > >>
> > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the
> > value
> > > > of m->next,
> > > > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > > > instance in this
> > > > > > > > > > >>> case:
> > > > > > > > > > >>>
> > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > >>>
> > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > > > initial m1
> > > > > > > > > > >>> segment (this is not required), after this code the
> > > > mbuf chain
> > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > >>>
> > > > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1
> > in
> > > > the second
> > > > > > > > > > >>> segment.
> > > > > > > > > > >>
> > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > >> {
> > > > > > > > > > >> 	...
> > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > >>                 }
> > > > > > > > > > >>
> > > > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > >> What I am missing here?
> > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > >>  -		}
> > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > >
> > > > > > > > > > > Ah, sorry. I oversimplified the example and now it
> > does
> > > > not
> > > > > > > > > > > show the issue...
> > > > > > > > > > >
> > > > > > > > > > > The full example also adds a split() to break the
> > mbuf
> > > > chain
> > > > > > > > > > > between m1 and m2. The kind of thing that would be
> > done
> > > > for
> > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If so, may be the right solution is to care about
> > nb_segs
> > > > > > > > > > when next is set to NULL on split? Any place when next
> > is
> > > > set
> > > > > > > > > > to NULL. Just to keep the optimization in a more
> > generic
> > > > place.
> > > > > > > >
> > > > > > > >
> > > > > > > > > The problem with that approach is that there are already
> > > > several
> > > > > > > > > existing split() or trim() implementations in different
> > DPDK-
> > > > based
> > > > > > > > > applications. For instance, we have some in 6WINDGate. If
> > we
> > > > force
> > > > > > > > > applications to set nb_seg to 1 when resetting next, it
> > has
> > > > to be
> > > > > > > > > documented because it is not straightforward.
> > > > > > > >
> > > > > > > > I think it is better to go that way.
> > > > > > > > From my perspective it seems natural to reset nb_seg at
> > same
> > > > time
> > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > >
> > > > > > > While it is not explicitly stated for nb_segs, to me it was
> > clear
> > > > that
> > > > > > > nb_segs is only valid in the first segment, like for many
> > fields
> > > > (port,
> > > > > > > ol_flags, vlan, rss, ...).
> > > > > > >
> > > > > > > If we say that nb_segs has to be valid in any segments, it
> > means
> > > > that
> > > > > > > chain() or split() will have to update it in all segments,
> > which
> > > > is not
> > > > > > > efficient.
> > > > > >
> > > > > > Why in all?
> > > > > > We can state that nb_segs on non-first segment should always
> > equal
> > > > 1.
> > > > > > As I understand in that case, both split() and chain() have to
> > > > update nb_segs
> > > > > > only for head mbufs, rest ones will remain untouched.
> > > > >
> > > > > Well, anyway, I think it's strange to have a constraint on m-
> > >nb_segs
> > > > for
> > > > > non-first segment. We don't have that kind of constraints for
> > other
> > > > fields.
> > > >
> > > > True, we don't. But this is one of the fields we consider critical
> > > > for proper work of mbuf alloc/free mechanism.
> > > >
> > >
> > > I am not sure that requiring m->nb_segs == 1 on non-first segments
> > will provide any benefits.
> > 
> > It would make this patch unneeded.
> > So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-
> > free.
> 
> I see. Then I agree with Konstantin that alternative solutions should be considered.
> 
> The benefit regarding free()'ing non-segmented mbufs - which is a very common operation - certainly outweighs the cost of requiring split()/chain() operations to set the new head mbuf's nb_segs = 1.
> 
> Nonetheless, the bug needs to be fixed somehow.
> 
> If we can't come up with a better solution that doesn't break the ABI, we are forced to accept the patch.
> 
> Unless the techboard accepts to break the ABI in order to avoid the performance cost of this patch.

Did someone notice a performance drop with this patch?
On my side, I don't see any regression on a L3 use case.

Let's sumarize: splitting a mbuf chain and freeing it causes subsequent mbuf
allocation to return a mbuf which is not correctly initialized. There are 2
options to fix it:

1/ change the mbuf free function (this patch)

   - m->nb_segs would behave like many other field: valid in the first
     segment, ignored in other segments
   - may impact performance (suspected)

2/ change all places where a mbuf chain is split, or trimmed

   - m->nb_segs would have a specific behavior: count the number of
     segments in the first mbuf, should be 1 in the last segment,
     ignored in other ones.
   - no code change in mbuf library, so no performance impact
   - need to patch all places where we do a mbuf split or trim. From afar,
     I see at least mbuf_cut_seg_ofs() in DPDK. Some external applications
     may have to be patched (for instance, I already found 3 places in
     6WIND code base without a deep search).

In my opinion, 1/ is better, except we notice a significant performance,
because the (implicit) behavior is unchanged.

Whatever the solution, some documentation has to be added.

Olivier


> 
> > 
> > >
> > > E.g. the second segment of a three-segment chain will still have m-
> > >next != NULL, so it cannot be used as a gate to prevent accessing m-
> > > >next.
> > >
> > > I might have overlooked something, though.
> > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Saying that nb_segs has to be valid for the first and last
> > > > segment seems
> > > > > > > really odd to me. What is the logic behind that except
> > keeping
> > > > this test
> > > > > > > as is?
> > > > > > >
> > > > > > > In any case, it has to be better documented.
> > > > > > >
> > > > > > >
> > > > > > > Olivier
> > > > > > >
> > > > > > >
> > > > > > > > > I think the approach from
> > > > > > > > > this patch is safer.
> > > > > > > >
> > > > > > > > It might be easier from perspective that changes in less
> > places
> > > > are required,
> > > > > > > > Though I think that this patch will introduce some
> > performance
> > > > drop.
> > > > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > > > lines unconditionally.
> > > > > > > >
> > > > > > > > > By the way, for 21.11, if we are able to do some
> > > > optimizations and have
> > > > > > > > > both pool (index?) and next in the first cache line, we
> > may
> > > > reconsider
> > > > > > > > > the fact that next and nb_segs are already set for new
> > > > allocated mbufs,
> > > > > > > > > because it is not straightforward either.
> > > > > > > >
> > > > > > > > My suggestion - let's put future optimization discussion
> > aside
> > > > for now,
> > > > > > > > and concentrate on that particular patch.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > > > nb_seg=2
> > > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > > >
> > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > > > segment.
> > > > > > > > > > >
> > > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > > >
> > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> > when
> > > > not in the first
> > > > > > > > > > > segment, and there is nothing saying it should be set
> > to
> > > > 1. Typically,
> > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess
> > it's
> > > > the same for
> > > > > > > > > > > many similar functions in applications.
> > > > > > > > > > >
> > > > > > > > > > > Olivier
> > > > > > > > > > >
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>> This is expected that mbufs stored in pool have
> > their
> > > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while
> > in
> > > > pool")
> > > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > > >>>
> > > > > > > > > > >>> Signed-off-by: Olivier Matz
> > <olivier.matz@6wind.com>
> > > > > > > > > > >>> ---
> > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > > > >>>
> > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > > >>>
> > > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > > >>> -	}
> > > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > > >>>  }
> > > > > > > > > > >>>
> > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> > rte_pktmbuf_prefree_seg(struct
> > > > rte_mbuf *m)
> > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > >>>  		}
> > > > > > > > > > >>>
> > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > >>> -		}
> > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > >>>
> > > > > > > > > > >>>  		return m;
> > > > > > > > > > >>>
> > > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> > rte_pktmbuf_prefree_seg(struct
> > > > rte_mbuf *m)
> > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > >>>  		}
> > > > > > > > > > >>>
> > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > >>> -		}
> > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > > >>>
> > > > > > > > > > >>>  		return m;
> > > > > > > > > > >>> --
> > > > > > > > > > >>> 2.25.1
> > > > > > > > > > >>
> > > > > > > > > >
>
Morten Brørup Nov. 6, 2020, 8:50 a.m. UTC | #17
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 6, 2020 9:21 AM
> 
> On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > Sent: Friday, November 6, 2020 12:55 AM
> > >
> > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > >>
> > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever
> the
> > > value
> > > > > of m->next,
> > > > > > > > > > > >>> because it can happen that m->nb_seg is != 1.
> For
> > > > > instance in this
> > > > > > > > > > > >>> case:
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in
> the
> > > > > initial m1
> > > > > > > > > > > >>> segment (this is not required), after this code
> the
> > > > > mbuf chain
> > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> nb_seg=1
> > > in
> > > > > the second
> > > > > > > > > > > >>> segment.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > >> {
> > > > > > > > > > > >> 	...
> > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > >>                 }
> > > > > > > > > > > >>
> > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> block,
> > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > >>  -		}
> > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > >
> > > > > > > > > > > > Ah, sorry. I oversimplified the example and now
> it
> > > does
> > > > > not
> > > > > > > > > > > > show the issue...
> > > > > > > > > > > >
> > > > > > > > > > > > The full example also adds a split() to break the
> > > mbuf
> > > > > chain
> > > > > > > > > > > > between m1 and m2. The kind of thing that would
> be
> > > done
> > > > > for
> > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > If so, may be the right solution is to care about
> > > nb_segs
> > > > > > > > > > > when next is set to NULL on split? Any place when
> next
> > > is
> > > > > set
> > > > > > > > > > > to NULL. Just to keep the optimization in a more
> > > generic
> > > > > place.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > The problem with that approach is that there are
> already
> > > > > several
> > > > > > > > > > existing split() or trim() implementations in
> different
> > > DPDK-
> > > > > based
> > > > > > > > > > applications. For instance, we have some in
> 6WINDGate. If
> > > we
> > > > > force
> > > > > > > > > > applications to set nb_seg to 1 when resetting next,
> it
> > > has
> > > > > to be
> > > > > > > > > > documented because it is not straightforward.
> > > > > > > > >
> > > > > > > > > I think it is better to go that way.
> > > > > > > > > From my perspective it seems natural to reset nb_seg at
> > > same
> > > > > time
> > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > >
> > > > > > > > While it is not explicitly stated for nb_segs, to me it
> was
> > > clear
> > > > > that
> > > > > > > > nb_segs is only valid in the first segment, like for many
> > > fields
> > > > > (port,
> > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > >
> > > > > > > > If we say that nb_segs has to be valid in any segments,
> it
> > > means
> > > > > that
> > > > > > > > chain() or split() will have to update it in all
> segments,
> > > which
> > > > > is not
> > > > > > > > efficient.
> > > > > > >
> > > > > > > Why in all?
> > > > > > > We can state that nb_segs on non-first segment should
> always
> > > equal
> > > > > 1.
> > > > > > > As I understand in that case, both split() and chain() have
> to
> > > > > update nb_segs
> > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > >
> > > > > > Well, anyway, I think it's strange to have a constraint on m-
> > > >nb_segs
> > > > > for
> > > > > > non-first segment. We don't have that kind of constraints for
> > > other
> > > > > fields.
> > > > >
> > > > > True, we don't. But this is one of the fields we consider
> critical
> > > > > for proper work of mbuf alloc/free mechanism.
> > > > >
> > > >
> > > > I am not sure that requiring m->nb_segs == 1 on non-first
> segments
> > > will provide any benefits.
> > >
> > > It would make this patch unneeded.
> > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> write-
> > > free.
> >
> > I see. Then I agree with Konstantin that alternative solutions should
> be considered.
> >
> > The benefit regarding free()'ing non-segmented mbufs - which is a
> very common operation - certainly outweighs the cost of requiring
> split()/chain() operations to set the new head mbuf's nb_segs = 1.
> >
> > Nonetheless, the bug needs to be fixed somehow.
> >
> > If we can't come up with a better solution that doesn't break the
> ABI, we are forced to accept the patch.
> >
> > Unless the techboard accepts to break the ABI in order to avoid the
> performance cost of this patch.
> 
> Did someone notice a performance drop with this patch?
> On my side, I don't see any regression on a L3 use case.

I am afraid that the DPDK performance regression tests are based on TX immediately following RX, so cache misses in TX may go by unnoticed because RX warmed up the cache for TX already. And similarly for RX reusing mbufs that have been warmed up by the preceding free() at TX.

Please consider testing the performance difference with the mbuf being completely cold at TX, and going completely cold again before being reused for RX.

> 
> Let's sumarize: splitting a mbuf chain and freeing it causes subsequent
> mbuf
> allocation to return a mbuf which is not correctly initialized. There
> are 2
> options to fix it:
> 
> 1/ change the mbuf free function (this patch)
> 
>    - m->nb_segs would behave like many other field: valid in the first
>      segment, ignored in other segments
>    - may impact performance (suspected)
> 
> 2/ change all places where a mbuf chain is split, or trimmed
> 
>    - m->nb_segs would have a specific behavior: count the number of
>      segments in the first mbuf, should be 1 in the last segment,
>      ignored in other ones.
>    - no code change in mbuf library, so no performance impact
>    - need to patch all places where we do a mbuf split or trim. From
> afar,
>      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> applications
>      may have to be patched (for instance, I already found 3 places in
>      6WIND code base without a deep search).
> 
> In my opinion, 1/ is better, except we notice a significant
> performance,
> because the (implicit) behavior is unchanged.
> 
> Whatever the solution, some documentation has to be added.
> 
> Olivier
> 

Unfortunately, I don't think that anything but the first option will go into 20.11 and stable releases of older versions, so I stand by my acknowledgment of the patch.

Which reminds me: Consider reporting the bug in Bugzilla.

> 
> >
> > >
> > > >
> > > > E.g. the second segment of a three-segment chain will still have
> m-
> > > >next != NULL, so it cannot be used as a gate to prevent accessing
> m-
> > > > >next.
> > > >
> > > > I might have overlooked something, though.
> > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Saying that nb_segs has to be valid for the first and
> last
> > > > > segment seems
> > > > > > > > really odd to me. What is the logic behind that except
> > > keeping
> > > > > this test
> > > > > > > > as is?
> > > > > > > >
> > > > > > > > In any case, it has to be better documented.
> > > > > > > >
> > > > > > > >
> > > > > > > > Olivier
> > > > > > > >
> > > > > > > >
> > > > > > > > > > I think the approach from
> > > > > > > > > > this patch is safer.
> > > > > > > > >
> > > > > > > > > It might be easier from perspective that changes in
> less
> > > places
> > > > > are required,
> > > > > > > > > Though I think that this patch will introduce some
> > > performance
> > > > > drop.
> > > > > > > > > As now each mbuf_prefree_seg() will cause update of 2
> cache
> > > > > lines unconditionally.
> > > > > > > > >
> > > > > > > > > > By the way, for 21.11, if we are able to do some
> > > > > optimizations and have
> > > > > > > > > > both pool (index?) and next in the first cache line,
> we
> > > may
> > > > > reconsider
> > > > > > > > > > the fact that next and nb_segs are already set for
> new
> > > > > allocated mbufs,
> > > > > > > > > > because it is not straightforward either.
> > > > > > > > >
> > > > > > > > > My suggestion - let's put future optimization
> discussion
> > > aside
> > > > > for now,
> > > > > > > > > and concentrate on that particular patch.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL
> but
> > > > > nb_seg=2
> > > > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > > > >
> > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the
> second
> > > > > segment.
> > > > > > > > > > > >
> > > > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> > > when
> > > > > not in the first
> > > > > > > > > > > > segment, and there is nothing saying it should be
> set
> > > to
> > > > > 1. Typically,
> > > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I
> guess
> > > it's
> > > > > the same for
> > > > > > > > > > > > many similar functions in applications.
> > > > > > > > > > > >
> > > > > > > > > > > > Olivier
> > > > > > > > > > > >
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>> This is expected that mbufs stored in pool have
> > > their
> > > > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields
> while
> > > in
> > > > > pool")
> > > > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Signed-off-by: Olivier Matz
> > > <olivier.matz@6wind.com>
> > > > > > > > > > > >>> ---
> > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > > > >>>  2 files changed, 6 insertions(+), 12
> deletions(-)
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > > > >>> -	}
> > > > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > > > >>>  }
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> > > rte_pktmbuf_prefree_seg(struct
> > > > > rte_mbuf *m)
> > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > >>>  		}
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > >>> -		}
> > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> > > rte_pktmbuf_prefree_seg(struct
> > > > > rte_mbuf *m)
> > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > >>>  		}
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > >>> -		}
> > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > >>> --
> > > > > > > > > > > >>> 2.25.1
> > > > > > > > > > > >>
> > > > > > > > > > >
> >
Olivier Matz Nov. 6, 2020, 10:04 a.m. UTC | #18
On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote:
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Friday, November 6, 2020 9:21 AM
> > 
> > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > > Sent: Friday, November 6, 2020 12:55 AM
> > > >
> > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever
> > the
> > > > value
> > > > > > of m->next,
> > > > > > > > > > > > >>> because it can happen that m->nb_seg is != 1.
> > For
> > > > > > instance in this
> > > > > > > > > > > > >>> case:
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in
> > the
> > > > > > initial m1
> > > > > > > > > > > > >>> segment (this is not required), after this code
> > the
> > > > > > mbuf chain
> > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > nb_seg=1
> > > > in
> > > > > > the second
> > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > >> {
> > > > > > > > > > > > >> 	...
> > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > >>                 }
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > block,
> > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ah, sorry. I oversimplified the example and now
> > it
> > > > does
> > > > > > not
> > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > >
> > > > > > > > > > > > > The full example also adds a split() to break the
> > > > mbuf
> > > > > > chain
> > > > > > > > > > > > > between m1 and m2. The kind of thing that would
> > be
> > > > done
> > > > > > for
> > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If so, may be the right solution is to care about
> > > > nb_segs
> > > > > > > > > > > > when next is set to NULL on split? Any place when
> > next
> > > > is
> > > > > > set
> > > > > > > > > > > > to NULL. Just to keep the optimization in a more
> > > > generic
> > > > > > place.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > The problem with that approach is that there are
> > already
> > > > > > several
> > > > > > > > > > > existing split() or trim() implementations in
> > different
> > > > DPDK-
> > > > > > based
> > > > > > > > > > > applications. For instance, we have some in
> > 6WINDGate. If
> > > > we
> > > > > > force
> > > > > > > > > > > applications to set nb_seg to 1 when resetting next,
> > it
> > > > has
> > > > > > to be
> > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > >
> > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > From my perspective it seems natural to reset nb_seg at
> > > > same
> > > > > > time
> > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > >
> > > > > > > > > While it is not explicitly stated for nb_segs, to me it
> > was
> > > > clear
> > > > > > that
> > > > > > > > > nb_segs is only valid in the first segment, like for many
> > > > fields
> > > > > > (port,
> > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > >
> > > > > > > > > If we say that nb_segs has to be valid in any segments,
> > it
> > > > means
> > > > > > that
> > > > > > > > > chain() or split() will have to update it in all
> > segments,
> > > > which
> > > > > > is not
> > > > > > > > > efficient.
> > > > > > > >
> > > > > > > > Why in all?
> > > > > > > > We can state that nb_segs on non-first segment should
> > always
> > > > equal
> > > > > > 1.
> > > > > > > > As I understand in that case, both split() and chain() have
> > to
> > > > > > update nb_segs
> > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > >
> > > > > > > Well, anyway, I think it's strange to have a constraint on m-
> > > > >nb_segs
> > > > > > for
> > > > > > > non-first segment. We don't have that kind of constraints for
> > > > other
> > > > > > fields.
> > > > > >
> > > > > > True, we don't. But this is one of the fields we consider
> > critical
> > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > >
> > > > >
> > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > segments
> > > > will provide any benefits.
> > > >
> > > > It would make this patch unneeded.
> > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > write-
> > > > free.
> > >
> > > I see. Then I agree with Konstantin that alternative solutions should
> > be considered.
> > >
> > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > very common operation - certainly outweighs the cost of requiring
> > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > >
> > > Nonetheless, the bug needs to be fixed somehow.
> > >
> > > If we can't come up with a better solution that doesn't break the
> > ABI, we are forced to accept the patch.
> > >
> > > Unless the techboard accepts to break the ABI in order to avoid the
> > performance cost of this patch.
> > 
> > Did someone notice a performance drop with this patch?
> > On my side, I don't see any regression on a L3 use case.
> 
> I am afraid that the DPDK performance regression tests are based on TX immediately following RX, so cache misses in TX may go by unnoticed because RX warmed up the cache for TX already. And similarly for RX reusing mbufs that have been warmed up by the preceding free() at TX.
> 
> Please consider testing the performance difference with the mbuf being completely cold at TX, and going completely cold again before being reused for RX.
> 
> > 
> > Let's sumarize: splitting a mbuf chain and freeing it causes subsequent
> > mbuf
> > allocation to return a mbuf which is not correctly initialized. There
> > are 2
> > options to fix it:
> > 
> > 1/ change the mbuf free function (this patch)
> > 
> >    - m->nb_segs would behave like many other field: valid in the first
> >      segment, ignored in other segments
> >    - may impact performance (suspected)
> > 
> > 2/ change all places where a mbuf chain is split, or trimmed
> > 
> >    - m->nb_segs would have a specific behavior: count the number of
> >      segments in the first mbuf, should be 1 in the last segment,
> >      ignored in other ones.
> >    - no code change in mbuf library, so no performance impact
> >    - need to patch all places where we do a mbuf split or trim. From
> > afar,
> >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > applications
> >      may have to be patched (for instance, I already found 3 places in
> >      6WIND code base without a deep search).
> > 
> > In my opinion, 1/ is better, except we notice a significant
> > performance,
> > because the (implicit) behavior is unchanged.
> > 
> > Whatever the solution, some documentation has to be added.
> > 
> > Olivier
> > 
> 
> Unfortunately, I don't think that anything but the first option will go into 20.11 and stable releases of older versions, so I stand by my acknowledgment of the patch.

If we are affraid about 20.11 performance (it is legitimate, few days
before the release), we can target 21.02. After all, everybody lives
with this bug since 2017, so there is no urgency. If accepted and well
tested, it can be backported in stable branches.


> Which reminds me: Consider reporting the bug in Bugzilla.
> 
> > 
> > >
> > > >
> > > > >
> > > > > E.g. the second segment of a three-segment chain will still have
> > m-
> > > > >next != NULL, so it cannot be used as a gate to prevent accessing
> > m-
> > > > > >next.
> > > > >
> > > > > I might have overlooked something, though.
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Saying that nb_segs has to be valid for the first and
> > last
> > > > > > segment seems
> > > > > > > > > really odd to me. What is the logic behind that except
> > > > keeping
> > > > > > this test
> > > > > > > > > as is?
> > > > > > > > >
> > > > > > > > > In any case, it has to be better documented.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Olivier
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > I think the approach from
> > > > > > > > > > > this patch is safer.
> > > > > > > > > >
> > > > > > > > > > It might be easier from perspective that changes in
> > less
> > > > places
> > > > > > are required,
> > > > > > > > > > Though I think that this patch will introduce some
> > > > performance
> > > > > > drop.
> > > > > > > > > > As now each mbuf_prefree_seg() will cause update of 2
> > cache
> > > > > > lines unconditionally.
> > > > > > > > > >
> > > > > > > > > > > By the way, for 21.11, if we are able to do some
> > > > > > optimizations and have
> > > > > > > > > > > both pool (index?) and next in the first cache line,
> > we
> > > > may
> > > > > > reconsider
> > > > > > > > > > > the fact that next and nb_segs are already set for
> > new
> > > > > > allocated mbufs,
> > > > > > > > > > > because it is not straightforward either.
> > > > > > > > > >
> > > > > > > > > > My suggestion - let's put future optimization
> > discussion
> > > > aside
> > > > > > for now,
> > > > > > > > > > and concentrate on that particular patch.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL
> > but
> > > > > > nb_seg=2
> > > > > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > > > > >
> > > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the
> > second
> > > > > > segment.
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> > > > when
> > > > > > not in the first
> > > > > > > > > > > > > segment, and there is nothing saying it should be
> > set
> > > > to
> > > > > > 1. Typically,
> > > > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I
> > guess
> > > > it's
> > > > > > the same for
> > > > > > > > > > > > > many similar functions in applications.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Olivier
> > > > > > > > > > > > >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>> This is expected that mbufs stored in pool have
> > > > their
> > > > > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields
> > while
> > > > in
> > > > > > pool")
> > > > > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Signed-off-by: Olivier Matz
> > > > <olivier.matz@6wind.com>
> > > > > > > > > > > > >>> ---
> > > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > > > > >>>  2 files changed, 6 insertions(+), 12
> > deletions(-)
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > > > > >>> -	}
> > > > > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > > > > >>>  }
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> > > > rte_pktmbuf_prefree_seg(struct
> > > > > > rte_mbuf *m)
> > > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > > >>>  		}
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > > >>> -		}
> > > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> > > > rte_pktmbuf_prefree_seg(struct
> > > > > > rte_mbuf *m)
> > > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > > >>>  		}
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > > >>> -		}
> > > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > > >>> --
> > > > > > > > > > > > >>> 2.25.1
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > >
>
Morten Brørup Nov. 6, 2020, 10:07 a.m. UTC | #19
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 6, 2020 11:05 AM
> 
> On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote:
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Friday, November 6, 2020 9:21 AM
> > >
> > > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > > > Sent: Friday, November 6, 2020 12:55 AM
> > > > >
> > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> whatever
> > > the
> > > > > value
> > > > > > > of m->next,
> > > > > > > > > > > > > >>> because it can happen that m->nb_seg is !=
> 1.
> > > For
> > > > > > > instance in this
> > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> nb_seg in
> > > the
> > > > > > > initial m1
> > > > > > > > > > > > > >>> segment (this is not required), after this
> code
> > > the
> > > > > > > mbuf chain
> > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > > nb_seg=1
> > > > > in
> > > > > > > the second
> > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > > >> {
> > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > > block,
> > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and
> now
> > > it
> > > > > does
> > > > > > > not
> > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The full example also adds a split() to break
> the
> > > > > mbuf
> > > > > > > chain
> > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> would
> > > be
> > > > > done
> > > > > > > for
> > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If so, may be the right solution is to care
> about
> > > > > nb_segs
> > > > > > > > > > > > > when next is set to NULL on split? Any place
> when
> > > next
> > > > > is
> > > > > > > set
> > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> more
> > > > > generic
> > > > > > > place.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > The problem with that approach is that there are
> > > already
> > > > > > > several
> > > > > > > > > > > > existing split() or trim() implementations in
> > > different
> > > > > DPDK-
> > > > > > > based
> > > > > > > > > > > > applications. For instance, we have some in
> > > 6WINDGate. If
> > > > > we
> > > > > > > force
> > > > > > > > > > > > applications to set nb_seg to 1 when resetting
> next,
> > > it
> > > > > has
> > > > > > > to be
> > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > >
> > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > From my perspective it seems natural to reset
> nb_seg at
> > > > > same
> > > > > > > time
> > > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > > >
> > > > > > > > > > While it is not explicitly stated for nb_segs, to me
> it
> > > was
> > > > > clear
> > > > > > > that
> > > > > > > > > > nb_segs is only valid in the first segment, like for
> many
> > > > > fields
> > > > > > > (port,
> > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > >
> > > > > > > > > > If we say that nb_segs has to be valid in any
> segments,
> > > it
> > > > > means
> > > > > > > that
> > > > > > > > > > chain() or split() will have to update it in all
> > > segments,
> > > > > which
> > > > > > > is not
> > > > > > > > > > efficient.
> > > > > > > > >
> > > > > > > > > Why in all?
> > > > > > > > > We can state that nb_segs on non-first segment should
> > > always
> > > > > equal
> > > > > > > 1.
> > > > > > > > > As I understand in that case, both split() and chain()
> have
> > > to
> > > > > > > update nb_segs
> > > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > > >
> > > > > > > > Well, anyway, I think it's strange to have a constraint
> on m-
> > > > > >nb_segs
> > > > > > > for
> > > > > > > > non-first segment. We don't have that kind of constraints
> for
> > > > > other
> > > > > > > fields.
> > > > > > >
> > > > > > > True, we don't. But this is one of the fields we consider
> > > critical
> > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > >
> > > > > >
> > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > segments
> > > > > will provide any benefits.
> > > > >
> > > > > It would make this patch unneeded.
> > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > > write-
> > > > > free.
> > > >
> > > > I see. Then I agree with Konstantin that alternative solutions
> should
> > > be considered.
> > > >
> > > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > > very common operation - certainly outweighs the cost of requiring
> > > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > > >
> > > > Nonetheless, the bug needs to be fixed somehow.
> > > >
> > > > If we can't come up with a better solution that doesn't break the
> > > ABI, we are forced to accept the patch.
> > > >
> > > > Unless the techboard accepts to break the ABI in order to avoid
> the
> > > performance cost of this patch.
> > >
> > > Did someone notice a performance drop with this patch?
> > > On my side, I don't see any regression on a L3 use case.
> >
> > I am afraid that the DPDK performance regression tests are based on
> TX immediately following RX, so cache misses in TX may go by unnoticed
> because RX warmed up the cache for TX already. And similarly for RX
> reusing mbufs that have been warmed up by the preceding free() at TX.
> >
> > Please consider testing the performance difference with the mbuf
> being completely cold at TX, and going completely cold again before
> being reused for RX.
> >
> > >
> > > Let's sumarize: splitting a mbuf chain and freeing it causes
> subsequent
> > > mbuf
> > > allocation to return a mbuf which is not correctly initialized.
> There
> > > are 2
> > > options to fix it:
> > >
> > > 1/ change the mbuf free function (this patch)
> > >
> > >    - m->nb_segs would behave like many other field: valid in the
> first
> > >      segment, ignored in other segments
> > >    - may impact performance (suspected)
> > >
> > > 2/ change all places where a mbuf chain is split, or trimmed
> > >
> > >    - m->nb_segs would have a specific behavior: count the number of
> > >      segments in the first mbuf, should be 1 in the last segment,
> > >      ignored in other ones.
> > >    - no code change in mbuf library, so no performance impact
> > >    - need to patch all places where we do a mbuf split or trim.
> From
> > > afar,
> > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > applications
> > >      may have to be patched (for instance, I already found 3 places
> in
> > >      6WIND code base without a deep search).
> > >
> > > In my opinion, 1/ is better, except we notice a significant
> > > performance,
> > > because the (implicit) behavior is unchanged.
> > >
> > > Whatever the solution, some documentation has to be added.
> > >
> > > Olivier
> > >
> >
> > Unfortunately, I don't think that anything but the first option will
> go into 20.11 and stable releases of older versions, so I stand by my
> acknowledgment of the patch.
> 
> If we are affraid about 20.11 performance (it is legitimate, few days
> before the release), we can target 21.02. After all, everybody lives
> with this bug since 2017, so there is no urgency. If accepted and well
> tested, it can be backported in stable branches.

+1

Good thinking, Olivier!
Ananyev, Konstantin Nov. 6, 2020, 11:53 a.m. UTC | #20
> > > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> > whatever
> > > > the
> > > > > > value
> > > > > > > > of m->next,
> > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is !=
> > 1.
> > > > For
> > > > > > > > instance in this
> > > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> > nb_seg in
> > > > the
> > > > > > > > initial m1
> > > > > > > > > > > > > > >>> segment (this is not required), after this
> > code
> > > > the
> > > > > > > > mbuf chain
> > > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > > > nb_seg=1
> > > > > > in
> > > > > > > > the second
> > > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > > > >> {
> > > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > > > block,
> > > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and
> > now
> > > > it
> > > > > > does
> > > > > > > > not
> > > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The full example also adds a split() to break
> > the
> > > > > > mbuf
> > > > > > > > chain
> > > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> > would
> > > > be
> > > > > > done
> > > > > > > > for
> > > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If so, may be the right solution is to care
> > about
> > > > > > nb_segs
> > > > > > > > > > > > > > when next is set to NULL on split? Any place
> > when
> > > > next
> > > > > > is
> > > > > > > > set
> > > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> > more
> > > > > > generic
> > > > > > > > place.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > The problem with that approach is that there are
> > > > already
> > > > > > > > several
> > > > > > > > > > > > > existing split() or trim() implementations in
> > > > different
> > > > > > DPDK-
> > > > > > > > based
> > > > > > > > > > > > > applications. For instance, we have some in
> > > > 6WINDGate. If
> > > > > > we
> > > > > > > > force
> > > > > > > > > > > > > applications to set nb_seg to 1 when resetting
> > next,
> > > > it
> > > > > > has
> > > > > > > > to be
> > > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > > >
> > > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > > From my perspective it seems natural to reset
> > nb_seg at
> > > > > > same
> > > > > > > > time
> > > > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > > > >
> > > > > > > > > > > While it is not explicitly stated for nb_segs, to me
> > it
> > > > was
> > > > > > clear
> > > > > > > > that
> > > > > > > > > > > nb_segs is only valid in the first segment, like for
> > many
> > > > > > fields
> > > > > > > > (port,
> > > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > > >
> > > > > > > > > > > If we say that nb_segs has to be valid in any
> > segments,
> > > > it
> > > > > > means
> > > > > > > > that
> > > > > > > > > > > chain() or split() will have to update it in all
> > > > segments,
> > > > > > which
> > > > > > > > is not
> > > > > > > > > > > efficient.
> > > > > > > > > >
> > > > > > > > > > Why in all?
> > > > > > > > > > We can state that nb_segs on non-first segment should
> > > > always
> > > > > > equal
> > > > > > > > 1.
> > > > > > > > > > As I understand in that case, both split() and chain()
> > have
> > > > to
> > > > > > > > update nb_segs
> > > > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > > > >
> > > > > > > > > Well, anyway, I think it's strange to have a constraint
> > on m-
> > > > > > >nb_segs
> > > > > > > > for
> > > > > > > > > non-first segment. We don't have that kind of constraints
> > for
> > > > > > other
> > > > > > > > fields.
> > > > > > > >
> > > > > > > > True, we don't. But this is one of the fields we consider
> > > > critical
> > > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > > >
> > > > > > >
> > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > > segments
> > > > > > will provide any benefits.
> > > > > >
> > > > > > It would make this patch unneeded.
> > > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > > > write-
> > > > > > free.
> > > > >
> > > > > I see. Then I agree with Konstantin that alternative solutions
> > should
> > > > be considered.
> > > > >
> > > > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > > > very common operation - certainly outweighs the cost of requiring
> > > > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > > > >
> > > > > Nonetheless, the bug needs to be fixed somehow.
> > > > >
> > > > > If we can't come up with a better solution that doesn't break the
> > > > ABI, we are forced to accept the patch.
> > > > >
> > > > > Unless the techboard accepts to break the ABI in order to avoid
> > the
> > > > performance cost of this patch.
> > > >
> > > > Did someone notice a performance drop with this patch?
> > > > On my side, I don't see any regression on a L3 use case.
> > >
> > > I am afraid that the DPDK performance regression tests are based on
> > TX immediately following RX, so cache misses in TX may go by unnoticed
> > because RX warmed up the cache for TX already. And similarly for RX
> > reusing mbufs that have been warmed up by the preceding free() at TX.
> > >
> > > Please consider testing the performance difference with the mbuf
> > being completely cold at TX, and going completely cold again before
> > being reused for RX.
> > >
> > > >
> > > > Let's sumarize: splitting a mbuf chain and freeing it causes
> > subsequent
> > > > mbuf
> > > > allocation to return a mbuf which is not correctly initialized.
> > There
> > > > are 2
> > > > options to fix it:
> > > >
> > > > 1/ change the mbuf free function (this patch)
> > > >
> > > >    - m->nb_segs would behave like many other field: valid in the
> > first
> > > >      segment, ignored in other segments
> > > >    - may impact performance (suspected)
> > > >
> > > > 2/ change all places where a mbuf chain is split, or trimmed
> > > >
> > > >    - m->nb_segs would have a specific behavior: count the number of
> > > >      segments in the first mbuf, should be 1 in the last segment,
> > > >      ignored in other ones.
> > > >    - no code change in mbuf library, so no performance impact
> > > >    - need to patch all places where we do a mbuf split or trim.
> > From
> > > > afar,
> > > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > > applications
> > > >      may have to be patched (for instance, I already found 3 places
> > in
> > > >      6WIND code base without a deep search).
> > > >
> > > > In my opinion, 1/ is better, except we notice a significant
> > > > performance,
> > > > because the (implicit) behavior is unchanged.
> > > >
> > > > Whatever the solution, some documentation has to be added.
> > > >
> > > > Olivier
> > > >
> > >
> > > Unfortunately, I don't think that anything but the first option will
> > go into 20.11 and stable releases of older versions, so I stand by my
> > acknowledgment of the patch.
> >
> > If we are affraid about 20.11 performance (it is legitimate, few days
> > before the release), we can target 21.02. After all, everybody lives
> > with this bug since 2017, so there is no urgency. If accepted and well
> > tested, it can be backported in stable branches.
> 
> +1
> 
> Good thinking, Olivier!

Looking at the changes once again, it probably can be reworked a bit:

-	if (m->next != NULL) {
-		m->next = NULL;
-		m->nb_segs = 1;
-	}

+	if (m->next != NULL)
+		m->next = NULL;
+	if (m->nb_segs != 1)
+		m->nb_segs = 1;

That way we add one more condition checking, but I suppose it
shouldn't be that perf critical.
That way for direct,non-segmented mbuf it still should be write-free.
Except cases as you described above: chain(), then split(). 

Of-course we still need to do perf testing for that approach too.
So if your preference it to postpone it till 21.02 - that's ok for me.
Konstantin
Morten Brørup Nov. 6, 2020, 12:23 p.m. UTC | #21
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Friday, November 6, 2020 12:54 PM
> 
> > > > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> > > whatever
> > > > > the
> > > > > > > value
> > > > > > > > > of m->next,
> > > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is
> !=
> > > 1.
> > > > > For
> > > > > > > > > instance in this
> > > > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> > > nb_seg in
> > > > > the
> > > > > > > > > initial m1
> > > > > > > > > > > > > > > >>> segment (this is not required), after
> this
> > > code
> > > > > the
> > > > > > > > > mbuf chain
> > > > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> Freeing this mbuf chain will not
> restore
> > > > > nb_seg=1
> > > > > > > in
> > > > > > > > > the second
> > > > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf
> *m)
> > > > > > > > > > > > > > > >> {
> > > > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the
> if()
> > > > > block,
> > > > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > > > >> Thinking in more generic way, that
> change:
> > > > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example
> and
> > > now
> > > > > it
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The full example also adds a split() to
> break
> > > the
> > > > > > > mbuf
> > > > > > > > > chain
> > > > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> > > would
> > > > > be
> > > > > > > done
> > > > > > > > > for
> > > > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If so, may be the right solution is to care
> > > about
> > > > > > > nb_segs
> > > > > > > > > > > > > > > when next is set to NULL on split? Any
> place
> > > when
> > > > > next
> > > > > > > is
> > > > > > > > > set
> > > > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> > > more
> > > > > > > generic
> > > > > > > > > place.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The problem with that approach is that there
> are
> > > > > already
> > > > > > > > > several
> > > > > > > > > > > > > > existing split() or trim() implementations in
> > > > > different
> > > > > > > DPDK-
> > > > > > > > > based
> > > > > > > > > > > > > > applications. For instance, we have some in
> > > > > 6WINDGate. If
> > > > > > > we
> > > > > > > > > force
> > > > > > > > > > > > > > applications to set nb_seg to 1 when
> resetting
> > > next,
> > > > > it
> > > > > > > has
> > > > > > > > > to be
> > > > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > > > From my perspective it seems natural to reset
> > > nb_seg at
> > > > > > > same
> > > > > > > > > time
> > > > > > > > > > > > > we reset next, otherwise inconsistency will
> occur.
> > > > > > > > > > > >
> > > > > > > > > > > > While it is not explicitly stated for nb_segs, to
> me
> > > it
> > > > > was
> > > > > > > clear
> > > > > > > > > that
> > > > > > > > > > > > nb_segs is only valid in the first segment, like
> for
> > > many
> > > > > > > fields
> > > > > > > > > (port,
> > > > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > > > >
> > > > > > > > > > > > If we say that nb_segs has to be valid in any
> > > segments,
> > > > > it
> > > > > > > means
> > > > > > > > > that
> > > > > > > > > > > > chain() or split() will have to update it in all
> > > > > segments,
> > > > > > > which
> > > > > > > > > is not
> > > > > > > > > > > > efficient.
> > > > > > > > > > >
> > > > > > > > > > > Why in all?
> > > > > > > > > > > We can state that nb_segs on non-first segment
> should
> > > > > always
> > > > > > > equal
> > > > > > > > > 1.
> > > > > > > > > > > As I understand in that case, both split() and
> chain()
> > > have
> > > > > to
> > > > > > > > > update nb_segs
> > > > > > > > > > > only for head mbufs, rest ones will remain
> untouched.
> > > > > > > > > >
> > > > > > > > > > Well, anyway, I think it's strange to have a
> constraint
> > > on m-
> > > > > > > >nb_segs
> > > > > > > > > for
> > > > > > > > > > non-first segment. We don't have that kind of
> constraints
> > > for
> > > > > > > other
> > > > > > > > > fields.
> > > > > > > > >
> > > > > > > > > True, we don't. But this is one of the fields we
> consider
> > > > > critical
> > > > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > > > segments
> > > > > > > will provide any benefits.
> > > > > > >
> > > > > > > It would make this patch unneeded.
> > > > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will
> remain
> > > > > write-
> > > > > > > free.
> > > > > >
> > > > > > I see. Then I agree with Konstantin that alternative
> solutions
> > > should
> > > > > be considered.
> > > > > >
> > > > > > The benefit regarding free()'ing non-segmented mbufs - which
> is a
> > > > > very common operation - certainly outweighs the cost of
> requiring
> > > > > split()/chain() operations to set the new head mbuf's nb_segs =
> 1.
> > > > > >
> > > > > > Nonetheless, the bug needs to be fixed somehow.
> > > > > >
> > > > > > If we can't come up with a better solution that doesn't break
> the
> > > > > ABI, we are forced to accept the patch.
> > > > > >
> > > > > > Unless the techboard accepts to break the ABI in order to
> avoid
> > > the
> > > > > performance cost of this patch.
> > > > >
> > > > > Did someone notice a performance drop with this patch?
> > > > > On my side, I don't see any regression on a L3 use case.
> > > >
> > > > I am afraid that the DPDK performance regression tests are based
> on
> > > TX immediately following RX, so cache misses in TX may go by
> unnoticed
> > > because RX warmed up the cache for TX already. And similarly for RX
> > > reusing mbufs that have been warmed up by the preceding free() at
> TX.
> > > >
> > > > Please consider testing the performance difference with the mbuf
> > > being completely cold at TX, and going completely cold again before
> > > being reused for RX.
> > > >
> > > > >
> > > > > Let's sumarize: splitting a mbuf chain and freeing it causes
> > > subsequent
> > > > > mbuf
> > > > > allocation to return a mbuf which is not correctly initialized.
> > > There
> > > > > are 2
> > > > > options to fix it:
> > > > >
> > > > > 1/ change the mbuf free function (this patch)
> > > > >
> > > > >    - m->nb_segs would behave like many other field: valid in
> the
> > > first
> > > > >      segment, ignored in other segments
> > > > >    - may impact performance (suspected)
> > > > >
> > > > > 2/ change all places where a mbuf chain is split, or trimmed
> > > > >
> > > > >    - m->nb_segs would have a specific behavior: count the
> number of
> > > > >      segments in the first mbuf, should be 1 in the last
> segment,
> > > > >      ignored in other ones.
> > > > >    - no code change in mbuf library, so no performance impact
> > > > >    - need to patch all places where we do a mbuf split or trim.
> > > From
> > > > > afar,
> > > > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > > > applications
> > > > >      may have to be patched (for instance, I already found 3
> places
> > > in
> > > > >      6WIND code base without a deep search).
> > > > >
> > > > > In my opinion, 1/ is better, except we notice a significant
> > > > > performance,
> > > > > because the (implicit) behavior is unchanged.
> > > > >
> > > > > Whatever the solution, some documentation has to be added.
> > > > >
> > > > > Olivier
> > > > >
> > > >
> > > > Unfortunately, I don't think that anything but the first option
> will
> > > go into 20.11 and stable releases of older versions, so I stand by
> my
> > > acknowledgment of the patch.
> > >
> > > If we are affraid about 20.11 performance (it is legitimate, few
> days
> > > before the release), we can target 21.02. After all, everybody
> lives
> > > with this bug since 2017, so there is no urgency. If accepted and
> well
> > > tested, it can be backported in stable branches.
> >
> > +1
> >
> > Good thinking, Olivier!
> 
> Looking at the changes once again, it probably can be reworked a bit:
> 
> -	if (m->next != NULL) {
> -		m->next = NULL;
> -		m->nb_segs = 1;
> -	}
> 
> +	if (m->next != NULL)
> +		m->next = NULL;
> +	if (m->nb_segs != 1)
> +		m->nb_segs = 1;
> 
> That way we add one more condition checking, but I suppose it
> shouldn't be that perf critical.
> That way for direct,non-segmented mbuf it still should be write-free.
> Except cases as you described above: chain(), then split().
> 
> Of-course we still need to do perf testing for that approach too.
> So if your preference it to postpone it till 21.02 - that's ok for me.
> Konstantin

With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
Ali Alnubani Nov. 8, 2020, 7:25 a.m. UTC | #22
Hi Olivier,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Olivier Matz
> Sent: Wednesday, November 4, 2020 7:00 PM
> To: dev@dpdk.org
> Cc: konstantin.ananyev@intel.com; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
> 

I can reproduce the failure in "ci/iol-mellanox-Performance" locally also with this patch.
I see a 2 mpps degradation with single core on ConnectX-5. Packet size is 64B. Testpmd command:
"""
dpdk-testpmd -n 4  -w 0000:82:00.1  -w 0000:82:00.0 -c 0x9000  -- --burst=64 --mbcache=512 -i  --nb-cores=1  --txd=256 --rxd=256
"""

Regards,
Ali
Andrew Rybchenko Nov. 8, 2020, 2:16 p.m. UTC | #23
On 11/6/20 3:23 PM, Morten Brørup wrote:
>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>> Sent: Friday, November 6, 2020 12:54 PM
>>
>>>>>>>>>>>>>>>>>> Hi Olivier,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
>>>> whatever
>>>>>> the
>>>>>>>> value
>>>>>>>>>> of m->next,
>>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
>> !=
>>>> 1.
>>>>>> For
>>>>>>>>>> instance in this
>>>>>>>>>>>>>>>>>>> case:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
>>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
>>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
>>>> nb_seg in
>>>>>> the
>>>>>>>>>> initial m1
>>>>>>>>>>>>>>>>>>> segment (this is not required), after
>> this
>>>> code
>>>>>> the
>>>>>>>>>> mbuf chain
>>>>>>>>>>>>>>>>>>> have 3 segments:
>>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
>>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
>>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
>> restore
>>>>>> nb_seg=1
>>>>>>>> in
>>>>>>>>>> the second
>>>>>>>>>>>>>>>>>>> segment.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
>>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
>>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
>> *m)
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> 	...
>>>>>>>>>>>>>>>>>> 	if (m->next != NULL) {
>>>>>>>>>>>>>>>>>>                          m->next = NULL;
>>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
>>>>>>>>>>>>>>>>>>                  }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
>> if()
>>>>>> block,
>>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
>>>>>>>>>>>>>>>>>> What I am missing here?
>>>>>>>>>>>>>>>>>> Thinking in more generic way, that
>> change:
>>>>>>>>>>>>>>>>>>   -		if (m->next != NULL) {
>>>>>>>>>>>>>>>>>>   -			m->next = NULL;
>>>>>>>>>>>>>>>>>>   -			m->nb_segs = 1;
>>>>>>>>>>>>>>>>>>   -		}
>>>>>>>>>>>>>>>>>>   +		m->next = NULL;
>>>>>>>>>>>>>>>>>>   +		m->nb_segs = 1;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
>> and
>>>> now
>>>>>> it
>>>>>>>> does
>>>>>>>>>> not
>>>>>>>>>>>>>>>>> show the issue...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The full example also adds a split() to
>> break
>>>> the
>>>>>>>> mbuf
>>>>>>>>>> chain
>>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
>>>> would
>>>>>> be
>>>>>>>> done
>>>>>>>>>> for
>>>>>>>>>>>>>>>>> software TCP segmentation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If so, may be the right solution is to care
>>>> about
>>>>>>>> nb_segs
>>>>>>>>>>>>>>>> when next is set to NULL on split? Any
>> place
>>>> when
>>>>>> next
>>>>>>>> is
>>>>>>>>>> set
>>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
>>>> more
>>>>>>>> generic
>>>>>>>>>> place.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem with that approach is that there
>> are
>>>>>> already
>>>>>>>>>> several
>>>>>>>>>>>>>>> existing split() or trim() implementations in
>>>>>> different
>>>>>>>> DPDK-
>>>>>>>>>> based
>>>>>>>>>>>>>>> applications. For instance, we have some in
>>>>>> 6WINDGate. If
>>>>>>>> we
>>>>>>>>>> force
>>>>>>>>>>>>>>> applications to set nb_seg to 1 when
>> resetting
>>>> next,
>>>>>> it
>>>>>>>> has
>>>>>>>>>> to be
>>>>>>>>>>>>>>> documented because it is not straightforward.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think it is better to go that way.
>>>>>>>>>>>>>>  From my perspective it seems natural to reset
>>>> nb_seg at
>>>>>>>> same
>>>>>>>>>> time
>>>>>>>>>>>>>> we reset next, otherwise inconsistency will
>> occur.
>>>>>>>>>>>>>
>>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
>> me
>>>> it
>>>>>> was
>>>>>>>> clear
>>>>>>>>>> that
>>>>>>>>>>>>> nb_segs is only valid in the first segment, like
>> for
>>>> many
>>>>>>>> fields
>>>>>>>>>> (port,
>>>>>>>>>>>>> ol_flags, vlan, rss, ...).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we say that nb_segs has to be valid in any
>>>> segments,
>>>>>> it
>>>>>>>> means
>>>>>>>>>> that
>>>>>>>>>>>>> chain() or split() will have to update it in all
>>>>>> segments,
>>>>>>>> which
>>>>>>>>>> is not
>>>>>>>>>>>>> efficient.
>>>>>>>>>>>>
>>>>>>>>>>>> Why in all?
>>>>>>>>>>>> We can state that nb_segs on non-first segment
>> should
>>>>>> always
>>>>>>>> equal
>>>>>>>>>> 1.
>>>>>>>>>>>> As I understand in that case, both split() and
>> chain()
>>>> have
>>>>>> to
>>>>>>>>>> update nb_segs
>>>>>>>>>>>> only for head mbufs, rest ones will remain
>> untouched.
>>>>>>>>>>>
>>>>>>>>>>> Well, anyway, I think it's strange to have a
>> constraint
>>>> on m-
>>>>>>>>> nb_segs
>>>>>>>>>> for
>>>>>>>>>>> non-first segment. We don't have that kind of
>> constraints
>>>> for
>>>>>>>> other
>>>>>>>>>> fields.
>>>>>>>>>>
>>>>>>>>>> True, we don't. But this is one of the fields we
>> consider
>>>>>> critical
>>>>>>>>>> for proper work of mbuf alloc/free mechanism.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
>>>>>> segments
>>>>>>>> will provide any benefits.
>>>>>>>>
>>>>>>>> It would make this patch unneeded.
>>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
>> remain
>>>>>> write-
>>>>>>>> free.
>>>>>>>
>>>>>>> I see. Then I agree with Konstantin that alternative
>> solutions
>>>> should
>>>>>> be considered.
>>>>>>>
>>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
>> is a
>>>>>> very common operation - certainly outweighs the cost of
>> requiring
>>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
>> 1.
>>>>>>>
>>>>>>> Nonetheless, the bug needs to be fixed somehow.
>>>>>>>
>>>>>>> If we can't come up with a better solution that doesn't break
>> the
>>>>>> ABI, we are forced to accept the patch.
>>>>>>>
>>>>>>> Unless the techboard accepts to break the ABI in order to
>> avoid
>>>> the
>>>>>> performance cost of this patch.
>>>>>>
>>>>>> Did someone notice a performance drop with this patch?
>>>>>> On my side, I don't see any regression on a L3 use case.
>>>>>
>>>>> I am afraid that the DPDK performance regression tests are based
>> on
>>>> TX immediately following RX, so cache misses in TX may go by
>> unnoticed
>>>> because RX warmed up the cache for TX already. And similarly for RX
>>>> reusing mbufs that have been warmed up by the preceding free() at
>> TX.
>>>>>
>>>>> Please consider testing the performance difference with the mbuf
>>>> being completely cold at TX, and going completely cold again before
>>>> being reused for RX.
>>>>>
>>>>>>
>>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
>>>> subsequent
>>>>>> mbuf
>>>>>> allocation to return a mbuf which is not correctly initialized.
>>>> There
>>>>>> are 2
>>>>>> options to fix it:
>>>>>>
>>>>>> 1/ change the mbuf free function (this patch)
>>>>>>
>>>>>>     - m->nb_segs would behave like many other field: valid in
>> the
>>>> first
>>>>>>       segment, ignored in other segments
>>>>>>     - may impact performance (suspected)
>>>>>>
>>>>>> 2/ change all places where a mbuf chain is split, or trimmed
>>>>>>
>>>>>>     - m->nb_segs would have a specific behavior: count the
>> number of
>>>>>>       segments in the first mbuf, should be 1 in the last
>> segment,
>>>>>>       ignored in other ones.
>>>>>>     - no code change in mbuf library, so no performance impact
>>>>>>     - need to patch all places where we do a mbuf split or trim.
>>>> From
>>>>>> afar,
>>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
>>>>>> applications
>>>>>>       may have to be patched (for instance, I already found 3
>> places
>>>> in
>>>>>>       6WIND code base without a deep search).
>>>>>>
>>>>>> In my opinion, 1/ is better, except we notice a significant
>>>>>> performance,
>>>>>> because the (implicit) behavior is unchanged.
>>>>>>
>>>>>> Whatever the solution, some documentation has to be added.
>>>>>>
>>>>>> Olivier
>>>>>>
>>>>>
>>>>> Unfortunately, I don't think that anything but the first option
>> will
>>>> go into 20.11 and stable releases of older versions, so I stand by
>> my
>>>> acknowledgment of the patch.
>>>>
>>>> If we are affraid about 20.11 performance (it is legitimate, few
>> days
>>>> before the release), we can target 21.02. After all, everybody
>> lives
>>>> with this bug since 2017, so there is no urgency. If accepted and
>> well
>>>> tested, it can be backported in stable branches.
>>>
>>> +1
>>>
>>> Good thinking, Olivier!
>>
>> Looking at the changes once again, it probably can be reworked a bit:
>>
>> -	if (m->next != NULL) {
>> -		m->next = NULL;
>> -		m->nb_segs = 1;
>> -	}
>>
>> +	if (m->next != NULL)
>> +		m->next = NULL;
>> +	if (m->nb_segs != 1)
>> +		m->nb_segs = 1;
>>
>> That way we add one more condition checking, but I suppose it
>> shouldn't be that perf critical.
>> That way for direct,non-segmented mbuf it still should be write-free.
>> Except cases as you described above: chain(), then split().
>>
>> Of-course we still need to do perf testing for that approach too.
>> So if your preference it to postpone it till 21.02 - that's ok for me.
>> Konstantin
> 
> With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.

+1
Ananyev, Konstantin Nov. 8, 2020, 2:19 p.m. UTC | #24
> >>
> >>>>>>>>>>>>>>>>>> Hi Olivier,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
> >>>> whatever
> >>>>>> the
> >>>>>>>> value
> >>>>>>>>>> of m->next,
> >>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
> >> !=
> >>>> 1.
> >>>>>> For
> >>>>>>>>>> instance in this
> >>>>>>>>>>>>>>>>>>> case:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
> >>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
> >>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
> >>>> nb_seg in
> >>>>>> the
> >>>>>>>>>> initial m1
> >>>>>>>>>>>>>>>>>>> segment (this is not required), after
> >> this
> >>>> code
> >>>>>> the
> >>>>>>>>>> mbuf chain
> >>>>>>>>>>>>>>>>>>> have 3 segments:
> >>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
> >>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
> >>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
> >> restore
> >>>>>> nb_seg=1
> >>>>>>>> in
> >>>>>>>>>> the second
> >>>>>>>>>>>>>>>>>>> segment.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
> >>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
> >>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
> >> *m)
> >>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>> 	...
> >>>>>>>>>>>>>>>>>> 	if (m->next != NULL) {
> >>>>>>>>>>>>>>>>>>                          m->next = NULL;
> >>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>>                  }
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
> >> if()
> >>>>>> block,
> >>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
> >>>>>>>>>>>>>>>>>> What I am missing here?
> >>>>>>>>>>>>>>>>>> Thinking in more generic way, that
> >> change:
> >>>>>>>>>>>>>>>>>>   -		if (m->next != NULL) {
> >>>>>>>>>>>>>>>>>>   -			m->next = NULL;
> >>>>>>>>>>>>>>>>>>   -			m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>>   -		}
> >>>>>>>>>>>>>>>>>>   +		m->next = NULL;
> >>>>>>>>>>>>>>>>>>   +		m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
> >> and
> >>>> now
> >>>>>> it
> >>>>>>>> does
> >>>>>>>>>> not
> >>>>>>>>>>>>>>>>> show the issue...
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The full example also adds a split() to
> >> break
> >>>> the
> >>>>>>>> mbuf
> >>>>>>>>>> chain
> >>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
> >>>> would
> >>>>>> be
> >>>>>>>> done
> >>>>>>>>>> for
> >>>>>>>>>>>>>>>>> software TCP segmentation.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If so, may be the right solution is to care
> >>>> about
> >>>>>>>> nb_segs
> >>>>>>>>>>>>>>>> when next is set to NULL on split? Any
> >> place
> >>>> when
> >>>>>> next
> >>>>>>>> is
> >>>>>>>>>> set
> >>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
> >>>> more
> >>>>>>>> generic
> >>>>>>>>>> place.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The problem with that approach is that there
> >> are
> >>>>>> already
> >>>>>>>>>> several
> >>>>>>>>>>>>>>> existing split() or trim() implementations in
> >>>>>> different
> >>>>>>>> DPDK-
> >>>>>>>>>> based
> >>>>>>>>>>>>>>> applications. For instance, we have some in
> >>>>>> 6WINDGate. If
> >>>>>>>> we
> >>>>>>>>>> force
> >>>>>>>>>>>>>>> applications to set nb_seg to 1 when
> >> resetting
> >>>> next,
> >>>>>> it
> >>>>>>>> has
> >>>>>>>>>> to be
> >>>>>>>>>>>>>>> documented because it is not straightforward.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think it is better to go that way.
> >>>>>>>>>>>>>>  From my perspective it seems natural to reset
> >>>> nb_seg at
> >>>>>>>> same
> >>>>>>>>>> time
> >>>>>>>>>>>>>> we reset next, otherwise inconsistency will
> >> occur.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
> >> me
> >>>> it
> >>>>>> was
> >>>>>>>> clear
> >>>>>>>>>> that
> >>>>>>>>>>>>> nb_segs is only valid in the first segment, like
> >> for
> >>>> many
> >>>>>>>> fields
> >>>>>>>>>> (port,
> >>>>>>>>>>>>> ol_flags, vlan, rss, ...).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we say that nb_segs has to be valid in any
> >>>> segments,
> >>>>>> it
> >>>>>>>> means
> >>>>>>>>>> that
> >>>>>>>>>>>>> chain() or split() will have to update it in all
> >>>>>> segments,
> >>>>>>>> which
> >>>>>>>>>> is not
> >>>>>>>>>>>>> efficient.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Why in all?
> >>>>>>>>>>>> We can state that nb_segs on non-first segment
> >> should
> >>>>>> always
> >>>>>>>> equal
> >>>>>>>>>> 1.
> >>>>>>>>>>>> As I understand in that case, both split() and
> >> chain()
> >>>> have
> >>>>>> to
> >>>>>>>>>> update nb_segs
> >>>>>>>>>>>> only for head mbufs, rest ones will remain
> >> untouched.
> >>>>>>>>>>>
> >>>>>>>>>>> Well, anyway, I think it's strange to have a
> >> constraint
> >>>> on m-
> >>>>>>>>> nb_segs
> >>>>>>>>>> for
> >>>>>>>>>>> non-first segment. We don't have that kind of
> >> constraints
> >>>> for
> >>>>>>>> other
> >>>>>>>>>> fields.
> >>>>>>>>>>
> >>>>>>>>>> True, we don't. But this is one of the fields we
> >> consider
> >>>>>> critical
> >>>>>>>>>> for proper work of mbuf alloc/free mechanism.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
> >>>>>> segments
> >>>>>>>> will provide any benefits.
> >>>>>>>>
> >>>>>>>> It would make this patch unneeded.
> >>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
> >> remain
> >>>>>> write-
> >>>>>>>> free.
> >>>>>>>
> >>>>>>> I see. Then I agree with Konstantin that alternative
> >> solutions
> >>>> should
> >>>>>> be considered.
> >>>>>>>
> >>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
> >> is a
> >>>>>> very common operation - certainly outweighs the cost of
> >> requiring
> >>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
> >> 1.
> >>>>>>>
> >>>>>>> Nonetheless, the bug needs to be fixed somehow.
> >>>>>>>
> >>>>>>> If we can't come up with a better solution that doesn't break
> >> the
> >>>>>> ABI, we are forced to accept the patch.
> >>>>>>>
> >>>>>>> Unless the techboard accepts to break the ABI in order to
> >> avoid
> >>>> the
> >>>>>> performance cost of this patch.
> >>>>>>
> >>>>>> Did someone notice a performance drop with this patch?
> >>>>>> On my side, I don't see any regression on a L3 use case.
> >>>>>
> >>>>> I am afraid that the DPDK performance regression tests are based
> >> on
> >>>> TX immediately following RX, so cache misses in TX may go by
> >> unnoticed
> >>>> because RX warmed up the cache for TX already. And similarly for RX
> >>>> reusing mbufs that have been warmed up by the preceding free() at
> >> TX.
> >>>>>
> >>>>> Please consider testing the performance difference with the mbuf
> >>>> being completely cold at TX, and going completely cold again before
> >>>> being reused for RX.
> >>>>>
> >>>>>>
> >>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
> >>>> subsequent
> >>>>>> mbuf
> >>>>>> allocation to return a mbuf which is not correctly initialized.
> >>>> There
> >>>>>> are 2
> >>>>>> options to fix it:
> >>>>>>
> >>>>>> 1/ change the mbuf free function (this patch)
> >>>>>>
> >>>>>>     - m->nb_segs would behave like many other field: valid in
> >> the
> >>>> first
> >>>>>>       segment, ignored in other segments
> >>>>>>     - may impact performance (suspected)
> >>>>>>
> >>>>>> 2/ change all places where a mbuf chain is split, or trimmed
> >>>>>>
> >>>>>>     - m->nb_segs would have a specific behavior: count the
> >> number of
> >>>>>>       segments in the first mbuf, should be 1 in the last
> >> segment,
> >>>>>>       ignored in other ones.
> >>>>>>     - no code change in mbuf library, so no performance impact
> >>>>>>     - need to patch all places where we do a mbuf split or trim.
> >>>> From
> >>>>>> afar,
> >>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> >>>>>> applications
> >>>>>>       may have to be patched (for instance, I already found 3
> >> places
> >>>> in
> >>>>>>       6WIND code base without a deep search).
> >>>>>>
> >>>>>> In my opinion, 1/ is better, except we notice a significant
> >>>>>> performance,
> >>>>>> because the (implicit) behavior is unchanged.
> >>>>>>
> >>>>>> Whatever the solution, some documentation has to be added.
> >>>>>>
> >>>>>> Olivier
> >>>>>>
> >>>>>
> >>>>> Unfortunately, I don't think that anything but the first option
> >> will
> >>>> go into 20.11 and stable releases of older versions, so I stand by
> >> my
> >>>> acknowledgment of the patch.
> >>>>
> >>>> If we are affraid about 20.11 performance (it is legitimate, few
> >> days
> >>>> before the release), we can target 21.02. After all, everybody
> >> lives
> >>>> with this bug since 2017, so there is no urgency. If accepted and
> >> well
> >>>> tested, it can be backported in stable branches.
> >>>
> >>> +1
> >>>
> >>> Good thinking, Olivier!
> >>
> >> Looking at the changes once again, it probably can be reworked a bit:
> >>
> >> -	if (m->next != NULL) {
> >> -		m->next = NULL;
> >> -		m->nb_segs = 1;
> >> -	}
> >>
> >> +	if (m->next != NULL)
> >> +		m->next = NULL;
> >> +	if (m->nb_segs != 1)
> >> +		m->nb_segs = 1;
> >>
> >> That way we add one more condition checking, but I suppose it
> >> shouldn't be that perf critical.
> >> That way for direct,non-segmented mbuf it still should be write-free.
> >> Except cases as you described above: chain(), then split().
> >>
> >> Of-course we still need to do perf testing for that approach too.
> >> So if your preference it to postpone it till 21.02 - that's ok for me.
> >> Konstantin
> >
> > With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the
> mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
> 
> +1

I don't expect perf drop with that approach either.
But some perf testing still needs to be done, just in case 😊
Olivier Matz Nov. 10, 2020, 4:26 p.m. UTC | #25
On Sun, Nov 08, 2020 at 02:19:55PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > >>
> > >>>>>>>>>>>>>>>>>> Hi Olivier,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
> > >>>> whatever
> > >>>>>> the
> > >>>>>>>> value
> > >>>>>>>>>> of m->next,
> > >>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
> > >> !=
> > >>>> 1.
> > >>>>>> For
> > >>>>>>>>>> instance in this
> > >>>>>>>>>>>>>>>>>>> case:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
> > >>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
> > >>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
> > >>>> nb_seg in
> > >>>>>> the
> > >>>>>>>>>> initial m1
> > >>>>>>>>>>>>>>>>>>> segment (this is not required), after
> > >> this
> > >>>> code
> > >>>>>> the
> > >>>>>>>>>> mbuf chain
> > >>>>>>>>>>>>>>>>>>> have 3 segments:
> > >>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
> > >>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
> > >>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
> > >> restore
> > >>>>>> nb_seg=1
> > >>>>>>>> in
> > >>>>>>>>>> the second
> > >>>>>>>>>>>>>>>>>>> segment.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
> > >>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
> > >>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
> > >> *m)
> > >>>>>>>>>>>>>>>>>> {
> > >>>>>>>>>>>>>>>>>> 	...
> > >>>>>>>>>>>>>>>>>> 	if (m->next != NULL) {
> > >>>>>>>>>>>>>>>>>>                          m->next = NULL;
> > >>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>>                  }
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
> > >> if()
> > >>>>>> block,
> > >>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
> > >>>>>>>>>>>>>>>>>> What I am missing here?
> > >>>>>>>>>>>>>>>>>> Thinking in more generic way, that
> > >> change:
> > >>>>>>>>>>>>>>>>>>   -		if (m->next != NULL) {
> > >>>>>>>>>>>>>>>>>>   -			m->next = NULL;
> > >>>>>>>>>>>>>>>>>>   -			m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>>   -		}
> > >>>>>>>>>>>>>>>>>>   +		m->next = NULL;
> > >>>>>>>>>>>>>>>>>>   +		m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
> > >> and
> > >>>> now
> > >>>>>> it
> > >>>>>>>> does
> > >>>>>>>>>> not
> > >>>>>>>>>>>>>>>>> show the issue...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> The full example also adds a split() to
> > >> break
> > >>>> the
> > >>>>>>>> mbuf
> > >>>>>>>>>> chain
> > >>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
> > >>>> would
> > >>>>>> be
> > >>>>>>>> done
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> software TCP segmentation.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> If so, may be the right solution is to care
> > >>>> about
> > >>>>>>>> nb_segs
> > >>>>>>>>>>>>>>>> when next is set to NULL on split? Any
> > >> place
> > >>>> when
> > >>>>>> next
> > >>>>>>>> is
> > >>>>>>>>>> set
> > >>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
> > >>>> more
> > >>>>>>>> generic
> > >>>>>>>>>> place.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The problem with that approach is that there
> > >> are
> > >>>>>> already
> > >>>>>>>>>> several
> > >>>>>>>>>>>>>>> existing split() or trim() implementations in
> > >>>>>> different
> > >>>>>>>> DPDK-
> > >>>>>>>>>> based
> > >>>>>>>>>>>>>>> applications. For instance, we have some in
> > >>>>>> 6WINDGate. If
> > >>>>>>>> we
> > >>>>>>>>>> force
> > >>>>>>>>>>>>>>> applications to set nb_seg to 1 when
> > >> resetting
> > >>>> next,
> > >>>>>> it
> > >>>>>>>> has
> > >>>>>>>>>> to be
> > >>>>>>>>>>>>>>> documented because it is not straightforward.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think it is better to go that way.
> > >>>>>>>>>>>>>>  From my perspective it seems natural to reset
> > >>>> nb_seg at
> > >>>>>>>> same
> > >>>>>>>>>> time
> > >>>>>>>>>>>>>> we reset next, otherwise inconsistency will
> > >> occur.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
> > >> me
> > >>>> it
> > >>>>>> was
> > >>>>>>>> clear
> > >>>>>>>>>> that
> > >>>>>>>>>>>>> nb_segs is only valid in the first segment, like
> > >> for
> > >>>> many
> > >>>>>>>> fields
> > >>>>>>>>>> (port,
> > >>>>>>>>>>>>> ol_flags, vlan, rss, ...).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> If we say that nb_segs has to be valid in any
> > >>>> segments,
> > >>>>>> it
> > >>>>>>>> means
> > >>>>>>>>>> that
> > >>>>>>>>>>>>> chain() or split() will have to update it in all
> > >>>>>> segments,
> > >>>>>>>> which
> > >>>>>>>>>> is not
> > >>>>>>>>>>>>> efficient.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Why in all?
> > >>>>>>>>>>>> We can state that nb_segs on non-first segment
> > >> should
> > >>>>>> always
> > >>>>>>>> equal
> > >>>>>>>>>> 1.
> > >>>>>>>>>>>> As I understand in that case, both split() and
> > >> chain()
> > >>>> have
> > >>>>>> to
> > >>>>>>>>>> update nb_segs
> > >>>>>>>>>>>> only for head mbufs, rest ones will remain
> > >> untouched.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Well, anyway, I think it's strange to have a
> > >> constraint
> > >>>> on m-
> > >>>>>>>>> nb_segs
> > >>>>>>>>>> for
> > >>>>>>>>>>> non-first segment. We don't have that kind of
> > >> constraints
> > >>>> for
> > >>>>>>>> other
> > >>>>>>>>>> fields.
> > >>>>>>>>>>
> > >>>>>>>>>> True, we don't. But this is one of the fields we
> > >> consider
> > >>>>>> critical
> > >>>>>>>>>> for proper work of mbuf alloc/free mechanism.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
> > >>>>>> segments
> > >>>>>>>> will provide any benefits.
> > >>>>>>>>
> > >>>>>>>> It would make this patch unneeded.
> > >>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
> > >> remain
> > >>>>>> write-
> > >>>>>>>> free.
> > >>>>>>>
> > >>>>>>> I see. Then I agree with Konstantin that alternative
> > >> solutions
> > >>>> should
> > >>>>>> be considered.
> > >>>>>>>
> > >>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
> > >> is a
> > >>>>>> very common operation - certainly outweighs the cost of
> > >> requiring
> > >>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
> > >> 1.
> > >>>>>>>
> > >>>>>>> Nonetheless, the bug needs to be fixed somehow.
> > >>>>>>>
> > >>>>>>> If we can't come up with a better solution that doesn't break
> > >> the
> > >>>>>> ABI, we are forced to accept the patch.
> > >>>>>>>
> > >>>>>>> Unless the techboard accepts to break the ABI in order to
> > >> avoid
> > >>>> the
> > >>>>>> performance cost of this patch.
> > >>>>>>
> > >>>>>> Did someone notice a performance drop with this patch?
> > >>>>>> On my side, I don't see any regression on a L3 use case.
> > >>>>>
> > >>>>> I am afraid that the DPDK performance regression tests are based
> > >> on
> > >>>> TX immediately following RX, so cache misses in TX may go by
> > >> unnoticed
> > >>>> because RX warmed up the cache for TX already. And similarly for RX
> > >>>> reusing mbufs that have been warmed up by the preceding free() at
> > >> TX.
> > >>>>>
> > >>>>> Please consider testing the performance difference with the mbuf
> > >>>> being completely cold at TX, and going completely cold again before
> > >>>> being reused for RX.
> > >>>>>
> > >>>>>>
> > >>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
> > >>>> subsequent
> > >>>>>> mbuf
> > >>>>>> allocation to return a mbuf which is not correctly initialized.
> > >>>> There
> > >>>>>> are 2
> > >>>>>> options to fix it:
> > >>>>>>
> > >>>>>> 1/ change the mbuf free function (this patch)
> > >>>>>>
> > >>>>>>     - m->nb_segs would behave like many other field: valid in
> > >> the
> > >>>> first
> > >>>>>>       segment, ignored in other segments
> > >>>>>>     - may impact performance (suspected)
> > >>>>>>
> > >>>>>> 2/ change all places where a mbuf chain is split, or trimmed
> > >>>>>>
> > >>>>>>     - m->nb_segs would have a specific behavior: count the
> > >> number of
> > >>>>>>       segments in the first mbuf, should be 1 in the last
> > >> segment,
> > >>>>>>       ignored in other ones.
> > >>>>>>     - no code change in mbuf library, so no performance impact
> > >>>>>>     - need to patch all places where we do a mbuf split or trim.
> > >>>> From
> > >>>>>> afar,
> > >>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > >>>>>> applications
> > >>>>>>       may have to be patched (for instance, I already found 3
> > >> places
> > >>>> in
> > >>>>>>       6WIND code base without a deep search).
> > >>>>>>
> > >>>>>> In my opinion, 1/ is better, except we notice a significant
> > >>>>>> performance,
> > >>>>>> because the (implicit) behavior is unchanged.
> > >>>>>>
> > >>>>>> Whatever the solution, some documentation has to be added.
> > >>>>>>
> > >>>>>> Olivier
> > >>>>>>
> > >>>>>
> > >>>>> Unfortunately, I don't think that anything but the first option
> > >> will
> > >>>> go into 20.11 and stable releases of older versions, so I stand by
> > >> my
> > >>>> acknowledgment of the patch.
> > >>>>
> > >>>> If we are affraid about 20.11 performance (it is legitimate, few
> > >> days
> > >>>> before the release), we can target 21.02. After all, everybody
> > >> lives
> > >>>> with this bug since 2017, so there is no urgency. If accepted and
> > >> well
> > >>>> tested, it can be backported in stable branches.
> > >>>
> > >>> +1
> > >>>
> > >>> Good thinking, Olivier!
> > >>
> > >> Looking at the changes once again, it probably can be reworked a bit:
> > >>
> > >> -	if (m->next != NULL) {
> > >> -		m->next = NULL;
> > >> -		m->nb_segs = 1;
> > >> -	}
> > >>
> > >> +	if (m->next != NULL)
> > >> +		m->next = NULL;
> > >> +	if (m->nb_segs != 1)
> > >> +		m->nb_segs = 1;
> > >>
> > >> That way we add one more condition checking, but I suppose it
> > >> shouldn't be that perf critical.
> > >> That way for direct,non-segmented mbuf it still should be write-free.
> > >> Except cases as you described above: chain(), then split().
> > >>
> > >> Of-course we still need to do perf testing for that approach too.
> > >> So if your preference it to postpone it till 21.02 - that's ok for me.
> > >> Konstantin
> > >
> > > With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the
> > mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
> > 
> > +1
> 
> I don't expect perf drop with that approach either.
> But some perf testing still needs to be done, just in case 😊 

I also agree with your suggestion Konstantin.
Let's postpone it right after 20.11 so we have more time to test.
I'll send a v2.

Olivier

Patch
diff mbox series

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 8a456e5e64..e632071c23 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -129,10 +129,8 @@  rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
-		m->next = NULL;
-		m->nb_segs = 1;
-	}
+	m->next = NULL;
+	m->nb_segs = 1;
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a1414ed7cd..ef5800c8ef 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1329,10 +1329,8 @@  rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
-			m->next = NULL;
-			m->nb_segs = 1;
-		}
+		m->next = NULL;
+		m->nb_segs = 1;
 
 		return m;
 
@@ -1346,10 +1344,8 @@  rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
-			m->next = NULL;
-			m->nb_segs = 1;
-		}
+		m->next = NULL;
+		m->nb_segs = 1;
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;