[v4] mbuf: fix reset on mbuf free
diff mbox series

Message ID 20210113132734.1636-1-olivier.matz@6wind.com
State New
Delegated to: David Marchand
Headers show
Series
  • [v4] mbuf: fix reset on mbuf free
Related show

Checks

Context Check Description
ci/iol-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/checkpatch success coding style OK

Commit Message

Olivier Matz Jan. 13, 2021, 1:27 p.m. UTC
m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Then split this chain between m1 and m2, it would result in 2 packets:
  - first packet
    - m0: next=m1, nb_seg=2
    - m1: next=NULL, nb_seg=2
  - second packet
    - m2: next=NULL, nb_seg=1

Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

v4
* add a unit test (suggested by David)

v3
* fix commit log again (thanks Morten for spotting it)

v2
* avoid write access if uneeded (suggested by Konstantin)
* enhance comments in mbuf header file (suggested by Morten)
* fix commit log

 app/test/test_mbuf.c            | 69 +++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.c      |  4 +-
 lib/librte_mbuf/rte_mbuf.h      |  8 ++--
 lib/librte_mbuf/rte_mbuf_core.h | 13 ++++++-
 4 files changed, 86 insertions(+), 8 deletions(-)

Comments

David Marchand Jan. 15, 2021, 1:59 p.m. UTC | #1
On Wed, Jan 13, 2021 at 2:28 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
>
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
>
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
>
> Then split this chain between m1 and m2, it would result in 2 packets:
>   - first packet
>     - m0: next=m1, nb_seg=2
>     - m1: next=NULL, nb_seg=2
>   - second packet
>     - m2: next=NULL, nb_seg=1
>
> Freeing the first packet will not restore nb_seg=1 in the second
> segment. This is an issue because it is expected that mbufs stored
> in pool have their nb_seg field set to 1.
>
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>
> v4
> * add a unit test (suggested by David)

Olivier,

Thanks, for adding it.


Ali,

You reported some performance regression, did you confirm it?
If I get no reply by monday, I'll proceed with this patch.


Thanks.
Ali Alnubani Jan. 15, 2021, 6:39 p.m. UTC | #2
Hi,
Adding Ferruh and Zhaoyan,

> Ali,
> 
> You reported some performance regression, did you confirm it?
> If I get no reply by monday, I'll proceed with this patch.

Sure I'll confirm by Monday.

Doesn't the regression also reproduce on the Lab's Intel servers?
Even though the check iol-intel-Performance isn't failing, I can see that the throughput differences from expected for this patch are less than those of another patch that was tested only 20 minutes earlier. Both patches were applied to the same tree:

https://mails.dpdk.org/archives/test-report/2021-January/173927.html
> | 64         | 512     | 1.571                               |

https://mails.dpdk.org/archives/test-report/2021-January/173919.html
> | 64         | 512     | 2.698                               |

Assuming that pw86457 doesn't have an effect on this test, it looks to me that this patch caused a regression in Intel hardware as well.

Can someone update the baseline's expected values for the Intel NICs and rerun the test on this patch?

Thanks,
Ali

Patch
diff mbox series

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index a40f7d4883..ad2cbab600 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -2677,6 +2677,70 @@  test_mbuf_dyn(struct rte_mempool *pktmbuf_pool)
 	return -1;
 }
 
+/* check that m->nb_segs and m->next are reset on mbuf free */
+static int
+test_nb_segs_and_next_reset(void)
+{
+	struct rte_mbuf *m0 = NULL, *m1 = NULL, *m2 = NULL;
+	struct rte_mempool *pool = NULL;
+
+	pool = rte_pktmbuf_pool_create("test_mbuf_reset",
+			3, 0, 0, MBUF_DATA_SIZE, SOCKET_ID_ANY);
+	if (pool == NULL)
+		GOTO_FAIL("Failed to create mbuf pool");
+
+	/* alloc mbufs */
+	m0 = rte_pktmbuf_alloc(pool);
+	m1 = rte_pktmbuf_alloc(pool);
+	m2 = rte_pktmbuf_alloc(pool);
+	if (m0 == NULL || m1 == NULL || m2 == NULL)
+		GOTO_FAIL("Failed to allocate mbuf");
+
+	/* append data in all of them */
+	if (rte_pktmbuf_append(m0, 500) == NULL ||
+			rte_pktmbuf_append(m1, 500) == NULL ||
+			rte_pktmbuf_append(m2, 500) == NULL)
+		GOTO_FAIL("Failed to append data in mbuf");
+
+	/* chain them in one mbuf m0 */
+	rte_pktmbuf_chain(m1, m2);
+	rte_pktmbuf_chain(m0, m1);
+	if (m0->nb_segs != 3 || m0->next != m1 || m1->next != m2 ||
+			m2->next != NULL) {
+		m1 = m2 = NULL;
+		GOTO_FAIL("Failed to chain mbufs");
+	}
+
+	/* split m0 chain in two, between m1 and m2 */
+	m0->nb_segs = 2;
+	m1->next = NULL;
+	m2->nb_segs = 1;
+
+	/* free the 2 mbuf chains m0 and m2  */
+	rte_pktmbuf_free(m0);
+	rte_pktmbuf_free(m2);
+
+	/* realloc the 3 mbufs */
+	m0 = rte_mbuf_raw_alloc(pool);
+	m1 = rte_mbuf_raw_alloc(pool);
+	m2 = rte_mbuf_raw_alloc(pool);
+	if (m0 == NULL || m1 == NULL || m2 == NULL)
+		GOTO_FAIL("Failed to reallocate mbuf");
+
+	/* ensure that m->next and m->nb_segs are reset allocated mbufs */
+	if (m0->nb_segs != 1 || m0->next != NULL ||
+			m1->nb_segs != 1 || m1->next != NULL ||
+			m2->nb_segs != 1 || m2->next != NULL)
+		GOTO_FAIL("nb_segs or next was not reset properly");
+
+	return 0;
+
+fail:
+	if (pool != NULL)
+		rte_mempool_free(pool);
+	return -1;
+}
+
 static int
 test_mbuf(void)
 {
@@ -2867,6 +2931,11 @@  test_mbuf(void)
 		goto err;
 	}
 
+	/* test reset of m->nb_segs and m->next on mbuf free */
+	if (test_nb_segs_and_next_reset() < 0) {
+		printf("test_nb_segs_and_next_reset() failed\n");
+		goto err;
+	}
 
 	ret = 0;
 err:
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 7d09ee2939..5f77840557 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -129,10 +129,10 @@  rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
+	if (m->next != NULL)
 		m->next = NULL;
+	if (m->nb_segs != 1)
 		m->nb_segs = 1;
-	}
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c4c9ebfaa0..8c1097ed76 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1340,10 +1340,10 @@  rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 
 		return m;
 
@@ -1357,10 +1357,10 @@  rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 567551deab..78a1fcc8ff 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -495,7 +495,12 @@  struct rte_mbuf {
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
 	uint16_t refcnt;
-	uint16_t nb_segs;         /**< Number of segments. */
+
+	/**
+	 * Number of segments. Only valid for the first segment of an mbuf
+	 * chain.
+	 */
+	uint16_t nb_segs;
 
 	/** Input port (16 bits to support more than 256 virtual ports).
 	 * The event eth Tx adapter uses this field to specify the output port.
@@ -591,7 +596,11 @@  struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+	/**
+	 * Next segment of scattered packet. Must be NULL in the last segment or
+	 * in case of non-segmented packet.
+	 */
+	struct rte_mbuf *next;
 
 	/* fields to support TX offloads */
 	RTE_STD_C11