[v2] mbuf: replace GCC marker extension with C11 anonymous unions

Message ID 1707806741-29694-2-git-send-email-roretzla@linux.microsoft.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] mbuf: replace GCC marker extension with C11 anonymous unions |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build fail github build: failed
ci/intel-Functional success Functional PASS
ci/iol-abi-testing warning Testing issues
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Tyler Retzlaff Feb. 13, 2024, 6:45 a.m. UTC
  Replace the use of RTE_MARKER<x> with C11 anonymous unions to improve
code portability between toolchains.

Update use of rte_mbuf rearm_data field in net/ionic, net/sfc, net/ixgbe
and net/virtio which were accessing field as a zero-length array.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/net/ionic/ionic_lif.c               |   8 +-
 drivers/net/ionic/ionic_rxtx_sg.c           |   4 +-
 drivers/net/ionic/ionic_rxtx_simple.c       |   2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c      |   8 +-
 drivers/net/sfc/sfc_ef100_rx.c              |   8 +-
 drivers/net/sfc/sfc_ef10_rx.c               |  12 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h |   8 +-
 lib/mbuf/rte_mbuf_core.h                    | 276 ++++++++++++++++------------
 8 files changed, 179 insertions(+), 147 deletions(-)
  

Comments

Morten Brørup Feb. 13, 2024, 4:58 p.m. UTC | #1
> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Tuesday, 13 February 2024 07.46
> 
> Replace the use of RTE_MARKER<x> with C11 anonymous unions to improve
> code portability between toolchains.

How about combining the cacheline 0 marker and padding, like this:

 struct rte_mbuf {
-	RTE_MARKER cacheline0;
+	union {
+		char cacheline0[RTE_CACHE_LINE_MIN_SIZE];
 
+		struct {
-	void *buf_addr;           /**< Virtual address of segment buffer. */
+			void *buf_addr; /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF


You could do the same with the cacheline1 marker:

	/* second cache line - fields only used in slow path or on TX */
-	RTE_MARKER cacheline1 __rte_cache_min_aligned;
+	union {
+		char cacheline1[RTE_CACHE_LINE_MIN_SIZE];
 
+		struct {
 #if RTE_IOVA_IN_MBUF
-	/**
-	 * Next segment of scattered packet. Must be NULL in the last
-	 * segment or in case of non-segmented packet.
-	 */
-	struct rte_mbuf *next;
+			/**
+			 * Next segment of scattered packet. Must be NULL in the last
+			 * segment or in case of non-segmented packet.
+			 */
+			struct rte_mbuf *next;
 #else


It also avoids the weird union between cacheline0 and buf_addr at the beginning of the structure, and between cacheline1 and next/dynfield2 at the beginning of the second cache line.

And then you can omit the pad_cacheline0 array at the end of the first part of the structure.


BTW: char is a weaker type than uint8_t - i.e. it is easier to cast to another type.
It might be a personal preference, but I would use char instead of uint8_t for the padding array.
  
Tyler Retzlaff Feb. 13, 2024, 6:48 p.m. UTC | #2
On Tue, Feb 13, 2024 at 05:58:21PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Tuesday, 13 February 2024 07.46
> > 
> > Replace the use of RTE_MARKER<x> with C11 anonymous unions to improve
> > code portability between toolchains.
> 
> How about combining the cacheline 0 marker and padding, like this:

this seems like a good suggestion i will evaluate it. at least it gets
rid of all the extra nesting if there are no unforseen problems.

> 
>  struct rte_mbuf {
> -	RTE_MARKER cacheline0;
> +	union {
> +		char cacheline0[RTE_CACHE_LINE_MIN_SIZE];
>  
> +		struct {
> -	void *buf_addr;           /**< Virtual address of segment buffer. */
> +			void *buf_addr; /**< Virtual address of segment buffer. */
>  #if RTE_IOVA_IN_MBUF
> 
> 
> You could do the same with the cacheline1 marker:

yeah, i wondered if i should. i'll do it since it does seem more
consistent to just pad out both cachelines explicitly instead of just
doing all but the last.

we probably don't need to align struct rte_mbuf type if we do since it
will cause it to be naturally aligned to RTE_CACHE_LINE_MIN_SIZE.

> 
> 	/* second cache line - fields only used in slow path or on TX */
> -	RTE_MARKER cacheline1 __rte_cache_min_aligned;
> +	union {
> +		char cacheline1[RTE_CACHE_LINE_MIN_SIZE];
>  
> +		struct {
>  #if RTE_IOVA_IN_MBUF
> -	/**
> -	 * Next segment of scattered packet. Must be NULL in the last
> -	 * segment or in case of non-segmented packet.
> -	 */
> -	struct rte_mbuf *next;
> +			/**
> +			 * Next segment of scattered packet. Must be NULL in the last
> +			 * segment or in case of non-segmented packet.
> +			 */
> +			struct rte_mbuf *next;
>  #else
> 
> 
> It also avoids the weird union between cacheline0 and buf_addr at the beginning of the structure, and between cacheline1 and next/dynfield2 at the beginning of the second cache line.
> 
> And then you can omit the pad_cacheline0 array at the end of the first part of the structure.
> 
> 
> BTW: char is a weaker type than uint8_t - i.e. it is easier to cast to another type.
> It might be a personal preference, but I would use char instead of uint8_t for the padding array.

noted, i'll make the change.

thanks!
  
Morten Brørup Feb. 13, 2024, 7:27 p.m. UTC | #3
> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Tuesday, 13 February 2024 19.48
> 
> On Tue, Feb 13, 2024 at 05:58:21PM +0100, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Tuesday, 13 February 2024 07.46
> > >
> > > Replace the use of RTE_MARKER<x> with C11 anonymous unions to
> improve
> > > code portability between toolchains.
> >
> > How about combining the cacheline 0 marker and padding, like this:
> 
> this seems like a good suggestion i will evaluate it. at least it gets
> rid of all the extra nesting if there are no unforseen problems.
> 
> >
> >  struct rte_mbuf {
> > -	RTE_MARKER cacheline0;
> > +	union {
> > +		char cacheline0[RTE_CACHE_LINE_MIN_SIZE];
> >
> > +		struct {
> > -	void *buf_addr;           /**< Virtual address of segment buffer.
> */
> > +			void *buf_addr; /**< Virtual address of segment
> buffer. */
> >  #if RTE_IOVA_IN_MBUF
> >
> >
> > You could do the same with the cacheline1 marker:
> 
> yeah, i wondered if i should. i'll do it since it does seem more
> consistent to just pad out both cachelines explicitly instead of just
> doing all but the last.
> 
> we probably don't need to align struct rte_mbuf type if we do since it
> will cause it to be naturally aligned to RTE_CACHE_LINE_MIN_SIZE.

We still need to align struct rte_mbuf to cache line size.
RTE_CACHE_LINE_MIN_SIZE is 64, like the cache line size on Intel arch,
but cache line size is 128 byte on POWER architecture and Apple M2.
  
Tyler Retzlaff Feb. 13, 2024, 8 p.m. UTC | #4
On Tue, Feb 13, 2024 at 08:27:52PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Tuesday, 13 February 2024 19.48
> > 
> > On Tue, Feb 13, 2024 at 05:58:21PM +0100, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Tuesday, 13 February 2024 07.46
> > > >
> > > > Replace the use of RTE_MARKER<x> with C11 anonymous unions to
> > improve
> > > > code portability between toolchains.
> > >
> > > How about combining the cacheline 0 marker and padding, like this:
> > 
> > this seems like a good suggestion i will evaluate it. at least it gets
> > rid of all the extra nesting if there are no unforseen problems.
> > 
> > >
> > >  struct rte_mbuf {
> > > -	RTE_MARKER cacheline0;
> > > +	union {
> > > +		char cacheline0[RTE_CACHE_LINE_MIN_SIZE];
> > >
> > > +		struct {
> > > -	void *buf_addr;           /**< Virtual address of segment buffer.
> > */
> > > +			void *buf_addr; /**< Virtual address of segment
> > buffer. */
> > >  #if RTE_IOVA_IN_MBUF
> > >
> > >
> > > You could do the same with the cacheline1 marker:
> > 
> > yeah, i wondered if i should. i'll do it since it does seem more
> > consistent to just pad out both cachelines explicitly instead of just
> > doing all but the last.
> > 
> > we probably don't need to align struct rte_mbuf type if we do since it
> > will cause it to be naturally aligned to RTE_CACHE_LINE_MIN_SIZE.
> 
> We still need to align struct rte_mbuf to cache line size.
> RTE_CACHE_LINE_MIN_SIZE is 64, like the cache line size on Intel arch,
> but cache line size is 128 byte on POWER architecture and Apple M2.

RTE_CACHE_LINE_SIZE vs RTE_CACHE_LINE_MIN_SIZE forgot about that.
  

Patch

diff --git a/drivers/net/ionic/ionic_lif.c b/drivers/net/ionic/ionic_lif.c
index 25b490d..fd99f39 100644
--- a/drivers/net/ionic/ionic_lif.c
+++ b/drivers/net/ionic/ionic_lif.c
@@ -725,8 +725,8 @@ 
 
 	rte_compiler_barrier();
 
-	RTE_BUILD_BUG_ON(sizeof(rxm.rearm_data[0]) != sizeof(uint64_t));
-	return rxm.rearm_data[0];
+	RTE_BUILD_BUG_ON(sizeof(rxm.rearm_data) != sizeof(uint64_t));
+	return rxm.rearm_data;
 }
 
 static uint64_t
@@ -743,8 +743,8 @@ 
 
 	rte_compiler_barrier();
 
-	RTE_BUILD_BUG_ON(sizeof(rxm.rearm_data[0]) != sizeof(uint64_t));
-	return rxm.rearm_data[0];
+	RTE_BUILD_BUG_ON(sizeof(rxm.rearm_data) != sizeof(uint64_t));
+	return rxm.rearm_data;
 }
 
 int
diff --git a/drivers/net/ionic/ionic_rxtx_sg.c b/drivers/net/ionic/ionic_rxtx_sg.c
index ab8e56e..a569dd1 100644
--- a/drivers/net/ionic/ionic_rxtx_sg.c
+++ b/drivers/net/ionic/ionic_rxtx_sg.c
@@ -285,7 +285,7 @@ 
 	info[0] = NULL;
 
 	/* Set the mbuf metadata based on the cq entry */
-	rxm->rearm_data[0] = rxq->rearm_data;
+	rxm->rearm_data = rxq->rearm_data;
 	rxm->pkt_len = cq_desc_len;
 	rxm->data_len = RTE_MIN(rxq->hdr_seg_size, cq_desc_len);
 	left = cq_desc_len - rxm->data_len;
@@ -298,7 +298,7 @@ 
 		info[i] = NULL;
 
 		/* Set the chained mbuf metadata */
-		rxm_seg->rearm_data[0] = rxq->rearm_seg_data;
+		rxm_seg->rearm_data = rxq->rearm_seg_data;
 		rxm_seg->data_len = RTE_MIN(rxq->seg_size, left);
 		left -= rxm_seg->data_len;
 
diff --git a/drivers/net/ionic/ionic_rxtx_simple.c b/drivers/net/ionic/ionic_rxtx_simple.c
index 5f81856..1978610 100644
--- a/drivers/net/ionic/ionic_rxtx_simple.c
+++ b/drivers/net/ionic/ionic_rxtx_simple.c
@@ -256,7 +256,7 @@ 
 	info[0] = NULL;
 
 	/* Set the mbuf metadata based on the cq entry */
-	rxm->rearm_data[0] = rxq->rearm_data;
+	rxm->rearm_data = rxq->rearm_data;
 	rxm->pkt_len = cq_desc_len;
 	rxm->data_len = cq_desc_len;
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index f60808d..bc0525b 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -98,10 +98,10 @@ 
 desc_to_olflags_v_ipsec(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
 	__m128i sterr, rearm, tmp_e, tmp_p;
-	uint32_t *rearm0 = (uint32_t *)rx_pkts[0]->rearm_data + 2;
-	uint32_t *rearm1 = (uint32_t *)rx_pkts[1]->rearm_data + 2;
-	uint32_t *rearm2 = (uint32_t *)rx_pkts[2]->rearm_data + 2;
-	uint32_t *rearm3 = (uint32_t *)rx_pkts[3]->rearm_data + 2;
+	uint32_t *rearm0 = (uint32_t *)&rx_pkts[0]->rearm_data + 2;
+	uint32_t *rearm1 = (uint32_t *)&rx_pkts[1]->rearm_data + 2;
+	uint32_t *rearm2 = (uint32_t *)&rx_pkts[2]->rearm_data + 2;
+	uint32_t *rearm3 = (uint32_t *)&rx_pkts[3]->rearm_data + 2;
 	const __m128i ipsec_sterr_msk =
 			_mm_set1_epi32(IXGBE_RXDADV_IPSEC_STATUS_SECP |
 				       IXGBE_RXDADV_IPSEC_ERROR_AUTH_FAILED);
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index 2677003..23918d5 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -553,9 +553,9 @@  struct sfc_ef100_rxq {
 		pkt = sfc_ef100_rx_next_mbuf(rxq);
 		__rte_mbuf_raw_sanity_check(pkt);
 
-		RTE_BUILD_BUG_ON(sizeof(pkt->rearm_data[0]) !=
+		RTE_BUILD_BUG_ON(sizeof(pkt->rearm_data) !=
 				 sizeof(rxq->rearm_data));
-		pkt->rearm_data[0] = rxq->rearm_data;
+		pkt->rearm_data = rxq->rearm_data;
 
 		/* data_off already moved past Rx prefix */
 		rx_prefix = (const efx_xword_t *)sfc_ef100_rx_pkt_prefix(pkt);
@@ -759,8 +759,8 @@  struct sfc_ef100_rxq {
 
 	/* rearm_data covers structure members filled in above */
 	rte_compiler_barrier();
-	RTE_BUILD_BUG_ON(sizeof(m.rearm_data[0]) != sizeof(uint64_t));
-	return m.rearm_data[0];
+	RTE_BUILD_BUG_ON(sizeof(m.rearm_data) != sizeof(uint64_t));
+	return m.rearm_data;
 }
 
 static sfc_dp_rx_qcreate_t sfc_ef100_rx_qcreate;
diff --git a/drivers/net/sfc/sfc_ef10_rx.c b/drivers/net/sfc/sfc_ef10_rx.c
index 30a320d..60bc098 100644
--- a/drivers/net/sfc/sfc_ef10_rx.c
+++ b/drivers/net/sfc/sfc_ef10_rx.c
@@ -322,8 +322,8 @@  struct sfc_ef10_rxq {
 
 	m = rxd->mbuf;
 
-	RTE_BUILD_BUG_ON(sizeof(m->rearm_data[0]) != sizeof(rxq->rearm_data));
-	m->rearm_data[0] = rxq->rearm_data;
+	RTE_BUILD_BUG_ON(sizeof(m->rearm_data) != sizeof(rxq->rearm_data));
+	m->rearm_data = rxq->rearm_data;
 
 	/* Classify packet based on Rx event */
 	/* Mask RSS hash offload flag if RSS is not enabled */
@@ -377,9 +377,9 @@  struct sfc_ef10_rxq {
 			rxq->completed = pending;
 		}
 
-		RTE_BUILD_BUG_ON(sizeof(m->rearm_data[0]) !=
+		RTE_BUILD_BUG_ON(sizeof(m->rearm_data) !=
 				 sizeof(rxq->rearm_data));
-		m->rearm_data[0] = rxq->rearm_data;
+		m->rearm_data = rxq->rearm_data;
 
 		/* Event-dependent information is the same */
 		m->ol_flags = m0->ol_flags;
@@ -633,8 +633,8 @@  struct sfc_ef10_rxq {
 
 	/* rearm_data covers structure members filled in above */
 	rte_compiler_barrier();
-	RTE_BUILD_BUG_ON(sizeof(m.rearm_data[0]) != sizeof(uint64_t));
-	return m.rearm_data[0];
+	RTE_BUILD_BUG_ON(sizeof(m.rearm_data) != sizeof(uint64_t));
+	return m.rearm_data;
 }
 
 static sfc_dp_rx_qcreate_t sfc_ef10_rx_qcreate;
diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.h b/drivers/net/virtio/virtio_rxtx_packed_avx.h
index 584ac72..a9ce53f 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_avx.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_avx.h
@@ -36,10 +36,10 @@ 
 	/* Load four mbufs rearm data */
 	RTE_BUILD_BUG_ON(REFCNT_BITS_OFFSET >= 64);
 	RTE_BUILD_BUG_ON(SEG_NUM_BITS_OFFSET >= 64);
-	__m256i mbufs = _mm256_set_epi64x(*tx_pkts[3]->rearm_data,
-					  *tx_pkts[2]->rearm_data,
-					  *tx_pkts[1]->rearm_data,
-					  *tx_pkts[0]->rearm_data);
+	__m256i mbufs = _mm256_set_epi64x(tx_pkts[3]->rearm_data,
+					  tx_pkts[2]->rearm_data,
+					  tx_pkts[1]->rearm_data,
+					  tx_pkts[0]->rearm_data);
 
 	/* refcnt=1 and nb_segs=1 */
 	__m256i mbuf_ref = _mm256_set1_epi64x(DEFAULT_REARM_DATA);
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 5688683..3867c19 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -464,152 +464,179 @@  enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct rte_mbuf {
-	RTE_MARKER cacheline0;
-
-	void *buf_addr;           /**< Virtual address of segment buffer. */
+	union {
+		struct {
+			union {
+				void *cacheline0;
+				void *buf_addr; /**< Virtual address of segment buffer. */
+			};
 #if RTE_IOVA_IN_MBUF
-	/**
-	 * Physical address of segment buffer.
-	 * This field is undefined if the build is configured to use only
-	 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
-	 * Force alignment to 8-bytes, so as to ensure we have the exact
-	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
-	 * working on vector drivers easier.
-	 */
-	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
+			/**
+			 * Physical address of segment buffer.
+			 * This field is undefined if the build is configured to use only
+			 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
+			 * Force alignment to 8-bytes, so as to ensure we have the exact
+			 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+			 * working on vector drivers easier.
+			 */
+			rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
 #else
-	/**
-	 * Next segment of scattered packet.
-	 * This field is valid when physical address field is undefined.
-	 * Otherwise next pointer in the second cache line will be used.
-	 */
-	struct rte_mbuf *next;
+			/**
+			 * Next segment of scattered packet.
+			 * This field is valid when physical address field is undefined.
+			 * Otherwise next pointer in the second cache line will be used.
+			 */
+			struct rte_mbuf *next;
 #endif
 
-	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
-
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
-
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
-
-	uint64_t ol_flags;        /**< Offload features. */
+			/* next 8 bytes are initialised on RX descriptor rearm */
+			union {
+				uint64_t rearm_data;
+				struct {
+					uint16_t data_off;
+
+					/**
+					 * Reference counter. Its size should at least equal to
+					 * the size of port field (16 bits), to support zero-copy
+					 * broadcast.
+					 * It should only be accessed using the following
+					 * functions: rte_mbuf_refcnt_update(),
+					 * rte_mbuf_refcnt_read(), and rte_mbuf_refcnt_set(). The
+					 * functionality of these functions (atomic, or non-atomic)
+					 * is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+					 */
+					RTE_ATOMIC(uint16_t) refcnt;
+
+					/**
+					 * Number of segments. Only valid for the first segment of
+					 * an mbuf chain.
+					 */
+					uint16_t nb_segs;
+
+					/** Input port (16 bits to support more than 256 virtual
+					 * ports). The event eth Tx adapter uses this field to
+					 * specify the output port.
+					 */
+					uint16_t port;
+				};
+			};
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
+			uint64_t ol_flags;        /**< Offload features. */
 
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
-	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
-		__extension__
-		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/* remaining bytes are set on RX when pulling packet from descriptor */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
+				void *rx_descriptor_fields1;
+
+				/*
+				 * The packet type, which is the combination of outer/inner L2, L3,
+				 * L4 and tunnel types. The packet_type is about data really
+				 * present in the mbuf. Example: if vlan stripping is enabled, a
+				 * received vlan packet would have RTE_PTYPE_L2_ETHER and not
+				 * RTE_PTYPE_L2_VLAN because the vlan is stripped from the data.
 				 */
-				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					union {
+						/** < L2/L3/L4 and tunnel information. */
+						uint32_t packet_type;
+						__extension__
+						struct {
+							/**< (Outer) L2 type. */
+							uint8_t l2_type:4;
+							/**< (Outer) L3 type. */
+							uint8_t l3_type:4;
+							/**< (Outer) L4 type. */
+							uint8_t l4_type:4;
+							/**< Tunnel type. */
+							uint8_t tun_type:4;
+							union {
+								uint8_t inner_esp_next_proto;
+								/**< ESP next protocol type, valid
+								 * if RTE_PTYPE_TUNNEL_ESP tunnel
+								 * type is set on both Tx and Rx.
+								 */
+								__extension__
+								struct {
+									uint8_t inner_l2_type:4;
+									/**< Inner L2 type. */
+									uint8_t inner_l3_type:4;
+									/**< Inner L3 type. */
+								};
+							};
+							/**< Inner L4 type. */
+							uint8_t inner_l4_type:4;
+						};
+					};
+					/**< Total pkt len: sum of all segments. */
+					uint32_t pkt_len;
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							uint32_t lo;
+							/**< Second 4 flexible bytes */
+						};
+						uint32_t hi;
+						/**< First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					struct rte_mbuf_sched sched;
+					/**< Hierarchical scheduler : 8 bytes */
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						uint16_t txq;
+						/**< The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					uint32_t usr;
+					/**< User defined tags. See rte_distributor_process() */
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
 
-	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+			struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+
+		};
+		uint8_t pad_cacheline0[RTE_CACHE_LINE_MIN_SIZE];
+	}; /* cacheline0 */
 
 	/* second cache line - fields only used in slow path or on TX */
-	RTE_MARKER cacheline1 __rte_cache_min_aligned;
+	union {
+		void *cacheline1;
 
 #if RTE_IOVA_IN_MBUF
-	/**
-	 * Next segment of scattered packet. Must be NULL in the last
-	 * segment or in case of non-segmented packet.
-	 */
-	struct rte_mbuf *next;
+		/**
+		 * Next segment of scattered packet. Must be NULL in the last
+		 * segment or in case of non-segmented packet.
+		 */
+		struct rte_mbuf *next;
 #else
-	/**
-	 * Reserved for dynamic fields
-	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
-	 */
-	uint64_t dynfield2;
+		/**
+		 * Reserved for dynamic fields
+		 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
+		 */
+		uint64_t dynfield2;
 #endif
+	};
 
 	/* fields to support TX offloads */
 	union {
@@ -664,6 +691,11 @@  struct rte_mbuf {
 	uint32_t dynfield1[9]; /**< Reserved for dynamic fields. */
 } __rte_cache_aligned;
 
+static_assert(offsetof(struct rte_mbuf, cacheline1) == RTE_CACHE_LINE_MIN_SIZE,
+	"offsetof cacheline1");
+static_assert(sizeof(struct rte_mbuf) == RTE_CACHE_LINE_MIN_SIZE * 2,
+	"sizeof struct rte_mbuf");
+
 /**
  * Function typedef of callback to free externally attached buffer.
  */