[v4,1/3] net/iavf: fix segment fault in AVX512

Message ID 1617937317-130223-2-git-send-email-wenzhuo.lu@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Qi Zhang
Headers
Series fix segment fault in avx512 code |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Wenzhuo Lu April 9, 2021, 3:01 a.m. UTC
  Fix segment fault when failing to get the memory from the pool.

Fixes: 31737f2b66fb ("net/iavf: enable AVX512 for legacy Rx")
Cc: stable@dpdk.org

Reported-by: David Coyle <David.Coyle@intel.com>
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 120 +------------------
 drivers/net/iavf/iavf_rxtx_vec_avx512.c |   5 +-
 drivers/net/iavf/iavf_rxtx_vec_common.h | 203 ++++++++++++++++++++++++++++++++
 3 files changed, 209 insertions(+), 119 deletions(-)
  

Comments

Ferruh Yigit April 13, 2021, 12:37 p.m. UTC | #1
On 4/9/2021 4:01 AM, Wenzhuo Lu wrote:
> Fix segment fault when failing to get the memory from the pool.
> 

Can be good to clarify there is no change in the moved code, it is not possible 
to recognize this from patch without using a compare app.

> Fixes: 31737f2b66fb ("net/iavf: enable AVX512 for legacy Rx")
> Cc: stable@dpdk.org
> 
> Reported-by: David Coyle <David.Coyle@intel.com>
> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> ---
>   drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 120 +------------------
>   drivers/net/iavf/iavf_rxtx_vec_avx512.c |   5 +-
>   drivers/net/iavf/iavf_rxtx_vec_common.h | 203 ++++++++++++++++++++++++++++++++

The common vector code seems moved to 'iavf_rxtx_vec_common.h' but that header 
is included by 'iavf_rxtx_vec_sse.c' too, and the moved function has AVX2 code 
in it.
Won't this fail to build if the AVX2 is not enabled?

Bruce, is there an easy way to test this, meson detects the AVX2 support even I 
provide c_args march that doesn't have AVX2 support.

<...>

> index 46a1873..57b4381 100644
> --- a/drivers/net/iavf/iavf_rxtx_vec_common.h
> +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h
> @@ -11,6 +11,10 @@
>   #include "iavf.h"
>   #include "iavf_rxtx.h"
>   
> +#ifndef __INTEL_COMPILER
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +#endif
> +

Is this pragma needed or carried here to be sure?

>   static inline uint16_t
>   reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs,
>   		   uint16_t nb_bufs, uint8_t *split_flags)
> @@ -276,4 +280,203 @@
>   	return 0;
>   }
>   
> +#ifdef RTE_ARCH_X86
> +static __rte_always_inline void
> +iavf_rxq_rearm_cmn(struct iavf_rx_queue *rxq, __rte_unused bool avx512)

What do you think expand 'cmn' to full 'common', it is clear from this patch 
what it stands for but later if you just look this function it is not that clear 
if it is an abbreviation for something else or common.
  
Wenzhuo Lu April 14, 2021, 1:18 a.m. UTC | #2
Hi Ferruh,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Tuesday, April 13, 2021 8:37 PM
> To: Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Richardson,
> Bruce <bruce.richardson@intel.com>
> Cc: stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 1/3] net/iavf: fix segment fault in AVX512
> 
> On 4/9/2021 4:01 AM, Wenzhuo Lu wrote:
> > Fix segment fault when failing to get the memory from the pool.
> >
> 
> Can be good to clarify there is no change in the moved code, it is not possible
> to recognize this from patch without using a compare app.
Sure. Will add more description here.

> 
> > Fixes: 31737f2b66fb ("net/iavf: enable AVX512 for legacy Rx")
> > Cc: stable@dpdk.org
> >
> > Reported-by: David Coyle <David.Coyle@intel.com>
> > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> > ---
> >   drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 120 +------------------
> >   drivers/net/iavf/iavf_rxtx_vec_avx512.c |   5 +-
> >   drivers/net/iavf/iavf_rxtx_vec_common.h | 203
> > ++++++++++++++++++++++++++++++++
> 
> The common vector code seems moved to 'iavf_rxtx_vec_common.h' but
> that header is included by 'iavf_rxtx_vec_sse.c' too, and the moved function
> has AVX2 code in it.
> Won't this fail to build if the AVX2 is not enabled?
Agree. There may be a compile error. I'll change the ' RTE_ARCH_X86 ' to ' CC_AVX2_SUPPORT ' to make the code only for avx2 and avx512.

> 
> Bruce, is there an easy way to test this, meson detects the AVX2 support
> even I provide c_args march that doesn't have AVX2 support.
> 
> <...>
> 
> > index 46a1873..57b4381 100644
> > --- a/drivers/net/iavf/iavf_rxtx_vec_common.h
> > +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h
> > @@ -11,6 +11,10 @@
> >   #include "iavf.h"
> >   #include "iavf_rxtx.h"
> >
> > +#ifndef __INTEL_COMPILER
> > +#pragma GCC diagnostic ignored "-Wcast-qual"
> > +#endif
> > +
> 
> Is this pragma needed or carried here to be sure?
It's necessary for a compile warning. Just leveraged from the existing code.

> 
> >   static inline uint16_t
> >   reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs,
> >   		   uint16_t nb_bufs, uint8_t *split_flags) @@ -276,4 +280,203
> @@
> >   	return 0;
> >   }
> >
> > +#ifdef RTE_ARCH_X86
> > +static __rte_always_inline void
> > +iavf_rxq_rearm_cmn(struct iavf_rx_queue *rxq, __rte_unused bool
> > +avx512)
> 
> What do you think expand 'cmn' to full 'common', it is clear from this patch
> what it stands for but later if you just look this function it is not that clear if it
> is an abbreviation for something else or common.
Sure. Will change it to 'common'.
  

Patch

diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
index cdb5139..2c2b139 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
@@ -10,126 +10,10 @@ 
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
 
-static inline void
+static __rte_always_inline void
 iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 {
-	int i;
-	uint16_t rx_id;
-	volatile union iavf_rx_desc *rxdp;
-	struct rte_mbuf **rxp = &rxq->sw_ring[rxq->rxrearm_start];
-
-	rxdp = rxq->rx_ring + rxq->rxrearm_start;
-
-	/* Pull 'n' more MBUFs into the software ring */
-	if (rte_mempool_get_bulk(rxq->mp,
-				 (void *)rxp,
-				 IAVF_RXQ_REARM_THRESH) < 0) {
-		if (rxq->rxrearm_nb + IAVF_RXQ_REARM_THRESH >=
-		    rxq->nb_rx_desc) {
-			__m128i dma_addr0;
-
-			dma_addr0 = _mm_setzero_si128();
-			for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) {
-				rxp[i] = &rxq->fake_mbuf;
-				_mm_store_si128((__m128i *)&rxdp[i].read,
-						dma_addr0);
-			}
-		}
-		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
-			IAVF_RXQ_REARM_THRESH;
-		return;
-	}
-
-#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
-	struct rte_mbuf *mb0, *mb1;
-	__m128i dma_addr0, dma_addr1;
-	__m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM,
-			RTE_PKTMBUF_HEADROOM);
-	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
-	for (i = 0; i < IAVF_RXQ_REARM_THRESH; i += 2, rxp += 2) {
-		__m128i vaddr0, vaddr1;
-
-		mb0 = rxp[0];
-		mb1 = rxp[1];
-
-		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
-		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
-				offsetof(struct rte_mbuf, buf_addr) + 8);
-		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
-		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
-
-		/* convert pa to dma_addr hdr/data */
-		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
-		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
-
-		/* add headroom to pa values */
-		dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
-		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
-
-		/* flush desc with pa dma_addr */
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
-		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
-	}
-#else
-	struct rte_mbuf *mb0, *mb1, *mb2, *mb3;
-	__m256i dma_addr0_1, dma_addr2_3;
-	__m256i hdr_room = _mm256_set1_epi64x(RTE_PKTMBUF_HEADROOM);
-	/* Initialize the mbufs in vector, process 4 mbufs in one loop */
-	for (i = 0; i < IAVF_RXQ_REARM_THRESH;
-			i += 4, rxp += 4, rxdp += 4) {
-		__m128i vaddr0, vaddr1, vaddr2, vaddr3;
-		__m256i vaddr0_1, vaddr2_3;
-
-		mb0 = rxp[0];
-		mb1 = rxp[1];
-		mb2 = rxp[2];
-		mb3 = rxp[3];
-
-		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
-		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
-				offsetof(struct rte_mbuf, buf_addr) + 8);
-		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
-		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
-		vaddr2 = _mm_loadu_si128((__m128i *)&mb2->buf_addr);
-		vaddr3 = _mm_loadu_si128((__m128i *)&mb3->buf_addr);
-
-		/**
-		 * merge 0 & 1, by casting 0 to 256-bit and inserting 1
-		 * into the high lanes. Similarly for 2 & 3
-		 */
-		vaddr0_1 =
-			_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0),
-						vaddr1, 1);
-		vaddr2_3 =
-			_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2),
-						vaddr3, 1);
-
-		/* convert pa to dma_addr hdr/data */
-		dma_addr0_1 = _mm256_unpackhi_epi64(vaddr0_1, vaddr0_1);
-		dma_addr2_3 = _mm256_unpackhi_epi64(vaddr2_3, vaddr2_3);
-
-		/* add headroom to pa values */
-		dma_addr0_1 = _mm256_add_epi64(dma_addr0_1, hdr_room);
-		dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room);
-
-		/* flush desc with pa dma_addr */
-		_mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1);
-		_mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3);
-	}
-
-#endif
-
-	rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH;
-	if (rxq->rxrearm_start >= rxq->nb_rx_desc)
-		rxq->rxrearm_start = 0;
-
-	rxq->rxrearm_nb -= IAVF_RXQ_REARM_THRESH;
-
-	rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
-			     (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
-
-	/* Update the tail pointer on the NIC */
-	IAVF_PCI_REG_WRITE(rxq->qrx_tail, rx_id);
+	return iavf_rxq_rearm_cmn(rxq, false);
 }
 
 #define PKTLEN_SHIFT     10
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
index 67184ae..2927a7c 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
@@ -13,7 +13,7 @@ 
 #define IAVF_DESCS_PER_LOOP_AVX 8
 #define PKTLEN_SHIFT 10
 
-static inline void
+static __rte_always_inline void
 iavf_rxq_rearm(struct iavf_rx_queue *rxq)
 {
 	int i;
@@ -25,6 +25,9 @@ 
 
 	rxdp = rxq->rx_ring + rxq->rxrearm_start;
 
+	if (unlikely(!cache))
+		return iavf_rxq_rearm_cmn(rxq, true);
+
 	/* We need to pull 'n' more MBUFs into the software ring from mempool
 	 * We inline the mempool function here, so we can vectorize the copy
 	 * from the cache into the shadow ring.
diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h
index 46a1873..57b4381 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/iavf/iavf_rxtx_vec_common.h
@@ -11,6 +11,10 @@ 
 #include "iavf.h"
 #include "iavf_rxtx.h"
 
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
+
 static inline uint16_t
 reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs,
 		   uint16_t nb_bufs, uint8_t *split_flags)
@@ -276,4 +280,203 @@ 
 	return 0;
 }
 
+#ifdef RTE_ARCH_X86
+static __rte_always_inline void
+iavf_rxq_rearm_cmn(struct iavf_rx_queue *rxq, __rte_unused bool avx512)
+{
+	int i;
+	uint16_t rx_id;
+	volatile union iavf_rx_desc *rxdp;
+	struct rte_mbuf **rxp = &rxq->sw_ring[rxq->rxrearm_start];
+
+	rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+	/* Pull 'n' more MBUFs into the software ring */
+	if (rte_mempool_get_bulk(rxq->mp,
+				 (void *)rxp,
+				 IAVF_RXQ_REARM_THRESH) < 0) {
+		if (rxq->rxrearm_nb + IAVF_RXQ_REARM_THRESH >=
+		    rxq->nb_rx_desc) {
+			__m128i dma_addr0;
+
+			dma_addr0 = _mm_setzero_si128();
+			for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) {
+				rxp[i] = &rxq->fake_mbuf;
+				_mm_store_si128((__m128i *)&rxdp[i].read,
+						dma_addr0);
+			}
+		}
+		rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+			IAVF_RXQ_REARM_THRESH;
+		return;
+	}
+
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
+	struct rte_mbuf *mb0, *mb1;
+	__m128i dma_addr0, dma_addr1;
+	__m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM,
+			RTE_PKTMBUF_HEADROOM);
+	/* Initialize the mbufs in vector, process 2 mbufs in one loop */
+	for (i = 0; i < IAVF_RXQ_REARM_THRESH; i += 2, rxp += 2) {
+		__m128i vaddr0, vaddr1;
+
+		mb0 = rxp[0];
+		mb1 = rxp[1];
+
+		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
+		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
+				offsetof(struct rte_mbuf, buf_addr) + 8);
+		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
+		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
+
+		/* convert pa to dma_addr hdr/data */
+		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+		/* add headroom to pa values */
+		dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
+		dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
+
+		/* flush desc with pa dma_addr */
+		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
+		_mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
+	}
+#else
+#ifdef CC_AVX512_SUPPORT
+	if (avx512) {
+		struct rte_mbuf *mb0, *mb1, *mb2, *mb3;
+		struct rte_mbuf *mb4, *mb5, *mb6, *mb7;
+		__m512i dma_addr0_3, dma_addr4_7;
+		__m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM);
+		/* Initialize the mbufs in vector, process 8 mbufs in one loop */
+		for (i = 0; i < IAVF_RXQ_REARM_THRESH;
+				i += 8, rxp += 8, rxdp += 8) {
+			__m128i vaddr0, vaddr1, vaddr2, vaddr3;
+			__m128i vaddr4, vaddr5, vaddr6, vaddr7;
+			__m256i vaddr0_1, vaddr2_3;
+			__m256i vaddr4_5, vaddr6_7;
+			__m512i vaddr0_3, vaddr4_7;
+
+			mb0 = rxp[0];
+			mb1 = rxp[1];
+			mb2 = rxp[2];
+			mb3 = rxp[3];
+			mb4 = rxp[4];
+			mb5 = rxp[5];
+			mb6 = rxp[6];
+			mb7 = rxp[7];
+
+			/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
+			RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
+					offsetof(struct rte_mbuf, buf_addr) + 8);
+			vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
+			vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
+			vaddr2 = _mm_loadu_si128((__m128i *)&mb2->buf_addr);
+			vaddr3 = _mm_loadu_si128((__m128i *)&mb3->buf_addr);
+			vaddr4 = _mm_loadu_si128((__m128i *)&mb4->buf_addr);
+			vaddr5 = _mm_loadu_si128((__m128i *)&mb5->buf_addr);
+			vaddr6 = _mm_loadu_si128((__m128i *)&mb6->buf_addr);
+			vaddr7 = _mm_loadu_si128((__m128i *)&mb7->buf_addr);
+
+			/**
+			 * merge 0 & 1, by casting 0 to 256-bit and inserting 1
+			 * into the high lanes. Similarly for 2 & 3, and so on.
+			 */
+			vaddr0_1 =
+				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0),
+							vaddr1, 1);
+			vaddr2_3 =
+				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2),
+							vaddr3, 1);
+			vaddr4_5 =
+				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr4),
+							vaddr5, 1);
+			vaddr6_7 =
+				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr6),
+							vaddr7, 1);
+			vaddr0_3 =
+				_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
+							vaddr2_3, 1);
+			vaddr4_7 =
+				_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
+							vaddr6_7, 1);
+
+			/* convert pa to dma_addr hdr/data */
+			dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3, vaddr0_3);
+			dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7, vaddr4_7);
+
+			/* add headroom to pa values */
+			dma_addr0_3 = _mm512_add_epi64(dma_addr0_3, hdr_room);
+			dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
+
+			/* flush desc with pa dma_addr */
+			_mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3);
+			_mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7);
+		}
+	} else
+#endif
+	{
+		struct rte_mbuf *mb0, *mb1, *mb2, *mb3;
+		__m256i dma_addr0_1, dma_addr2_3;
+		__m256i hdr_room = _mm256_set1_epi64x(RTE_PKTMBUF_HEADROOM);
+		/* Initialize the mbufs in vector, process 4 mbufs in one loop */
+		for (i = 0; i < IAVF_RXQ_REARM_THRESH;
+				i += 4, rxp += 4, rxdp += 4) {
+			__m128i vaddr0, vaddr1, vaddr2, vaddr3;
+			__m256i vaddr0_1, vaddr2_3;
+
+			mb0 = rxp[0];
+			mb1 = rxp[1];
+			mb2 = rxp[2];
+			mb3 = rxp[3];
+
+			/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
+			RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
+					offsetof(struct rte_mbuf, buf_addr) + 8);
+			vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
+			vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
+			vaddr2 = _mm_loadu_si128((__m128i *)&mb2->buf_addr);
+			vaddr3 = _mm_loadu_si128((__m128i *)&mb3->buf_addr);
+
+			/**
+			 * merge 0 & 1, by casting 0 to 256-bit and inserting 1
+			 * into the high lanes. Similarly for 2 & 3
+			 */
+			vaddr0_1 =
+				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0),
+							vaddr1, 1);
+			vaddr2_3 =
+				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2),
+							vaddr3, 1);
+
+			/* convert pa to dma_addr hdr/data */
+			dma_addr0_1 = _mm256_unpackhi_epi64(vaddr0_1, vaddr0_1);
+			dma_addr2_3 = _mm256_unpackhi_epi64(vaddr2_3, vaddr2_3);
+
+			/* add headroom to pa values */
+			dma_addr0_1 = _mm256_add_epi64(dma_addr0_1, hdr_room);
+			dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room);
+
+			/* flush desc with pa dma_addr */
+			_mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1);
+			_mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3);
+		}
+	}
+
+#endif
+
+	rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH;
+	if (rxq->rxrearm_start >= rxq->nb_rx_desc)
+		rxq->rxrearm_start = 0;
+
+	rxq->rxrearm_nb -= IAVF_RXQ_REARM_THRESH;
+
+	rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
+			     (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
+
+	/* Update the tail pointer on the NIC */
+	IAVF_PCI_REG_WRITE(rxq->qrx_tail, rx_id);
+}
+#endif
+
 #endif