From patchwork Fri Jul 23 03:10:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 96222 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5290FA0C46; Fri, 23 Jul 2021 05:11:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2AA07410DA; Fri, 23 Jul 2021 05:11:07 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id B696040E5A; Fri, 23 Jul 2021 05:11:05 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2E99C106F; Thu, 22 Jul 2021 20:11:05 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.99]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B21CF3F694; Thu, 22 Jul 2021 20:10:58 -0700 (PDT) From: Feifei Wang To: Qi Zhang , Xiao Wang , David Christensen , Beilei Xing , Ruifeng Wang , Bruce Richardson , Konstantin Ananyev , Jingjing Wu , Qiming Yang , Haiyue Wang , Cunming Liang , "Chen Jing D(Mark)" , Chao Zhu , Gowrishankar Muthukrishnan , Jerin Jacob , Jianbo Liu , Zhe Tao , Leyi Rong , Wenzhuo Lu Cc: dev@dpdk.org, nd@arm.com, Feifei Wang , stable@dpdk.org Date: Fri, 23 Jul 2021 11:10:46 +0800 Message-Id: <20210723031049.2201665-2-feifei.wang2@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210723031049.2201665-1-feifei.wang2@arm.com> References: <20210723031049.2201665-1-feifei.wang2@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 1/4] drivers/net: remove redundant phrases X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For the note of Rx vec path,when extract and record EOP bit, the code note should be "as the count of dd bits doesn't care", remove the redundant "count". fm10k: Fixes: 7092be8437bd ("fm10k: add vector Rx") Cc: jing.d.chen@intel.com i40e-altive: Fixes: c3def6a8724c ("net/i40e: implement vector PMD for altivec") Cc: gowrishankar.m@linux.vnet.ibm.com i40e-neon: Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM") i40e-sse: Fixes: 9ed94e5bb04e ("i40e: add vector Rx") Cc: zhe.tao@intel.com iavf: Fixes: 319c421f3890 ("net/avf: enable SSE Rx Tx") Cc: jingjing.wu@intel.com Fixes: 1162f5a0ef31 ("net/iavf: support flexible Rx descriptor in SSE path") Cc: leyi.rong@intel.com ice: Fixes: c68a52b8b38c ("net/ice: support vector SSE in Rx") Cc: wenzhuo.lu@intel.com ixgbe: Fixes: cf4b4708a88a ("ixgbe: improve slow-path perf with vector scattered Rx") Cc: bruce.richardson@intel.com Cc: stable@dpdk.org Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- drivers/net/fm10k/fm10k_rxtx_vec.c | 2 +- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 2 +- drivers/net/i40e/i40e_rxtx_vec_neon.c | 2 +- drivers/net/i40e/i40e_rxtx_vec_sse.c | 2 +- drivers/net/iavf/iavf_rxtx_vec_sse.c | 4 ++-- drivers/net/ice/ice_rxtx_vec_sse.c | 2 +- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 2 +- 7 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 39e3cdac1f..cae5322d48 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -544,7 +544,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* and with mask to extract bits, flipping 1-0 */ __m128i eop_bits = _mm_andnot_si128(staterr, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index 1ad74646d6..edaa462ac8 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -398,7 +398,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, (vector unsigned char)vec_nor(staterr, staterr), (vector unsigned char)eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index 1f5539bda8..32336fdb80 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -387,7 +387,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, eop_bits = vmvnq_u8(vreinterpretq_u8_u16(staterr)); eop_bits = vandq_u8(eop_bits, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index bfa5aff48d..03a0320353 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -557,7 +557,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* and with mask to extract bits, flipping 1-0 */ __m128i eop_bits = _mm_andnot_si128(staterr, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index bf87696fa4..b813d96ef4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -590,7 +590,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* and with mask to extract bits, flipping 1-0 */ __m128i eop_bits = _mm_andnot_si128(staterr, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ @@ -884,7 +884,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* and with mask to extract bits, flipping 1-0 */ __m128i eop_bits = _mm_andnot_si128(staterr, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 673e44a243..5f7e13ee39 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -545,7 +545,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* and with mask to extract bits, flipping 1-0 */ __m128i eop_bits = _mm_andnot_si128(staterr, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index 7610fd93db..3a3ef51172 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -540,7 +540,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* and with mask to extract bits, flipping 1-0 */ __m128i eop_bits = _mm_andnot_si128(staterr, eop_check); /* the staterr values are not in order, as the count - * count of dd bits doesn't care. However, for end of + * of dd bits doesn't care. However, for end of * packet tracking, we do care, so shuffle. This also * compresses the 32-bit values to 8-bit */ From patchwork Fri Jul 23 03:10:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 96223 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 86225A0C46; Fri, 23 Jul 2021 05:11:14 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 74CD9410DB; Fri, 23 Jul 2021 05:11:14 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 03D7D410DB; Fri, 23 Jul 2021 05:11:12 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 60B5D106F; Thu, 22 Jul 2021 20:11:12 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.99]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E51E23F694; Thu, 22 Jul 2021 20:11:05 -0700 (PDT) From: Feifei Wang To: Qi Zhang , Xiao Wang , David Christensen , Beilei Xing , Ruifeng Wang , Bruce Richardson , Konstantin Ananyev , Jingjing Wu , Qiming Yang , Haiyue Wang , "Chen Jing D(Mark)" , Cunming Liang , Chao Zhu , Gowrishankar Muthukrishnan , Jerin Jacob , Jianbo Liu , Zhe Tao , Leyi Rong , Wenzhuo Lu Cc: dev@dpdk.org, nd@arm.com, Feifei Wang , stable@dpdk.org Date: Fri, 23 Jul 2021 11:10:47 +0800 Message-Id: <20210723031049.2201665-3-feifei.wang2@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210723031049.2201665-1-feifei.wang2@arm.com> References: <20210723031049.2201665-1-feifei.wang2@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 2/4] drivers/net: fix note error for Rx vector X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For the loop to process packets in Rx vector path, some notes for the code are wrong, fix these errors. fm10k: Fixes: 7092be8437bd ("fm10k: add vector Rx") Cc: jing.d.chen@intel.com i40e-altive: Fixes: c3def6a8724c ("net/i40e: implement vector PMD for altivec") Cc: gowrishankar.m@linux.vnet.ibm.com i40e-neon: Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM") i40e-sse: Fixes: 9ed94e5bb04e ("i40e: add vector Rx") Cc: zhe.tao@intel.com iavf: Fixes: 319c421f3890 ("net/avf: enable SSE Rx Tx") Cc: jingjing.wu@intel.com Fixes: 1162f5a0ef31 ("net/iavf: support flexible Rx descriptor in SSE path") Cc: leyi.rong@intel.com ice: Fixes: c68a52b8b38c ("net/ice: support vector SSE in Rx") Cc: wenzhuo.lu@intel.com ixgbe: Fixes: cf4b4708a88a ("ixgbe: improve slow-path perf with vector scattered Rx") Cc: bruce.richardson@intel.com Cc: stable@dpdk.org Suggested-by: Ruifeng Wang Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- drivers/net/fm10k/fm10k_rxtx_vec.c | 4 ++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 8 ++++---- drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++---- drivers/net/i40e/i40e_rxtx_vec_sse.c | 4 ++-- drivers/net/iavf/iavf_rxtx_vec_sse.c | 8 ++++---- drivers/net/ice/ice_rxtx_vec_sse.c | 4 ++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 4 ++-- 7 files changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index cae5322d48..83af01dc2d 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -472,7 +472,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); rte_compiler_barrier(); @@ -484,9 +484,9 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]); #endif + /* A.1 load desc[2-0] */ descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index edaa462ac8..b99323992f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -281,22 +281,22 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, * in one XMM reg. */ - /* B.1 load 1 mbuf point */ + /* B.1 load 2 mbuf point */ mbp1 = *(vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = *(vector unsigned long *)(rxdp + 3); rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ *(vector unsigned long *)&rx_pkts[pos] = mbp1; - /* B.1 load 1 mbuf point */ + /* B.1 load 2 mbuf point */ mbp2 = *(vector unsigned long *)&sw_ring[pos + 2]; + /* A.1 load desc[2-0] */ descs[2] = *(vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs[1] = *(vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(vector unsigned long *)(rxdp); diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index 32336fdb80..fb624a4882 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -280,20 +280,20 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; - /* B.1 load 1 mbuf point */ + /* B.1 load 2 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); - /* B.1 load 1 mbuf point */ + /* B.1 load 2 mbuf point */ mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); + /* A.1 load desc[2-0] */ descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - /* B.1 load 2 mbuf point */ descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[0] = vld1q_u64((uint64_t *)(rxdp)); diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index 03a0320353..b235502db5 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -462,7 +462,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* B.1 load 2 (64 bit) or 4 (32 bit) mbuf points */ mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); rte_compiler_barrier(); @@ -474,9 +474,9 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos+2]); #endif + /* A.1 load desc[2-0] */ descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index b813d96ef4..ee1e905525 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -494,7 +494,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* B.1 load 2 (64 bit) or 4 (32 bit) mbuf points */ mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); rte_compiler_barrier(); @@ -506,9 +506,9 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos + 2]); #endif + /* A.1 load desc[2-0] */ descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); @@ -755,7 +755,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* B.1 load 2 (64 bit) or 4 (32 bit) mbuf points */ mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); rte_compiler_barrier(); @@ -767,9 +767,9 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos + 2]); #endif + /* A.1 load desc[2-0] */ descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 5f7e13ee39..653bd28b41 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -416,7 +416,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* B.1 load 2 (64 bit) or 4 (32 bit) mbuf points */ mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); rte_compiler_barrier(); @@ -428,9 +428,9 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos + 2]); #endif + /* A.1 load desc[2-0] */ descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index 3a3ef51172..1dea95e73b 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -454,7 +454,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ - /* A.1 load 4 pkts desc */ + /* A.1 load desc[3] */ descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); rte_compiler_barrier(); @@ -466,9 +466,9 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos+2]); #endif + /* A.1 load desc[2-0] */ descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); - /* B.1 load 2 mbuf point */ descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); From patchwork Fri Jul 23 03:10:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 96224 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E0ED2A0C46; Fri, 23 Jul 2021 05:11:21 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 302F7410EE; Fri, 23 Jul 2021 05:11:17 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 8A2E5410E7 for ; Fri, 23 Jul 2021 05:11:15 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F1A511D4; Thu, 22 Jul 2021 20:11:15 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.99]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F0B2B3F694; Thu, 22 Jul 2021 20:11:12 -0700 (PDT) From: Feifei Wang To: Ruifeng Wang , Beilei Xing Cc: dev@dpdk.org, nd@arm.com, Feifei Wang , Joyce Kong Date: Fri, 23 Jul 2021 11:10:48 +0800 Message-Id: <20210723031049.2201665-4-feifei.wang2@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210723031049.2201665-1-feifei.wang2@arm.com> References: <20210723031049.2201665-1-feifei.wang2@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 3/4] net/i40e: reorder Rx NEON code for better readability X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Rearrange the code in logical order for better readability and maintenance convenience in Rx NEON path. No performance change with this patch in arm platform. Suggested-by: Joyce Kong Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 99 ++++++++++++--------------- 1 file changed, 44 insertions(+), 55 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index fb624a4882..8f3188e910 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -280,24 +280,18 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; - /* B.1 load 2 mbuf point */ - mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); - /* Read desc statuses backwards to avoid race condition */ - /* A.1 load desc[3] */ + /* A.1 load desc[3-0] */ descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - - /* B.2 copy 2 mbuf point into rx_pkts */ - vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); - - /* B.1 load 2 mbuf point */ - mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); - - /* A.1 load desc[2-0] */ descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[0] = vld1q_u64((uint64_t *)(rxdp)); - /* B.2 copy 2 mbuf point into rx_pkts */ + /* B.1 load 4 mbuf point */ + mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); + mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); + + /* B.2 copy 4 mbuf point into rx_pkts */ + vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); if (split_packet) { @@ -307,28 +301,9 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, rte_mbuf_prefetch_part2(rx_pkts[pos + 3]); } - /* pkt 3,4 shift the pktlen field to be 16-bit aligned*/ - uint32x4_t len3 = vshlq_u32(vreinterpretq_u32_u64(descs[3]), - len_shl); - descs[3] = vreinterpretq_u64_u16(vsetq_lane_u16 - (vgetq_lane_u16(vreinterpretq_u16_u32(len3), 7), - vreinterpretq_u16_u64(descs[3]), - 7)); - uint32x4_t len2 = vshlq_u32(vreinterpretq_u32_u64(descs[2]), - len_shl); - descs[2] = vreinterpretq_u64_u16(vsetq_lane_u16 - (vgetq_lane_u16(vreinterpretq_u16_u32(len2), 7), - vreinterpretq_u16_u64(descs[2]), - 7)); - - /* D.1 pkt 3,4 convert format from desc to pktmbuf */ - pkt_mb4 = vqtbl1q_u8(vreinterpretq_u8_u64(descs[3]), shuf_msk); - pkt_mb3 = vqtbl1q_u8(vreinterpretq_u8_u64(descs[2]), shuf_msk); - /* C.1 4=>2 filter staterr info only */ sterr_tmp2 = vzipq_u16(vreinterpretq_u16_u64(descs[1]), vreinterpretq_u16_u64(descs[3])); - /* C.1 4=>2 filter staterr info only */ sterr_tmp1 = vzipq_u16(vreinterpretq_u16_u64(descs[0]), vreinterpretq_u16_u64(descs[2])); @@ -338,13 +313,19 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, desc_to_olflags_v(rxq, descs, &rx_pkts[pos]); - /* D.2 pkt 3,4 set in_port/nb_seg and remove crc */ - tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb4), crc_adjust); - pkt_mb4 = vreinterpretq_u8_u16(tmp); - tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb3), crc_adjust); - pkt_mb3 = vreinterpretq_u8_u16(tmp); - - /* pkt 1,2 shift the pktlen field to be 16-bit aligned*/ + /* pkts shift the pktlen field to be 16-bit aligned*/ + uint32x4_t len3 = vshlq_u32(vreinterpretq_u32_u64(descs[3]), + len_shl); + descs[3] = vreinterpretq_u64_u16(vsetq_lane_u16 + (vgetq_lane_u16(vreinterpretq_u16_u32(len3), 7), + vreinterpretq_u16_u64(descs[3]), + 7)); + uint32x4_t len2 = vshlq_u32(vreinterpretq_u32_u64(descs[2]), + len_shl); + descs[2] = vreinterpretq_u64_u16(vsetq_lane_u16 + (vgetq_lane_u16(vreinterpretq_u16_u32(len2), 7), + vreinterpretq_u16_u64(descs[2]), + 7)); uint32x4_t len1 = vshlq_u32(vreinterpretq_u32_u64(descs[1]), len_shl); descs[1] = vreinterpretq_u64_u16(vsetq_lane_u16 @@ -358,22 +339,38 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, vreinterpretq_u16_u64(descs[0]), 7)); - /* D.1 pkt 1,2 convert format from desc to pktmbuf */ + /* D.1 pkts convert format from desc to pktmbuf */ + pkt_mb4 = vqtbl1q_u8(vreinterpretq_u8_u64(descs[3]), shuf_msk); + pkt_mb3 = vqtbl1q_u8(vreinterpretq_u8_u64(descs[2]), shuf_msk); pkt_mb2 = vqtbl1q_u8(vreinterpretq_u8_u64(descs[1]), shuf_msk); pkt_mb1 = vqtbl1q_u8(vreinterpretq_u8_u64(descs[0]), shuf_msk); - /* D.3 copy final 3,4 data to rx_pkts */ - vst1q_u8((void *)&rx_pkts[pos + 3]->rx_descriptor_fields1, - pkt_mb4); - vst1q_u8((void *)&rx_pkts[pos + 2]->rx_descriptor_fields1, - pkt_mb3); - - /* D.2 pkt 1,2 set in_port/nb_seg and remove crc */ + /* D.2 pkts set in_port/nb_seg and remove crc */ + tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb4), crc_adjust); + pkt_mb4 = vreinterpretq_u8_u16(tmp); + tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb3), crc_adjust); + pkt_mb3 = vreinterpretq_u8_u16(tmp); tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb2), crc_adjust); pkt_mb2 = vreinterpretq_u8_u16(tmp); tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb1), crc_adjust); pkt_mb1 = vreinterpretq_u8_u16(tmp); + /* D.3 copy final data to rx_pkts */ + vst1q_u8((void *)&rx_pkts[pos + 3]->rx_descriptor_fields1, + pkt_mb4); + vst1q_u8((void *)&rx_pkts[pos + 2]->rx_descriptor_fields1, + pkt_mb3); + vst1q_u8((void *)&rx_pkts[pos + 1]->rx_descriptor_fields1, + pkt_mb2); + vst1q_u8((void *)&rx_pkts[pos]->rx_descriptor_fields1, + pkt_mb1); + + desc_to_ptype_v(descs, &rx_pkts[pos], ptype_tbl); + + if (likely(pos + RTE_I40E_DESCS_PER_LOOP < nb_pkts)) { + rte_prefetch_non_temporal(rxdp + RTE_I40E_DESCS_PER_LOOP); + } + /* C* extract and record EOP bit */ if (split_packet) { uint8x16_t eop_shuf_mask = { @@ -411,14 +408,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, I40E_UINT16_BIT - 1)); stat = ~vgetq_lane_u64(vreinterpretq_u64_u16(staterr), 0); - rte_prefetch_non_temporal(rxdp + RTE_I40E_DESCS_PER_LOOP); - - /* D.3 copy final 1,2 data to rx_pkts */ - vst1q_u8((void *)&rx_pkts[pos + 1]->rx_descriptor_fields1, - pkt_mb2); - vst1q_u8((void *)&rx_pkts[pos]->rx_descriptor_fields1, - pkt_mb1); - desc_to_ptype_v(descs, &rx_pkts[pos], ptype_tbl); /* C.4 calc avaialbe number of desc */ if (unlikely(stat == 0)) { nb_pkts_recd += RTE_I40E_DESCS_PER_LOOP; From patchwork Fri Jul 23 03:10:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 96225 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2A208A0C46; Fri, 23 Jul 2021 05:11:27 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6B6FC410F1; Fri, 23 Jul 2021 05:11:19 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 0702440DDA for ; Fri, 23 Jul 2021 05:11:18 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73973106F; Thu, 22 Jul 2021 20:11:17 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.99]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9F55E3F694; Thu, 22 Jul 2021 20:11:15 -0700 (PDT) From: Feifei Wang To: Ruifeng Wang , Beilei Xing Cc: dev@dpdk.org, nd@arm.com, Feifei Wang Date: Fri, 23 Jul 2021 11:10:49 +0800 Message-Id: <20210723031049.2201665-5-feifei.wang2@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210723031049.2201665-1-feifei.wang2@arm.com> References: <20210723031049.2201665-1-feifei.wang2@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 4/4] net/i40e: change code order to reduce L1 cache misses X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For N1 platform, packet mbuf load and descs load are hot spots to limit the performance for "desc_to_ptype_v" and "desc_to_olflags_v" functions in i40e rx NEON path. This is because packet mbuf and descs are evicted from l1d-cache to l2d-cache. To reduce l1d-cache-misses and improve the performance, change the code order and move "desc_to_ptype_v" and "desc_to_olflags_v" functions forward to the location, where packet mbuf and descs are just loaded. Test Result: dpdk:21.08-rc1 gcc-9 For n1sdp, the patch improves the performance by 1.8%. For thunderx2, no performance changes. Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index 8f3188e910..b2683fda60 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -301,18 +301,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, rte_mbuf_prefetch_part2(rx_pkts[pos + 3]); } - /* C.1 4=>2 filter staterr info only */ - sterr_tmp2 = vzipq_u16(vreinterpretq_u16_u64(descs[1]), - vreinterpretq_u16_u64(descs[3])); - sterr_tmp1 = vzipq_u16(vreinterpretq_u16_u64(descs[0]), - vreinterpretq_u16_u64(descs[2])); - - /* C.2 get 4 pkts staterr value */ - staterr = vzipq_u16(sterr_tmp1.val[1], - sterr_tmp2.val[1]).val[0]; - - desc_to_olflags_v(rxq, descs, &rx_pkts[pos]); - /* pkts shift the pktlen field to be 16-bit aligned*/ uint32x4_t len3 = vshlq_u32(vreinterpretq_u32_u64(descs[3]), len_shl); @@ -367,10 +355,22 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, desc_to_ptype_v(descs, &rx_pkts[pos], ptype_tbl); + desc_to_olflags_v(rxq, descs, &rx_pkts[pos]); + if (likely(pos + RTE_I40E_DESCS_PER_LOOP < nb_pkts)) { rte_prefetch_non_temporal(rxdp + RTE_I40E_DESCS_PER_LOOP); } + /* C.1 4=>2 filter staterr info only */ + sterr_tmp2 = vzipq_u16(vreinterpretq_u16_u64(descs[1]), + vreinterpretq_u16_u64(descs[3])); + sterr_tmp1 = vzipq_u16(vreinterpretq_u16_u64(descs[0]), + vreinterpretq_u16_u64(descs[2])); + + /* C.2 get 4 pkts staterr value */ + staterr = vzipq_u16(sterr_tmp1.val[1], + sterr_tmp2.val[1]).val[0]; + /* C* extract and record EOP bit */ if (split_packet) { uint8x16_t eop_shuf_mask = {