From patchwork Wed Aug 28 08:24:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58109 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 666DF1C231; Wed, 28 Aug 2019 10:25:33 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id ACE9E1C231; Wed, 28 Aug 2019 10:25:31 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 47EBF360; Wed, 28 Aug 2019 01:25:31 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4BCD53F59C; Wed, 28 Aug 2019 01:25:29 -0700 (PDT) From: Ruifeng Wang To: xiaolong.ye@intel.com, ferruh.yigit@intel.com, jerinj@marvell.com, gavin.hu@arm.com Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, nd@arm.com, Ruifeng Wang , stable@dpdk.org Date: Wed, 28 Aug 2019 16:24:53 +0800 Message-Id: <20190828082454.13484-2-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190828082454.13484-1-ruifeng.wang@arm.com> References: <20190813100248.8000-1-ruifeng.wang@arm.com> <20190828082454.13484-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 1/2] net/ixgbe: remove barrier in vPMD for aarch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The memory barrier was intended for descriptor data integrity (see comments in [1]). As later NEON loads were implemented and a whole entry is loaded in one-run and atomic, that makes the ordering of partial loading unnecessary. Remove it accordingly. Corrected couple of code comments. In terms of performance, observed slightly higher average throughput in tests with 82599ES NIC. [1] http://patches.dpdk.org/patch/18153/ Fixes: 989a84050542 ("net/ixgbe: fix received packets number for ARM NEON") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang Reviewed-by: Gavin Hu --- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index edb138354..86fb3afdb 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -214,13 +214,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, uint32_t var = 0; uint32_t stat; - /* B.1 load 1 mbuf point */ + /* B.1 load 2 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); - /* B.1 load 1 mbuf point */ + /* B.1 load 2 mbuf point */ mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ @@ -228,7 +228,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - rte_smp_rmb(); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); From patchwork Wed Aug 28 08:24:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58110 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C076B1C222; Wed, 28 Aug 2019 10:25:36 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id C04201C24F for ; Wed, 28 Aug 2019 10:25:35 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4CCE0337; Wed, 28 Aug 2019 01:25:35 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 7E6DF3F59C; Wed, 28 Aug 2019 01:25:33 -0700 (PDT) From: Ruifeng Wang To: xiaolong.ye@intel.com, ferruh.yigit@intel.com, jerinj@marvell.com, gavin.hu@arm.com Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, nd@arm.com, Ruifeng Wang Date: Wed, 28 Aug 2019 16:24:54 +0800 Message-Id: <20190828082454.13484-3-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190828082454.13484-1-ruifeng.wang@arm.com> References: <20190813100248.8000-1-ruifeng.wang@arm.com> <20190828082454.13484-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 2/2] net/ixgbe: use neon intrinsics to count packet for aarch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" vPMD for aarch64 calculates the number of received packets using a loop. Change to use NEON intrinsics for calculation. This saves CPU cycles and has slightly better performance. Signed-off-by: Ruifeng Wang Reviewed-by: Gavin Hu --- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 27 +++++++++++++------------ 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 86fb3afdb..eeb825911 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -144,6 +144,7 @@ desc_to_olflags_v(uint8x16x2_t sterr_tmp1, uint8x16x2_t sterr_tmp2, #define IXGBE_VPMD_DESC_DD_MASK 0x01010101 #define IXGBE_VPMD_DESC_EOP_MASK 0x02020202 +#define IXGBE_UINT8_BIT (CHAR_BIT * sizeof(uint8_t)) static inline uint16_t _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, @@ -211,7 +212,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, uint64x2_t mbp1, mbp2; uint8x16_t staterr; uint16x8_t tmp; - uint32_t var = 0; uint32_t stat; /* B.1 load 2 mbuf point */ @@ -256,7 +256,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* C.2 get 4 pkts staterr value */ staterr = vzipq_u8(sterr_tmp1.val[1], sterr_tmp2.val[1]).val[0]; - stat = vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0); /* set ol_flags with vlan packet type */ desc_to_olflags_v(sterr_tmp1, sterr_tmp2, staterr, @@ -282,12 +281,20 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* C* extract and record EOP bit */ if (split_packet) { + stat = vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0); /* and with mask to extract bits, flipping 1-0 */ *(int *)split_packet = ~stat & IXGBE_VPMD_DESC_EOP_MASK; split_packet += RTE_IXGBE_DESCS_PER_LOOP; } + /* C.4 expand DD bit to saturate UINT8 */ + staterr = vshlq_n_u8(staterr, IXGBE_UINT8_BIT - 1); + staterr = vreinterpretq_u8_s8 + (vshrq_n_s8(vreinterpretq_s8_u8(staterr), + IXGBE_UINT8_BIT - 1)); + stat = ~vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0); + rte_prefetch_non_temporal(rxdp + RTE_IXGBE_DESCS_PER_LOOP); /* D.3 copy final 1,2 data to rx_pkts */ @@ -296,18 +303,12 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1, pkt_mb1); - stat &= IXGBE_VPMD_DESC_DD_MASK; - - /* C.4 calc avaialbe number of desc */ - if (likely(stat != IXGBE_VPMD_DESC_DD_MASK)) { - while (stat & 0x01) { - ++var; - stat = stat >> 8; - } - nb_pkts_recd += var; - break; - } else { + /* C.5 calc available number of desc */ + if (unlikely(stat == 0)) { nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP; + } else { + nb_pkts_recd += __builtin_ctz(stat) / IXGBE_UINT8_BIT; + break; } }