From patchwork Fri Sep 18 03:35:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leyi Rong X-Patchwork-Id: 78060 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id AEF0BA04C7; Fri, 18 Sep 2020 05:53:55 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1AA471D730; Fri, 18 Sep 2020 05:53:45 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id CE3B91D71D for ; Fri, 18 Sep 2020 05:53:39 +0200 (CEST) IronPort-SDR: 2XK5aM40RaOFsrZOzFFbEF5zwmQYzTMuZ6jISMHmsDiuJTTfpssOLzk3+9WyccA3lEfBZ/wuh2 4qRPeZg4nYvA== X-IronPort-AV: E=McAfee;i="6000,8403,9747"; a="159155194" X-IronPort-AV: E=Sophos;i="5.77,273,1596524400"; d="scan'208";a="159155194" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2020 20:53:38 -0700 IronPort-SDR: 0G8Yw7Ixi6ppNLyJMXlRCecDcSyHNyAhSNjCgQadpwRJP3kVjzL02sXAt6Pven2IDCuXSI6v3m WwnSphX+Os+A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,273,1596524400"; d="scan'208";a="307728642" Received: from dpdk-lrong-srv-04.sh.intel.com ([10.67.119.221]) by orsmga006.jf.intel.com with ESMTP; 17 Sep 2020 20:53:37 -0700 From: Leyi Rong To: qi.z.zhang@intel.com, wenzhuo.lu@intel.com, burce.richardson@intel.com Cc: dev@dpdk.org, Leyi Rong Date: Fri, 18 Sep 2020 11:35:27 +0800 Message-Id: <20200918033528.110297-3-leyi.rong@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200918033528.110297-1-leyi.rong@intel.com> References: <20200910065504.104217-1-leyi.rong@intel.com> <20200918033528.110297-1-leyi.rong@intel.com> Subject: [dpdk-dev] [PATCH v2 2/3] net/ice: add RSS hash parsing in AVX512 path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Support RSS hash parsing in AVX512 data path as the default RXDID is set to #22, that means the RSS hash field locates in the 2nd 16B of each Flex Rx descriptor. Signed-off-by: Leyi Rong --- drivers/net/ice/ice_rxtx_vec_avx512.c | 105 ++++++++++++++++++++++++-- 1 file changed, 98 insertions(+), 7 deletions(-) diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index 6a9d0a8eaa..a2a5d9987a 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -176,8 +176,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* mask to shuffle from desc. to mbuf (4 descriptors)*/ const __m512i shuf_msk = _mm512_set4_epi32 - (/* octet 12~15, 32 bits rss */ - 15 << 24 | 14 << 16 | 13 << 8 | 12, + (/* rss hash parsed separately */ + 0xFFFFFFFF, /* octet 10~11, 16 bits vlan_macip */ /* octet 4~5, 16 bits data_len */ 11 << 24 | 10 << 16 | 5 << 8 | 4, @@ -399,6 +399,11 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, mb4_7 = _mm512_mask_blend_epi32(0x1111, mb4_7, ptype4_7); mb0_3 = _mm512_mask_blend_epi32(0x1111, mb0_3, ptype0_3); + __m256i mb4_5 = _mm512_extracti64x4_epi64(mb4_7, 0); + __m256i mb6_7 = _mm512_extracti64x4_epi64(mb4_7, 1); + __m256i mb0_1 = _mm512_extracti64x4_epi64(mb0_3, 0); + __m256i mb2_3 = _mm512_extracti64x4_epi64(mb0_3, 1); + /** * use permute/extract to get status content * After the operations, the packets status flags are in the @@ -438,6 +443,97 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* merge flags */ const __m256i mbuf_flags = _mm256_or_si256(l3_l4_flags, rss_vlan_flags); + +#ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + /** + * needs to load 2nd 16B of each desc for RSS hash parsing, + * will cause performance drop to get into this context. + */ + if (rxq->vsi->adapter->eth_dev->data->dev_conf.rxmode.offloads & + DEV_RX_OFFLOAD_RSS_HASH) { + /* load bottom half of every 32B desc */ + const __m128i raw_desc_bh7 = + _mm_load_si128 + ((void *)(&rxdp[7].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh6 = + _mm_load_si128 + ((void *)(&rxdp[6].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh5 = + _mm_load_si128 + ((void *)(&rxdp[5].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh4 = + _mm_load_si128 + ((void *)(&rxdp[4].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh3 = + _mm_load_si128 + ((void *)(&rxdp[3].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh2 = + _mm_load_si128 + ((void *)(&rxdp[2].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh1 = + _mm_load_si128 + ((void *)(&rxdp[1].wb.status_error1)); + rte_compiler_barrier(); + const __m128i raw_desc_bh0 = + _mm_load_si128 + ((void *)(&rxdp[0].wb.status_error1)); + + __m256i raw_desc_bh6_7 = + _mm256_inserti128_si256 + (_mm256_castsi128_si256(raw_desc_bh6), + raw_desc_bh7, 1); + __m256i raw_desc_bh4_5 = + _mm256_inserti128_si256 + (_mm256_castsi128_si256(raw_desc_bh4), + raw_desc_bh5, 1); + __m256i raw_desc_bh2_3 = + _mm256_inserti128_si256 + (_mm256_castsi128_si256(raw_desc_bh2), + raw_desc_bh3, 1); + __m256i raw_desc_bh0_1 = + _mm256_inserti128_si256 + (_mm256_castsi128_si256(raw_desc_bh0), + raw_desc_bh1, 1); + + /** + * to shift the 32b RSS hash value to the + * highest 32b of each 128b before mask + */ + __m256i rss_hash6_7 = + _mm256_slli_epi64(raw_desc_bh6_7, 32); + __m256i rss_hash4_5 = + _mm256_slli_epi64(raw_desc_bh4_5, 32); + __m256i rss_hash2_3 = + _mm256_slli_epi64(raw_desc_bh2_3, 32); + __m256i rss_hash0_1 = + _mm256_slli_epi64(raw_desc_bh0_1, 32); + + __m256i rss_hash_msk = + _mm256_set_epi32(0xFFFFFFFF, 0, 0, 0, + 0xFFFFFFFF, 0, 0, 0); + + rss_hash6_7 = _mm256_and_si256 + (rss_hash6_7, rss_hash_msk); + rss_hash4_5 = _mm256_and_si256 + (rss_hash4_5, rss_hash_msk); + rss_hash2_3 = _mm256_and_si256 + (rss_hash2_3, rss_hash_msk); + rss_hash0_1 = _mm256_and_si256 + (rss_hash0_1, rss_hash_msk); + + mb6_7 = _mm256_or_si256(mb6_7, rss_hash6_7); + mb4_5 = _mm256_or_si256(mb4_5, rss_hash4_5); + mb2_3 = _mm256_or_si256(mb2_3, rss_hash2_3); + mb0_1 = _mm256_or_si256(mb0_1, rss_hash0_1); + } /* if() on RSS hash parsing */ +#endif + /** * At this point, we have the 8 sets of flags in the low 16-bits * of each 32-bit value in vlan0. @@ -471,11 +567,6 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, _mm256_srli_si256(mbuf_flags, 4), 0x04); - const __m256i mb4_5 = _mm512_extracti64x4_epi64(mb4_7, 0); - const __m256i mb6_7 = _mm512_extracti64x4_epi64(mb4_7, 1); - const __m256i mb0_1 = _mm512_extracti64x4_epi64(mb0_3, 0); - const __m256i mb2_3 = _mm512_extracti64x4_epi64(mb0_3, 1); - /* permute to add in the rx_descriptor e.g. rss fields */ rearm6 = _mm256_permute2f128_si256(rearm6, mb6_7, 0x20); rearm4 = _mm256_permute2f128_si256(rearm4, mb4_5, 0x20);