From patchwork Wed Apr 17 16:08:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 139454 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4E0E843E94; Wed, 17 Apr 2024 18:08:56 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1D5DB40EE5; Wed, 17 Apr 2024 18:08:36 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 46DE840E78 for ; Wed, 17 Apr 2024 18:08:30 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6CA7414BF; Wed, 17 Apr 2024 09:08:57 -0700 (PDT) Received: from octeon10-1.usa.Arm.com (unknown [10.118.91.161]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 548143F792; Wed, 17 Apr 2024 09:08:29 -0700 (PDT) From: Yoan Picchi To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi , Harjot Singh , Nathan Brown , Ruifeng Wang Subject: [PATCH v8 4/4] hash: add SVE support for bulk key lookup Date: Wed, 17 Apr 2024 16:08:07 +0000 Message-Id: <20240417160807.1249480-5-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240417160807.1249480-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240417160807.1249480-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org - Implemented SVE code for comparing signatures in bulk lookup. - New SVE code is ~5% slower than optimized NEON for N2 processor for 128b vectors. Signed-off-by: Yoan Picchi Signed-off-by: Harjot Singh Reviewed-by: Nathan Brown Reviewed-by: Ruifeng Wang --- lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++ lib/hash/rte_cuckoo_hash.c | 7 +++- lib/hash/rte_cuckoo_hash.h | 1 + 3 files changed, 65 insertions(+), 1 deletion(-) diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/compare_signatures.h index 2601ed68b3..140ff97b1d 100644 --- a/lib/hash/arch/arm/compare_signatures.h +++ b/lib/hash/arch/arm/compare_signatures.h @@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer, *hitmask_buffer = vaddvq_u16(hit2); } break; +#endif +#if defined(RTE_HAS_SVE_ACLE) + case RTE_HASH_COMPARE_SVE: { + svuint16_t vsign, shift, sv_matches; + svbool_t pred, match, bucket_wide_pred; + int i = 0; + uint64_t vl = svcnth(); + + vsign = svdup_u16(sig); + shift = svindex_u16(0, 1); + + if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && RTE_HASH_BUCKET_ENTRIES <= 8) { + svuint16_t primary_array_vect, secondary_array_vect; + bucket_wide_pred = svwhilelt_b16(0, RTE_HASH_BUCKET_ENTRIES); + primary_array_vect = svld1_u16(bucket_wide_pred, prim_bucket_sigs); + secondary_array_vect = svld1_u16(bucket_wide_pred, sec_bucket_sigs); + + /* We merged the two vectors so we can do both comparisons at once */ + primary_array_vect = svsplice_u16(bucket_wide_pred, + primary_array_vect, + secondary_array_vect); + pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES); + + /* Compare all signatures in the buckets */ + match = svcmpeq_u16(pred, vsign, primary_array_vect); + if (svptest_any(svptrue_b16(), match)) { + sv_matches = svdup_u16(1); + sv_matches = svlsl_u16_z(match, sv_matches, shift); + *hitmask_buffer = svorv_u16(svptrue_b16(), sv_matches); + } + } else { + do { + pred = svwhilelt_b16(i, RTE_HASH_BUCKET_ENTRIES); + uint16_t lower_half = 0; + uint16_t upper_half = 0; + /* Compare all signatures in the primary bucket */ + match = svcmpeq_u16(pred, vsign, svld1_u16(pred, + &prim_bucket_sigs[i])); + if (svptest_any(svptrue_b16(), match)) { + sv_matches = svdup_u16(1); + sv_matches = svlsl_u16_z(match, sv_matches, shift); + lower_half = svorv_u16(svptrue_b16(), sv_matches); + } + /* Compare all signatures in the secondary bucket */ + match = svcmpeq_u16(pred, vsign, svld1_u16(pred, + &sec_bucket_sigs[i])); + if (svptest_any(svptrue_b16(), match)) { + sv_matches = svdup_u16(1); + sv_matches = svlsl_u16_z(match, sv_matches, shift); + upper_half = svorv_u16(svptrue_b16(), sv_matches) + << RTE_HASH_BUCKET_ENTRIES; + } + hitmask_buffer[i / 8] = upper_half | lower_half; + i += vl; + } while (i < RTE_HASH_BUCKET_ENTRIES); + } + } + break; #endif default: for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index 0697743cdf..75f555ba2c 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -450,8 +450,13 @@ rte_hash_create(const struct rte_hash_parameters *params) h->sig_cmp_fn = RTE_HASH_COMPARE_SSE; else #elif defined(RTE_ARCH_ARM64) - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { h->sig_cmp_fn = RTE_HASH_COMPARE_NEON; +#if defined(RTE_HAS_SVE_ACLE) + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE)) + h->sig_cmp_fn = RTE_HASH_COMPARE_SVE; +#endif + } else #endif h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR; diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h index a528f1d1a0..01ad01c258 100644 --- a/lib/hash/rte_cuckoo_hash.h +++ b/lib/hash/rte_cuckoo_hash.h @@ -139,6 +139,7 @@ enum rte_hash_sig_compare_function { RTE_HASH_COMPARE_SCALAR = 0, RTE_HASH_COMPARE_SSE, RTE_HASH_COMPARE_NEON, + RTE_HASH_COMPARE_SVE, RTE_HASH_COMPARE_NUM };