From patchwork Fri Jul 5 17:45:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142156 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B23A64559B; Fri, 5 Jul 2024 19:45:44 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6CDEA42FB1; Fri, 5 Jul 2024 19:45:44 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id CBE2542F7F for ; Fri, 5 Jul 2024 19:45:32 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3FC781477; Fri, 5 Jul 2024 10:45:57 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 265403F73B; Fri, 5 Jul 2024 10:45:32 -0700 (PDT) From: Yoan Picchi To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi Subject: [PATCH v11 1/7] hash: make compare signature function enum private Date: Fri, 5 Jul 2024 17:45:20 +0000 Message-Id: <20240705174526.3035295-2-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org enum rte_hash_sig_compare_function is only used internally. This patch move it out of the public ABI and into the C file. Signed-off-by: Yoan Picchi --- lib/hash/rte_cuckoo_hash.c | 10 ++++++++++ lib/hash/rte_cuckoo_hash.h | 10 +--------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index d87aa52b5b..e1d50e7d40 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO); #include "rte_cuckoo_hash.h" +/* Enum used to select the implementation of the signature comparison function to use + * eg: A system supporting SVE might want to use a NEON or scalar implementation. + */ +enum rte_hash_sig_compare_function { + RTE_HASH_COMPARE_SCALAR = 0, + RTE_HASH_COMPARE_SSE, + RTE_HASH_COMPARE_NEON, + RTE_HASH_COMPARE_NUM +}; + /* Mask of all flags supported by this version */ #define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \ RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \ diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h index a528f1d1a0..26a992419a 100644 --- a/lib/hash/rte_cuckoo_hash.h +++ b/lib/hash/rte_cuckoo_hash.h @@ -134,14 +134,6 @@ struct rte_hash_key { char key[0]; }; -/* All different signature compare functions */ -enum rte_hash_sig_compare_function { - RTE_HASH_COMPARE_SCALAR = 0, - RTE_HASH_COMPARE_SSE, - RTE_HASH_COMPARE_NEON, - RTE_HASH_COMPARE_NUM -}; - /** Bucket structure */ struct __rte_cache_aligned rte_hash_bucket { uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES]; @@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash { /**< Custom function used to compare keys. */ enum cmp_jump_table_case cmp_jump_table_idx; /**< Indicates which compare function to use. */ - enum rte_hash_sig_compare_function sig_cmp_fn; + unsigned int sig_cmp_fn; /**< Indicates which signature compare function to use. */ uint32_t bucket_bitmask; /**< Bitmask for getting bucket index from hash signature. */ From patchwork Fri Jul 5 17:45:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142157 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 31DC34559B; Fri, 5 Jul 2024 19:45:54 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E091D42FC9; Fri, 5 Jul 2024 19:45:46 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id D69DF42F8F for ; Fri, 5 Jul 2024 19:45:32 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C8241480; Fri, 5 Jul 2024 10:45:57 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 509B43F73B; Fri, 5 Jul 2024 10:45:32 -0700 (PDT) From: Yoan Picchi To: Thomas Monjalon , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi Subject: [PATCH v11 2/7] hash: split compare signature into arch-specific files Date: Fri, 5 Jul 2024 17:45:21 +0000 Message-Id: <20240705174526.3035295-3-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Move the compare_signatures function into architecture-specific files They all have the default scalar option as an option if we disable vectorisation. Signed-off-by: Yoan Picchi --- .mailmap | 1 + lib/hash/compare_signatures_arm_pvt.h | 55 +++++++++++++++++++ lib/hash/compare_signatures_generic_pvt.h | 33 ++++++++++++ lib/hash/compare_signatures_x86_pvt.h | 48 +++++++++++++++++ lib/hash/rte_cuckoo_hash.c | 65 +++-------------------- 5 files changed, 145 insertions(+), 57 deletions(-) create mode 100644 lib/hash/compare_signatures_arm_pvt.h create mode 100644 lib/hash/compare_signatures_generic_pvt.h create mode 100644 lib/hash/compare_signatures_x86_pvt.h diff --git a/.mailmap b/.mailmap index f76037213d..ec525981fe 100644 --- a/.mailmap +++ b/.mailmap @@ -1661,6 +1661,7 @@ Yixue Wang Yi Yang Yi Zhang Yoann Desmouceaux +Yoan Picchi Yogesh Jangra Yogev Chaimovich Yongjie Gu diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h new file mode 100644 index 0000000000..80b6afb7a5 --- /dev/null +++ b/lib/hash/compare_signatures_arm_pvt.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2016 Intel Corporation + * Copyright(c) 2018-2024 Arm Limited + */ + +#ifndef _COMPARE_SIGNATURE_ARM_PVT_H_ +#define _COMPARE_SIGNATURE_ARM_PVT_H_ + +#include +#include +#include + +#include "rte_cuckoo_hash.h" + +static inline void +compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, + const struct rte_hash_bucket *prim_bkt, + const struct rte_hash_bucket *sec_bkt, + uint16_t sig, + enum rte_hash_sig_compare_function sig_cmp_fn) +{ + unsigned int i; + + /* For match mask the first bit of every two bits indicates the match */ + switch (sig_cmp_fn) { +#if defined(__ARM_NEON) + case RTE_HASH_COMPARE_NEON: { + uint16x8_t vmat, vsig, x; + int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1}; + + vsig = vld1q_dup_u16((uint16_t const *)&sig); + /* Compare all signatures in the primary bucket */ + vmat = vceqq_u16(vsig, + vld1q_u16((uint16_t const *)prim_bkt->sig_current)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); + *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); + /* Compare all signatures in the secondary bucket */ + vmat = vceqq_u16(vsig, + vld1q_u16((uint16_t const *)sec_bkt->sig_current)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); + *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); + } + break; +#endif + default: + for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *prim_hash_matches |= + ((sig == prim_bkt->sig_current[i]) << (i << 1)); + *sec_hash_matches |= + ((sig == sec_bkt->sig_current[i]) << (i << 1)); + } + } +} + +#endif diff --git a/lib/hash/compare_signatures_generic_pvt.h b/lib/hash/compare_signatures_generic_pvt.h new file mode 100644 index 0000000000..43587adcef --- /dev/null +++ b/lib/hash/compare_signatures_generic_pvt.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2016 Intel Corporation + * Copyright(c) 2018-2024 Arm Limited + */ + +#ifndef _COMPARE_SIGNATURE_GENERIC_PVT_H_ +#define _COMPARE_SIGNATURE_GENERIC_PVT_H_ + +#include +#include +#include + +#include "rte_cuckoo_hash.h" + +static inline void +compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, + const struct rte_hash_bucket *prim_bkt, + const struct rte_hash_bucket *sec_bkt, + uint16_t sig, + enum rte_hash_sig_compare_function sig_cmp_fn) +{ + unsigned int i; + + /* For match mask the first bit of every two bits indicates the match */ + for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *prim_hash_matches |= + ((sig == prim_bkt->sig_current[i]) << (i << 1)); + *sec_hash_matches |= + ((sig == sec_bkt->sig_current[i]) << (i << 1)); + } +} + +#endif diff --git a/lib/hash/compare_signatures_x86_pvt.h b/lib/hash/compare_signatures_x86_pvt.h new file mode 100644 index 0000000000..11a82aced9 --- /dev/null +++ b/lib/hash/compare_signatures_x86_pvt.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2016 Intel Corporation + * Copyright(c) 2018-2024 Arm Limited + */ + +#ifndef _COMPARE_SIGNATURE_X86_PVT_H_ +#define _COMPARE_SIGNATURE_X86_PVT_H_ + +#include +#include +#include + +#include "rte_cuckoo_hash.h" + +static inline void +compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, + const struct rte_hash_bucket *prim_bkt, + const struct rte_hash_bucket *sec_bkt, + uint16_t sig, + enum rte_hash_sig_compare_function sig_cmp_fn) +{ + unsigned int i; + + /* For match mask the first bit of every two bits indicates the match */ + switch (sig_cmp_fn) { +#if defined(__SSE2__) + case RTE_HASH_COMPARE_SSE: + /* Compare all signatures in the bucket */ + *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128( + (__m128i const *)prim_bkt->sig_current), _mm_set1_epi16(sig))); + /* Extract the even-index bits only */ + *prim_hash_matches &= 0x5555; + /* Compare all signatures in the bucket */ + *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128( + (__m128i const *)sec_bkt->sig_current), _mm_set1_epi16(sig))); + /* Extract the even-index bits only */ + *sec_hash_matches &= 0x5555; + break; +#endif + default: + for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *prim_hash_matches |= (sig == prim_bkt->sig_current[i]) << (i << 1); + *sec_hash_matches |= (sig == sec_bkt->sig_current[i]) << (i << 1); + } + } +} + +#endif diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index e1d50e7d40..739f7927b8 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -43,6 +43,14 @@ enum rte_hash_sig_compare_function { RTE_HASH_COMPARE_NUM }; +#if defined(__ARM_NEON) +#include "compare_signatures_arm_pvt.h" +#elif defined(__SSE2__) +#include "compare_signatures_x86_pvt.h" +#else +#include "compare_signatures_generic_pvt.h" +#endif + /* Mask of all flags supported by this version */ #define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \ RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \ @@ -1890,63 +1898,6 @@ rte_hash_free_key_with_position(const struct rte_hash *h, } -static inline void -compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, - const struct rte_hash_bucket *prim_bkt, - const struct rte_hash_bucket *sec_bkt, - uint16_t sig, - enum rte_hash_sig_compare_function sig_cmp_fn) -{ - unsigned int i; - - /* For match mask the first bit of every two bits indicates the match */ - switch (sig_cmp_fn) { -#if defined(__SSE2__) - case RTE_HASH_COMPARE_SSE: - /* Compare all signatures in the bucket */ - *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( - _mm_load_si128( - (__m128i const *)prim_bkt->sig_current), - _mm_set1_epi16(sig))); - /* Extract the even-index bits only */ - *prim_hash_matches &= 0x5555; - /* Compare all signatures in the bucket */ - *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( - _mm_load_si128( - (__m128i const *)sec_bkt->sig_current), - _mm_set1_epi16(sig))); - /* Extract the even-index bits only */ - *sec_hash_matches &= 0x5555; - break; -#elif defined(__ARM_NEON) - case RTE_HASH_COMPARE_NEON: { - uint16x8_t vmat, vsig, x; - int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1}; - - vsig = vld1q_dup_u16((uint16_t const *)&sig); - /* Compare all signatures in the primary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)prim_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); - /* Compare all signatures in the secondary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)sec_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); - } - break; -#endif - default: - for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { - *prim_hash_matches |= - ((sig == prim_bkt->sig_current[i]) << (i << 1)); - *sec_hash_matches |= - ((sig == sec_bkt->sig_current[i]) << (i << 1)); - } - } -} - static inline void __bulk_lookup_l(const struct rte_hash *h, const void **keys, const struct rte_hash_bucket **primary_bkt, From patchwork Fri Jul 5 17:45:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142158 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 54FD14559B; Fri, 5 Jul 2024 19:46:00 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 24D1942FCF; Fri, 5 Jul 2024 19:45:48 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 0FE0942FAD for ; Fri, 5 Jul 2024 19:45:33 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9067214BF; Fri, 5 Jul 2024 10:45:57 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 771303F73B; Fri, 5 Jul 2024 10:45:32 -0700 (PDT) From: Yoan Picchi To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi Subject: [PATCH v11 3/7] hash: add a check on hash entry max size Date: Fri, 5 Jul 2024 17:45:22 +0000 Message-Id: <20240705174526.3035295-4-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org If were to change RTE_HASH_BUCKET_ENTRIES to be over 8, it would no longer fit in the vector (8*16b=128b), therefore failing to check some of the signatures. This patch adds a compile time check to fallback to scalar code in this case. Signed-off-by: Yoan Picchi --- lib/hash/compare_signatures_arm_pvt.h | 2 +- lib/hash/compare_signatures_x86_pvt.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h index 80b6afb7a5..74b3286c95 100644 --- a/lib/hash/compare_signatures_arm_pvt.h +++ b/lib/hash/compare_signatures_arm_pvt.h @@ -23,7 +23,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, /* For match mask the first bit of every two bits indicates the match */ switch (sig_cmp_fn) { -#if defined(__ARM_NEON) +#if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8 case RTE_HASH_COMPARE_NEON: { uint16x8_t vmat, vsig, x; int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1}; diff --git a/lib/hash/compare_signatures_x86_pvt.h b/lib/hash/compare_signatures_x86_pvt.h index 11a82aced9..f77b37f1cd 100644 --- a/lib/hash/compare_signatures_x86_pvt.h +++ b/lib/hash/compare_signatures_x86_pvt.h @@ -23,7 +23,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, /* For match mask the first bit of every two bits indicates the match */ switch (sig_cmp_fn) { -#if defined(__SSE2__) +#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <= 8 case RTE_HASH_COMPARE_SSE: /* Compare all signatures in the bucket */ *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128( From patchwork Fri Jul 5 17:45:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142159 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0A5FB4559B; Fri, 5 Jul 2024 19:46:07 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 68A3842FDF; Fri, 5 Jul 2024 19:45:49 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 3F56842F7F for ; Fri, 5 Jul 2024 19:45:33 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BC99B150C; Fri, 5 Jul 2024 10:45:57 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E11D3F73B; Fri, 5 Jul 2024 10:45:32 -0700 (PDT) From: Yoan Picchi To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi , Ruifeng Wang , Nathan Brown Subject: [PATCH v11 4/7] hash: pack the hitmask for hash in bulk lookup Date: Fri, 5 Jul 2024 17:45:23 +0000 Message-Id: <20240705174526.3035295-5-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Current hitmask includes padding due to Intel's SIMD implementation detail. This patch allows non Intel SIMD implementations to benefit from a dense hitmask. In addition, the new dense hitmask interweave the primary and secondary matches which allow a better cache usage and enable future improvements for the SIMD implementations The default non SIMD path now use this dense mask. Signed-off-by: Yoan Picchi Reviewed-by: Ruifeng Wang Reviewed-by: Nathan Brown --- lib/hash/compare_signatures_arm_pvt.h | 47 ++++---- lib/hash/compare_signatures_generic_pvt.h | 31 +++--- lib/hash/compare_signatures_x86_pvt.h | 9 +- lib/hash/rte_cuckoo_hash.c | 124 ++++++++++++++++------ 4 files changed, 145 insertions(+), 66 deletions(-) diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h index 74b3286c95..0fc657c49b 100644 --- a/lib/hash/compare_signatures_arm_pvt.h +++ b/lib/hash/compare_signatures_arm_pvt.h @@ -6,48 +6,57 @@ #ifndef _COMPARE_SIGNATURE_ARM_PVT_H_ #define _COMPARE_SIGNATURE_ARM_PVT_H_ +/* + * Arm's version uses a densely packed hitmask buffer: + * Every bit is in use. + */ + #include #include #include #include "rte_cuckoo_hash.h" +#define DENSE_HASH_BULK_LOOKUP 1 + static inline void -compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, - const struct rte_hash_bucket *prim_bkt, - const struct rte_hash_bucket *sec_bkt, +compare_signatures_dense(uint16_t *hitmask_buffer, + const uint16_t *prim_bucket_sigs, + const uint16_t *sec_bucket_sigs, uint16_t sig, enum rte_hash_sig_compare_function sig_cmp_fn) { - unsigned int i; - /* For match mask the first bit of every two bits indicates the match */ + static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 8), + "hitmask_buffer must be wide enough to fit a dense hitmask"); + + /* For match mask every bits indicates the match */ switch (sig_cmp_fn) { #if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8 case RTE_HASH_COMPARE_NEON: { uint16x8_t vmat, vsig, x; - int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1}; + int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7}; + uint16_t low, high; vsig = vld1q_dup_u16((uint16_t const *)&sig); /* Compare all signatures in the primary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)prim_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); + vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)prim_bucket_sigs)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); + low = (uint16_t)(vaddvq_u16(x)); /* Compare all signatures in the secondary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)sec_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); + vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)sec_bucket_sigs)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); + high = (uint16_t)(vaddvq_u16(x)); + *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES; + } break; #endif default: - for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { - *prim_hash_matches |= - ((sig == prim_bkt->sig_current[i]) << (i << 1)); - *sec_hash_matches |= - ((sig == sec_bkt->sig_current[i]) << (i << 1)); + for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i; + *hitmask_buffer |= + ((sig == sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES; } } } diff --git a/lib/hash/compare_signatures_generic_pvt.h b/lib/hash/compare_signatures_generic_pvt.h index 43587adcef..1d065d4c28 100644 --- a/lib/hash/compare_signatures_generic_pvt.h +++ b/lib/hash/compare_signatures_generic_pvt.h @@ -6,27 +6,34 @@ #ifndef _COMPARE_SIGNATURE_GENERIC_PVT_H_ #define _COMPARE_SIGNATURE_GENERIC_PVT_H_ +/* + * The generic version could use either a dense or sparsely packed hitmask buffer, + * but the dense one is slightly faster. + */ + #include #include #include #include "rte_cuckoo_hash.h" +#define DENSE_HASH_BULK_LOOKUP 1 + static inline void -compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, - const struct rte_hash_bucket *prim_bkt, - const struct rte_hash_bucket *sec_bkt, +compare_signatures_dense(uint16_t *hitmask_buffer, + const uint16_t *prim_bucket_sigs, + const uint16_t *sec_bucket_sigs, uint16_t sig, - enum rte_hash_sig_compare_function sig_cmp_fn) + __rte_unused enum rte_hash_sig_compare_function sig_cmp_fn) { - unsigned int i; - - /* For match mask the first bit of every two bits indicates the match */ - for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { - *prim_hash_matches |= - ((sig == prim_bkt->sig_current[i]) << (i << 1)); - *sec_hash_matches |= - ((sig == sec_bkt->sig_current[i]) << (i << 1)); + + static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 8), + "hitmask_buffer must be wide enough to fit a dense hitmask"); + + /* For match mask every bits indicates the match */ + for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i; + *hitmask_buffer |= ((sig == sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES; } } diff --git a/lib/hash/compare_signatures_x86_pvt.h b/lib/hash/compare_signatures_x86_pvt.h index f77b37f1cd..03e9c44e53 100644 --- a/lib/hash/compare_signatures_x86_pvt.h +++ b/lib/hash/compare_signatures_x86_pvt.h @@ -6,14 +6,21 @@ #ifndef _COMPARE_SIGNATURE_X86_PVT_H_ #define _COMPARE_SIGNATURE_X86_PVT_H_ +/* + * x86's version uses a sparsely packed hitmask buffer: + * Every other bit is padding. + */ + #include #include #include #include "rte_cuckoo_hash.h" +#define DENSE_HASH_BULK_LOOKUP 0 + static inline void -compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, const struct rte_hash_bucket *prim_bkt, const struct rte_hash_bucket *sec_bkt, uint16_t sig, diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index 739f7927b8..187918a05a 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -1908,22 +1908,41 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, uint64_t hits = 0; int32_t i; int32_t ret; - uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; - uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; struct rte_hash_bucket *cur_bkt, *next_bkt; +#if DENSE_HASH_BULK_LOOKUP + const int hitmask_padding = 0; + uint16_t hitmask_buffer[RTE_HASH_LOOKUP_BULK_MAX] = {0}; +#else + const int hitmask_padding = 1; + uint32_t prim_hitmask_buffer[RTE_HASH_LOOKUP_BULK_MAX] = {0}; + uint32_t sec_hitmask_buffer[RTE_HASH_LOOKUP_BULK_MAX] = {0}; +#endif + __hash_rw_reader_lock(h); /* Compare signatures and prefetch key slot of first hit */ for (i = 0; i < num_keys; i++) { - compare_signatures(&prim_hitmask[i], &sec_hitmask[i], +#if DENSE_HASH_BULK_LOOKUP + uint16_t *hitmask = &hitmask_buffer[i]; + compare_signatures_dense(hitmask, + primary_bkt[i]->sig_current, + secondary_bkt[i]->sig_current, + sig[i], h->sig_cmp_fn); + const unsigned int prim_hitmask = *(uint8_t *)(hitmask); + const unsigned int sec_hitmask = *((uint8_t *)(hitmask)+1); +#else + compare_signatures_sparse(&prim_hitmask_buffer[i], &sec_hitmask_buffer[i], primary_bkt[i], secondary_bkt[i], sig[i], h->sig_cmp_fn); + const unsigned int prim_hitmask = prim_hitmask_buffer[i]; + const unsigned int sec_hitmask = sec_hitmask_buffer[i]; +#endif - if (prim_hitmask[i]) { + if (prim_hitmask) { uint32_t first_hit = - rte_ctz32(prim_hitmask[i]) - >> 1; + rte_ctz32(prim_hitmask) + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -1934,10 +1953,10 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, continue; } - if (sec_hitmask[i]) { + if (sec_hitmask) { uint32_t first_hit = - rte_ctz32(sec_hitmask[i]) - >> 1; + rte_ctz32(sec_hitmask) + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -1951,10 +1970,18 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, /* Compare keys, first hits in primary first */ for (i = 0; i < num_keys; i++) { positions[i] = -ENOENT; - while (prim_hitmask[i]) { +#if DENSE_HASH_BULK_LOOKUP + uint16_t *hitmask = &hitmask_buffer[i]; + unsigned int prim_hitmask = *(uint8_t *)(hitmask); + unsigned int sec_hitmask = *((uint8_t *)(hitmask)+1); +#else + unsigned int prim_hitmask = prim_hitmask_buffer[i]; + unsigned int sec_hitmask = sec_hitmask_buffer[i]; +#endif + while (prim_hitmask) { uint32_t hit_index = - rte_ctz32(prim_hitmask[i]) - >> 1; + rte_ctz32(prim_hitmask) + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[hit_index]; const struct rte_hash_key *key_slot = @@ -1976,13 +2003,13 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - prim_hitmask[i] &= ~(3ULL << (hit_index << 1)); + prim_hitmask &= ~(1 << (hit_index << hitmask_padding)); } - while (sec_hitmask[i]) { + while (sec_hitmask) { uint32_t hit_index = - rte_ctz32(sec_hitmask[i]) - >> 1; + rte_ctz32(sec_hitmask) + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index]; const struct rte_hash_key *key_slot = @@ -2005,7 +2032,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - sec_hitmask[i] &= ~(3ULL << (hit_index << 1)); + sec_hitmask &= ~(1 << (hit_index << hitmask_padding)); } next_key: continue; @@ -2055,11 +2082,20 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, uint64_t hits = 0; int32_t i; int32_t ret; - uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; - uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; struct rte_hash_bucket *cur_bkt, *next_bkt; uint32_t cnt_b, cnt_a; +#if DENSE_HASH_BULK_LOOKUP + const int hitmask_padding = 0; + uint16_t hitmask_buffer[RTE_HASH_LOOKUP_BULK_MAX] = {0}; + static_assert(sizeof(*hitmask_buffer)*8/2 == RTE_HASH_BUCKET_ENTRIES, + "The hitmask must be exactly wide enough to accept the whole hitmask chen it is dense"); +#else + const int hitmask_padding = 1; + uint32_t prim_hitmask_buffer[RTE_HASH_LOOKUP_BULK_MAX] = {0}; + uint32_t sec_hitmask_buffer[RTE_HASH_LOOKUP_BULK_MAX] = {0}; +#endif + for (i = 0; i < num_keys; i++) positions[i] = -ENOENT; @@ -2073,14 +2109,26 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, /* Compare signatures and prefetch key slot of first hit */ for (i = 0; i < num_keys; i++) { - compare_signatures(&prim_hitmask[i], &sec_hitmask[i], +#if DENSE_HASH_BULK_LOOKUP + uint16_t *hitmask = &hitmask_buffer[i]; + compare_signatures_dense(hitmask, + primary_bkt[i]->sig_current, + secondary_bkt[i]->sig_current, + sig[i], h->sig_cmp_fn); + const unsigned int prim_hitmask = *(uint8_t *)(hitmask); + const unsigned int sec_hitmask = *((uint8_t *)(hitmask)+1); +#else + compare_signatures_sparse(&prim_hitmask_buffer[i], &sec_hitmask_buffer[i], primary_bkt[i], secondary_bkt[i], sig[i], h->sig_cmp_fn); + const unsigned int prim_hitmask = prim_hitmask_buffer[i]; + const unsigned int sec_hitmask = sec_hitmask_buffer[i]; +#endif - if (prim_hitmask[i]) { + if (prim_hitmask) { uint32_t first_hit = - rte_ctz32(prim_hitmask[i]) - >> 1; + rte_ctz32(prim_hitmask) + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -2091,10 +2139,10 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, continue; } - if (sec_hitmask[i]) { + if (sec_hitmask) { uint32_t first_hit = - rte_ctz32(sec_hitmask[i]) - >> 1; + rte_ctz32(sec_hitmask) + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -2107,10 +2155,18 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, /* Compare keys, first hits in primary first */ for (i = 0; i < num_keys; i++) { - while (prim_hitmask[i]) { +#if DENSE_HASH_BULK_LOOKUP + uint16_t *hitmask = &hitmask_buffer[i]; + unsigned int prim_hitmask = *(uint8_t *)(hitmask); + unsigned int sec_hitmask = *((uint8_t *)(hitmask)+1); +#else + unsigned int prim_hitmask = prim_hitmask_buffer[i]; + unsigned int sec_hitmask = sec_hitmask_buffer[i]; +#endif + while (prim_hitmask) { uint32_t hit_index = - rte_ctz32(prim_hitmask[i]) - >> 1; + rte_ctz32(prim_hitmask) + >> hitmask_padding; uint32_t key_idx = rte_atomic_load_explicit( &primary_bkt[i]->key_idx[hit_index], @@ -2136,13 +2192,13 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - prim_hitmask[i] &= ~(3ULL << (hit_index << 1)); + prim_hitmask &= ~(1 << (hit_index << hitmask_padding)); } - while (sec_hitmask[i]) { + while (sec_hitmask) { uint32_t hit_index = - rte_ctz32(sec_hitmask[i]) - >> 1; + rte_ctz32(sec_hitmask) + >> hitmask_padding; uint32_t key_idx = rte_atomic_load_explicit( &secondary_bkt[i]->key_idx[hit_index], @@ -2169,7 +2225,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - sec_hitmask[i] &= ~(3ULL << (hit_index << 1)); + sec_hitmask &= ~(1 << (hit_index << hitmask_padding)); } next_key: continue; From patchwork Fri Jul 5 17:45:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142160 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BC5B24559B; Fri, 5 Jul 2024 19:46:12 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A718C42FED; Fri, 5 Jul 2024 19:45:50 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 66FA942F8F for ; Fri, 5 Jul 2024 19:45:33 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E17291576; Fri, 5 Jul 2024 10:45:57 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C5B403F73B; Fri, 5 Jul 2024 10:45:32 -0700 (PDT) From: Yoan Picchi To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi , Ruifeng Wang , Nathan Brown Subject: [PATCH v11 5/7] hash: optimize compare signature for NEON Date: Fri, 5 Jul 2024 17:45:24 +0000 Message-Id: <20240705174526.3035295-6-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Upon a successful comparison, NEON sets all the bits in the lane to 1 We can skip shifting by simply masking with specific masks. Signed-off-by: Yoan Picchi Reviewed-by: Ruifeng Wang Reviewed-by: Nathan Brown --- lib/hash/compare_signatures_arm_pvt.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h index 0fc657c49b..0245fec26f 100644 --- a/lib/hash/compare_signatures_arm_pvt.h +++ b/lib/hash/compare_signatures_arm_pvt.h @@ -34,21 +34,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer, switch (sig_cmp_fn) { #if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8 case RTE_HASH_COMPARE_NEON: { - uint16x8_t vmat, vsig, x; - int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7}; - uint16_t low, high; + uint16x8_t vmat, hit1, hit2; + const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80}; + const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig); - vsig = vld1q_dup_u16((uint16_t const *)&sig); /* Compare all signatures in the primary bucket */ - vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)prim_bucket_sigs)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); - low = (uint16_t)(vaddvq_u16(x)); + vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs)); + hit1 = vandq_u16(vmat, mask); + /* Compare all signatures in the secondary bucket */ - vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)sec_bucket_sigs)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); - high = (uint16_t)(vaddvq_u16(x)); - *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES; + vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs)); + hit2 = vandq_u16(vmat, mask); + hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES); + hit2 = vorrq_u16(hit1, hit2); + *hitmask_buffer = vaddvq_u16(hit2); } break; #endif From patchwork Fri Jul 5 17:45:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142161 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A68374559B; Fri, 5 Jul 2024 19:46:18 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 094F243007; Fri, 5 Jul 2024 19:45:52 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 92C5E42FA9 for ; Fri, 5 Jul 2024 19:45:33 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 20B861595; Fri, 5 Jul 2024 10:45:58 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EC30D3F73B; Fri, 5 Jul 2024 10:45:32 -0700 (PDT) From: Yoan Picchi To: Thomas Monjalon , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi , Harjot Singh , Ruifeng Wang , Nathan Brown Subject: [PATCH v11 6/7] test/hash: check bulk lookup of keys after collision Date: Fri, 5 Jul 2024 17:45:25 +0000 Message-Id: <20240705174526.3035295-7-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org This patch adds unit test for rte_hash_lookup_bulk(). It also update the test_full_bucket test to the current number of entries in a hash bucket. Signed-off-by: Yoan Picchi Signed-off-by: Harjot Singh Reviewed-by: Ruifeng Wang Reviewed-by: Nathan Brown --- .mailmap | 1 + app/test/test_hash.c | 99 ++++++++++++++++++++++++++++++++++---------- 2 files changed, 77 insertions(+), 23 deletions(-) diff --git a/.mailmap b/.mailmap index ec525981fe..41a8a99a7c 100644 --- a/.mailmap +++ b/.mailmap @@ -505,6 +505,7 @@ Hari Kumar Vemula Harini Ramakrishnan Hariprasad Govindharajan Harish Patil +Harjot Singh Harman Kalra Harneet Singh Harold Huang diff --git a/app/test/test_hash.c b/app/test/test_hash.c index 24d3b547ad..ab3b37de3f 100644 --- a/app/test/test_hash.c +++ b/app/test/test_hash.c @@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys, __rte_unused uint32_t key_len, __rte_unused uint32_t init_val) { - return 3; + return 3 | (3 << 16); } RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO); @@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct flow_key *key, rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos); } +#define KEY_PER_BUCKET 8 + /* Keys used by unit test functions */ -static struct flow_key keys[5] = { { +static struct flow_key keys[KEY_PER_BUCKET+1] = { { .ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00), .ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04), .port_src = 0x0908, @@ -146,6 +148,30 @@ static struct flow_key keys[5] = { { .port_src = 0x4948, .port_dst = 0x4b4a, .proto = 0x4c, +}, { + .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50), + .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54), + .port_src = 0x5958, + .port_dst = 0x5b5a, + .proto = 0x5c, +}, { + .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60), + .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64), + .port_src = 0x6968, + .port_dst = 0x6b6a, + .proto = 0x6c, +}, { + .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70), + .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74), + .port_src = 0x7978, + .port_dst = 0x7b7a, + .proto = 0x7c, +}, { + .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80), + .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84), + .port_src = 0x8988, + .port_dst = 0x8b8a, + .proto = 0x8c, } }; /* Parameters used for hash table in unit test functions. Name set later. */ @@ -783,13 +809,15 @@ static int test_five_keys(void) /* * Add keys to the same bucket until bucket full. - * - add 5 keys to the same bucket (hash created with 4 keys per bucket): - * first 4 successful, 5th successful, pushing existing item in bucket - * - lookup the 5 keys: 5 hits - * - add the 5 keys again: 5 OK - * - lookup the 5 keys: 5 hits (updated data) - * - delete the 5 keys: 5 OK - * - lookup the 5 keys: 5 misses + * - add 9 keys to the same bucket (hash created with 8 keys per bucket): + * first 8 successful, 9th successful, pushing existing item in bucket + * - lookup the 9 keys: 9 hits + * - bulk lookup for all the 9 keys: 9 hits + * - add the 9 keys again: 9 OK + * - lookup the 9 keys: 9 hits (updated data) + * - delete the 9 keys: 9 OK + * - lookup the 9 keys: 9 misses + * - bulk lookup for all the 9 keys: 9 misses */ static int test_full_bucket(void) { @@ -801,16 +829,17 @@ static int test_full_bucket(void) .hash_func_init_val = 0, .socket_id = 0, }; + const void *key_array[KEY_PER_BUCKET+1] = {0}; struct rte_hash *handle; - int pos[5]; - int expected_pos[5]; + int pos[KEY_PER_BUCKET+1]; + int expected_pos[KEY_PER_BUCKET+1]; unsigned i; - + int ret; handle = rte_hash_create(¶ms_pseudo_hash); RETURN_IF_ERROR(handle == NULL, "hash creation failed"); /* Fill bucket */ - for (i = 0; i < 4; i++) { + for (i = 0; i < KEY_PER_BUCKET; i++) { pos[i] = rte_hash_add_key(handle, &keys[i]); print_key_info("Add", &keys[i], pos[i]); RETURN_IF_ERROR(pos[i] < 0, @@ -821,22 +850,36 @@ static int test_full_bucket(void) * This should work and will push one of the items * in the bucket because it is full */ - pos[4] = rte_hash_add_key(handle, &keys[4]); - print_key_info("Add", &keys[4], pos[4]); - RETURN_IF_ERROR(pos[4] < 0, - "failed to add key (pos[4]=%d)", pos[4]); - expected_pos[4] = pos[4]; + pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]); + print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]); + RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0, + "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, pos[KEY_PER_BUCKET]); + expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET]; /* Lookup */ - for (i = 0; i < 5; i++) { + for (i = 0; i < KEY_PER_BUCKET+1; i++) { pos[i] = rte_hash_lookup(handle, &keys[i]); print_key_info("Lkp", &keys[i], pos[i]); RETURN_IF_ERROR(pos[i] != expected_pos[i], "failed to find key (pos[%u]=%d)", i, pos[i]); } + for (i = 0; i < KEY_PER_BUCKET+1; i++) + key_array[i] = &keys[i]; + + /*Bulk lookup after add with same hash*/ + ret = rte_hash_lookup_bulk(handle, key_array, KEY_PER_BUCKET+1, (int32_t *)pos); + RETURN_IF_ERROR(ret, "rte_hash_lookup_bulk returned an error: %d\n", ret); + for (i = 0; i < KEY_PER_BUCKET+1; i++) { + print_key_info("Blk_Lkp", key_array[i], pos[i]); + RETURN_IF_ERROR(pos[i] != expected_pos[i], + "failed to find key (pos[%u]=%d)", i, pos[i]); + } + + + /* Add - update */ - for (i = 0; i < 5; i++) { + for (i = 0; i < KEY_PER_BUCKET+1; i++) { pos[i] = rte_hash_add_key(handle, &keys[i]); print_key_info("Add", &keys[i], pos[i]); RETURN_IF_ERROR(pos[i] != expected_pos[i], @@ -844,7 +887,7 @@ static int test_full_bucket(void) } /* Lookup */ - for (i = 0; i < 5; i++) { + for (i = 0; i < KEY_PER_BUCKET+1; i++) { pos[i] = rte_hash_lookup(handle, &keys[i]); print_key_info("Lkp", &keys[i], pos[i]); RETURN_IF_ERROR(pos[i] != expected_pos[i], @@ -869,7 +912,7 @@ static int test_full_bucket(void) RETURN_IF_ERROR(pos[1] < 0, "failed to add key (pos[1]=%d)", pos[1]); /* Delete */ - for (i = 0; i < 5; i++) { + for (i = 0; i < KEY_PER_BUCKET+1; i++) { pos[i] = rte_hash_del_key(handle, &keys[i]); print_key_info("Del", &keys[i], pos[i]); RETURN_IF_ERROR(pos[i] != expected_pos[i], @@ -877,13 +920,23 @@ static int test_full_bucket(void) } /* Lookup */ - for (i = 0; i < 5; i++) { + for (i = 0; i < KEY_PER_BUCKET+1; i++) { pos[i] = rte_hash_lookup(handle, &keys[i]); print_key_info("Lkp", &keys[i], pos[i]); RETURN_IF_ERROR(pos[i] != -ENOENT, "fail: found non-existent key (pos[%u]=%d)", i, pos[i]); } + /* Bulk Lookup on empty table*/ + ret = rte_hash_lookup_bulk(handle, &key_array[0], KEY_PER_BUCKET+1, (int32_t *)pos); + RETURN_IF_ERROR(ret, "rte_hash_lookup_bulk returned an error: %d\n", ret); + for (i = 0; i < KEY_PER_BUCKET+1; i++) { + print_key_info("Blk_Lkp", key_array[i], pos[i]); + RETURN_IF_ERROR(pos[i] != -ENOENT, + "failed to find key (pos[%u]=%d)", i, pos[i]); + } + + rte_hash_free(handle); /* Cover the NULL case. */ From patchwork Fri Jul 5 17:45:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yoan Picchi X-Patchwork-Id: 142162 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1B8574559B; Fri, 5 Jul 2024 19:46:24 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6187B43036; Fri, 5 Jul 2024 19:45:53 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id BCADA42F7F for ; Fri, 5 Jul 2024 19:45:33 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 487EF1596; Fri, 5 Jul 2024 10:45:58 -0700 (PDT) Received: from ampere-altra-2-3.austin.arm.com (ampere-altra-2-3.austin.arm.com [10.118.14.97]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2B59A3F73B; Fri, 5 Jul 2024 10:45:33 -0700 (PDT) From: Yoan Picchi To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi , Harjot Singh , Nathan Brown , Ruifeng Wang Subject: [PATCH v11 7/7] hash: add SVE support for bulk key lookup Date: Fri, 5 Jul 2024 17:45:26 +0000 Message-Id: <20240705174526.3035295-8-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240705174526.3035295-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240705174526.3035295-1-yoan.picchi@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org - Implemented SVE code for comparing signatures in bulk lookup. - New SVE code is ~5% slower than optimized NEON for N2 processor for 128b vectors. Signed-off-by: Yoan Picchi Signed-off-by: Harjot Singh Reviewed-by: Nathan Brown Reviewed-by: Ruifeng Wang --- lib/hash/compare_signatures_arm_pvt.h | 57 +++++++++++++++++++++++++++ lib/hash/rte_cuckoo_hash.c | 8 +++- 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h index 0245fec26f..86843b8a8a 100644 --- a/lib/hash/compare_signatures_arm_pvt.h +++ b/lib/hash/compare_signatures_arm_pvt.h @@ -51,6 +51,63 @@ compare_signatures_dense(uint16_t *hitmask_buffer, *hitmask_buffer = vaddvq_u16(hit2); } break; +#endif +#if defined(RTE_HAS_SVE_ACLE) + case RTE_HASH_COMPARE_SVE: { + svuint16_t vsign, shift, sv_matches; + svbool_t pred, match, bucket_wide_pred; + int i = 0; + uint64_t vl = svcnth(); + + vsign = svdup_u16(sig); + shift = svindex_u16(0, 1); + + if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && RTE_HASH_BUCKET_ENTRIES <= 8) { + svuint16_t primary_array_vect, secondary_array_vect; + bucket_wide_pred = svwhilelt_b16(0, RTE_HASH_BUCKET_ENTRIES); + primary_array_vect = svld1_u16(bucket_wide_pred, prim_bucket_sigs); + secondary_array_vect = svld1_u16(bucket_wide_pred, sec_bucket_sigs); + + /* We merged the two vectors so we can do both comparisons at once */ + primary_array_vect = svsplice_u16(bucket_wide_pred, primary_array_vect, + secondary_array_vect); + pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES); + + /* Compare all signatures in the buckets */ + match = svcmpeq_u16(pred, vsign, primary_array_vect); + if (svptest_any(svptrue_b16(), match)) { + sv_matches = svdup_u16(1); + sv_matches = svlsl_u16_z(match, sv_matches, shift); + *hitmask_buffer = svorv_u16(svptrue_b16(), sv_matches); + } + } else { + do { + pred = svwhilelt_b16(i, RTE_HASH_BUCKET_ENTRIES); + uint16_t lower_half = 0; + uint16_t upper_half = 0; + /* Compare all signatures in the primary bucket */ + match = svcmpeq_u16(pred, vsign, svld1_u16(pred, + &prim_bucket_sigs[i])); + if (svptest_any(svptrue_b16(), match)) { + sv_matches = svdup_u16(1); + sv_matches = svlsl_u16_z(match, sv_matches, shift); + lower_half = svorv_u16(svptrue_b16(), sv_matches); + } + /* Compare all signatures in the secondary bucket */ + match = svcmpeq_u16(pred, vsign, svld1_u16(pred, + &sec_bucket_sigs[i])); + if (svptest_any(svptrue_b16(), match)) { + sv_matches = svdup_u16(1); + sv_matches = svlsl_u16_z(match, sv_matches, shift); + upper_half = svorv_u16(svptrue_b16(), sv_matches) + << RTE_HASH_BUCKET_ENTRIES; + } + hitmask_buffer[i / 8] = upper_half | lower_half; + i += vl; + } while (i < RTE_HASH_BUCKET_ENTRIES); + } + } + break; #endif default: for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index 187918a05a..c30ea13000 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -40,6 +40,7 @@ enum rte_hash_sig_compare_function { RTE_HASH_COMPARE_SCALAR = 0, RTE_HASH_COMPARE_SSE, RTE_HASH_COMPARE_NEON, + RTE_HASH_COMPARE_SVE, RTE_HASH_COMPARE_NUM }; @@ -461,8 +462,13 @@ rte_hash_create(const struct rte_hash_parameters *params) h->sig_cmp_fn = RTE_HASH_COMPARE_SSE; else #elif defined(RTE_ARCH_ARM64) - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { h->sig_cmp_fn = RTE_HASH_COMPARE_NEON; +#if defined(RTE_HAS_SVE_ACLE) + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE)) + h->sig_cmp_fn = RTE_HASH_COMPARE_SVE; +#endif + } else #endif h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;