diff mbox series

[1/2] hash: add hash bulk lookup with hash signatures array

Message ID 1583757860-375294-1-git-send-email-vladimir.medvedkin@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers show
Series [1/2] hash: add hash bulk lookup with hash signatures array | expand

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/checkpatch success coding style OK

Commit Message

Medvedkin, Vladimir March 9, 2020, 12:44 p.m. UTC
Implement rte_hash_lookup_with_hash_bulk_data() - lookup
function with precomputed hash signatures.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c    | 402 +++++++++++++++++++++++++++++++++++
 lib/librte_hash/rte_hash.h           |  27 +++
 lib/librte_hash/rte_hash_version.map |   1 +
 3 files changed, 430 insertions(+)

Comments

Wang, Yipeng1 March 17, 2020, 5:27 p.m. UTC | #1
> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Monday, March 9, 2020 5:44 AM
> To: dev@dpdk.org
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh
> <sameh.gobriel@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: [PATCH 1/2] hash: add hash bulk lookup with hash signatures array
> 
> Implement rte_hash_lookup_with_hash_bulk_data() - lookup function with
> precomputed hash signatures.
> 
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -528,6 +528,33 @@ rte_hash_lookup_bulk_data(const struct rte_hash
> *h, const void **keys,
>   *   Hash table to look in.
>   * @param keys
>   *   A pointer to a list of keys to look for.
> + * @param sig
> + *   A pointer to a list of precomputed hash values for keys.
> + * @param num_keys
> + *   How many keys are in the keys list (less than
> RTE_HASH_LOOKUP_BULK_MAX).
> + * @param hit_mask
> + *   Output containing a bitmask with all successful lookups.
> + * @param data
> + *   Output containing array of data returned from all the successful lookups.
> + * @return
> + *   -EINVAL if there's an error, otherwise number of successful lookups.
> + */
> +__rte_experimental
> +int
> +rte_hash_lookup_with_hash_bulk_data(const struct rte_hash *h,
> +		const void **keys, hash_sig_t *prim_hash,
[Wang, Yipeng] hash_sig_t *sig
> +		uint32_t num_keys, uint64_t *hit_mask, void *data[]);
> +
> +/**
> + * Find multiple keys in the hash table.
[Wang, Yipeng] ...with precomputed hash value array.
> + * This operation is multi-thread safe with regarding to other lookup threads.
> + * Read-write concurrency can be enabled by setting flag during
> + * table creation.
> + *

[Wang, Yipeng]
Hi, Vladimir, thanks for the patch!

Besides the minor comments above, my major concern is the code duplication here.
It is after all hundred's lines of code that in future if we want to extend features, we need
to modify both function blocks.

Have you tried to just do if-else and reuse the current code block, and measure the performance?
If that causes performance difference, then we may justify adding the duplication, but we still
Need to optimize the software pipelining accordingly.
Medvedkin, Vladimir March 26, 2020, 2:16 p.m. UTC | #2
Hi Yipeng,

On 17/03/2020 17:27, Wang, Yipeng1 wrote:
>> -----Original Message-----
>> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
>> Sent: Monday, March 9, 2020 5:44 AM
>> To: dev@dpdk.org
>> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh
>> <sameh.gobriel@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>
>> Subject: [PATCH 1/2] hash: add hash bulk lookup with hash signatures array
>>
>> Implement rte_hash_lookup_with_hash_bulk_data() - lookup function with
>> precomputed hash signatures.
>>
>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>> ---
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -528,6 +528,33 @@ rte_hash_lookup_bulk_data(const struct rte_hash
>> *h, const void **keys,
>>    *   Hash table to look in.
>>    * @param keys
>>    *   A pointer to a list of keys to look for.
>> + * @param sig
>> + *   A pointer to a list of precomputed hash values for keys.
>> + * @param num_keys
>> + *   How many keys are in the keys list (less than
>> RTE_HASH_LOOKUP_BULK_MAX).
>> + * @param hit_mask
>> + *   Output containing a bitmask with all successful lookups.
>> + * @param data
>> + *   Output containing array of data returned from all the successful lookups.
>> + * @return
>> + *   -EINVAL if there's an error, otherwise number of successful lookups.
>> + */
>> +__rte_experimental
>> +int
>> +rte_hash_lookup_with_hash_bulk_data(const struct rte_hash *h,
>> +		const void **keys, hash_sig_t *prim_hash,
> [Wang, Yipeng] hash_sig_t *sig
>> +		uint32_t num_keys, uint64_t *hit_mask, void *data[]);
>> +
>> +/**
>> + * Find multiple keys in the hash table.
> [Wang, Yipeng] ...with precomputed hash value array.
>> + * This operation is multi-thread safe with regarding to other lookup threads.
>> + * Read-write concurrency can be enabled by setting flag during
>> + * table creation.
>> + *
> [Wang, Yipeng]
> Hi, Vladimir, thanks for the patch!
>
> Besides the minor comments above, my major concern is the code duplication here.
> It is after all hundred's lines of code that in future if we want to extend features, we need
> to modify both function blocks.
>
> Have you tried to just do if-else and reuse the current code block, and measure the performance?
> If that causes performance difference, then we may justify adding the duplication, but we still
> Need to optimize the software pipelining accordingly.


Agree, I'll get rid of code duplication in v2.


>
diff mbox series

Patch

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 6af8ca4..9a054eb 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -2165,6 +2165,408 @@  rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
 	return __builtin_popcountl(*hit_mask);
 }
 
+static inline void
+__rte_hash_lookup_with_hash_bulk_l(const struct rte_hash *h,
+			const void **keys, hash_sig_t *prim_hash,
+			int32_t num_keys, int32_t *positions,
+			uint64_t *hit_mask, void *data[])
+{
+	uint64_t hits = 0;
+	int32_t i;
+	int32_t ret;
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
+	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
+	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
+
+	/*
+	 * Prefetch keys, calculate primary and
+	 * secondary bucket and prefetch them
+	 */
+	for (i = 0; i < num_keys; i++) {
+		rte_prefetch0(keys[i]);
+
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
+
+		rte_prefetch0(primary_bkt[i]);
+		rte_prefetch0(secondary_bkt[i]);
+	}
+
+	__hash_rw_reader_lock(h);
+
+	/* Compare signatures and prefetch key slot of first hit */
+	for (i = 0; i < num_keys; i++) {
+		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
+			primary_bkt[i], secondary_bkt[i],
+			sig[i], h->sig_cmp_fn);
+
+		if (prim_hitmask[i]) {
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i])
+					>> 1;
+			uint32_t key_idx =
+				primary_bkt[i]->key_idx[first_hit];
+			const struct rte_hash_key *key_slot =
+				(const struct rte_hash_key *)(
+				(const char *)h->key_store +
+				key_idx * h->key_entry_size);
+			rte_prefetch0(key_slot);
+			continue;
+		}
+
+		if (sec_hitmask[i]) {
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i])
+					>> 1;
+			uint32_t key_idx =
+				secondary_bkt[i]->key_idx[first_hit];
+			const struct rte_hash_key *key_slot =
+				(const struct rte_hash_key *)(
+				(const char *)h->key_store +
+				key_idx * h->key_entry_size);
+			rte_prefetch0(key_slot);
+		}
+	}
+
+	/* Compare keys, first hits in primary first */
+	for (i = 0; i < num_keys; i++) {
+		positions[i] = -ENOENT;
+		while (prim_hitmask[i]) {
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i])
+					>> 1;
+			uint32_t key_idx =
+				primary_bkt[i]->key_idx[hit_index];
+			const struct rte_hash_key *key_slot =
+				(const struct rte_hash_key *)(
+				(const char *)h->key_store +
+				key_idx * h->key_entry_size);
+
+			/*
+			 * If key index is 0, do not compare key,
+			 * as it is checking the dummy slot
+			 */
+			if (!!key_idx &
+				!rte_hash_cmp_eq(
+					key_slot->key, keys[i], h)) {
+				if (data != NULL)
+					data[i] = key_slot->pdata;
+
+				hits |= 1ULL << i;
+				positions[i] = key_idx - 1;
+				goto next_key;
+			}
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
+		}
+
+		while (sec_hitmask[i]) {
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i])
+					>> 1;
+			uint32_t key_idx =
+				secondary_bkt[i]->key_idx[hit_index];
+			const struct rte_hash_key *key_slot =
+				(const struct rte_hash_key *)(
+				(const char *)h->key_store +
+				key_idx * h->key_entry_size);
+
+			/*
+			 * If key index is 0, do not compare key,
+			 * as it is checking the dummy slot
+			 */
+
+			if (!!key_idx &
+				!rte_hash_cmp_eq(
+					key_slot->key, keys[i], h)) {
+				if (data != NULL)
+					data[i] = key_slot->pdata;
+
+				hits |= 1ULL << i;
+				positions[i] = key_idx - 1;
+				goto next_key;
+			}
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
+		}
+next_key:
+		continue;
+	}
+
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket_l(h, keys[i],
+						sig[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket_l(h, keys[i],
+						sig[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
+	__hash_rw_reader_unlock(h);
+
+	if (hit_mask != NULL)
+		*hit_mask = hits;
+}
+
+static inline void
+__rte_hash_lookup_with_hash_bulk_lf(const struct rte_hash *h,
+			const void **keys, hash_sig_t *prim_hash,
+			int32_t num_keys, int32_t *positions,
+			uint64_t *hit_mask, void *data[])
+{
+	uint64_t hits = 0;
+	int32_t i;
+	int32_t ret;
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
+	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
+	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
+	uint32_t cnt_b, cnt_a;
+
+	/*
+	 * Prefetch keys, calculate primary and
+	 * secondary bucket and prefetch them
+	 */
+	for (i = 0; i < num_keys; i++) {
+		rte_prefetch0(keys[i]);
+
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
+
+		rte_prefetch0(primary_bkt[i]);
+		rte_prefetch0(secondary_bkt[i]);
+	}
+
+	for (i = 0; i < num_keys; i++)
+		positions[i] = -ENOENT;
+
+	do {
+		/* Load the table change counter before the lookup
+		 * starts. Acquire semantics will make sure that
+		 * loads in compare_signatures are not hoisted.
+		 */
+		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
+					__ATOMIC_ACQUIRE);
+
+		/* Compare signatures and prefetch key slot of first hit */
+		for (i = 0; i < num_keys; i++) {
+			compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
+				primary_bkt[i], secondary_bkt[i],
+				sig[i], h->sig_cmp_fn);
+
+			if (prim_hitmask[i]) {
+				uint32_t first_hit =
+						__builtin_ctzl(prim_hitmask[i])
+						>> 1;
+				uint32_t key_idx =
+					primary_bkt[i]->key_idx[first_hit];
+				const struct rte_hash_key *key_slot =
+					(const struct rte_hash_key *)(
+					(const char *)h->key_store +
+					key_idx * h->key_entry_size);
+				rte_prefetch0(key_slot);
+				continue;
+			}
+
+			if (sec_hitmask[i]) {
+				uint32_t first_hit =
+						__builtin_ctzl(sec_hitmask[i])
+						>> 1;
+				uint32_t key_idx =
+					secondary_bkt[i]->key_idx[first_hit];
+				const struct rte_hash_key *key_slot =
+					(const struct rte_hash_key *)(
+					(const char *)h->key_store +
+					key_idx * h->key_entry_size);
+				rte_prefetch0(key_slot);
+			}
+		}
+
+		/* Compare keys, first hits in primary first */
+		for (i = 0; i < num_keys; i++) {
+			while (prim_hitmask[i]) {
+				uint32_t hit_index =
+						__builtin_ctzl(prim_hitmask[i])
+						>> 1;
+				uint32_t key_idx =
+				__atomic_load_n(
+					&primary_bkt[i]->key_idx[hit_index],
+					__ATOMIC_ACQUIRE);
+				const struct rte_hash_key *key_slot =
+					(const struct rte_hash_key *)(
+					(const char *)h->key_store +
+					key_idx * h->key_entry_size);
+
+				/*
+				 * If key index is 0, do not compare key,
+				 * as it is checking the dummy slot
+				 */
+				if (!!key_idx &
+					!rte_hash_cmp_eq(
+						key_slot->key, keys[i], h)) {
+					if (data != NULL)
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
+
+					hits |= 1ULL << i;
+					positions[i] = key_idx - 1;
+					goto next_key;
+				}
+				prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
+			}
+
+			while (sec_hitmask[i]) {
+				uint32_t hit_index =
+						__builtin_ctzl(sec_hitmask[i])
+						>> 1;
+				uint32_t key_idx =
+				__atomic_load_n(
+					&secondary_bkt[i]->key_idx[hit_index],
+					__ATOMIC_ACQUIRE);
+				const struct rte_hash_key *key_slot =
+					(const struct rte_hash_key *)(
+					(const char *)h->key_store +
+					key_idx * h->key_entry_size);
+
+				/*
+				 * If key index is 0, do not compare key,
+				 * as it is checking the dummy slot
+				 */
+
+				if (!!key_idx &
+					!rte_hash_cmp_eq(
+						key_slot->key, keys[i], h)) {
+					if (data != NULL)
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
+
+					hits |= 1ULL << i;
+					positions[i] = key_idx - 1;
+					goto next_key;
+				}
+				sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
+			}
+next_key:
+			continue;
+		}
+
+		/* all found, do not need to go through ext bkt */
+		if (hits == ((1ULL << num_keys) - 1)) {
+			if (hit_mask != NULL)
+				*hit_mask = hits;
+			return;
+		}
+		/* need to check ext buckets for match */
+		if (h->ext_table_support) {
+			for (i = 0; i < num_keys; i++) {
+				if ((hits & (1ULL << i)) != 0)
+					continue;
+				next_bkt = secondary_bkt[i]->next;
+				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+					if (data != NULL)
+						ret = search_one_bucket_lf(h,
+							keys[i], sig[i],
+							&data[i], cur_bkt);
+					else
+						ret = search_one_bucket_lf(h,
+								keys[i], sig[i],
+								NULL, cur_bkt);
+					if (ret != -1) {
+						positions[i] = ret;
+						hits |= 1ULL << i;
+						break;
+					}
+				}
+			}
+		}
+		/* The loads of sig_current in compare_signatures
+		 * should not move below the load from tbl_chng_cnt.
+		 */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* Re-read the table change counter to check if the
+		 * table has changed during search. If yes, re-do
+		 * the search.
+		 * This load should not get hoisted. The load
+		 * acquires on cnt_b, primary key index and secondary
+		 * key index will make sure that it does not get
+		 * hoisted.
+		 */
+		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
+					__ATOMIC_ACQUIRE);
+	} while (cnt_b != cnt_a);
+
+	if (hit_mask != NULL)
+		*hit_mask = hits;
+}
+
+static inline void
+__rte_hash_lookup_with_hash_bulk(const struct rte_hash *h, const void **keys,
+			hash_sig_t *prim_hash, int32_t num_keys,
+			int32_t *positions, uint64_t *hit_mask, void *data[])
+{
+	if (h->readwrite_concur_lf_support)
+		__rte_hash_lookup_with_hash_bulk_lf(h, keys, prim_hash,
+				num_keys, positions, hit_mask, data);
+	else
+		__rte_hash_lookup_with_hash_bulk_l(h, keys, prim_hash,
+				num_keys, positions, hit_mask, data);
+}
+
+int
+rte_hash_lookup_with_hash_bulk_data(const struct rte_hash *h,
+		const void **keys, hash_sig_t *prim_hash,
+		uint32_t num_keys, uint64_t *hit_mask, void *data[])
+{
+	RETURN_IF_TRUE(((h == NULL) || (keys == NULL) ||
+			(prim_hash == NULL) || (num_keys == 0) ||
+			(num_keys > RTE_HASH_LOOKUP_BULK_MAX) ||
+			(hit_mask == NULL)), -EINVAL);
+
+	int32_t positions[num_keys];
+
+	__rte_hash_lookup_with_hash_bulk(h, keys, prim_hash, num_keys,
+			positions, hit_mask, data);
+
+	/* Return number of hits */
+	return __builtin_popcountl(*hit_mask);
+}
+
 int32_t
 rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
 {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index ed0673b..9d3f0fa 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -528,6 +528,33 @@  rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
  *   Hash table to look in.
  * @param keys
  *   A pointer to a list of keys to look for.
+ * @param sig
+ *   A pointer to a list of precomputed hash values for keys.
+ * @param num_keys
+ *   How many keys are in the keys list (less than RTE_HASH_LOOKUP_BULK_MAX).
+ * @param hit_mask
+ *   Output containing a bitmask with all successful lookups.
+ * @param data
+ *   Output containing array of data returned from all the successful lookups.
+ * @return
+ *   -EINVAL if there's an error, otherwise number of successful lookups.
+ */
+__rte_experimental
+int
+rte_hash_lookup_with_hash_bulk_data(const struct rte_hash *h,
+		const void **keys, hash_sig_t *prim_hash,
+		uint32_t num_keys, uint64_t *hit_mask, void *data[]);
+
+/**
+ * Find multiple keys in the hash table.
+ * This operation is multi-thread safe with regarding to other lookup threads.
+ * Read-write concurrency can be enabled by setting flag during
+ * table creation.
+ *
+ * @param h
+ *   Hash table to look in.
+ * @param keys
+ *   A pointer to a list of keys to look for.
  * @param num_keys
  *   How many keys are in the keys list (less than RTE_HASH_LOOKUP_BULK_MAX).
  * @param positions
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index a8fbbc3..fc3cb60 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -33,6 +33,7 @@  EXPERIMENTAL {
 	global:
 
 	rte_hash_free_key_with_position;
+	rte_hash_lookup_with_hash_bulk_data;
 	rte_hash_max_key_id;
 
 };