[v3] hash table: add an iterator over conflicting entries

Message ID 20180831165101.20026-1-qiaobinf@bu.edu
State Superseded, archived
Delegated to: Thomas Monjalon
Headers show
Series
  • [v3] hash table: add an iterator over conflicting entries
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Fu, Qiaobin Aug. 31, 2018, 4:51 p.m.
Function rte_hash_iterate_conflict_entries() iterates over
the entries that conflict with an incoming entry.

Iterating over conflicting entries enables one to decide
if the incoming entry is more valuable than the entries already
in the hash table. This is particularly useful after
an insertion failure.

v3:
* Make the rte_hash_iterate() API similar to
  rte_hash_iterate_conflict_entries()

v2:
* Fix the style issue

* Make the API more universal

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/Makefile             |   1 +
 lib/librte_hash/rte_cuckoo_hash.c    | 132 +++++++++++++++++++++++----
 lib/librte_hash/rte_hash.h           |  80 ++++++++++++++--
 lib/librte_hash/rte_hash_version.map |   8 ++
 test/test/test_hash.c                |   7 +-
 test/test/test_hash_multiwriter.c    |   8 +-
 test/test/test_hash_readwrite.c      |  16 ++--
 7 files changed, 219 insertions(+), 33 deletions(-)

Comments

Gaëtan Rivet Aug. 31, 2018, 10:53 p.m. | #1
Hi Qiaobin,

This work seems interesting, but is difficult to follow because
the previous discussion is not referenced.

You can find a how-to there:

http://doc.dpdk.org/guides/contributing/patches.html#sending-patches

--in-reply-to is useful to check which comments were already made and
understand the work previously done on a patchset.

On Fri, Aug 31, 2018 at 04:51:01PM +0000, Qiaobin Fu wrote:
> Function rte_hash_iterate_conflict_entries() iterates over
> the entries that conflict with an incoming entry.
> 
> Iterating over conflicting entries enables one to decide
> if the incoming entry is more valuable than the entries already
> in the hash table. This is particularly useful after
> an insertion failure.
> 
> v3:
> * Make the rte_hash_iterate() API similar to
>   rte_hash_iterate_conflict_entries()
> 
> v2:
> * Fix the style issue
> 
> * Make the API more universal
> 
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Cody Doucette <doucette@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> Reviewed-by: Keith Wiles <keith.wiles@intel.com>
> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_hash/Makefile             |   1 +
>  lib/librte_hash/rte_cuckoo_hash.c    | 132 +++++++++++++++++++++++----
>  lib/librte_hash/rte_hash.h           |  80 ++++++++++++++--
>  lib/librte_hash/rte_hash_version.map |   8 ++
>  test/test/test_hash.c                |   7 +-
>  test/test/test_hash_multiwriter.c    |   8 +-
>  test/test/test_hash_readwrite.c      |  16 ++--
>  7 files changed, 219 insertions(+), 33 deletions(-)
> 
> diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
> index c8c435dfd..9be58a205 100644
> --- a/lib/librte_hash/Makefile
> +++ b/lib/librte_hash/Makefile
> @@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_hash.a
>  
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>  LDLIBS += -lrte_eal -lrte_ring
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
> index f7b86c8c9..cf5b28196 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -1300,45 +1300,143 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
>  	return __builtin_popcountl(*hit_mask);
>  }
>  
> +/* istate stands for internal state. */

Is a name requiring a comment to explain a good name?
Maybe rte_hash_iterator_priv?

> +struct rte_hash_iterator_istate {
> +	const struct rte_hash *h;
> +	uint32_t              next;
> +	uint32_t              total_entries;
> +};

You should check that your private structure does not grow beyond
the public one, using RTE_BUILD_BUG_ON(sizeof(priv) < sizeof(pub)) somewhere.

"rte_hash_iterator_[i]state" seems unnecessarily verbose.
The memory you are manipulating through this variable is already holding
the state of your iterator. It is useless to append "_state".

    struct rte_hash_iterator_priv *state;

is also clear and reads better.
On the other hand "h" is maybe not verbose enough. Why not "hash"?

Also, please do not align field names in a structure. It forces
future changes to either break the pattern or edit the whole structure
when someone attempts to insert a field with a name that is too long.

> +
> +int32_t
> +rte_hash_iterator_init(const struct rte_hash *h,
> +	struct rte_hash_iterator_state *state)
> +{
> +	struct rte_hash_iterator_istate *__state;

Please do not use the "__" prefix to convey that
you are using a private version of the structure.

You could use "istate" or "it", the common shorthand for
iterator handles.

> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
> +	__state->h = h;
> +	__state->next = 0;
> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> +
> +	return 0;
> +}
> +
>  int32_t
> -rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
> +rte_hash_iterate(
> +	struct rte_hash_iterator_state *state, const void **key, void **data)

Why an empty first line of parameters here?

rte_hash_iterate(struct rte_hash_iterator_state *state,
                 const void **key,
                 void **data)

reads better.

>  {
> +	struct rte_hash_iterator_istate *__state;
>  	uint32_t bucket_idx, idx, position;
>  	struct rte_hash_key *next_key;
>  
> -	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
>  
> -	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>  	/* Out of bounds */
> -	if (*next >= total_entries)
> +	if (__state->next >= __state->total_entries)
>  		return -ENOENT;
>  
>  	/* Calculate bucket and index of current iterator */
> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>  
>  	/* If current position is empty, go to the next one */
> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> -		(*next)++;
> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +		__state->next++;
>  		/* End of table */
> -		if (*next == total_entries)
> +		if (__state->next == __state->total_entries)
>  			return -ENOENT;
> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>  	}
> -	__hash_rw_reader_lock(h);
> +	__hash_rw_reader_lock(__state->h);
>  	/* Get position of entry in key table */
> -	position = h->buckets[bucket_idx].key_idx[idx];
> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> -				position * h->key_entry_size);
> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
> +				position * __state->h->key_entry_size);
>  	/* Return key and data */
>  	*key = next_key->key;
>  	*data = next_key->pdata;
>  
> -	__hash_rw_reader_unlock(h);
> +	__hash_rw_reader_unlock(__state->h);
>  
>  	/* Increment iterator */
> -	(*next)++;
> +	__state->next++;
>  
>  	return position - 1;
>  }
> +
> +/* istate stands for internal state. */
> +struct rte_hash_iterator_conflict_entries_istate {

I find "conflict_entries" awkward, how about

rte_hash_dup_iterator

instead? It is shorter and conveys that you will iterate duplicate
entries.

> +	const struct rte_hash *h;
> +	uint32_t              vnext;
> +	uint32_t              primary_bidx;
> +	uint32_t              secondary_bidx;
> +};
> +
> +int32_t __rte_experimental
> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,

rte_hash_dup_iterator_init() maybe?

Why is _with_hash mentioned here? Is it possible to initialize this kind
of iterator without a reference to compare against? That this reference
is an rte_hash is already given by the parameter list.

In any case, 49 characters for a name is too long.

> +	hash_sig_t sig, struct rte_hash_iterator_state *state)
> +{
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +	__state->h = h;
> +	__state->vnext = 0;
> +
> +	/* Get the primary bucket index given the precomputed hash value. */
> +	__state->primary_bidx = sig & h->bucket_bitmask;
> +	/* Get the secondary bucket index given the precomputed hash value. */
> +	__state->secondary_bidx =
> +		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +
> +	return 0;
> +}
> +
> +int32_t __rte_experimental
> +rte_hash_iterate_conflict_entries(
> +	struct rte_hash_iterator_state *state, const void **key, void **data)

How about "rte_hash_dup_next()"?
Also, please break the parameter list instead of having an empty first
line.

> +{
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +
> +	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
> +		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
> +			__state->primary_bidx : __state->secondary_bidx;
> +		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
> +		uint32_t position = __state->h->buckets[bidx].key_idx[next];
> +		struct rte_hash_key *next_key;
> +
> +		/* Increment iterator. */
> +		__state->vnext++;
> +
> +		/*
> +		 * The test below is unlikely because this iterator is meant
> +		 * to be used after a failed insert.
> +		 */
> +		if (unlikely(position == EMPTY_SLOT))
> +			continue;
> +
> +		/* Get the entry in key table. */
> +		next_key = (struct rte_hash_key *) (
> +			(char *)__state->h->key_store +
> +			position * __state->h->key_entry_size);
> +		/* Return key and data. */
> +		*key = next_key->key;
> +		*data = next_key->pdata;
> +
> +		return position - 1;
> +	}
> +
> +	return -ENOENT;
> +}

[snip]

> diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
> index e216ac8e2..301d4638c 100644
> --- a/lib/librte_hash/rte_hash_version.map
> +++ b/lib/librte_hash/rte_hash_version.map
> @@ -24,6 +24,7 @@ DPDK_2.1 {
>  
>  	rte_hash_add_key_data;
>  	rte_hash_add_key_with_hash_data;
> +	rte_hash_iterator_init;
>  	rte_hash_iterate;
>  	rte_hash_lookup_bulk_data;
>  	rte_hash_lookup_data;
> @@ -53,3 +54,10 @@ DPDK_18.08 {
>  	rte_hash_count;
>  
>  } DPDK_16.07;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_hash_iterator_conflict_entries_init_with_hash;
> +	rte_hash_iterate_conflict_entries;
> +};

A blank line may be missing here.

Regards,
Honnappa Nagarahalli Sept. 2, 2018, 10:05 p.m. | #2
Hi Qiaobin,
	Thank you for the patch. Please see few comments inline.

-----Original Message-----
From: Qiaobin Fu <qiaobinf@bu.edu> 
Sent: Friday, August 31, 2018 11:51 AM
To: bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; yipeng1.wang@intel.com; michel@digirati.com.br; qiaobinf@bu.edu
Subject: [PATCH v3] hash table: add an iterator over conflicting entries

Function rte_hash_iterate_conflict_entries() iterates over the entries that conflict with an incoming entry.

Iterating over conflicting entries enables one to decide if the incoming entry is more valuable than the entries already in the hash table. This is particularly useful after an insertion failure.

v3:
* Make the rte_hash_iterate() API similar to
  rte_hash_iterate_conflict_entries()

v2:
* Fix the style issue

* Make the API more universal

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/Makefile             |   1 +
 lib/librte_hash/rte_cuckoo_hash.c    | 132 +++++++++++++++++++++++----
 lib/librte_hash/rte_hash.h           |  80 ++++++++++++++--
 lib/librte_hash/rte_hash_version.map |   8 ++
 test/test/test_hash.c                |   7 +-
 test/test/test_hash_multiwriter.c    |   8 +-
 test/test/test_hash_readwrite.c      |  16 ++--
 7 files changed, 219 insertions(+), 33 deletions(-)

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index c8c435dfd..9be58a205 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name  LIB = librte_hash.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 LDLIBS += -lrte_eal -lrte_ring
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8c9..cf5b28196 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1300,45 +1300,143 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
 	return __builtin_popcountl(*hit_mask);  }
 
+/* istate stands for internal state. */ struct rte_hash_iterator_istate 
+{
+	const struct rte_hash *h;
This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.

+	uint32_t              next;
+	uint32_t              total_entries;
+};
This structure can be moved to rte_cuckoo_hash.h file.

+

+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+	struct rte_hash_iterator_state *state) {
+	struct rte_hash_iterator_istate *__state;
'__state' can be replaced by 's'.

+
+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_istate *)state;
+	__state->h = h;
+	__state->next = 0;
+	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+	return 0;
+}
IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.

+
 int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
+rte_hash_iterate(
+	struct rte_hash_iterator_state *state, const void **key, void **data)

IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)

This will help keep the API signature consistent with existing APIs.

This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.

 {
+	struct rte_hash_iterator_istate *__state;
'__state' can be replaced with 's'.

 	uint32_t bucket_idx, idx, position;
 	struct rte_hash_key *next_key;
 
-	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+		(data == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_istate *)state;
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
 	/* Out of bounds */
-	if (*next >= total_entries)
+	if (__state->next >= __state->total_entries)
 		return -ENOENT;
 
'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.

 	/* Calculate bucket and index of current iterator */
-	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-	idx = *next % RTE_HASH_BUCKET_ENTRIES;
+	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
-		(*next)++;
+	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		__state->next++;
 		/* End of table */
-		if (*next == total_entries)
+		if (__state->next == __state->total_entries)
 			return -ENOENT;
-		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-		idx = *next % RTE_HASH_BUCKET_ENTRIES;
+		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
 	}
-	__hash_rw_reader_lock(h);
+	__hash_rw_reader_lock(__state->h);
 	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
-	next_key = (struct rte_hash_key *) ((char *)h->key_store +
-				position * h->key_entry_size);
+	position = __state->h->buckets[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
+				position * __state->h->key_entry_size);
 	/* Return key and data */
 	*key = next_key->key;
 	*data = next_key->pdata;
 
-	__hash_rw_reader_unlock(h);
+	__hash_rw_reader_unlock(__state->h);
 
 	/* Increment iterator */
-	(*next)++;
+	__state->next++;
This comment is not related to this change, it is better to place this inside the lock.
 
 	return position - 1;
 }
+
+/* istate stands for internal state. */ struct 
+rte_hash_iterator_conflict_entries_istate {
+	const struct rte_hash *h;
This can be moved outside of this structure. 

+	uint32_t              vnext;
+	uint32_t              primary_bidx;
+	uint32_t              secondary_bidx;
+};
+
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
+	hash_sig_t sig, struct rte_hash_iterator_state *state) {
+	struct rte_hash_iterator_conflict_entries_istate *__state;
+
+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
+	__state->h = h;
+	__state->vnext = 0;
+
+	/* Get the primary bucket index given the precomputed hash value. */
+	__state->primary_bidx = sig & h->bucket_bitmask;
+	/* Get the secondary bucket index given the precomputed hash value. */
+	__state->secondary_bidx =
+		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
+
+	return 0;
+}
IMO, as mentioned above, it is possible to avoid creating this API.

+
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+	struct rte_hash_iterator_state *state, const void **key, void **data) 
Signature of this API can be changed as follows:
rte_hash_iterate_conflict_entries(
	struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)

+{
+	struct rte_hash_iterator_conflict_entries_istate *__state;
+
+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+		(data == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
+
+	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
+		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
+			__state->primary_bidx : __state->secondary_bidx;
+		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);

take the reader lock before reading bucket entry

+		uint32_t position = __state->h->buckets[bidx].key_idx[next];
+		struct rte_hash_key *next_key;
+
+		/* Increment iterator. */
+		__state->vnext++;
+
+		/*
+		 * The test below is unlikely because this iterator is meant
+		 * to be used after a failed insert.
+		 */
+		if (unlikely(position == EMPTY_SLOT))
+			continue;
+
+		/* Get the entry in key table. */
+		next_key = (struct rte_hash_key *) (
+			(char *)__state->h->key_store +
+			position * __state->h->key_entry_size);
+		/* Return key and data. */
+		*key = next_key->key;
+		*data = next_key->pdata;
give the reader lock

+
+		return position - 1;
+	}
+
+	return -ENOENT;
+}
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index 9e7d9315f..fdb01023e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -14,6 +14,8 @@
 #include <stdint.h>
 #include <stddef.h>
 
+#include <rte_compat.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -64,6 +66,16 @@ struct rte_hash_parameters {
 /** @internal A hash table structure. */  struct rte_hash;
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal A hash table iterator state structure.
+ */
+struct rte_hash_iterator_state {
+	uint8_t space[64];
I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
+} __rte_cache_aligned;
+
 /**
  * Create a new hash table.
  *
@@ -443,26 +455,82 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		      uint32_t num_keys, int32_t *positions);
 
 /**
- * Iterate through the hash table, returning key-value pairs.
+ * Initialize the iterator over the hash table.
  *
  * @param h
- *   Hash table to iterate
+ *   Hash table to iterate.
+ * @param state
+ *   Pointer to the iterator state.
+ * @return
+ *   - 0 if successful.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+	struct rte_hash_iterator_state *state);
+
+/**
+ * Iterate through the hash table, returning key-value pairs.
+ *
+ * @param state
+ *   Pointer to the iterator state.
  * @param key
  *   Output containing the key where current iterator
  *   was pointing at
  * @param data
  *   Output containing the data associated with key.
  *   Returns NULL if data was not stored.
- * @param next
- *   Pointer to iterator. Should be 0 to start iterating the hash table.
- *   Iterator is incremented after each call of this function.
  * @return
  *   Position where key was stored, if successful.
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if end of the hash table.
  */
 int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
+rte_hash_iterate(
+	struct rte_hash_iterator_state *state, const void **key, void **data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the iterator over entries that conflict with a given hash.
+ *
+ * @param h
+ *   Hash table to iterate.
+ * @param sig
+ *   Precomputed hash value with which the returning entries conflict.
+ * @param state
+ *   Pointer to the iterator state.
+ * @return
+ *   - 0 if successful.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
+	hash_sig_t sig, struct rte_hash_iterator_state *state);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Iterate over entries that conflict with a given hash.
+ *
+ * @param state
+ *   Pointer to the iterator state.
+ * @param key
+ *   Output containing the key at where the iterator is currently pointing.
+ * @param data
+ *   Output containing the data associated with key.
+ *   Returns NULL if data was not stored.
+ * @return
+ *   Position where key was stored, if successful.
+ *   - -EINVAL if the parameters are invalid.
+ *   - -ENOENT if there is no more conflicting entries.
+ */
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+	struct rte_hash_iterator_state *state, const void **key, void **data);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index e216ac8e2..301d4638c 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -24,6 +24,7 @@ DPDK_2.1 {
 
 	rte_hash_add_key_data;
 	rte_hash_add_key_with_hash_data;
+	rte_hash_iterator_init;
 	rte_hash_iterate;
 	rte_hash_lookup_bulk_data;
 	rte_hash_lookup_data;
@@ -53,3 +54,10 @@ DPDK_18.08 {
 	rte_hash_count;
 
 } DPDK_16.07;
+
+EXPERIMENTAL {
+	global:
+
+	rte_hash_iterator_conflict_entries_init_with_hash;
+	rte_hash_iterate_conflict_entries;
+};
diff --git a/test/test/test_hash.c b/test/test/test_hash.c index b3db9fd10..bf57004c3 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -1170,8 +1170,8 @@ static int test_hash_iteration(void)
 	void *next_data;
 	void *data[NUM_ENTRIES];
 	unsigned added_keys;
-	uint32_t iter = 0;
 	int ret = 0;
+	struct rte_hash_iterator_state state;
 
 	ut_params.entries = NUM_ENTRIES;
 	ut_params.name = "test_hash_iteration"; @@ -1180,6 +1180,9 @@ static int test_hash_iteration(void)
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
+			"initialization of the hash iterator failed");
+
 	/* Add random entries until key cannot be added */
 	for (added_keys = 0; added_keys < NUM_ENTRIES; added_keys++) {
 		data[added_keys] = (void *) ((uintptr_t) rte_rand()); @@ -1191,7 +1194,7 @@ static int test_hash_iteration(void)
 	}
 
 	/* Iterate through the hash table */
-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added */
 		for (i = 0; i < NUM_ENTRIES; i++) {
 			if (memcmp(next_key, keys[i], ut_params.key_len) == 0) { diff --git a/test/test/test_hash_multiwriter.c b/test/test/test_hash_multiwriter.c
index 6a3eb10bd..48db8007d 100644
--- a/test/test/test_hash_multiwriter.c
+++ b/test/test/test_hash_multiwriter.c
@@ -125,18 +125,22 @@ test_hash_multiwriter(void)
 
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	uint32_t count;
 
+	struct rte_hash_iterator_state state;
+
 	snprintf(name, 32, "test%u", calledCount++);
 	hash_params.name = name;
 
 	handle = rte_hash_create(&hash_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
+			"initialization of the hash iterator failed");
+
 	tbl_multiwriter_test_params.h = handle;
 	tbl_multiwriter_test_params.nb_tsx_insertion =
 		nb_total_tsx_insertion / rte_lcore_count(); @@ -203,7 +207,7 @@ test_hash_multiwriter(void)
 		goto err3;
 	}
 
-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added .*/
 		i = *(const uint32_t *)next_key;
 		tbl_multiwriter_test_params.found[i]++;
diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c index 55ae33d80..9cdab9992 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -166,12 +166,13 @@ test_hash_readwrite_functional(int use_htm)
 	unsigned int i;
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	int use_jhash = 1;
 
+	struct rte_hash_iterator_state state;
+
 	rte_atomic64_init(&gcycles);
 	rte_atomic64_clear(&gcycles);
 
@@ -188,6 +189,8 @@ test_hash_readwrite_functional(int use_htm)
 		tbl_rw_test_param.num_insert
 		* rte_lcore_count();
 
+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
+
 	printf("++++++++Start function tests:+++++++++\n");
 
 	/* Fire all threads. */
@@ -195,8 +198,7 @@ test_hash_readwrite_functional(int use_htm)
 				 NULL, CALL_MASTER);
 	rte_eal_mp_wait_lcore();
 
-	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
-			&next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added .*/
 		i = *(const uint32_t *)next_key;
 		tbl_rw_test_param.found[i]++;
@@ -315,9 +317,10 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 	int use_jhash = 0;
 
+	struct rte_hash_iterator_state state;
+
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 
@@ -336,6 +339,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 	if (init_params(use_htm, use_jhash) != 0)
 		goto err;
 
+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
+
 	/*
 	 * Do a readers finish faster or writers finish faster test.
 	 * When readers finish faster, we timing the readers, and when writers @@ -484,8 +489,7 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 
 		rte_eal_mp_wait_lcore();
 
-		while (rte_hash_iterate(tbl_rw_test_param.h,
-				&next_key, &next_data, &iter) >= 0) {
+		while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 			/* Search for the key in the list of keys added .*/
 			i = *(const uint32_t *)next_key;
 			tbl_rw_test_param.found[i]++;
--
2.17.1
Michel Machado Sept. 4, 2018, 6:46 p.m. | #3
Hi Gaëtan,

On 08/31/2018 06:53 PM, Gaëtan Rivet wrote:
> Hi Qiaobin,
> 
> This work seems interesting, but is difficult to follow because
> the previous discussion is not referenced.
> 
> You can find a how-to there:
> 
> http://doc.dpdk.org/guides/contributing/patches.html#sending-patches
> 
> --in-reply-to is useful to check which comments were already made and
> understand the work previously done on a patchset.

    Thanks for bringing this to our attention.

>> +/* istate stands for internal state. */
> 
> Is a name requiring a comment to explain a good name?
> Maybe rte_hash_iterator_priv?
> 
>> +struct rte_hash_iterator_istate {
>> +	const struct rte_hash *h;
>> +	uint32_t              next;
>> +	uint32_t              total_entries;
>> +};

    We agree that the suffix _priv is better.

> You should check that your private structure does not grow beyond
> the public one, using RTE_BUILD_BUG_ON(sizeof(priv) < sizeof(pub)) somewhere.

    We have overlooked the macro RTE_BUILD_BUG_ON(). We'll use it.

> "rte_hash_iterator_[i]state" seems unnecessarily verbose.
> The memory you are manipulating through this variable is already holding
> the state of your iterator. It is useless to append "_state".
> 
>      struct rte_hash_iterator_priv *state;
> 
> is also clear and reads better.
> On the other hand "h" is maybe not verbose enough. Why not "hash"?

    We'll keep the parameter name "state" and rename the variable 
"__state" to "it" as you suggest in a comment later in your email.

    About the variable "h", we are following the coding convention in 
the library. You can find plenty of examples of using "h" for a hash table.

> Also, please do not align field names in a structure. It forces
> future changes to either break the pattern or edit the whole structure
> when someone attempts to insert a field with a name that is too long.

    Your suggestion goes against the coding style of DPDK. See section 
"1.5.5. Structure Declarations" on the page:

https://doc.dpdk.org/guides-18.08/contributing/coding_style.html

>> +
>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> +	struct rte_hash_iterator_state *state)
>> +{
>> +	struct rte_hash_iterator_istate *__state;
> 
> Please do not use the "__" prefix to convey that
> you are using a private version of the structure.
> 
> You could use "istate" or "it", the common shorthand for
> iterator handles.

    We'll do it as explained before.

>>   int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
>> +rte_hash_iterate(
>> +	struct rte_hash_iterator_state *state, const void **key, void **data)
> 
> Why an empty first line of parameters here?
> 
> rte_hash_iterate(struct rte_hash_iterator_state *state,
>                   const void **key,
>                   void **data)
> 
> reads better.

   Okay.

>> +
>> +/* istate stands for internal state. */
>> +struct rte_hash_iterator_conflict_entries_istate {
> 
> I find "conflict_entries" awkward, how about
> 
> rte_hash_dup_iterator
> 
> instead? It is shorter and conveys that you will iterate duplicate
> entries.

    Yipeng Wang suggested the expression "conflict_entries" in his 
review of the first version of this patch. You find his suggestion here:

http://mails.dpdk.org/archives/dev/2018-August/109103.html

    I find the name "dup" misleading because it suggests that the 
returned entries have the same key or refer to the same object. For 
example, the file descriptor returned by dup(2) refers to the same file.

>> +	const struct rte_hash *h;
>> +	uint32_t              vnext;
>> +	uint32_t              primary_bidx;
>> +	uint32_t              secondary_bidx;
>> +};
>> +
>> +int32_t __rte_experimental
>> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
> 
> rte_hash_dup_iterator_init() maybe?
> 
> Why is _with_hash mentioned here? Is it possible to initialize this kind
> of iterator without a reference to compare against? That this reference
> is an rte_hash is already given by the parameter list.
> 
> In any case, 49 characters for a name is too long.

    Honnappa Nagarahalli suggested adding the suffix "_with_hash" during 
his review of the second version of this patch. His argument went as 
follows: "Let me elaborate. For the API 'rte_hash_lookup', there are 
multiple variations such as 'rte_hash_lookup_with_hash', 
'rte_hash_lookup_data', 'rte_hash_lookup_with_hash_data' etc. We do not 
need to create similar variations for 
'rte_hash_iterate_conflict_entries' API right now. But the naming of the 
API should be such that these variations can be created in the future."

    You find a copy of his original message here:

https://www.mail-archive.com/dev@dpdk.org/msg109653.html

>> +int32_t __rte_experimental
>> +rte_hash_iterate_conflict_entries(
>> +	struct rte_hash_iterator_state *state, const void **key, void **data)
> 
> How about "rte_hash_dup_next()"?
> Also, please break the parameter list instead of having an empty first
> line.

    We are preserving the name convention used in DPDK (e.g. 
rte_hash_iterate()).

    We'll break the parameter list to avoid the empty first line.

>> diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
>> index e216ac8e2..301d4638c 100644
>> --- a/lib/librte_hash/rte_hash_version.map
>> +++ b/lib/librte_hash/rte_hash_version.map
>> @@ -24,6 +24,7 @@ DPDK_2.1 {
>>   
>>   	rte_hash_add_key_data;
>>   	rte_hash_add_key_with_hash_data;
>> +	rte_hash_iterator_init;
>>   	rte_hash_iterate;
>>   	rte_hash_lookup_bulk_data;
>>   	rte_hash_lookup_data;
>> @@ -53,3 +54,10 @@ DPDK_18.08 {
>>   	rte_hash_count;
>>   
>>   } DPDK_16.07;
>> +
>> +EXPERIMENTAL {
>> +	global:
>> +
>> +	rte_hash_iterator_conflict_entries_init_with_hash;
>> +	rte_hash_iterate_conflict_entries;
>> +};
> 
> A blank line may be missing here.

    Do you mean a blank line before "};"?

    Thank you for the careful review.

[ ]'s
Michel Machado
Wang, Yipeng1 Sept. 4, 2018, 6:55 p.m. | #4
Thanks for the patch.

Do we need both of the state and istate struct? struct rte_hash_iterator_state  seems not doing much.
How about we only have one "state" struct and just not expose the internals to the public API, similar to the
rte_hash struct or rte_member_setsum struct. 
And in _init function use rte_malloc to allocate the state and add a _free function to free it.

Thanks
Yipeng


>-----Original Message-----
>From: Qiaobin Fu [mailto:qiaobinf@bu.edu]
>Sent: Friday, August 31, 2018 9:51 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; doucette@bu.edu; Wiles, Keith <keith.wiles@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Tai, Charlie
><charlie.tai@intel.com>; stephen@networkplumber.org; nd@arm.com; honnappa.nagarahalli@arm.com; Wang, Yipeng1
><yipeng1.wang@intel.com>; michel@digirati.com.br; qiaobinf@bu.edu
>Subject: [PATCH v3] hash table: add an iterator over conflicting entries
>
>Function rte_hash_iterate_conflict_entries() iterates over
>the entries that conflict with an incoming entry.
>
>Iterating over conflicting entries enables one to decide
>if the incoming entry is more valuable than the entries already
>in the hash table. This is particularly useful after
>an insertion failure.
>
>v3:
>* Make the rte_hash_iterate() API similar to
>  rte_hash_iterate_conflict_entries()
>
>v2:
>* Fix the style issue
>
>* Make the API more universal
>
>Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>Reviewed-by: Cody Doucette <doucette@bu.edu>
>Reviewed-by: Michel Machado <michel@digirati.com.br>
>Reviewed-by: Keith Wiles <keith.wiles@intel.com>
>Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
>Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>---
> lib/librte_hash/Makefile             |   1 +
> lib/librte_hash/rte_cuckoo_hash.c    | 132 +++++++++++++++++++++++----
> lib/librte_hash/rte_hash.h           |  80 ++++++++++++++--
> lib/librte_hash/rte_hash_version.map |   8 ++
> test/test/test_hash.c                |   7 +-
> test/test/test_hash_multiwriter.c    |   8 +-
> test/test/test_hash_readwrite.c      |  16 ++--
> 7 files changed, 219 insertions(+), 33 deletions(-)
>
>diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
>index c8c435dfd..9be58a205 100644
>--- a/lib/librte_hash/Makefile
>+++ b/lib/librte_hash/Makefile
>@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
> # library name
> LIB = librte_hash.a
>
>+CFLAGS += -DALLOW_EXPERIMENTAL_API
> CFLAGS += -O3
> CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> LDLIBS += -lrte_eal -lrte_ring
>diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>index f7b86c8c9..cf5b28196 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.c
>+++ b/lib/librte_hash/rte_cuckoo_hash.c
>@@ -1300,45 +1300,143 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
> 	return __builtin_popcountl(*hit_mask);
> }
>
>+/* istate stands for internal state. */
>+struct rte_hash_iterator_istate {
>+	const struct rte_hash *h;
>+	uint32_t              next;
>+	uint32_t              total_entries;
>+};
>+
>+int32_t
>+rte_hash_iterator_init(const struct rte_hash *h,
>+	struct rte_hash_iterator_state *state)
>+{
>+	struct rte_hash_iterator_istate *__state;
>+
>+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>+
>+	__state = (struct rte_hash_iterator_istate *)state;
>+	__state->h = h;
>+	__state->next = 0;
>+	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>+
>+	return 0;
>+}
>+
> int32_t
>-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
>+rte_hash_iterate(
>+	struct rte_hash_iterator_state *state, const void **key, void **data)
> {
>+	struct rte_hash_iterator_istate *__state;
> 	uint32_t bucket_idx, idx, position;
> 	struct rte_hash_key *next_key;
>
>-	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
>+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
>+		(data == NULL)), -EINVAL);
>+
>+	__state = (struct rte_hash_iterator_istate *)state;
>
>-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> 	/* Out of bounds */
>-	if (*next >= total_entries)
>+	if (__state->next >= __state->total_entries)
> 		return -ENOENT;
>
> 	/* Calculate bucket and index of current iterator */
>-	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>-	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>+	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>+	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>
> 	/* If current position is empty, go to the next one */
>-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>-		(*next)++;
>+	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>+		__state->next++;
> 		/* End of table */
>-		if (*next == total_entries)
>+		if (__state->next == __state->total_entries)
> 			return -ENOENT;
>-		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>-		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>+		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>+		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
> 	}
>-	__hash_rw_reader_lock(h);
>+	__hash_rw_reader_lock(__state->h);
> 	/* Get position of entry in key table */
>-	position = h->buckets[bucket_idx].key_idx[idx];
>-	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>-				position * h->key_entry_size);
>+	position = __state->h->buckets[bucket_idx].key_idx[idx];
>+	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>+				position * __state->h->key_entry_size);
> 	/* Return key and data */
> 	*key = next_key->key;
> 	*data = next_key->pdata;
>
>-	__hash_rw_reader_unlock(h);
>+	__hash_rw_reader_unlock(__state->h);
>
> 	/* Increment iterator */
>-	(*next)++;
>+	__state->next++;
>
> 	return position - 1;
> }
>+
>+/* istate stands for internal state. */
>+struct rte_hash_iterator_conflict_entries_istate {
>+	const struct rte_hash *h;
>+	uint32_t              vnext;
>+	uint32_t              primary_bidx;
>+	uint32_t              secondary_bidx;
>+};
>+
>+int32_t __rte_experimental
>+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
>+	hash_sig_t sig, struct rte_hash_iterator_state *state)
>+{
>+	struct rte_hash_iterator_conflict_entries_istate *__state;
>+
>+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>+
>+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
>+	__state->h = h;
>+	__state->vnext = 0;
>+
>+	/* Get the primary bucket index given the precomputed hash value. */
>+	__state->primary_bidx = sig & h->bucket_bitmask;
>+	/* Get the secondary bucket index given the precomputed hash value. */
>+	__state->secondary_bidx =
>+		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
>+
>+	return 0;
>+}
>+
>+int32_t __rte_experimental
>+rte_hash_iterate_conflict_entries(
>+	struct rte_hash_iterator_state *state, const void **key, void **data)
>+{
>+	struct rte_hash_iterator_conflict_entries_istate *__state;
>+
>+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
>+		(data == NULL)), -EINVAL);
>+
>+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
>+
>+	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
>+		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
>+			__state->primary_bidx : __state->secondary_bidx;
>+		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
>+		uint32_t position = __state->h->buckets[bidx].key_idx[next];
>+		struct rte_hash_key *next_key;
>+
>+		/* Increment iterator. */
>+		__state->vnext++;
>+
>+		/*
>+		 * The test below is unlikely because this iterator is meant
>+		 * to be used after a failed insert.
>+		 */
>+		if (unlikely(position == EMPTY_SLOT))
>+			continue;
>+
>+		/* Get the entry in key table. */
>+		next_key = (struct rte_hash_key *) (
>+			(char *)__state->h->key_store +
>+			position * __state->h->key_entry_size);
>+		/* Return key and data. */
>+		*key = next_key->key;
>+		*data = next_key->pdata;
>+
>+		return position - 1;
>+	}
>+
>+	return -ENOENT;
>+}
>diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>index 9e7d9315f..fdb01023e 100644
>--- a/lib/librte_hash/rte_hash.h
>+++ b/lib/librte_hash/rte_hash.h
>@@ -14,6 +14,8 @@
> #include <stdint.h>
> #include <stddef.h>
>
>+#include <rte_compat.h>
>+
> #ifdef __cplusplus
> extern "C" {
> #endif
>@@ -64,6 +66,16 @@ struct rte_hash_parameters {
> /** @internal A hash table structure. */
> struct rte_hash;
>
>+/**
>+ * @warning
>+ * @b EXPERIMENTAL: this API may change without prior notice.
>+ *
>+ * @internal A hash table iterator state structure.
>+ */
>+struct rte_hash_iterator_state {
>+	uint8_t space[64];
>+} __rte_cache_aligned;
>+
> /**
>  * Create a new hash table.
>  *
>@@ -443,26 +455,82 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> 		      uint32_t num_keys, int32_t *positions);
>
> /**
>- * Iterate through the hash table, returning key-value pairs.
>+ * Initialize the iterator over the hash table.
>  *
>  * @param h
>- *   Hash table to iterate
>+ *   Hash table to iterate.
>+ * @param state
>+ *   Pointer to the iterator state.
>+ * @return
>+ *   - 0 if successful.
>+ *   - -EINVAL if the parameters are invalid.
>+ */
>+int32_t
>+rte_hash_iterator_init(const struct rte_hash *h,
>+	struct rte_hash_iterator_state *state);
>+
>+/**
>+ * Iterate through the hash table, returning key-value pairs.
>+ *
>+ * @param state
>+ *   Pointer to the iterator state.
>  * @param key
>  *   Output containing the key where current iterator
>  *   was pointing at
>  * @param data
>  *   Output containing the data associated with key.
>  *   Returns NULL if data was not stored.
>- * @param next
>- *   Pointer to iterator. Should be 0 to start iterating the hash table.
>- *   Iterator is incremented after each call of this function.
>  * @return
>  *   Position where key was stored, if successful.
>  *   - -EINVAL if the parameters are invalid.
>  *   - -ENOENT if end of the hash table.
>  */
> int32_t
>-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
>+rte_hash_iterate(
>+	struct rte_hash_iterator_state *state, const void **key, void **data);
>+
>+/**
>+ * @warning
>+ * @b EXPERIMENTAL: this API may change without prior notice.
>+ *
>+ * Initialize the iterator over entries that conflict with a given hash.
>+ *
>+ * @param h
>+ *   Hash table to iterate.
>+ * @param sig
>+ *   Precomputed hash value with which the returning entries conflict.
>+ * @param state
>+ *   Pointer to the iterator state.
>+ * @return
>+ *   - 0 if successful.
>+ *   - -EINVAL if the parameters are invalid.
>+ */
>+int32_t __rte_experimental
>+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
>+	hash_sig_t sig, struct rte_hash_iterator_state *state);
>+
>+/**
>+ * @warning
>+ * @b EXPERIMENTAL: this API may change without prior notice.
>+ *
>+ * Iterate over entries that conflict with a given hash.
>+ *
>+ * @param state
>+ *   Pointer to the iterator state.
>+ * @param key
>+ *   Output containing the key at where the iterator is currently pointing.
>+ * @param data
>+ *   Output containing the data associated with key.
>+ *   Returns NULL if data was not stored.
>+ * @return
>+ *   Position where key was stored, if successful.
>+ *   - -EINVAL if the parameters are invalid.
>+ *   - -ENOENT if there is no more conflicting entries.
>+ */
>+int32_t __rte_experimental
>+rte_hash_iterate_conflict_entries(
>+	struct rte_hash_iterator_state *state, const void **key, void **data);
>+
> #ifdef __cplusplus
> }
> #endif
>diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
>index e216ac8e2..301d4638c 100644
>--- a/lib/librte_hash/rte_hash_version.map
>+++ b/lib/librte_hash/rte_hash_version.map
>@@ -24,6 +24,7 @@ DPDK_2.1 {
>
> 	rte_hash_add_key_data;
> 	rte_hash_add_key_with_hash_data;
>+	rte_hash_iterator_init;
> 	rte_hash_iterate;
> 	rte_hash_lookup_bulk_data;
> 	rte_hash_lookup_data;
>@@ -53,3 +54,10 @@ DPDK_18.08 {
> 	rte_hash_count;
>
> } DPDK_16.07;
>+
>+EXPERIMENTAL {
>+	global:
>+
>+	rte_hash_iterator_conflict_entries_init_with_hash;
>+	rte_hash_iterate_conflict_entries;
>+};
>diff --git a/test/test/test_hash.c b/test/test/test_hash.c
>index b3db9fd10..bf57004c3 100644
>--- a/test/test/test_hash.c
>+++ b/test/test/test_hash.c
>@@ -1170,8 +1170,8 @@ static int test_hash_iteration(void)
> 	void *next_data;
> 	void *data[NUM_ENTRIES];
> 	unsigned added_keys;
>-	uint32_t iter = 0;
> 	int ret = 0;
>+	struct rte_hash_iterator_state state;
>
> 	ut_params.entries = NUM_ENTRIES;
> 	ut_params.name = "test_hash_iteration";
>@@ -1180,6 +1180,9 @@ static int test_hash_iteration(void)
> 	handle = rte_hash_create(&ut_params);
> 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
>
>+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
>+			"initialization of the hash iterator failed");
>+
> 	/* Add random entries until key cannot be added */
> 	for (added_keys = 0; added_keys < NUM_ENTRIES; added_keys++) {
> 		data[added_keys] = (void *) ((uintptr_t) rte_rand());
>@@ -1191,7 +1194,7 @@ static int test_hash_iteration(void)
> 	}
>
> 	/* Iterate through the hash table */
>-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
>+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
> 		/* Search for the key in the list of keys added */
> 		for (i = 0; i < NUM_ENTRIES; i++) {
> 			if (memcmp(next_key, keys[i], ut_params.key_len) == 0) {
>diff --git a/test/test/test_hash_multiwriter.c b/test/test/test_hash_multiwriter.c
>index 6a3eb10bd..48db8007d 100644
>--- a/test/test/test_hash_multiwriter.c
>+++ b/test/test/test_hash_multiwriter.c
>@@ -125,18 +125,22 @@ test_hash_multiwriter(void)
>
> 	const void *next_key;
> 	void *next_data;
>-	uint32_t iter = 0;
>
> 	uint32_t duplicated_keys = 0;
> 	uint32_t lost_keys = 0;
> 	uint32_t count;
>
>+	struct rte_hash_iterator_state state;
>+
> 	snprintf(name, 32, "test%u", calledCount++);
> 	hash_params.name = name;
>
> 	handle = rte_hash_create(&hash_params);
> 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
>
>+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
>+			"initialization of the hash iterator failed");
>+
> 	tbl_multiwriter_test_params.h = handle;
> 	tbl_multiwriter_test_params.nb_tsx_insertion =
> 		nb_total_tsx_insertion / rte_lcore_count();
>@@ -203,7 +207,7 @@ test_hash_multiwriter(void)
> 		goto err3;
> 	}
>
>-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
>+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
> 		/* Search for the key in the list of keys added .*/
> 		i = *(const uint32_t *)next_key;
> 		tbl_multiwriter_test_params.found[i]++;
>diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
>index 55ae33d80..9cdab9992 100644
>--- a/test/test/test_hash_readwrite.c
>+++ b/test/test/test_hash_readwrite.c
>@@ -166,12 +166,13 @@ test_hash_readwrite_functional(int use_htm)
> 	unsigned int i;
> 	const void *next_key;
> 	void *next_data;
>-	uint32_t iter = 0;
>
> 	uint32_t duplicated_keys = 0;
> 	uint32_t lost_keys = 0;
> 	int use_jhash = 1;
>
>+	struct rte_hash_iterator_state state;
>+
> 	rte_atomic64_init(&gcycles);
> 	rte_atomic64_clear(&gcycles);
>
>@@ -188,6 +189,8 @@ test_hash_readwrite_functional(int use_htm)
> 		tbl_rw_test_param.num_insert
> 		* rte_lcore_count();
>
>+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
>+
> 	printf("++++++++Start function tests:+++++++++\n");
>
> 	/* Fire all threads. */
>@@ -195,8 +198,7 @@ test_hash_readwrite_functional(int use_htm)
> 				 NULL, CALL_MASTER);
> 	rte_eal_mp_wait_lcore();
>
>-	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
>-			&next_data, &iter) >= 0) {
>+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
> 		/* Search for the key in the list of keys added .*/
> 		i = *(const uint32_t *)next_key;
> 		tbl_rw_test_param.found[i]++;
>@@ -315,9 +317,10 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
>
> 	const void *next_key;
> 	void *next_data;
>-	uint32_t iter = 0;
> 	int use_jhash = 0;
>
>+	struct rte_hash_iterator_state state;
>+
> 	uint32_t duplicated_keys = 0;
> 	uint32_t lost_keys = 0;
>
>@@ -336,6 +339,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
> 	if (init_params(use_htm, use_jhash) != 0)
> 		goto err;
>
>+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
>+
> 	/*
> 	 * Do a readers finish faster or writers finish faster test.
> 	 * When readers finish faster, we timing the readers, and when writers
>@@ -484,8 +489,7 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
>
> 		rte_eal_mp_wait_lcore();
>
>-		while (rte_hash_iterate(tbl_rw_test_param.h,
>-				&next_key, &next_data, &iter) >= 0) {
>+		while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
> 			/* Search for the key in the list of keys added .*/
> 			i = *(const uint32_t *)next_key;
> 			tbl_rw_test_param.found[i]++;
>--
>2.17.1
Michel Machado Sept. 4, 2018, 7:07 p.m. | #5
Hi Yipeng,

On 09/04/2018 02:55 PM, Wang, Yipeng1 wrote:
> Do we need both of the state and istate struct? struct rte_hash_iterator_state  seems not doing much.
> How about we only have one "state" struct and just not expose the internals to the public API, similar to the
> rte_hash struct or rte_member_setsum struct.
> And in _init function use rte_malloc to allocate the state and add a _free function to free it.

    The purpose of have struct state is to enable applications to 
allocate iterator states on their execution stack or embedding iterator 
states in larger structs to avoid an extra malloc()/free().

    Do you foresee that the upcoming new underlying algorithm of hash 
tables will need to dynamically allocate iterator states?

[ ]'s
Michel Machado
Michel Machado Sept. 4, 2018, 7:36 p.m. | #6
Hi Honnappa,

On 09/02/2018 06:05 PM, Honnappa Nagarahalli wrote:
> +/* istate stands for internal state. */ struct rte_hash_iterator_istate
> +{
> +	const struct rte_hash *h;
> This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.

    Discussed later.

> +	uint32_t              next;
> +	uint32_t              total_entries;
> +};
> This structure can be moved to rte_cuckoo_hash.h file.

    What's the purpose of moving this struct to a header file since it's 
only used in the C file rte_cuckoo_hash.c?

> +int32_t
> +rte_hash_iterator_init(const struct rte_hash *h,
> +	struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced by 's'.
> 
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
> +	__state->h = h;
> +	__state->next = 0;
> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> +
> +	return 0;
> +}
> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.

    Applications would have to initialize struct rte_hash_iterator_state 
*state before calling rte_hash_iterate() anyway. Why not initializing 
the fields of a state only once?

>   int32_t
> -rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
> +rte_hash_iterate(
> +	struct rte_hash_iterator_state *state, const void **key, void **data)
> 
> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
> rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)
> 
> This will help keep the API signature consistent with existing APIs.
> 
> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.

    The ABI will change in a way or another, so why not going for a 
single state instead of requiring parameters that are already needed for 
the initialization of the state?

    Thank you for the link. We'll check how to proceed with the ABI change.

>   {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced with 's'.

    Gaëtan Rivet has already pointed this out in his review of this 
version of our patch.

>   	uint32_t bucket_idx, idx, position;
>   	struct rte_hash_key *next_key;
>   
> -	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
>   
> -	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>   	/* Out of bounds */
> -	if (*next >= total_entries)
> +	if (__state->next >= __state->total_entries)
>   		return -ENOENT;
>   
> 'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.

    The argument to keep _init() is presented above in this email.

>   	/* Calculate bucket and index of current iterator */
> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   
>   	/* If current position is empty, go to the next one */
> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> -		(*next)++;
> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +		__state->next++;
>   		/* End of table */
> -		if (*next == total_entries)
> +		if (__state->next == __state->total_entries)
>   			return -ENOENT;
> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   	}
> -	__hash_rw_reader_lock(h);
> +	__hash_rw_reader_lock(__state->h);
>   	/* Get position of entry in key table */
> -	position = h->buckets[bucket_idx].key_idx[idx];
> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> -				position * h->key_entry_size);
> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
> +				position * __state->h->key_entry_size);
>   	/* Return key and data */
>   	*key = next_key->key;
>   	*data = next_key->pdata;
>   
> -	__hash_rw_reader_unlock(h);
> +	__hash_rw_reader_unlock(__state->h);
>   
>   	/* Increment iterator */
> -	(*next)++;
> +	__state->next++;
> This comment is not related to this change, it is better to place this inside the lock.

    Even though __state->next does not depend on the lock?

>   	return position - 1;
>   }
> +
> +/* istate stands for internal state. */ struct
> +rte_hash_iterator_conflict_entries_istate {
> +	const struct rte_hash *h;
> This can be moved outside of this structure.

    Discussed earlier.

> +	uint32_t              vnext;
> +	uint32_t              primary_bidx;
> +	uint32_t              secondary_bidx;
> +};
> +
> +int32_t __rte_experimental
> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
> +	hash_sig_t sig, struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +	__state->h = h;
> +	__state->vnext = 0;
> +
> +	/* Get the primary bucket index given the precomputed hash value. */
> +	__state->primary_bidx = sig & h->bucket_bitmask;
> +	/* Get the secondary bucket index given the precomputed hash value. */
> +	__state->secondary_bidx =
> +		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +
> +	return 0;
> +}
> IMO, as mentioned above, it is possible to avoid creating this API.

    Discussed earlier.

> +
> +int32_t __rte_experimental
> +rte_hash_iterate_conflict_entries(
> +	struct rte_hash_iterator_state *state, const void **key, void **data)
> Signature of this API can be changed as follows:
> rte_hash_iterate_conflict_entries(
> 	struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)

    Discussed earlier.

> +{
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +
> +	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
> +		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
> +			__state->primary_bidx : __state->secondary_bidx;
> +		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
> 
> take the reader lock before reading bucket entry

    Thanks for pointing this out. We are going to do so. The lock came 
in as we go through the versions of this patch.

> +		uint32_t position = __state->h->buckets[bidx].key_idx[next];
> +		struct rte_hash_key *next_key;
> +
> +		/* Increment iterator. */
> +		__state->vnext++;
> +
> +		/*
> +		 * The test below is unlikely because this iterator is meant
> +		 * to be used after a failed insert.
> +		 */
> +		if (unlikely(position == EMPTY_SLOT))
> +			continue;
> +
> +		/* Get the entry in key table. */
> +		next_key = (struct rte_hash_key *) (
> +			(char *)__state->h->key_store +
> +			position * __state->h->key_entry_size);
> +		/* Return key and data. */
> +		*key = next_key->key;
> +		*data = next_key->pdata;
> give the reader lock

    We'll do so.

> +
> +		return position - 1;
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index 9e7d9315f..fdb01023e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -14,6 +14,8 @@
>   #include <stdint.h>
>   #include <stddef.h>
>   
> +#include <rte_compat.h>
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>   /** @internal A hash table structure. */  struct rte_hash;
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal A hash table iterator state structure.
> + */
> +struct rte_hash_iterator_state {
> +	uint8_t space[64];
> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.

    Okay.

[ ]'s
Michel Machado
Wang, Yipeng1 Sept. 4, 2018, 7:51 p.m. | #7
Hmm, I guess my comment is for code readability. If we don’t need the extra state that would be great.

I think "rte_hash" is defined as an internal data structure but expose the type to the public header. Would this work?

I propose to malloc inside function mostly because I think it is cleaner to the user. But your argument is
valid. Depending on use case I think it is OK.

Another comment is you put the total_entry in the state, is it for performance of the rte_hash_iterate?
If you use it to iterate conflict entries, especially If you reuse same "state" struct and init it again and again for different keys,
would this slow down the performance for your specific use case? 

Also iterate_conflic_entry may need reader lock protection.

Thanks
Yipeng

>-----Original Message-----
>From: Michel Machado [mailto:michel@digirati.com.br]
>Sent: Tuesday, September 4, 2018 12:08 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Qiaobin Fu <qiaobinf@bu.edu>; Richardson, Bruce <bruce.richardson@intel.com>; De
>Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; doucette@bu.edu; Wiles, Keith <keith.wiles@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Tai, Charlie
><charlie.tai@intel.com>; stephen@networkplumber.org; nd@arm.com; honnappa.nagarahalli@arm.com
>Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries
>
>Hi Yipeng,
>
>On 09/04/2018 02:55 PM, Wang, Yipeng1 wrote:
>> Do we need both of the state and istate struct? struct rte_hash_iterator_state  seems not doing much.
>> How about we only have one "state" struct and just not expose the internals to the public API, similar to the
>> rte_hash struct or rte_member_setsum struct.
>> And in _init function use rte_malloc to allocate the state and add a _free function to free it.
>
>    The purpose of have struct state is to enable applications to
>allocate iterator states on their execution stack or embedding iterator
>states in larger structs to avoid an extra malloc()/free().
>
>    Do you foresee that the upcoming new underlying algorithm of hash
>tables will need to dynamically allocate iterator states?
>
>[ ]'s
>Michel Machado
Michel Machado Sept. 4, 2018, 8:26 p.m. | #8
Hi Yipeng,

On 09/04/2018 03:51 PM, Wang, Yipeng1 wrote:
> Hmm, I guess my comment is for code readability. If we don’t need the extra state that would be great.

    Notice that applications only see the public, opaque state. And the 
private versions are scoped to the C file where they are needed.

> I think "rte_hash" is defined as an internal data structure but expose the type to the public header. Would this work?

    Exposing the private fields would bind the interface with the 
current implementation of the hash table. In the way we are proposing, 
one should be able to replace the underlying algorithm and not touching 
the header files that applications use. But, yes, your solution would 
enable applications to allocate iterator states as local variables as well.

> I propose to malloc inside function mostly because I think it is cleaner to the user. But your argument is
> valid. Depending on use case I think it is OK.

    I don't know how other applications will use this iterator, but we 
use it when our application is already overloaded. So avoiding an 
expensive operation like malloc() is a win.

> Another comment is you put the total_entry in the state, is it for performance of the rte_hash_iterate?

    We are saving one integer multiplication per call of 
rte_hash_iterate(). It's not a big deal, but since there's room in the 
state variable, we thought this would be a good idea because it grows 
with the size of the table. We didn't actually measure the effect of 
this decision.

> If you use it to iterate conflict entries, especially If you reuse same "state" struct and init it again and again for different keys,
> would this slow down the performance for your specific use case?

    Notice that the field total_entry only exists for 
rte_hash_iterate(). But even if total_entry were in the state of 
rte_hash_iterate_conflict_entries(), it would still save on the 
multiplication as long as rte_hash_iterate_conflict_entries() is called 
at least twice. Calling rte_hash_iterate_conflict_entries() once evens 
out, and calling rte_hash_iterate_conflict_entries() more times adds 
further savings. As a side note. in our application, whenever an 
iterator of conflicting entries is initialized, we call 
rte_hash_iterate_conflict_entries() at least once.

> Also iterate_conflic_entry may need reader lock protection.

    We are going to add the reader lock protection. Thanks.

[ ]'s
Michel Machado
Wang, Yipeng1 Sept. 4, 2018, 8:57 p.m. | #9
Thanks for your reply.

>-----Original Message-----
>From: Michel Machado [mailto:michel@digirati.com.br]

>    Exposing the private fields would bind the interface with the
>current implementation of the hash table. In the way we are proposing,
>one should be able to replace the underlying algorithm and not touching
>the header files that applications use. But, yes, your solution would
>enable applications to allocate iterator states as local variables as well.
>

[Wang, Yipeng] I didn't mean to expose the private fields. But only the 
Type. For example, rte_hash does not expose its private fields to users.
One can change the fields without changing API.

>
>    Notice that the field total_entry only exists for
>rte_hash_iterate(). But even if total_entry were in the state of
>rte_hash_iterate_conflict_entries(), it would still save on the
>multiplication as long as rte_hash_iterate_conflict_entries() is called
>at least twice. Calling rte_hash_iterate_conflict_entries() once evens
>out, and calling rte_hash_iterate_conflict_entries() more times adds
>further savings. As a side note. in our application, whenever an
>iterator of conflicting entries is initialized, we call
>rte_hash_iterate_conflict_entries() at least once.
>

[Wang, Yipeng] I mean the extra overhead of _init function which does the
Calculation.  For iterate_conflict_entries, since it does not need
Total_entries, _init will cost extra cycles for the calculation, especially in my
imaginary use cases, one may call _init multiple times on the same state to iterate
different keys. I guess your application is not sensitive to this overhead,
So I think I am OK with current implementation.
 
>
>[ ]'s
>Michel Machado
Michel Machado Sept. 5, 2018, 5:52 p.m. | #10
Hi Yipeng,

On 09/04/2018 04:57 PM, Wang, Yipeng1 wrote:
>> -----Original Message-----
>> From: Michel Machado [mailto:michel@digirati.com.br]
> 
>>     Exposing the private fields would bind the interface with the
>> current implementation of the hash table. In the way we are proposing,
>> one should be able to replace the underlying algorithm and not touching
>> the header files that applications use. But, yes, your solution would
>> enable applications to allocate iterator states as local variables as well.
>>
> 
> [Wang, Yipeng] I didn't mean to expose the private fields. But only the
> Type. For example, rte_hash does not expose its private fields to users.
> One can change the fields without changing API.

    The fact that struct rte_hash does not expose its private fields but 
only its type to applications means that a compiler cannot find out the 
byte length of struct rte_hash using only the header rte_hash.h. Thus, 
an application cannot allocate memory on its own (e.g. as a local 
variable) for a struct rte_hash. An application can, however, have a 
pointer to a struct rte_hash since the byte length of a pointer only 
depends on the architecture of the machine. This is the motivation 
behind having struct rte_hash_iterator_state in rte_hash.h only holding 
an array of bytes.

    There are good reasons to implement struct rte_hash as it is. For 
examples, struct rte_hash can change its byte length between versions of 
DPDK even if applications are dynamically linked to DPDK and not 
recompiled. Moreover a hash table is unlikely to be so short-lived as an 
iterator.

[ ]'s
Michel Machado
Wang, Yipeng1 Sept. 5, 2018, 8:27 p.m. | #11
Hmm I see, it falls back to my original thought to have malloc inside the init function..
Thanks for the explanation. :) 

So I guess with your implementation, in future if we change the internal state to be larger,
the ABI will be broken. BTW, this patch set also changes API so proper notice is needed.
People more familiar with API/ABI change policies may be able to help here.

Just to confirm, is there anyway like I said for your application to have some long-live states
and reuse them throughout the application so that you don’t have to have short-lived ones in stack?

Thanks
Yipeng

>
>    The fact that struct rte_hash does not expose its private fields but
>only its type to applications means that a compiler cannot find out the
>byte length of struct rte_hash using only the header rte_hash.h. Thus,
>an application cannot allocate memory on its own (e.g. as a local
>variable) for a struct rte_hash. An application can, however, have a
>pointer to a struct rte_hash since the byte length of a pointer only
>depends on the architecture of the machine. This is the motivation
>behind having struct rte_hash_iterator_state in rte_hash.h only holding
>an array of bytes.
>
>    There are good reasons to implement struct rte_hash as it is. For
>examples, struct rte_hash can change its byte length between versions of
>DPDK even if applications are dynamically linked to DPDK and not
>recompiled. Moreover a hash table is unlikely to be so short-lived as an
>iterator.
>
>[ ]'s
>Michel Machado
Honnappa Nagarahalli Sept. 5, 2018, 10:13 p.m. | #12
-----Original Message-----
From: Michel Machado <michel@digirati.com.br> 
Sent: Tuesday, September 4, 2018 2:37 PM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Qiaobin Fu <qiaobinf@bu.edu>; bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; yipeng1.wang@intel.com
Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries

Hi Honnappa,

On 09/02/2018 06:05 PM, Honnappa Nagarahalli wrote:
> +/* istate stands for internal state. */ struct 
> +rte_hash_iterator_istate {
> +	const struct rte_hash *h;
> This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.

    Discussed later.

> +	uint32_t              next;
> +	uint32_t              total_entries;
> +};
> This structure can be moved to rte_cuckoo_hash.h file.

    What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?

This is to maintain consistency. For ex: 'struct queue_node', which is an internal structure, is kept in rte_cuckoo_hash.h

> +int32_t
> +rte_hash_iterator_init(const struct rte_hash *h,
> +	struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced by 's'.
> 
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
> +	__state->h = h;
> +	__state->next = 0;
> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> +
> +	return 0;
> +}
> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.

    Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?

My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.

>   int32_t
> -rte_hash_iterate(const struct rte_hash *h, const void **key, void 
> **data, uint32_t *next)
> +rte_hash_iterate(
> +	struct rte_hash_iterator_state *state, const void **key, void 
> +**data)
> 
> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
> rte_hash_iterate(const struct rte_hash *h, const void **key, void 
> **data, struct rte_hash_iterator_state *state)
> 
> This will help keep the API signature consistent with existing APIs.
> 
> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.

    The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?

Are there any cost savings we can achieve by keeping the 'h' in the iterator state?

    Thank you for the link. We'll check how to proceed with the ABI change.

>   {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced with 's'.

    Gaëtan Rivet has already pointed this out in his review of this version of our patch.

>   	uint32_t bucket_idx, idx, position;
>   	struct rte_hash_key *next_key;
>   
> -	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
>   
> -	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>   	/* Out of bounds */
> -	if (*next >= total_entries)
> +	if (__state->next >= __state->total_entries)
>   		return -ENOENT;
>   
> 'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.

    The argument to keep _init() is presented above in this email.

>   	/* Calculate bucket and index of current iterator */
> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   
>   	/* If current position is empty, go to the next one */
> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> -		(*next)++;
> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +		__state->next++;
>   		/* End of table */
> -		if (*next == total_entries)
> +		if (__state->next == __state->total_entries)
>   			return -ENOENT;
> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   	}
> -	__hash_rw_reader_lock(h);
> +	__hash_rw_reader_lock(__state->h);
>   	/* Get position of entry in key table */
> -	position = h->buckets[bucket_idx].key_idx[idx];
> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> -				position * h->key_entry_size);
> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
> +				position * __state->h->key_entry_size);
>   	/* Return key and data */
>   	*key = next_key->key;
>   	*data = next_key->pdata;
>   
> -	__hash_rw_reader_unlock(h);
> +	__hash_rw_reader_unlock(__state->h);
>   
>   	/* Increment iterator */
> -	(*next)++;
> +	__state->next++;
> This comment is not related to this change, it is better to place this inside the lock.

    Even though __state->next does not depend on the lock?

It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?

>   	return position - 1;
>   }
> +
> +/* istate stands for internal state. */ struct 
> +rte_hash_iterator_conflict_entries_istate {
> +	const struct rte_hash *h;
> This can be moved outside of this structure.

    Discussed earlier.

> +	uint32_t              vnext;
> +	uint32_t              primary_bidx;
> +	uint32_t              secondary_bidx;
> +};
> +
> +int32_t __rte_experimental
> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
> +	hash_sig_t sig, struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +	__state->h = h;
> +	__state->vnext = 0;
> +
> +	/* Get the primary bucket index given the precomputed hash value. */
> +	__state->primary_bidx = sig & h->bucket_bitmask;
> +	/* Get the secondary bucket index given the precomputed hash value. */
> +	__state->secondary_bidx =
> +		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +
> +	return 0;
> +}
> IMO, as mentioned above, it is possible to avoid creating this API.

    Discussed earlier.

> +
> +int32_t __rte_experimental
> +rte_hash_iterate_conflict_entries(
> +	struct rte_hash_iterator_state *state, const void **key, void 
> +**data)
> Signature of this API can be changed as follows:
> rte_hash_iterate_conflict_entries(
> 	struct rte_hash *h, const void **key, void **data, struct 
> rte_hash_iterator_state *state)

    Discussed earlier.

> +{
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +
> +	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
> +		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
> +			__state->primary_bidx : __state->secondary_bidx;
> +		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
> 
> take the reader lock before reading bucket entry

    Thanks for pointing this out. We are going to do so. The lock came in as we go through the versions of this patch.

> +		uint32_t position = __state->h->buckets[bidx].key_idx[next];
> +		struct rte_hash_key *next_key;
> +
> +		/* Increment iterator. */
> +		__state->vnext++;
> +
> +		/*
> +		 * The test below is unlikely because this iterator is meant
> +		 * to be used after a failed insert.
> +		 */
> +		if (unlikely(position == EMPTY_SLOT))
> +			continue;
> +
> +		/* Get the entry in key table. */
> +		next_key = (struct rte_hash_key *) (
> +			(char *)__state->h->key_store +
> +			position * __state->h->key_entry_size);
> +		/* Return key and data. */
> +		*key = next_key->key;
> +		*data = next_key->pdata;
> give the reader lock

    We'll do so.

> +
> +		return position - 1;
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h 
> index 9e7d9315f..fdb01023e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -14,6 +14,8 @@
>   #include <stdint.h>
>   #include <stddef.h>
>   
> +#include <rte_compat.h>
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>   /** @internal A hash table structure. */  struct rte_hash;
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal A hash table iterator state structure.
> + */
> +struct rte_hash_iterator_state {
> +	uint8_t space[64];
> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.

    Okay.

[ ]'s
Michel Machado
Michel Machado Sept. 6, 2018, 1:34 p.m. | #13
On 09/05/2018 04:27 PM, Wang, Yipeng1 wrote:
> Hmm I see, it falls back to my original thought to have malloc inside the init function..
> Thanks for the explanation. :)
> 
> So I guess with your implementation, in future if we change the internal state to be larger,
> the ABI will be broken.

    If that happens, yes, the ABI would need to change again. But this 
concern is overblown for two reasons. First, this event is unlikely to 
happen because struct rte_hash_iterator_state is already allocating 64 
bytes while struct rte_hash_iterator_istate and struct 
rte_hash_iterator_conflict_entries_istate consume 16 and 20 bytes, 
respectively. Thus, the complexity of the underlying hash algorithm 
would need to grow substantially to force the necessary state of these 
iterators to grow more than 4x and 3x, respectively. This is unlikely to 
happen, and, if it does, it would likely break the ABI somewhere else 
and have a high impact on applications anyway.

    Second, even if the unlikely event happens, all one would need to do 
is to increase the size of struct rte_hash_iterator_state, mark the new 
API as a new version, and applications would be ready for the new ABI 
just recompiling.

> BTW, this patch set also changes API so proper notice is needed.
> People more familiar with API/ABI change policies may be able to help here.

    We'd be happy to get feedback on this aspect.

> Just to confirm, is there anyway like I said for your application to have some long-live states
> and reuse them throughout the application so that you don’t have to have short-lived ones in stack?

    Two things would need to happen for this to be possible. The init 
functions would need to accept previously allocated iterator states, 
that is, the init function would act as a reset of the state when acting 
on a previous allocated state. And, applications would now need to carry 
these pre-allocated state to avoid a malloc. In order words, we'll 
increase the complexity of the API.

    To emphasize that the cost of a malloc is not negligible, 
rte_malloc() needs to get a spinlock (see heap_alloc_on_socket()), do 
its thing to allocate memory, and, if the first attempt fails, try to 
allocate the memory on other sockets (see end of malloc_heap_alloc()). 
For an iterator that goes through the whole hash table, this cost may be 
okay, but for an iterator that goes through a couple entries, this cost 
is a lot to add.

    This memory allocation concern is not new. Function 
rte_pktmbuf_read(), for example, let applications pass buffers, which 
are often allocated in the execution stack, to avoid the malloc cost.

[ ]'s
Michel Machado
Michel Machado Sept. 6, 2018, 2:28 p.m. | #14
On 09/05/2018 06:13 PM, Honnappa Nagarahalli wrote:
>> +	uint32_t              next;
>> +	uint32_t              total_entries;
>> +};
>> This structure can be moved to rte_cuckoo_hash.h file.
> 
>      What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
> 
> This is to maintain consistency. For ex: 'struct queue_node', which is an internal structure, is kept in rte_cuckoo_hash.h

    Okay. We'll move it there.

>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> +	struct rte_hash_iterator_state *state) {
>> +	struct rte_hash_iterator_istate *__state;
>> '__state' can be replaced by 's'.
>>
>> +
>> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>> +
>> +	__state = (struct rte_hash_iterator_istate *)state;
>> +	__state->h = h;
>> +	__state->next = 0;
>> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>> +
>> +	return 0;
>> +}
>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
> 
>      Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
> 
> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.

    It's not just about creating one init function for each iterator 
because an iterator may have a couple of init functions. For example, 
someone may eventually find useful to add another init function for the 
conflicting-entry iterator that we are advocating in this patch. A 
possibility would be for this new init function to use the key of the 
new entry instead of its signature to initialize the state. Similar to 
what is already done in rte_hash_lookup*() functions. In spite of 
possibly having multiple init functions, there will be a single iterator 
function.

    About the performance benchmarking, the current API only requites 
applications to initialize a single 32-bit integer. But with the 
adoption of a struct for the state, the initialization will grow to 64 
bytes.

>>    int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, uint32_t *next)
>> +rte_hash_iterate(
>> +	struct rte_hash_iterator_state *state, const void **key, void
>> +**data)
>>
>> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, struct rte_hash_iterator_state *state)
>>
>> This will help keep the API signature consistent with existing APIs.
>>
>> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
> 
>      The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
> 
> Are there any cost savings we can achieve by keeping the 'h' in the iterator state?

    There's a tiny cost saving: avoiding to push that parameter in the 
execution stack every time the iterator will get another entry. However, 
the reason I find more important is to make impossible to introduce a 
bug in the code. Consider a function that is dealing with two hash 
tables and two iterators. Without asking for the hash table to make 
progress in an iterator, it's impossible to mix up hash tables and 
iterator states.

    There's even the possibility that an iterator doesn't need the hash 
table after its initialization. This would be an *unlikely* case, but 
consider an iterator that only returns a couple of entries. It could 
cache those entries during initialization.

>>    	/* Calculate bucket and index of current iterator */
>> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    
>>    	/* If current position is empty, go to the next one */
>> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> -		(*next)++;
>> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> +		__state->next++;
>>    		/* End of table */
>> -		if (*next == total_entries)
>> +		if (__state->next == __state->total_entries)
>>    			return -ENOENT;
>> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    	}
>> -	__hash_rw_reader_lock(h);
>> +	__hash_rw_reader_lock(__state->h);
>>    	/* Get position of entry in key table */
>> -	position = h->buckets[bucket_idx].key_idx[idx];
>> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> -				position * h->key_entry_size);
>> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
>> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>> +				position * __state->h->key_entry_size);
>>    	/* Return key and data */
>>    	*key = next_key->key;
>>    	*data = next_key->pdata;
>>    
>> -	__hash_rw_reader_unlock(h);
>> +	__hash_rw_reader_unlock(__state->h);
>>    
>>    	/* Increment iterator */
>> -	(*next)++;
>> +	__state->next++;
>> This comment is not related to this change, it is better to place this inside the lock.
> 
>      Even though __state->next does not depend on the lock?
> 
> It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?

    If an application needs to share an iterator state between threads, 
it has to have a synchronization mechanism for that as it would for any 
other shared variable. The lock above is allowing applications to share 
a hash table between threads, it has no semantic over anything else.

>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>> index 9e7d9315f..fdb01023e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -14,6 +14,8 @@
>>    #include <stdint.h>
>>    #include <stddef.h>
>>    
>> +#include <rte_compat.h>
>> +
>>    #ifdef __cplusplus
>>    extern "C" {
>>    #endif
>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>    /** @internal A hash table structure. */  struct rte_hash;
>>    
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * @internal A hash table iterator state structure.
>> + */
>> +struct rte_hash_iterator_state {
>> +	uint8_t space[64];
>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
> 
>      Okay.

    I think we should not replace 64 with RTE_CACHE_LINE_SIZE because 
the ABI would change based on the architecture for which it's compiled.

[ ]'s
Michel Machado
Honnappa Nagarahalli Sept. 12, 2018, 8:37 p.m. | #15
Hi Michel,
	I applied your patch and tweaked the code to run few performance tests on Arm (Cortex-A72, 1.3GHz) and x86 (Intel Xeon CPU E5-2660 v4 @ 2.00GHz). The perf code looks as follows:

        count_b = rte_rdtsc_precise();
        int k = 0;
        rte_hash_iterator_init(tbl_rw_test_param.h, &state);

        while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
                /* Search for the key in the list of keys added .*/
                i = *(const uint32_t *)next_key;
                tbl_rw_test_param.found[i]++;
                k++;
        }

        count_a = rte_rdtsc_precise() - count_b;
        printf("*****Cycles2 per iterate call: %lu\n", count_a/k);

Further, I changed the rte_hash_iterate as follows and ran the same test.
int32_t rte_hash_iterate(const struct rte_hash *h, struct rte_hash_iterator_state *state, const void **key, void **data)

Finally, I used memset in the place of rte_hash_iterator_init with the required changes to rte_hash_iterate.

All these tests show little variation in 'cycles per iterate call' for both architectures.


-----Original Message-----
From: Michel Machado <michel@digirati.com.br> 
Sent: Thursday, September 6, 2018 9:29 AM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Qiaobin Fu <qiaobinf@bu.edu>; bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; yipeng1.wang@intel.com
Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries

On 09/05/2018 06:13 PM, Honnappa Nagarahalli wrote:
>> +	uint32_t              next;
>> +	uint32_t              total_entries;
>> +};
>> This structure can be moved to rte_cuckoo_hash.h file.
> 
>      What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
> 
> This is to maintain consistency. For ex: 'struct queue_node', which is 
> an internal structure, is kept in rte_cuckoo_hash.h

    Okay. We'll move it there.

>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> +	struct rte_hash_iterator_state *state) {
>> +	struct rte_hash_iterator_istate *__state;
>> '__state' can be replaced by 's'.
>>
>> +
>> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>> +
>> +	__state = (struct rte_hash_iterator_istate *)state;
>> +	__state->h = h;
>> +	__state->next = 0;
>> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>> +
>> +	return 0;
>> +}
>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
> 
>      Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
> 
> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.

    It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.

    About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.

As my tests showed, I do not see any impact of this.

>>    int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void 
>> **data, uint32_t *next)
>> +rte_hash_iterate(
>> +	struct rte_hash_iterator_state *state, const void **key, void
>> +**data)
>>
>> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void 
>> **data, struct rte_hash_iterator_state *state)
>>
>> This will help keep the API signature consistent with existing APIs.
>>
>> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
> 
>      The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
> 
> Are there any cost savings we can achieve by keeping the 'h' in the iterator state?

    There's a tiny cost saving: avoiding to push that parameter in the execution stack every time the iterator will get another entry. However, the reason I find more important is to make impossible to introduce a bug in the code. Consider a function that is dealing with two hash tables and two iterators. Without asking for the hash table to make progress in an iterator, it's impossible to mix up hash tables and iterator states.

IMO, similar arguments can be applied for other APIs too. It is more difficult to use the APIs if they are not consistent. I also do not see the benefit of the savings in my tests. 

    There's even the possibility that an iterator doesn't need the hash table after its initialization. This would be an *unlikely* case, but consider an iterator that only returns a couple of entries. It could cache those entries during initialization.

>>    	/* Calculate bucket and index of current iterator */
>> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    
>>    	/* If current position is empty, go to the next one */
>> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> -		(*next)++;
>> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> +		__state->next++;
>>    		/* End of table */
>> -		if (*next == total_entries)
>> +		if (__state->next == __state->total_entries)
>>    			return -ENOENT;
>> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    	}
>> -	__hash_rw_reader_lock(h);
>> +	__hash_rw_reader_lock(__state->h);
>>    	/* Get position of entry in key table */
>> -	position = h->buckets[bucket_idx].key_idx[idx];
>> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> -				position * h->key_entry_size);
>> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
>> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>> +				position * __state->h->key_entry_size);
>>    	/* Return key and data */
>>    	*key = next_key->key;
>>    	*data = next_key->pdata;
>>    
>> -	__hash_rw_reader_unlock(h);
>> +	__hash_rw_reader_unlock(__state->h);
>>    
>>    	/* Increment iterator */
>> -	(*next)++;
>> +	__state->next++;
>> This comment is not related to this change, it is better to place this inside the lock.
> 
>      Even though __state->next does not depend on the lock?
> 
> It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?

    If an application needs to share an iterator state between threads, it has to have a synchronization mechanism for that as it would for any other shared variable. The lock above is allowing applications to share a hash table between threads, it has no semantic over anything else.

Agree, the lock is for protecting the hash table, not the iterator state.

>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h 
>> index 9e7d9315f..fdb01023e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -14,6 +14,8 @@
>>    #include <stdint.h>
>>    #include <stddef.h>
>>    
>> +#include <rte_compat.h>
>> +
>>    #ifdef __cplusplus
>>    extern "C" {
>>    #endif
>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>    /** @internal A hash table structure. */  struct rte_hash;
>>    
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * @internal A hash table iterator state structure.
>> + */
>> +struct rte_hash_iterator_state {
>> +	uint8_t space[64];
>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
> 
>      Okay.

    I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.

Ok. May be have a #define for 64?

[ ]'s
Michel Machado
Michel Machado Sept. 20, 2018, 7:50 p.m. | #16
On 09/12/2018 04:37 PM, Honnappa Nagarahalli wrote:
>>> +int32_t
>>> +rte_hash_iterator_init(const struct rte_hash *h,
>>> +	struct rte_hash_iterator_state *state) {
>>> +	struct rte_hash_iterator_istate *__state;
>>> '__state' can be replaced by 's'.
>>>
>>> +
>>> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>>> +
>>> +	__state = (struct rte_hash_iterator_istate *)state;
>>> +	__state->h = h;
>>> +	__state->next = 0;
>>> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>>> +
>>> +	return 0;
>>> +}
>>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
>>
>>       Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
>>
>> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
> 
>      It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.
> 
>      About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.
> 
> As my tests showed, I do not see any impact of this.

    Ok, we are going to eliminate the init functions in v4.

>>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>>> index 9e7d9315f..fdb01023e 100644
>>> --- a/lib/librte_hash/rte_hash.h
>>> +++ b/lib/librte_hash/rte_hash.h
>>> @@ -14,6 +14,8 @@
>>>     #include <stdint.h>
>>>     #include <stddef.h>
>>>     
>>> +#include <rte_compat.h>
>>> +
>>>     #ifdef __cplusplus
>>>     extern "C" {
>>>     #endif
>>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>>     /** @internal A hash table structure. */  struct rte_hash;
>>>     
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * @internal A hash table iterator state structure.
>>> + */
>>> +struct rte_hash_iterator_state {
>>> +	uint8_t space[64];
>>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
>>
>>       Okay.
> 
>      I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.
> 
> Ok. May be have a #define for 64?

    Ok.

[ ]'s
Michel Machado

Patch

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index c8c435dfd..9be58a205 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -6,6 +6,7 @@  include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_hash.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 LDLIBS += -lrte_eal -lrte_ring
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8c9..cf5b28196 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1300,45 +1300,143 @@  rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
 	return __builtin_popcountl(*hit_mask);
 }
 
+/* istate stands for internal state. */
+struct rte_hash_iterator_istate {
+	const struct rte_hash *h;
+	uint32_t              next;
+	uint32_t              total_entries;
+};
+
+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+	struct rte_hash_iterator_state *state)
+{
+	struct rte_hash_iterator_istate *__state;
+
+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_istate *)state;
+	__state->h = h;
+	__state->next = 0;
+	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+	return 0;
+}
+
 int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
+rte_hash_iterate(
+	struct rte_hash_iterator_state *state, const void **key, void **data)
 {
+	struct rte_hash_iterator_istate *__state;
 	uint32_t bucket_idx, idx, position;
 	struct rte_hash_key *next_key;
 
-	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+		(data == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_istate *)state;
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
 	/* Out of bounds */
-	if (*next >= total_entries)
+	if (__state->next >= __state->total_entries)
 		return -ENOENT;
 
 	/* Calculate bucket and index of current iterator */
-	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-	idx = *next % RTE_HASH_BUCKET_ENTRIES;
+	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
-		(*next)++;
+	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		__state->next++;
 		/* End of table */
-		if (*next == total_entries)
+		if (__state->next == __state->total_entries)
 			return -ENOENT;
-		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-		idx = *next % RTE_HASH_BUCKET_ENTRIES;
+		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
 	}
-	__hash_rw_reader_lock(h);
+	__hash_rw_reader_lock(__state->h);
 	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
-	next_key = (struct rte_hash_key *) ((char *)h->key_store +
-				position * h->key_entry_size);
+	position = __state->h->buckets[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
+				position * __state->h->key_entry_size);
 	/* Return key and data */
 	*key = next_key->key;
 	*data = next_key->pdata;
 
-	__hash_rw_reader_unlock(h);
+	__hash_rw_reader_unlock(__state->h);
 
 	/* Increment iterator */
-	(*next)++;
+	__state->next++;
 
 	return position - 1;
 }
+
+/* istate stands for internal state. */
+struct rte_hash_iterator_conflict_entries_istate {
+	const struct rte_hash *h;
+	uint32_t              vnext;
+	uint32_t              primary_bidx;
+	uint32_t              secondary_bidx;
+};
+
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
+	hash_sig_t sig, struct rte_hash_iterator_state *state)
+{
+	struct rte_hash_iterator_conflict_entries_istate *__state;
+
+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
+	__state->h = h;
+	__state->vnext = 0;
+
+	/* Get the primary bucket index given the precomputed hash value. */
+	__state->primary_bidx = sig & h->bucket_bitmask;
+	/* Get the secondary bucket index given the precomputed hash value. */
+	__state->secondary_bidx =
+		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
+
+	return 0;
+}
+
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+	struct rte_hash_iterator_state *state, const void **key, void **data)
+{
+	struct rte_hash_iterator_conflict_entries_istate *__state;
+
+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+		(data == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
+
+	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
+		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
+			__state->primary_bidx : __state->secondary_bidx;
+		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
+		uint32_t position = __state->h->buckets[bidx].key_idx[next];
+		struct rte_hash_key *next_key;
+
+		/* Increment iterator. */
+		__state->vnext++;
+
+		/*
+		 * The test below is unlikely because this iterator is meant
+		 * to be used after a failed insert.
+		 */
+		if (unlikely(position == EMPTY_SLOT))
+			continue;
+
+		/* Get the entry in key table. */
+		next_key = (struct rte_hash_key *) (
+			(char *)__state->h->key_store +
+			position * __state->h->key_entry_size);
+		/* Return key and data. */
+		*key = next_key->key;
+		*data = next_key->pdata;
+
+		return position - 1;
+	}
+
+	return -ENOENT;
+}
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d9315f..fdb01023e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -14,6 +14,8 @@ 
 #include <stdint.h>
 #include <stddef.h>
 
+#include <rte_compat.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -64,6 +66,16 @@  struct rte_hash_parameters {
 /** @internal A hash table structure. */
 struct rte_hash;
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal A hash table iterator state structure.
+ */
+struct rte_hash_iterator_state {
+	uint8_t space[64];
+} __rte_cache_aligned;
+
 /**
  * Create a new hash table.
  *
@@ -443,26 +455,82 @@  rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		      uint32_t num_keys, int32_t *positions);
 
 /**
- * Iterate through the hash table, returning key-value pairs.
+ * Initialize the iterator over the hash table.
  *
  * @param h
- *   Hash table to iterate
+ *   Hash table to iterate.
+ * @param state
+ *   Pointer to the iterator state.
+ * @return
+ *   - 0 if successful.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+	struct rte_hash_iterator_state *state);
+
+/**
+ * Iterate through the hash table, returning key-value pairs.
+ *
+ * @param state
+ *   Pointer to the iterator state.
  * @param key
  *   Output containing the key where current iterator
  *   was pointing at
  * @param data
  *   Output containing the data associated with key.
  *   Returns NULL if data was not stored.
- * @param next
- *   Pointer to iterator. Should be 0 to start iterating the hash table.
- *   Iterator is incremented after each call of this function.
  * @return
  *   Position where key was stored, if successful.
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if end of the hash table.
  */
 int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
+rte_hash_iterate(
+	struct rte_hash_iterator_state *state, const void **key, void **data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the iterator over entries that conflict with a given hash.
+ *
+ * @param h
+ *   Hash table to iterate.
+ * @param sig
+ *   Precomputed hash value with which the returning entries conflict.
+ * @param state
+ *   Pointer to the iterator state.
+ * @return
+ *   - 0 if successful.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
+	hash_sig_t sig, struct rte_hash_iterator_state *state);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Iterate over entries that conflict with a given hash.
+ *
+ * @param state
+ *   Pointer to the iterator state.
+ * @param key
+ *   Output containing the key at where the iterator is currently pointing.
+ * @param data
+ *   Output containing the data associated with key.
+ *   Returns NULL if data was not stored.
+ * @return
+ *   Position where key was stored, if successful.
+ *   - -EINVAL if the parameters are invalid.
+ *   - -ENOENT if there is no more conflicting entries.
+ */
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+	struct rte_hash_iterator_state *state, const void **key, void **data);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index e216ac8e2..301d4638c 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -24,6 +24,7 @@  DPDK_2.1 {
 
 	rte_hash_add_key_data;
 	rte_hash_add_key_with_hash_data;
+	rte_hash_iterator_init;
 	rte_hash_iterate;
 	rte_hash_lookup_bulk_data;
 	rte_hash_lookup_data;
@@ -53,3 +54,10 @@  DPDK_18.08 {
 	rte_hash_count;
 
 } DPDK_16.07;
+
+EXPERIMENTAL {
+	global:
+
+	rte_hash_iterator_conflict_entries_init_with_hash;
+	rte_hash_iterate_conflict_entries;
+};
diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd10..bf57004c3 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -1170,8 +1170,8 @@  static int test_hash_iteration(void)
 	void *next_data;
 	void *data[NUM_ENTRIES];
 	unsigned added_keys;
-	uint32_t iter = 0;
 	int ret = 0;
+	struct rte_hash_iterator_state state;
 
 	ut_params.entries = NUM_ENTRIES;
 	ut_params.name = "test_hash_iteration";
@@ -1180,6 +1180,9 @@  static int test_hash_iteration(void)
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
+			"initialization of the hash iterator failed");
+
 	/* Add random entries until key cannot be added */
 	for (added_keys = 0; added_keys < NUM_ENTRIES; added_keys++) {
 		data[added_keys] = (void *) ((uintptr_t) rte_rand());
@@ -1191,7 +1194,7 @@  static int test_hash_iteration(void)
 	}
 
 	/* Iterate through the hash table */
-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added */
 		for (i = 0; i < NUM_ENTRIES; i++) {
 			if (memcmp(next_key, keys[i], ut_params.key_len) == 0) {
diff --git a/test/test/test_hash_multiwriter.c b/test/test/test_hash_multiwriter.c
index 6a3eb10bd..48db8007d 100644
--- a/test/test/test_hash_multiwriter.c
+++ b/test/test/test_hash_multiwriter.c
@@ -125,18 +125,22 @@  test_hash_multiwriter(void)
 
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	uint32_t count;
 
+	struct rte_hash_iterator_state state;
+
 	snprintf(name, 32, "test%u", calledCount++);
 	hash_params.name = name;
 
 	handle = rte_hash_create(&hash_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
+			"initialization of the hash iterator failed");
+
 	tbl_multiwriter_test_params.h = handle;
 	tbl_multiwriter_test_params.nb_tsx_insertion =
 		nb_total_tsx_insertion / rte_lcore_count();
@@ -203,7 +207,7 @@  test_hash_multiwriter(void)
 		goto err3;
 	}
 
-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added .*/
 		i = *(const uint32_t *)next_key;
 		tbl_multiwriter_test_params.found[i]++;
diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
index 55ae33d80..9cdab9992 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -166,12 +166,13 @@  test_hash_readwrite_functional(int use_htm)
 	unsigned int i;
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	int use_jhash = 1;
 
+	struct rte_hash_iterator_state state;
+
 	rte_atomic64_init(&gcycles);
 	rte_atomic64_clear(&gcycles);
 
@@ -188,6 +189,8 @@  test_hash_readwrite_functional(int use_htm)
 		tbl_rw_test_param.num_insert
 		* rte_lcore_count();
 
+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
+
 	printf("++++++++Start function tests:+++++++++\n");
 
 	/* Fire all threads. */
@@ -195,8 +198,7 @@  test_hash_readwrite_functional(int use_htm)
 				 NULL, CALL_MASTER);
 	rte_eal_mp_wait_lcore();
 
-	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
-			&next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added .*/
 		i = *(const uint32_t *)next_key;
 		tbl_rw_test_param.found[i]++;
@@ -315,9 +317,10 @@  test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 	int use_jhash = 0;
 
+	struct rte_hash_iterator_state state;
+
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 
@@ -336,6 +339,8 @@  test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 	if (init_params(use_htm, use_jhash) != 0)
 		goto err;
 
+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
+
 	/*
 	 * Do a readers finish faster or writers finish faster test.
 	 * When readers finish faster, we timing the readers, and when writers
@@ -484,8 +489,7 @@  test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 
 		rte_eal_mp_wait_lcore();
 
-		while (rte_hash_iterate(tbl_rw_test_param.h,
-				&next_key, &next_data, &iter) >= 0) {
+		while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 			/* Search for the key in the list of keys added .*/
 			i = *(const uint32_t *)next_key;
 			tbl_rw_test_param.found[i]++;