diff mbox series

hash table: add a bucket iterator function

Message ID 20180728174851.46422-1-qiaobinf@bu.edu (mailing list archive)
State Changes Requested, archived
Headers show
Series hash table: add a bucket iterator function | expand

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Fu, Qiaobin July 28, 2018, 5:48 p.m. UTC
Function rte_hash_bucket_iterate() enables callers to
incrementally iterate over the hash table bucket by bucket,
so that it can avoid creating hiccups and thrashing the cache
of the processor.

This patch mainly deals with cases in which
the hash table is full and one needs to decide if the incoming
entry is more valuable than the entries already in the hash table.

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
---
 lib/librte_hash/rte_cuckoo_hash.c    | 60 +++++++++++++++++++++++++
 lib/librte_hash/rte_hash.h           | 66 ++++++++++++++++++++++++++++
 lib/librte_hash/rte_hash_version.map |  4 ++
 3 files changed, 130 insertions(+)

Comments

Wiles, Keith July 29, 2018, 1:17 p.m. UTC | #1
> On Jul 28, 2018, at 12:48 PM, Qiaobin Fu <qiaobinf@bu.edu> wrote:
> 
> Function rte_hash_bucket_iterate() enables callers to
> incrementally iterate over the hash table bucket by bucket,
> so that it can avoid creating hiccups and thrashing the cache
> of the processor.
> 
> This patch mainly deals with cases in which
> the hash table is full and one needs to decide if the incoming
> entry is more valuable than the entries already in the hash table.
> 
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Cody Doucette <doucette@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> lib/librte_hash/rte_cuckoo_hash.c    | 60 +++++++++++++++++++++++++
> lib/librte_hash/rte_hash.h           | 66 ++++++++++++++++++++++++++++
> lib/librte_hash/rte_hash_version.map |  4 ++
> 3 files changed, 130 insertions(+)
> 
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
> index a07543a29..2b90f3593 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -1160,3 +1160,63 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
> 
> 	return position - 1;
> }
> +
> +int
> +rte_hash_get_num_buckets(struct rte_hash *h)
> +{
> +	RETURN_IF_TRUE((h == NULL), -EINVAL);
> +	return h->num_buckets;
> +}
> +
> +uint32_t
> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig)
> +{
> +	return sig & h->bucket_bitmask;
> +}
> +
> +uint32_t
> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig)
> +{
> +	return rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +}

The above two functions should have the RETURN_IF_TRUE((h == NULL), -EINVAL); statements just like the first one.

> +
> +int32_t
> +rte_hash_bucket_iterate(const struct rte_hash *h,
> +	const void **key, void **data, uint32_t *bidx, uint32_t *next)
> +{
> +	uint32_t position;
> +	struct rte_hash_key *next_key;
> +
> +	RETURN_IF_TRUE(((h == NULL) || (key == NULL) || (data == NULL) ||
> +		(bidx == NULL) || (next == NULL)), -EINVAL);
> +
> +	/* Out of bounds. */
> +	if (*bidx >= h->num_buckets || *next >= RTE_HASH_BUCKET_ENTRIES)
> +		goto next_bucket;
> +
> +	/* If current position is empty, go to the next one. */
> +	while (h->buckets[*bidx].key_idx[*next] == EMPTY_SLOT) {
> +		(*next)++;
> +		/* End of bucket. */
> +		if (*next >= RTE_HASH_BUCKET_ENTRIES)
> +			goto next_bucket;
> +	}
> +
> +	/* Get position of entry in key table. */
> +	position = h->buckets[*bidx].key_idx[*next];
> +	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> +				position * h->key_entry_size);
> +	/* Return key and data. */
> +	*key = next_key->key;
> +	*data = next_key->pdata;
> +
> +	/* Increment iterator. */
> +	(*next)++;
> +
> +	return position - 1;
> +
> +next_bucket:
> +	*bidx = (*bidx + 1) >= h->num_buckets ? 0 : *bidx + 1;
> +	*next = 0;

If this is an error case why are we setting *next to zero and incrementing *bidx ?
I would expect we should not change the values on an error because they cannot debug the values on return.

> +	return -ENOENT;
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
> index f71ca9fbf..329bf42d6 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -94,6 +94,17 @@ rte_hash_create(const struct rte_hash_parameters *params);
>  */
> void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func);
> 
> +/**
> + * Get the number of buckets in the hash table.
> + * @param h
> + *   Hash table for which the function queries.
> + * @return
> + *   The number of buckets in the hash table h.
> + *    - EINVAL - invalid parameter passed to function
> + */
> +int

Needs to change to ‘int __rte_experimental’

> +rte_hash_get_num_buckets(struct rte_hash *h);
> +
> /**
>  * Find an existing hash table object and return a pointer to it.
>  *
> @@ -261,6 +272,34 @@ int
> rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
> 			       void **key);
> 
> +/**
> + * Get the primary bucket index given the precomputed hash value.
> + * This operation is multi-thread safe.
> + *
> + * @param h
> + *   Hash table to get the key from.
> + * @param sig
> + *   Precomputed hash value.
> + * @return
> + *   The primary bucket index.

Comment the error case here.
> + */
> +uint32_t

Needs to change to ‘uint32_t __rte_experimental’
> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig);
> +
> +/**
> + * Get the secondary bucket index given the precomputed hash value.
> + * This operation is multi-thread safe.
> + *
> + * @param h
> + *   Hash table to get the key from.
> + * @param sig
> + *   Precomputed hash value.
> + * @return
> + *   The primary bucket index.

Comment the error case here.
> + */
> +uint32_t

Needs to change to ‘uint32_t __rte_experimental’
> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig);
> +
> /**
>  * Find a key-value pair in the hash table.
>  * This operation is multi-thread safe.
> @@ -419,6 +458,33 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>  */
> int32_t
> rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
> +
> +/**
> + * Iterate through the buckets in hash table, returning key-value pairs.
> + *
> + * @param h
> + *   Hash table to iterate
> + * @param key
> + *   Output containing the key where current iterator
> + *   was pointing at
> + * @param data
> + *   Output containing the data associated with key.
> + *   Returns NULL if data was not stored.
> + * @param bidx
> + *   Pointer to the current bucket index. Should be 0 to start
> + *   iterating the hash table. Iterator is incremented after
> + *   RTE_HASH_BUCKET_ENTRIES calls of this function.

The wording of the last line seem odd to me, should the ‘of’ be removed?

> + * @param next
> + *   Pointer to iterator. Should be 0 to start iterating the bucket.
> + *   Iterator is incremented after each call of this function.
> + * @return
> + *   Position where key was stored, if successful.
> + *   - -EINVAL if the parameters are invalid.
> + *   - -ENOENT if end of the bucket.
> + */
> +int32_t

Needs to change to ‘int32_t __rte_experimental’ and why use ‘int32_t' when everywhere else we are using ‘int'?

> +rte_hash_bucket_iterate(const struct rte_hash *h,
> +	const void **key, void **data, uint32_t *bidx, uint32_t *next);

Should have a blank line here before the #ifdef.
> #ifdef __cplusplus
> }
> #endif
> diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
> index 52a2576f9..7d48f32c9 100644
> --- a/lib/librte_hash/rte_hash_version.map
> +++ b/lib/librte_hash/rte_hash_version.map
> @@ -15,6 +15,10 @@ DPDK_2.0 {
> 	rte_hash_lookup;
> 	rte_hash_lookup_bulk;
> 	rte_hash_lookup_with_hash;
> +	rte_hash_get_num_buckets;
> +	rte_hash_get_primary_bucket;
> +	rte_hash_get_secondary_bucket;
> +	rte_hash_bucket_iterate;

These must be added to a EXPERIMENTAL clause in the file and later you can remove the experimental indictor.

> 
> 	local: *;
> };
> -- 
> 2.17.1
> 

Regards,
Keith
Wang, Yipeng1 July 30, 2018, 11:24 p.m. UTC | #2
Hi, Qiaobin, 

Thanks for the patch. If I understand correctly your use case is to use hash table as a "cache" that new entries should evict stale ones automatically. Could you be more specific on your use scenarios so that I can have more context? 

We are actually working on an extendable version of rte_hash which will link extra buckets to current table if the table is full. As long as the table is sized appropriately, there will be no conflict issues thus you don’t need to replace old entries. Will this solve your issue? Also, if the “cache” mode is what you want, we have the membership library “rte_member”. Is it applicable to your case? 

w.r.t. the patch set you proposed, my concern is that the new API will expose too much internals to the user, e.g. the bucket/entries structure should be implementation dependent. Exposing internal structures would prevent improving the implementation down the road unless we keep changing API which is not recommended.  


>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
>Sent: Sunday, July 29, 2018 6:17 AM
>To: Qiaobin Fu <qiaobinf@bu.edu>
>Cc: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org;
>michel@digirati.com.br; doucette@bu.edu
>Subject: Re: [dpdk-dev] [PATCH] hash table: add a bucket iterator function
>
>
>
>> On Jul 28, 2018, at 12:48 PM, Qiaobin Fu <qiaobinf@bu.edu> wrote:
>>
>> Function rte_hash_bucket_iterate() enables callers to
>> incrementally iterate over the hash table bucket by bucket,
>> so that it can avoid creating hiccups and thrashing the cache
>> of the processor.
>>
>> This patch mainly deals with cases in which
>> the hash table is full and one needs to decide if the incoming
>> entry is more valuable than the entries already in the hash table.
>>
>> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>> Reviewed-by: Cody Doucette <doucette@bu.edu>
>> Reviewed-by: Michel Machado <michel@digirati.com.br>
>> ---
>> lib/librte_hash/rte_cuckoo_hash.c    | 60 +++++++++++++++++++++++++
>> lib/librte_hash/rte_hash.h           | 66 ++++++++++++++++++++++++++++
>> lib/librte_hash/rte_hash_version.map |  4 ++
>> 3 files changed, 130 insertions(+)
>>
>> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>> index a07543a29..2b90f3593 100644
>> --- a/lib/librte_hash/rte_cuckoo_hash.c
>> +++ b/lib/librte_hash/rte_cuckoo_hash.c
>> @@ -1160,3 +1160,63 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
>>
>> 	return position - 1;
>> }
>> +
>> +int
>> +rte_hash_get_num_buckets(struct rte_hash *h)
>> +{
>> +	RETURN_IF_TRUE((h == NULL), -EINVAL);
>> +	return h->num_buckets;
>> +}
>> +
>> +uint32_t
>> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig)
>> +{
>> +	return sig & h->bucket_bitmask;
>> +}
>> +
>> +uint32_t
>> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig)
>> +{
>> +	return rte_hash_secondary_hash(sig) & h->bucket_bitmask;
>> +}
>
>The above two functions should have the RETURN_IF_TRUE((h == NULL), -EINVAL); statements just like the first one.
>
>> +
>> +int32_t
>> +rte_hash_bucket_iterate(const struct rte_hash *h,
>> +	const void **key, void **data, uint32_t *bidx, uint32_t *next)
>> +{
>> +	uint32_t position;
>> +	struct rte_hash_key *next_key;
>> +
>> +	RETURN_IF_TRUE(((h == NULL) || (key == NULL) || (data == NULL) ||
>> +		(bidx == NULL) || (next == NULL)), -EINVAL);
>> +
>> +	/* Out of bounds. */
>> +	if (*bidx >= h->num_buckets || *next >= RTE_HASH_BUCKET_ENTRIES)
>> +		goto next_bucket;
>> +
>> +	/* If current position is empty, go to the next one. */
>> +	while (h->buckets[*bidx].key_idx[*next] == EMPTY_SLOT) {
>> +		(*next)++;
>> +		/* End of bucket. */
>> +		if (*next >= RTE_HASH_BUCKET_ENTRIES)
>> +			goto next_bucket;
>> +	}
>> +
>> +	/* Get position of entry in key table. */
>> +	position = h->buckets[*bidx].key_idx[*next];
>> +	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> +				position * h->key_entry_size);
>> +	/* Return key and data. */
>> +	*key = next_key->key;
>> +	*data = next_key->pdata;
>> +
>> +	/* Increment iterator. */
>> +	(*next)++;
>> +
>> +	return position - 1;
>> +
>> +next_bucket:
>> +	*bidx = (*bidx + 1) >= h->num_buckets ? 0 : *bidx + 1;
>> +	*next = 0;
>
>If this is an error case why are we setting *next to zero and incrementing *bidx ?
>I would expect we should not change the values on an error because they cannot debug the values on return.
>
>> +	return -ENOENT;
>> +}
>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>> index f71ca9fbf..329bf42d6 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -94,6 +94,17 @@ rte_hash_create(const struct rte_hash_parameters *params);
>>  */
>> void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func);
>>
>> +/**
>> + * Get the number of buckets in the hash table.
>> + * @param h
>> + *   Hash table for which the function queries.
>> + * @return
>> + *   The number of buckets in the hash table h.
>> + *    - EINVAL - invalid parameter passed to function
>> + */
>> +int
>
>Needs to change to ‘int __rte_experimental’
>
>> +rte_hash_get_num_buckets(struct rte_hash *h);
>> +
>> /**
>>  * Find an existing hash table object and return a pointer to it.
>>  *
>> @@ -261,6 +272,34 @@ int
>> rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
>> 			       void **key);
>>
>> +/**
>> + * Get the primary bucket index given the precomputed hash value.
>> + * This operation is multi-thread safe.
>> + *
>> + * @param h
>> + *   Hash table to get the key from.
>> + * @param sig
>> + *   Precomputed hash value.
>> + * @return
>> + *   The primary bucket index.
>
>Comment the error case here.
>> + */
>> +uint32_t
>
>Needs to change to ‘uint32_t __rte_experimental’
>> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig);
>> +
>> +/**
>> + * Get the secondary bucket index given the precomputed hash value.
>> + * This operation is multi-thread safe.
>> + *
>> + * @param h
>> + *   Hash table to get the key from.
>> + * @param sig
>> + *   Precomputed hash value.
>> + * @return
>> + *   The primary bucket index.
>
>Comment the error case here.
>> + */
>> +uint32_t
>
>Needs to change to ‘uint32_t __rte_experimental’
>> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig);
>> +
>> /**
>>  * Find a key-value pair in the hash table.
>>  * This operation is multi-thread safe.
>> @@ -419,6 +458,33 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>>  */
>> int32_t
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
>> +
>> +/**
>> + * Iterate through the buckets in hash table, returning key-value pairs.
>> + *
>> + * @param h
>> + *   Hash table to iterate
>> + * @param key
>> + *   Output containing the key where current iterator
>> + *   was pointing at
>> + * @param data
>> + *   Output containing the data associated with key.
>> + *   Returns NULL if data was not stored.
>> + * @param bidx
>> + *   Pointer to the current bucket index. Should be 0 to start
>> + *   iterating the hash table. Iterator is incremented after
>> + *   RTE_HASH_BUCKET_ENTRIES calls of this function.
>
>The wording of the last line seem odd to me, should the ‘of’ be removed?
>
>> + * @param next
>> + *   Pointer to iterator. Should be 0 to start iterating the bucket.
>> + *   Iterator is incremented after each call of this function.
>> + * @return
>> + *   Position where key was stored, if successful.
>> + *   - -EINVAL if the parameters are invalid.
>> + *   - -ENOENT if end of the bucket.
>> + */
>> +int32_t
>
>Needs to change to ‘int32_t __rte_experimental’ and why use ‘int32_t' when everywhere else we are using ‘int'?
>
>> +rte_hash_bucket_iterate(const struct rte_hash *h,
>> +	const void **key, void **data, uint32_t *bidx, uint32_t *next);
>
>Should have a blank line here before the #ifdef.
>> #ifdef __cplusplus
>> }
>> #endif
>> diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
>> index 52a2576f9..7d48f32c9 100644
>> --- a/lib/librte_hash/rte_hash_version.map
>> +++ b/lib/librte_hash/rte_hash_version.map
>> @@ -15,6 +15,10 @@ DPDK_2.0 {
>> 	rte_hash_lookup;
>> 	rte_hash_lookup_bulk;
>> 	rte_hash_lookup_with_hash;
>> +	rte_hash_get_num_buckets;
>> +	rte_hash_get_primary_bucket;
>> +	rte_hash_get_secondary_bucket;
>> +	rte_hash_bucket_iterate;
>
>These must be added to a EXPERIMENTAL clause in the file and later you can remove the experimental indictor.
>
>>
>> 	local: *;
>> };
>> --
>> 2.17.1
>>
>
>Regards,
>Keith
Fu, Qiaobin July 31, 2018, 6:09 a.m. UTC | #3
Hi Yipeng,

Thanks for the feedbacks!

> On Jul 30, 2018, at 4:24 PM, Wang, Yipeng1 <yipeng1.wang@intel.com> wrote:
> 
> Hi, Qiaobin, 
> 
> Thanks for the patch. If I understand correctly your use case is to use hash table as a "cache" that new entries should evict stale ones automatically. Could you be more specific on your use scenarios so that I can have more context? 
> 

Actually, we didn’t use hash table as a “cache”, and there is no automation to evict stale ones. Instead, the functionrte_hash_bucket_iterate() allows the callers to iterate over the hash table in an incremental way, i.e., one bucket by another bucket, so that the caller doesn’t have to iterate over the whole hash table in one time. This way can avoid creating hiccups and achieve better cache performance. One possible use scenario is when DDoS attacks happen, one may want to take more CPU time to process the packets, thus iterating over the hash table incrementally is desirable.

> We are actually working on an extendable version of rte_hash which will link extra buckets to current table if the table is full. As long as the table is sized appropriately, there will be no conflict issues thus you don’t need to replace old entries. Will this solve your issue? Also, if the “cache” mode is what you want, we have the membership library “rte_member”. Is it applicable to your case? 

I agree that adding extra buckets to current table when the table is full can alleviate this scenario. Indeed, we also considered this solution before coming up our proposed solution. However, it’s still highly desirable to have a bucket iterator function. Considering the scenario where the machine’s memory is limited and cannot always allocate new memory to the hash table (e.g., DDoS attacks, switch measurement tasks, etc.), a solution is to allow the callers evict some less important (the criteria for the eviction is based on the caller’s needs) entries.

Indeed, we don’t have the “cache” mode, our implementation is a way to achieve better cache performance. So, the rte_member won’t help here.

> 
> w.r.t. the patch set you proposed, my concern is that the new API will expose too much internals to the user, e.g. the bucket/entries structure should be implementation dependent. Exposing internal structures would prevent improving the implementation down the road unless we keep changing API which is not recommended.  
> 

The functions we add here give a way for the callers to iterate over the specific bucket, which potentially contains the entry. These functions can be made general enough to allow callers to heuristically evict some entries, and add the new ones to the hash table. Otherwise, there is no way to evict some less important entries.

Hope this clarifies the patch.

Best,
Qiaobin

> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
>> Sent: Sunday, July 29, 2018 6:17 AM
>> To: Qiaobin Fu <qiaobinf@bu.edu>
>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org;
>> michel@digirati.com.br; doucette@bu.edu
>> Subject: Re: [dpdk-dev] [PATCH] hash table: add a bucket iterator function
>> 
>> 
>> 
>>> On Jul 28, 2018, at 12:48 PM, Qiaobin Fu <qiaobinf@bu.edu> wrote:
>>> 
>>> Function rte_hash_bucket_iterate() enables callers to
>>> incrementally iterate over the hash table bucket by bucket,
>>> so that it can avoid creating hiccups and thrashing the cache
>>> of the processor.
>>> 
>>> This patch mainly deals with cases in which
>>> the hash table is full and one needs to decide if the incoming
>>> entry is more valuable than the entries already in the hash table.
>>> 
>>> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>>> Reviewed-by: Cody Doucette <doucette@bu.edu>
>>> Reviewed-by: Michel Machado <michel@digirati.com.br>
>>> ---
>>> lib/librte_hash/rte_cuckoo_hash.c    | 60 +++++++++++++++++++++++++
>>> lib/librte_hash/rte_hash.h           | 66 ++++++++++++++++++++++++++++
>>> lib/librte_hash/rte_hash_version.map |  4 ++
>>> 3 files changed, 130 insertions(+)
>>> 
>>> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>>> index a07543a29..2b90f3593 100644
>>> --- a/lib/librte_hash/rte_cuckoo_hash.c
>>> +++ b/lib/librte_hash/rte_cuckoo_hash.c
>>> @@ -1160,3 +1160,63 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
>>> 
>>> 	return position - 1;
>>> }
>>> +
>>> +int
>>> +rte_hash_get_num_buckets(struct rte_hash *h)
>>> +{
>>> +	RETURN_IF_TRUE((h == NULL), -EINVAL);
>>> +	return h->num_buckets;
>>> +}
>>> +
>>> +uint32_t
>>> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig)
>>> +{
>>> +	return sig & h->bucket_bitmask;
>>> +}
>>> +
>>> +uint32_t
>>> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig)
>>> +{
>>> +	return rte_hash_secondary_hash(sig) & h->bucket_bitmask;
>>> +}
>> 
>> The above two functions should have the RETURN_IF_TRUE((h == NULL), -EINVAL); statements just like the first one.
>> 
>>> +
>>> +int32_t
>>> +rte_hash_bucket_iterate(const struct rte_hash *h,
>>> +	const void **key, void **data, uint32_t *bidx, uint32_t *next)
>>> +{
>>> +	uint32_t position;
>>> +	struct rte_hash_key *next_key;
>>> +
>>> +	RETURN_IF_TRUE(((h == NULL) || (key == NULL) || (data == NULL) ||
>>> +		(bidx == NULL) || (next == NULL)), -EINVAL);
>>> +
>>> +	/* Out of bounds. */
>>> +	if (*bidx >= h->num_buckets || *next >= RTE_HASH_BUCKET_ENTRIES)
>>> +		goto next_bucket;
>>> +
>>> +	/* If current position is empty, go to the next one. */
>>> +	while (h->buckets[*bidx].key_idx[*next] == EMPTY_SLOT) {
>>> +		(*next)++;
>>> +		/* End of bucket. */
>>> +		if (*next >= RTE_HASH_BUCKET_ENTRIES)
>>> +			goto next_bucket;
>>> +	}
>>> +
>>> +	/* Get position of entry in key table. */
>>> +	position = h->buckets[*bidx].key_idx[*next];
>>> +	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>>> +				position * h->key_entry_size);
>>> +	/* Return key and data. */
>>> +	*key = next_key->key;
>>> +	*data = next_key->pdata;
>>> +
>>> +	/* Increment iterator. */
>>> +	(*next)++;
>>> +
>>> +	return position - 1;
>>> +
>>> +next_bucket:
>>> +	*bidx = (*bidx + 1) >= h->num_buckets ? 0 : *bidx + 1;
>>> +	*next = 0;
>> 
>> If this is an error case why are we setting *next to zero and incrementing *bidx ?
>> I would expect we should not change the values on an error because they cannot debug the values on return.
>> 
>>> +	return -ENOENT;
>>> +}
>>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>>> index f71ca9fbf..329bf42d6 100644
>>> --- a/lib/librte_hash/rte_hash.h
>>> +++ b/lib/librte_hash/rte_hash.h
>>> @@ -94,6 +94,17 @@ rte_hash_create(const struct rte_hash_parameters *params);
>>> */
>>> void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func);
>>> 
>>> +/**
>>> + * Get the number of buckets in the hash table.
>>> + * @param h
>>> + *   Hash table for which the function queries.
>>> + * @return
>>> + *   The number of buckets in the hash table h.
>>> + *    - EINVAL - invalid parameter passed to function
>>> + */
>>> +int
>> 
>> Needs to change to ‘int __rte_experimental’
>> 
>>> +rte_hash_get_num_buckets(struct rte_hash *h);
>>> +
>>> /**
>>> * Find an existing hash table object and return a pointer to it.
>>> *
>>> @@ -261,6 +272,34 @@ int
>>> rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
>>> 			       void **key);
>>> 
>>> +/**
>>> + * Get the primary bucket index given the precomputed hash value.
>>> + * This operation is multi-thread safe.
>>> + *
>>> + * @param h
>>> + *   Hash table to get the key from.
>>> + * @param sig
>>> + *   Precomputed hash value.
>>> + * @return
>>> + *   The primary bucket index.
>> 
>> Comment the error case here.
>>> + */
>>> +uint32_t
>> 
>> Needs to change to ‘uint32_t __rte_experimental’
>>> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig);
>>> +
>>> +/**
>>> + * Get the secondary bucket index given the precomputed hash value.
>>> + * This operation is multi-thread safe.
>>> + *
>>> + * @param h
>>> + *   Hash table to get the key from.
>>> + * @param sig
>>> + *   Precomputed hash value.
>>> + * @return
>>> + *   The primary bucket index.
>> 
>> Comment the error case here.
>>> + */
>>> +uint32_t
>> 
>> Needs to change to ‘uint32_t __rte_experimental’
>>> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig);
>>> +
>>> /**
>>> * Find a key-value pair in the hash table.
>>> * This operation is multi-thread safe.
>>> @@ -419,6 +458,33 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>>> */
>>> int32_t
>>> rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
>>> +
>>> +/**
>>> + * Iterate through the buckets in hash table, returning key-value pairs.
>>> + *
>>> + * @param h
>>> + *   Hash table to iterate
>>> + * @param key
>>> + *   Output containing the key where current iterator
>>> + *   was pointing at
>>> + * @param data
>>> + *   Output containing the data associated with key.
>>> + *   Returns NULL if data was not stored.
>>> + * @param bidx
>>> + *   Pointer to the current bucket index. Should be 0 to start
>>> + *   iterating the hash table. Iterator is incremented after
>>> + *   RTE_HASH_BUCKET_ENTRIES calls of this function.
>> 
>> The wording of the last line seem odd to me, should the ‘of’ be removed?
>> 
>>> + * @param next
>>> + *   Pointer to iterator. Should be 0 to start iterating the bucket.
>>> + *   Iterator is incremented after each call of this function.
>>> + * @return
>>> + *   Position where key was stored, if successful.
>>> + *   - -EINVAL if the parameters are invalid.
>>> + *   - -ENOENT if end of the bucket.
>>> + */
>>> +int32_t
>> 
>> Needs to change to ‘int32_t __rte_experimental’ and why use ‘int32_t' when everywhere else we are using ‘int'?
>> 
>>> +rte_hash_bucket_iterate(const struct rte_hash *h,
>>> +	const void **key, void **data, uint32_t *bidx, uint32_t *next);
>> 
>> Should have a blank line here before the #ifdef.
>>> #ifdef __cplusplus
>>> }
>>> #endif
>>> diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
>>> index 52a2576f9..7d48f32c9 100644
>>> --- a/lib/librte_hash/rte_hash_version.map
>>> +++ b/lib/librte_hash/rte_hash_version.map
>>> @@ -15,6 +15,10 @@ DPDK_2.0 {
>>> 	rte_hash_lookup;
>>> 	rte_hash_lookup_bulk;
>>> 	rte_hash_lookup_with_hash;
>>> +	rte_hash_get_num_buckets;
>>> +	rte_hash_get_primary_bucket;
>>> +	rte_hash_get_secondary_bucket;
>>> +	rte_hash_bucket_iterate;
>> 
>> These must be added to a EXPERIMENTAL clause in the file and later you can remove the experimental indictor.
>> 
>>> 
>>> 	local: *;
>>> };
>>> --
>>> 2.17.1
>>> 
>> 
>> Regards,
>> Keith
>
Wiles, Keith July 31, 2018, 2:57 p.m. UTC | #4
> On Jul 31, 2018, at 1:09 AM, Fu, Qiaobin <qiaobinf@bu.edu> wrote:
> 
> Hi Yipeng,
> 
> Thanks for the feedbacks!
> 
>> On Jul 30, 2018, at 4:24 PM, Wang, Yipeng1 <yipeng1.wang@intel.com> wrote:
>> 
>> Hi, Qiaobin, 
>> 
>> Thanks for the patch. If I understand correctly your use case is to use hash table as a "cache" that new entries should evict stale ones automatically. Could you be more specific on your use scenarios so that I can have more context? 
>> 
> 
> Actually, we didn’t use hash table as a “cache”, and there is no automation to evict stale ones. Instead, the functionrte_hash_bucket_iterate() allows the callers to iterate over the hash table in an incremental way, i.e., one bucket by another bucket, so that the caller doesn’t have to iterate over the whole hash table in one time. This way can avoid creating hiccups and achieve better cache performance. One possible use scenario is when DDoS attacks happen, one may want to take more CPU time to process the packets, thus iterating over the hash table incrementally is desirable.

I do not see this as a cache, but only a way to iterate over the hash table.

> 
>> We are actually working on an extendable version of rte_hash which will link extra buckets to current table if the table is full. As long as the table is sized appropriately, there will be no conflict issues thus you don’t need to replace old entries. Will this solve your issue? Also, if the “cache” mode is what you want, we have the membership library “rte_member”. Is it applicable to your case? 
> 
> I agree that adding extra buckets to current table when the table is full can alleviate this scenario. Indeed, we also considered this solution before coming up our proposed solution. However, it’s still highly desirable to have a bucket iterator function. Considering the scenario where the machine’s memory is limited and cannot always allocate new memory to the hash table (e.g., DDoS attacks, switch measurement tasks, etc.), a solution is to allow the callers evict some less important (the criteria for the eviction is based on the caller’s needs) entries.
> 
> Indeed, we don’t have the “cache” mode, our implementation is a way to achieve better cache performance. So, the rte_member won’t help here.
> 
>> 
>> w.r.t. the patch set you proposed, my concern is that the new API will expose too much internals to the user, e.g. the bucket/entries structure should be implementation dependent. Exposing internal structures would prevent improving the implementation down the road unless we keep changing API which is not recommended.  
>> 
> 
> The functions we add here give a way for the callers to iterate over the specific bucket, which potentially contains the entry. These functions can be made general enough to allow callers to heuristically evict some entries, and add the new ones to the hash table. Otherwise, there is no way to evict some less important entries.

We have many other iterators in DPDK and this one is no different. If we can hide the internals, then it would be good. The callback function should not expose any internals only the value in the hash table, which I believe is do just that, right?

> 
> Hope this clarifies the patch.
> 
> Best,
> Qiaobin
> 
>> 
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
>>> Sent: Sunday, July 29, 2018 6:17 AM
>>> To: Qiaobin Fu <qiaobinf@bu.edu>
>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org;
>>> michel@digirati.com.br; doucette@bu.edu
>>> Subject: Re: [dpdk-dev] [PATCH] hash table: add a bucket iterator function
>>> 
>>> 
>>> 
>>>> On Jul 28, 2018, at 12:48 PM, Qiaobin Fu <qiaobinf@bu.edu> wrote:
>>>> 
>>>> Function rte_hash_bucket_iterate() enables callers to
>>>> incrementally iterate over the hash table bucket by bucket,
>>>> so that it can avoid creating hiccups and thrashing the cache
>>>> of the processor.
>>>> 
>>>> This patch mainly deals with cases in which
>>>> the hash table is full and one needs to decide if the incoming
>>>> entry is more valuable than the entries already in the hash table.
>>>> 
>>>> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>>>> Reviewed-by: Cody Doucette <doucette@bu.edu>
>>>> Reviewed-by: Michel Machado <michel@digirati.com.br>
>>>> ---
>>>> lib/librte_hash/rte_cuckoo_hash.c    | 60 +++++++++++++++++++++++++
>>>> lib/librte_hash/rte_hash.h           | 66 ++++++++++++++++++++++++++++
>>>> lib/librte_hash/rte_hash_version.map |  4 ++
>>>> 3 files changed, 130 insertions(+)
>>>> 
>>>> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>>>> index a07543a29..2b90f3593 100644
>>>> --- a/lib/librte_hash/rte_cuckoo_hash.c
>>>> +++ b/lib/librte_hash/rte_cuckoo_hash.c
>>>> @@ -1160,3 +1160,63 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
>>>> 
>>>> 	return position - 1;
>>>> }
>>>> +
>>>> +int
>>>> +rte_hash_get_num_buckets(struct rte_hash *h)
>>>> +{
>>>> +	RETURN_IF_TRUE((h == NULL), -EINVAL);
>>>> +	return h->num_buckets;
>>>> +}
>>>> +
>>>> +uint32_t
>>>> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig)
>>>> +{
>>>> +	return sig & h->bucket_bitmask;
>>>> +}
>>>> +
>>>> +uint32_t
>>>> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig)
>>>> +{
>>>> +	return rte_hash_secondary_hash(sig) & h->bucket_bitmask;
>>>> +}
>>> 
>>> The above two functions should have the RETURN_IF_TRUE((h == NULL), -EINVAL); statements just like the first one.
>>> 
>>>> +
>>>> +int32_t
>>>> +rte_hash_bucket_iterate(const struct rte_hash *h,
>>>> +	const void **key, void **data, uint32_t *bidx, uint32_t *next)
>>>> +{
>>>> +	uint32_t position;
>>>> +	struct rte_hash_key *next_key;
>>>> +
>>>> +	RETURN_IF_TRUE(((h == NULL) || (key == NULL) || (data == NULL) ||
>>>> +		(bidx == NULL) || (next == NULL)), -EINVAL);
>>>> +
>>>> +	/* Out of bounds. */
>>>> +	if (*bidx >= h->num_buckets || *next >= RTE_HASH_BUCKET_ENTRIES)
>>>> +		goto next_bucket;
>>>> +
>>>> +	/* If current position is empty, go to the next one. */
>>>> +	while (h->buckets[*bidx].key_idx[*next] == EMPTY_SLOT) {
>>>> +		(*next)++;
>>>> +		/* End of bucket. */
>>>> +		if (*next >= RTE_HASH_BUCKET_ENTRIES)
>>>> +			goto next_bucket;
>>>> +	}
>>>> +
>>>> +	/* Get position of entry in key table. */
>>>> +	position = h->buckets[*bidx].key_idx[*next];
>>>> +	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>>>> +				position * h->key_entry_size);
>>>> +	/* Return key and data. */
>>>> +	*key = next_key->key;
>>>> +	*data = next_key->pdata;
>>>> +
>>>> +	/* Increment iterator. */
>>>> +	(*next)++;
>>>> +
>>>> +	return position - 1;
>>>> +
>>>> +next_bucket:
>>>> +	*bidx = (*bidx + 1) >= h->num_buckets ? 0 : *bidx + 1;
>>>> +	*next = 0;
>>> 
>>> If this is an error case why are we setting *next to zero and incrementing *bidx ?
>>> I would expect we should not change the values on an error because they cannot debug the values on return.
>>> 
>>>> +	return -ENOENT;
>>>> +}
>>>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>>>> index f71ca9fbf..329bf42d6 100644
>>>> --- a/lib/librte_hash/rte_hash.h
>>>> +++ b/lib/librte_hash/rte_hash.h
>>>> @@ -94,6 +94,17 @@ rte_hash_create(const struct rte_hash_parameters *params);
>>>> */
>>>> void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func);
>>>> 
>>>> +/**
>>>> + * Get the number of buckets in the hash table.
>>>> + * @param h
>>>> + *   Hash table for which the function queries.
>>>> + * @return
>>>> + *   The number of buckets in the hash table h.
>>>> + *    - EINVAL - invalid parameter passed to function
>>>> + */
>>>> +int
>>> 
>>> Needs to change to ‘int __rte_experimental’
>>> 
>>>> +rte_hash_get_num_buckets(struct rte_hash *h);
>>>> +
>>>> /**
>>>> * Find an existing hash table object and return a pointer to it.
>>>> *
>>>> @@ -261,6 +272,34 @@ int
>>>> rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
>>>> 			       void **key);
>>>> 
>>>> +/**
>>>> + * Get the primary bucket index given the precomputed hash value.
>>>> + * This operation is multi-thread safe.
>>>> + *
>>>> + * @param h
>>>> + *   Hash table to get the key from.
>>>> + * @param sig
>>>> + *   Precomputed hash value.
>>>> + * @return
>>>> + *   The primary bucket index.
>>> 
>>> Comment the error case here.
>>>> + */
>>>> +uint32_t
>>> 
>>> Needs to change to ‘uint32_t __rte_experimental’
>>>> +rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig);
>>>> +
>>>> +/**
>>>> + * Get the secondary bucket index given the precomputed hash value.
>>>> + * This operation is multi-thread safe.
>>>> + *
>>>> + * @param h
>>>> + *   Hash table to get the key from.
>>>> + * @param sig
>>>> + *   Precomputed hash value.
>>>> + * @return
>>>> + *   The primary bucket index.
>>> 
>>> Comment the error case here.
>>>> + */
>>>> +uint32_t
>>> 
>>> Needs to change to ‘uint32_t __rte_experimental’
>>>> +rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig);
>>>> +
>>>> /**
>>>> * Find a key-value pair in the hash table.
>>>> * This operation is multi-thread safe.
>>>> @@ -419,6 +458,33 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>>>> */
>>>> int32_t
>>>> rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
>>>> +
>>>> +/**
>>>> + * Iterate through the buckets in hash table, returning key-value pairs.
>>>> + *
>>>> + * @param h
>>>> + *   Hash table to iterate
>>>> + * @param key
>>>> + *   Output containing the key where current iterator
>>>> + *   was pointing at
>>>> + * @param data
>>>> + *   Output containing the data associated with key.
>>>> + *   Returns NULL if data was not stored.
>>>> + * @param bidx
>>>> + *   Pointer to the current bucket index. Should be 0 to start
>>>> + *   iterating the hash table. Iterator is incremented after
>>>> + *   RTE_HASH_BUCKET_ENTRIES calls of this function.
>>> 
>>> The wording of the last line seem odd to me, should the ‘of’ be removed?
>>> 
>>>> + * @param next
>>>> + *   Pointer to iterator. Should be 0 to start iterating the bucket.
>>>> + *   Iterator is incremented after each call of this function.
>>>> + * @return
>>>> + *   Position where key was stored, if successful.
>>>> + *   - -EINVAL if the parameters are invalid.
>>>> + *   - -ENOENT if end of the bucket.
>>>> + */
>>>> +int32_t
>>> 
>>> Needs to change to ‘int32_t __rte_experimental’ and why use ‘int32_t' when everywhere else we are using ‘int'?
>>> 
>>>> +rte_hash_bucket_iterate(const struct rte_hash *h,
>>>> +	const void **key, void **data, uint32_t *bidx, uint32_t *next);
>>> 
>>> Should have a blank line here before the #ifdef.
>>>> #ifdef __cplusplus
>>>> }
>>>> #endif
>>>> diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
>>>> index 52a2576f9..7d48f32c9 100644
>>>> --- a/lib/librte_hash/rte_hash_version.map
>>>> +++ b/lib/librte_hash/rte_hash_version.map
>>>> @@ -15,6 +15,10 @@ DPDK_2.0 {
>>>> 	rte_hash_lookup;
>>>> 	rte_hash_lookup_bulk;
>>>> 	rte_hash_lookup_with_hash;
>>>> +	rte_hash_get_num_buckets;
>>>> +	rte_hash_get_primary_bucket;
>>>> +	rte_hash_get_secondary_bucket;
>>>> +	rte_hash_bucket_iterate;
>>> 
>>> These must be added to a EXPERIMENTAL clause in the file and later you can remove the experimental indictor.
>>> 
>>>> 
>>>> 	local: *;
>>>> };
>>>> --
>>>> 2.17.1
>>>> 
>>> 
>>> Regards,
>>> Keith
>> 
> 

Regards,
Keith
Michel Machado July 31, 2018, 3:32 p.m. UTC | #5
On 07/31/2018 10:57 AM, Wiles, Keith wrote:
> 
>> On Jul 31, 2018, at 1:09 AM, Fu, Qiaobin <qiaobinf@bu.edu> wrote:
>>
>> Hi Yipeng,
>>
>> Thanks for the feedbacks!
>>
>>> On Jul 30, 2018, at 4:24 PM, Wang, Yipeng1 <yipeng1.wang@intel.com> wrote:
>>>
>>> Hi, Qiaobin,
>>>
>>> Thanks for the patch. If I understand correctly your use case is to use hash table as a "cache" that new entries should evict stale ones automatically. Could you be more specific on your use scenarios so that I can have more context?
>>>
>>
>> Actually, we didn’t use hash table as a “cache”, and there is no automation to evict stale ones. Instead, the functionrte_hash_bucket_iterate() allows the callers to iterate over the hash table in an incremental way, i.e., one bucket by another bucket, so that the caller doesn’t have to iterate over the whole hash table in one time. This way can avoid creating hiccups and achieve better cache performance. One possible use scenario is when DDoS attacks happen, one may want to take more CPU time to process the packets, thus iterating over the hash table incrementally is desirable.
> 
> I do not see this as a cache, but only a way to iterate over the hash table.
> 
>>
>>> We are actually working on an extendable version of rte_hash which will link extra buckets to current table if the table is full. As long as the table is sized appropriately, there will be no conflict issues thus you don’t need to replace old entries. Will this solve your issue? Also, if the “cache” mode is what you want, we have the membership library “rte_member”. Is it applicable to your case?
>>
>> I agree that adding extra buckets to current table when the table is full can alleviate this scenario. Indeed, we also considered this solution before coming up our proposed solution. However, it’s still highly desirable to have a bucket iterator function. Considering the scenario where the machine’s memory is limited and cannot always allocate new memory to the hash table (e.g., DDoS attacks, switch measurement tasks, etc.), a solution is to allow the callers evict some less important (the criteria for the eviction is based on the caller’s needs) entries.
>>
>> Indeed, we don’t have the “cache” mode, our implementation is a way to achieve better cache performance. So, the rte_member won’t help here.
>>
>>>
>>> w.r.t. the patch set you proposed, my concern is that the new API will expose too much internals to the user, e.g. the bucket/entries structure should be implementation dependent. Exposing internal structures would prevent improving the implementation down the road unless we keep changing API which is not recommended.
>>>
>>
>> The functions we add here give a way for the callers to iterate over the specific bucket, which potentially contains the entry. These functions can be made general enough to allow callers to heuristically evict some entries, and add the new ones to the hash table. Otherwise, there is no way to evict some less important entries.
> 
> We have many other iterators in DPDK and this one is no different. If we can hide the internals, then it would be good. The callback function should not expose any internals only the value in the hash table, which I believe is do just that, right?

    The use case that led to this iterator is the following: a hash 
table of flows is overloaded due to a DoS attack. The easy option is to 
ignore new flows, but this is not optimal when there's information 
pointing out that a given new flow is more important than one flow in 
the bucket in which the new flow must be inserted. So the high-level 
interpretation for this iterator is to find out which are the candidates 
such that one must be dropped to add a new flow. That is why the patch 
adds rte_hash_get_primary_bucket() and rte_hash_get_secondary_bucket(). 
We don't need more than these candidates because the memory lookups 
would be overwhelming. In fact, we have found that the eight candidates 
that rte_hash_get_primary_bucket() gives is already good enough.

    Once we solved the problem above, we've noticed that with small 
adjustments of the code, it would make it easy to scan a hash table for 
stale entries one bucket at a time instead of an entry at a time (the 
regular iterator). The idea is that once one reads the bucket location, 
the information about the all entries is coming together into the 
processor cache, so we can take advantage of the information that is 
already there.

    While the second feature is good to have, the first one is the 
feature we must have in our software.

[ ]'s
Michel Machado
Wang, Yipeng1 Aug. 1, 2018, 1:40 a.m. UTC | #6
Hi,

Thanks for explaining the use case. I now understand and I agree that replacing stale entries during insertion is an interesting and useful addition to this library. 

My concern is still that it may be not a good practice to expose the bucketized structure of the rte_hash. For example, in future, rte_hash may use three hash functions, or as I mentioned each bucket may have an additional linked list or even a second level hash table, or if the hopscotch hash replaces cuckoo hash as the new algorithm. When these happen, the proposed API may lose their original intention. As a general hash table library, common users should assume a black box of the implementation.

For the iteration function, I agree on the advantage over current entry by entry iteration. But it exposes the bucket index as well.

How about an API that is more universal? For example, an API such as "rte_iterate_conflict_entries".  After an insertion failure, this function will iterate all entries that may conflict with the newly inserted key and you could decide which entry to evict? 

Thanks
Yipeng

>-----Original Message-----
>From: Michel Machado [mailto:michel@digirati.com.br]
>Sent: Tuesday, July 31, 2018 8:33 AM
>To: Wiles, Keith <keith.wiles@intel.com>; Fu, Qiaobin <qiaobinf@bu.edu>
>Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
><pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Doucette, Cody, Joseph <doucette@bu.edu>; Gobriel, Sameh
><sameh.gobriel@intel.com>; Tai, Charlie <charlie.tai@intel.com>
>Subject: Re: [dpdk-dev] [PATCH] hash table: add a bucket iterator function
>
>On 07/31/2018 10:57 AM, Wiles, Keith wrote:
>>
>>> On Jul 31, 2018, at 1:09 AM, Fu, Qiaobin <qiaobinf@bu.edu> wrote:
>>>
>>> Hi Yipeng,
>>>
>>> Thanks for the feedbacks!
>>>
>>>> On Jul 30, 2018, at 4:24 PM, Wang, Yipeng1 <yipeng1.wang@intel.com> wrote:
>>>>
>>>> Hi, Qiaobin,
>>>>
>>>> Thanks for the patch. If I understand correctly your use case is to use hash table as a "cache" that new entries should evict stale
>ones automatically. Could you be more specific on your use scenarios so that I can have more context?
>>>>
>>>
>>> Actually, we didn’t use hash table as a “cache”, and there is no automation to evict stale ones. Instead, the
>functionrte_hash_bucket_iterate() allows the callers to iterate over the hash table in an incremental way, i.e., one bucket by another
>bucket, so that the caller doesn’t have to iterate over the whole hash table in one time. This way can avoid creating hiccups and
>achieve better cache performance. One possible use scenario is when DDoS attacks happen, one may want to take more CPU time to
>process the packets, thus iterating over the hash table incrementally is desirable.
>>
>> I do not see this as a cache, but only a way to iterate over the hash table.
>>
>>>
>>>> We are actually working on an extendable version of rte_hash which will link extra buckets to current table if the table is full. As
>long as the table is sized appropriately, there will be no conflict issues thus you don’t need to replace old entries. Will this solve your
>issue? Also, if the “cache” mode is what you want, we have the membership library “rte_member”. Is it applicable to your case?
>>>
>>> I agree that adding extra buckets to current table when the table is full can alleviate this scenario. Indeed, we also considered this
>solution before coming up our proposed solution. However, it’s still highly desirable to have a bucket iterator function. Considering
>the scenario where the machine’s memory is limited and cannot always allocate new memory to the hash table (e.g., DDoS attacks,
>switch measurement tasks, etc.), a solution is to allow the callers evict some less important (the criteria for the eviction is based on the
>caller’s needs) entries.
>>>
>>> Indeed, we don’t have the “cache” mode, our implementation is a way to achieve better cache performance. So, the rte_member
>won’t help here.
>>>
>>>>
>>>> w.r.t. the patch set you proposed, my concern is that the new API will expose too much internals to the user, e.g. the
>bucket/entries structure should be implementation dependent. Exposing internal structures would prevent improving the
>implementation down the road unless we keep changing API which is not recommended.
>>>>
>>>
>>> The functions we add here give a way for the callers to iterate over the specific bucket, which potentially contains the entry. These
>functions can be made general enough to allow callers to heuristically evict some entries, and add the new ones to the hash table.
>Otherwise, there is no way to evict some less important entries.
>>
>> We have many other iterators in DPDK and this one is no different. If we can hide the internals, then it would be good. The callback
>function should not expose any internals only the value in the hash table, which I believe is do just that, right?
>
>    The use case that led to this iterator is the following: a hash
>table of flows is overloaded due to a DoS attack. The easy option is to
>ignore new flows, but this is not optimal when there's information
>pointing out that a given new flow is more important than one flow in
>the bucket in which the new flow must be inserted. So the high-level
>interpretation for this iterator is to find out which are the candidates
>such that one must be dropped to add a new flow. That is why the patch
>adds rte_hash_get_primary_bucket() and rte_hash_get_secondary_bucket().
>We don't need more than these candidates because the memory lookups
>would be overwhelming. In fact, we have found that the eight candidates
>that rte_hash_get_primary_bucket() gives is already good enough.
>
>    Once we solved the problem above, we've noticed that with small
>adjustments of the code, it would make it easy to scan a hash table for
>stale entries one bucket at a time instead of an entry at a time (the
>regular iterator). The idea is that once one reads the bucket location,
>the information about the all entries is coming together into the
>processor cache, so we can take advantage of the information that is
>already there.
>
>    While the second feature is good to have, the first one is the
>feature we must have in our software.
>
>[ ]'s
>Michel Machado
Michel Machado Aug. 1, 2018, 12:57 p.m. UTC | #7
On 07/31/2018 09:40 PM, Wang, Yipeng1 wrote:
> How about an API that is more universal? For example, an API such as "rte_iterate_conflict_entries".  After an insertion failure, this function will iterate all entries that may conflict with the newly inserted key and you could decide which entry to evict?

    Fine. We'll rewrite the patch to do so.

    Thank you for the feedback.

[ ]'s
Michel Machado
Stephen Hemminger Aug. 3, 2018, 3:24 p.m. UTC | #8
On Wed, 1 Aug 2018 08:57:39 -0400
Michel Machado <michel@digirati.com.br> wrote:

> On 07/31/2018 09:40 PM, Wang, Yipeng1 wrote:
> > How about an API that is more universal? For example, an API such as "rte_iterate_conflict_entries".  After an insertion failure, this function will iterate all entries that may conflict with the newly inserted key and you could decide which entry to evict?  
> 
>     Fine. We'll rewrite the patch to do so.
> 
>     Thank you for the feedback.
> 
> [ ]'s
> Michel Machado

Often for time based cleanup it is better to have a second linked list that is ordered
by time value. Then the cleanup code can start at the oldest stop when it reaches
the last item that could expire.

That does mean having some form of lock and doing delete/insert on every usage.

i.e	
	spinlock(&timer_lock);
	TAILQ_REMOVE(&timer_list, entry, timer_list);
	entry->expiration = new time;
	TAILQ_INSERT_TAIL(&timer_list, entry, timer_list);
	spinunlock(&timer_unlock);
Michel Machado Aug. 7, 2018, 1:24 p.m. UTC | #9
On 08/03/2018 11:24 AM, Stephen Hemminger wrote:
> Often for time based cleanup it is better to have a second linked list that is ordered
> by time value. Then the cleanup code can start at the oldest stop when it reaches
> the last item that could expire.
> 
> That does mean having some form of lock and doing delete/insert on every usage.
> 
> i.e	
> 	spinlock(&timer_lock);
> 	TAILQ_REMOVE(&timer_list, entry, timer_list);
> 	entry->expiration = new time;
> 	TAILQ_INSERT_TAIL(&timer_list, entry, timer_list);
> 	spinunlock(&timer_unlock);

    We'll try it. Thanks for the suggestion, Stephen.

[ ]'s
Michel Machado
diff mbox series

Patch

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index a07543a29..2b90f3593 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1160,3 +1160,63 @@  rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	return position - 1;
 }
+
+int
+rte_hash_get_num_buckets(struct rte_hash *h)
+{
+	RETURN_IF_TRUE((h == NULL), -EINVAL);
+	return h->num_buckets;
+}
+
+uint32_t
+rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig)
+{
+	return sig & h->bucket_bitmask;
+}
+
+uint32_t
+rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig)
+{
+	return rte_hash_secondary_hash(sig) & h->bucket_bitmask;
+}
+
+int32_t
+rte_hash_bucket_iterate(const struct rte_hash *h,
+	const void **key, void **data, uint32_t *bidx, uint32_t *next)
+{
+	uint32_t position;
+	struct rte_hash_key *next_key;
+
+	RETURN_IF_TRUE(((h == NULL) || (key == NULL) || (data == NULL) ||
+		(bidx == NULL) || (next == NULL)), -EINVAL);
+
+	/* Out of bounds. */
+	if (*bidx >= h->num_buckets || *next >= RTE_HASH_BUCKET_ENTRIES)
+		goto next_bucket;
+
+	/* If current position is empty, go to the next one. */
+	while (h->buckets[*bidx].key_idx[*next] == EMPTY_SLOT) {
+		(*next)++;
+		/* End of bucket. */
+		if (*next >= RTE_HASH_BUCKET_ENTRIES)
+			goto next_bucket;
+	}
+
+	/* Get position of entry in key table. */
+	position = h->buckets[*bidx].key_idx[*next];
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data. */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	/* Increment iterator. */
+	(*next)++;
+
+	return position - 1;
+
+next_bucket:
+	*bidx = (*bidx + 1) >= h->num_buckets ? 0 : *bidx + 1;
+	*next = 0;
+	return -ENOENT;
+}
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index f71ca9fbf..329bf42d6 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -94,6 +94,17 @@  rte_hash_create(const struct rte_hash_parameters *params);
  */
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func);
 
+/**
+ * Get the number of buckets in the hash table.
+ * @param h
+ *   Hash table for which the function queries.
+ * @return
+ *   The number of buckets in the hash table h.
+ *    - EINVAL - invalid parameter passed to function
+ */
+int
+rte_hash_get_num_buckets(struct rte_hash *h);
+
 /**
  * Find an existing hash table object and return a pointer to it.
  *
@@ -261,6 +272,34 @@  int
 rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
 			       void **key);
 
+/**
+ * Get the primary bucket index given the precomputed hash value.
+ * This operation is multi-thread safe.
+ *
+ * @param h
+ *   Hash table to get the key from.
+ * @param sig
+ *   Precomputed hash value.
+ * @return
+ *   The primary bucket index.
+ */
+uint32_t
+rte_hash_get_primary_bucket(const struct rte_hash *h, hash_sig_t sig);
+
+/**
+ * Get the secondary bucket index given the precomputed hash value.
+ * This operation is multi-thread safe.
+ *
+ * @param h
+ *   Hash table to get the key from.
+ * @param sig
+ *   Precomputed hash value.
+ * @return
+ *   The primary bucket index.
+ */
+uint32_t
+rte_hash_get_secondary_bucket(const struct rte_hash *h, hash_sig_t sig);
+
 /**
  * Find a key-value pair in the hash table.
  * This operation is multi-thread safe.
@@ -419,6 +458,33 @@  rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
  */
 int32_t
 rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
+
+/**
+ * Iterate through the buckets in hash table, returning key-value pairs.
+ *
+ * @param h
+ *   Hash table to iterate
+ * @param key
+ *   Output containing the key where current iterator
+ *   was pointing at
+ * @param data
+ *   Output containing the data associated with key.
+ *   Returns NULL if data was not stored.
+ * @param bidx
+ *   Pointer to the current bucket index. Should be 0 to start
+ *   iterating the hash table. Iterator is incremented after
+ *   RTE_HASH_BUCKET_ENTRIES calls of this function.
+ * @param next
+ *   Pointer to iterator. Should be 0 to start iterating the bucket.
+ *   Iterator is incremented after each call of this function.
+ * @return
+ *   Position where key was stored, if successful.
+ *   - -EINVAL if the parameters are invalid.
+ *   - -ENOENT if end of the bucket.
+ */
+int32_t
+rte_hash_bucket_iterate(const struct rte_hash *h,
+	const void **key, void **data, uint32_t *bidx, uint32_t *next);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index 52a2576f9..7d48f32c9 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -15,6 +15,10 @@  DPDK_2.0 {
 	rte_hash_lookup;
 	rte_hash_lookup_bulk;
 	rte_hash_lookup_with_hash;
+	rte_hash_get_num_buckets;
+	rte_hash_get_primary_bucket;
+	rte_hash_get_secondary_bucket;
+	rte_hash_bucket_iterate;
 
 	local: *;
 };