[v4,1/4] hash: fix race condition in iterate
Checks
Commit Message
In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.
This commit extends the protected region.
Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
lib/librte_hash/rte_cuckoo_hash.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Comments
>
> In rte_hash_iterate, the reader lock did not protect the while loop which
> checks empty entry. This created a race condition that the entry may become
> empty when enters the lock, then a wrong key data value would be read out.
>
> This commit extends the protected region.
>
> Fixes: f2e3001b53ec ("hash: support read/write concurrency")
> Cc: stable@dpdk.org
>
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> lib/librte_hash/rte_cuckoo_hash.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index f7b86c8..eba13e9 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
> bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> idx = *next % RTE_HASH_BUCKET_ENTRIES;
>
> + __hash_rw_reader_lock(h);
This does not work well with the lock-less changes I am making. We should leave the lock in its original position. Instead change the while loop as follows:
while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
> /* If current position is empty, go to the next one */
> while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> (*next)++;
> /* End of table */
> - if (*next == total_entries)
> + if (*next == total_entries) {
> + __hash_rw_reader_unlock(h);
> return -ENOENT;
> + }
> bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> idx = *next % RTE_HASH_BUCKET_ENTRIES;
> }
> - __hash_rw_reader_lock(h);
> +
> /* Get position of entry in key table */
> position = h->buckets[bucket_idx].key_idx[idx];
If we change the while loop as I suggested as above, we can remove this line.
> next_key = (struct rte_hash_key *) ((char *)h->key_store +
> --
> 2.7.4
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const
>> void **key, void **data, uint32
>> bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> idx = *next % RTE_HASH_BUCKET_ENTRIES;
>>
>> + __hash_rw_reader_lock(h);
>This does not work well with the lock-less changes I am making. We should leave the lock in its original position. Instead change the
>while loop as follows:
>
>while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
>
>> /* If current position is empty, go to the next one */
>> while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> (*next)++;
>> /* End of table */
>> - if (*next == total_entries)
>> + if (*next == total_entries) {
>> + __hash_rw_reader_unlock(h);
>> return -ENOENT;
>> + }
>> bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> }
>> - __hash_rw_reader_lock(h);
>> +
>> /* Get position of entry in key table */
>> position = h->buckets[bucket_idx].key_idx[idx];
>If we change the while loop as I suggested as above, we can remove this line.
>
>> next_key = (struct rte_hash_key *) ((char *)h->key_store +
[Wang, Yipeng] Sorry that I did not realize you already have it in your patch set and I agree.
Do you want to export it as a bug fix in your patch set? I will remove my change.
For the lock free, do we need to protect it with version counter? Imagine the following corner case:
While the iterator read the key and data, there is a writer deleted, removed, and recycled the key-data pair,
and write a new key and data into it. While the writer are writing, will the reader reads out wrong key/data, or
mismatched key/data?
>
> >-----Original Message-----
> >From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> >> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h,
> >> const void **key, void **data, uint32
> >> bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> >> idx = *next % RTE_HASH_BUCKET_ENTRIES;
> >>
> >> + __hash_rw_reader_lock(h);
> >This does not work well with the lock-less changes I am making. We
> >should leave the lock in its original position. Instead change the while loop as
> follows:
> >
> >while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
> >
> >> /* If current position is empty, go to the next one */
> >> while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> >> (*next)++;
> >> /* End of table */
> >> - if (*next == total_entries)
> >> + if (*next == total_entries) {
> >> + __hash_rw_reader_unlock(h);
> >> return -ENOENT;
> >> + }
> >> bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> >> idx = *next % RTE_HASH_BUCKET_ENTRIES;
> >> }
> >> - __hash_rw_reader_lock(h);
> >> +
> >> /* Get position of entry in key table */
> >> position = h->buckets[bucket_idx].key_idx[idx];
> >If we change the while loop as I suggested as above, we can remove this line.
> >
> >> next_key = (struct rte_hash_key *) ((char *)h->key_store +
>
> [Wang, Yipeng] Sorry that I did not realize you already have it in your patch
> set and I agree.
> Do you want to export it as a bug fix in your patch set? I will remove my
> change.
Sure, I will make a separate commit for this.
>
> For the lock free, do we need to protect it with version counter? Imagine the
> following corner case:
> While the iterator read the key and data, there is a writer deleted, removed,
> and recycled the key-data pair, and write a new key and data into it. While the
> writer are writing, will the reader reads out wrong key/data, or mismatched
> key/data?
>
In the lock-free algorithm, the key-data is not 'freed' until the readers have completed all their references to the 'deleted' key-data. Hence, the writers will not be able to allocate the same key store index till the readers have stopped referring to the 'deleted' key-data.
I re-checked my ladder diagrams [1] and I could not find any issues.
[1] https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash (PPT)
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> >> /* Get position of entry in key table */
>> >> position = h->buckets[bucket_idx].key_idx[idx];
>> >If we change the while loop as I suggested as above, we can remove this line.
>> >
>> >> next_key = (struct rte_hash_key *) ((char *)h->key_store +
>>
>> [Wang, Yipeng] Sorry that I did not realize you already have it in your patch
>> set and I agree.
>> Do you want to export it as a bug fix in your patch set? I will remove my
>> change.
>Sure, I will make a separate commit for this.
[Wang, Yipeng] I fixed the issue you mentioned and put it in V5 and I saw you acked it. You don't need to change you patch now.
>>
>> For the lock free, do we need to protect it with version counter? Imagine the
>> following corner case:
>> While the iterator read the key and data, there is a writer deleted, removed,
>> and recycled the key-data pair, and write a new key and data into it. While the
>> writer are writing, will the reader reads out wrong key/data, or mismatched
>> key/data?
>>
>In the lock-free algorithm, the key-data is not 'freed' until the readers have completed all their references to the 'deleted' key-data.
>Hence, the writers will not be able to allocate the same key store index till the readers have stopped referring to the 'deleted' key-
>data.
>I re-checked my ladder diagrams [1] and I could not find any issues.
>
>[1] https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash (PPT)
[Wang, Yipeng]
After checking your slides, I agree. It works with upper level application which should have RCU-like mechanisms to ensure
the grace period has finished before recycling. In this case, the logic is good!
If you haven't done so, could you be more specific in doc and the API comment to indicate that the key-data pair recycle function should only be called
after reader finished (or maybe specifically indicate RCU type of mechanism is needed?). I feel that users not familiar with this could easily
get it wrong.
Besides this, I don't have other concern.
@@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
idx = *next % RTE_HASH_BUCKET_ENTRIES;
+ __hash_rw_reader_lock(h);
/* If current position is empty, go to the next one */
while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
(*next)++;
/* End of table */
- if (*next == total_entries)
+ if (*next == total_entries) {
+ __hash_rw_reader_unlock(h);
return -ENOENT;
+ }
bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
idx = *next % RTE_HASH_BUCKET_ENTRIES;
}
- __hash_rw_reader_lock(h);
+
/* Get position of entry in key table */
position = h->buckets[bucket_idx].key_idx[idx];
next_key = (struct rte_hash_key *) ((char *)h->key_store +