[3/3] lib/hash: adjust tbl_chng_cnt position
Checks
Commit Message
tbl_chng_cnt is one of the first elements of the structure used in
the lookup. Move it to the beginning of the cache line to gain
performance.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Comments
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
>
>tbl_chng_cnt is one of the first elements of the structure used in
>the lookup. Move it to the beginning of the cache line to gain
>performance.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
>index fb19bb27d..af6451b5c 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.h
>+++ b/lib/librte_hash/rte_cuckoo_hash.h
>@@ -170,7 +170,9 @@ struct rte_hash {
>
> /* Fields used in lookup */
>
>- uint32_t key_len __rte_cache_aligned;
>+ uint32_t *tbl_chng_cnt __rte_cache_aligned;
>+ /**< Indicates if the hash table changed from last read. */
>+ uint32_t key_len;
> /**< Length of hash key. */
> uint8_t hw_trans_mem_support;
> /**< If hardware transactional memory is used. */
>@@ -218,8 +220,6 @@ struct rte_hash {
> * is piggy-backed to freeing of the key index.
> */
> uint32_t *ext_bkt_to_free;
>- uint32_t *tbl_chng_cnt;
>- /**< Indicates if the hash table changed from last read. */
> } __rte_cache_aligned;
>
> struct queue_node {
>--
>2.17.1
[Wang, Yipeng]
I am not sure about this change. By moving counter to front, I think you seems push key_store out of the cache line. And key_store
Is also used in lookup (and more commonly).
My tests also show perf drop in many cases.
<snip>
> >
> >tbl_chng_cnt is one of the first elements of the structure used in the
> >lookup. Move it to the beginning of the cache line to gain performance.
> >
> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> >b/lib/librte_hash/rte_cuckoo_hash.h
> >index fb19bb27d..af6451b5c 100644
> >--- a/lib/librte_hash/rte_cuckoo_hash.h
> >+++ b/lib/librte_hash/rte_cuckoo_hash.h
> >@@ -170,7 +170,9 @@ struct rte_hash {
> >
> > /* Fields used in lookup */
> >
> >- uint32_t key_len __rte_cache_aligned;
> >+ uint32_t *tbl_chng_cnt __rte_cache_aligned;
> >+ /**< Indicates if the hash table changed from last read. */
> >+ uint32_t key_len;
> > /**< Length of hash key. */
> > uint8_t hw_trans_mem_support;
> > /**< If hardware transactional memory is used. */ @@ -218,8 +220,6
> @@
> >struct rte_hash {
> > * is piggy-backed to freeing of the key index.
> > */
> > uint32_t *ext_bkt_to_free;
> >- uint32_t *tbl_chng_cnt;
> >- /**< Indicates if the hash table changed from last read. */
> > } __rte_cache_aligned;
> >
> > struct queue_node {
> >--
> >2.17.1
>
> [Wang, Yipeng]
> I am not sure about this change. By moving counter to front, I think you
> seems push key_store out of the cache line. And key_store Is also used in
> lookup (and more commonly).
> My tests also show perf drop in many cases.
I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?
But, I do agree that this work is not complete. We can drop this patch and take this up separately.
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Tuesday, July 2, 2019 10:23 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>
>Subject: RE: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
>
><snip>
>
>> >
>> >tbl_chng_cnt is one of the first elements of the structure used in the
>> >lookup. Move it to the beginning of the cache line to gain performance.
>> >
>> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >Cc: stable@dpdk.org
>> >
>> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >---
>> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
>> > 1 file changed, 3 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h
>> >b/lib/librte_hash/rte_cuckoo_hash.h
>> >index fb19bb27d..af6451b5c 100644
>> >--- a/lib/librte_hash/rte_cuckoo_hash.h
>> >+++ b/lib/librte_hash/rte_cuckoo_hash.h
>> >@@ -170,7 +170,9 @@ struct rte_hash {
>> >
>> > /* Fields used in lookup */
>> >
>> >- uint32_t key_len __rte_cache_aligned;
>> >+ uint32_t *tbl_chng_cnt __rte_cache_aligned;
>> >+ /**< Indicates if the hash table changed from last read. */
>> >+ uint32_t key_len;
>> > /**< Length of hash key. */
>> > uint8_t hw_trans_mem_support;
>> > /**< If hardware transactional memory is used. */ @@ -218,8 +220,6
>> @@
>> >struct rte_hash {
>> > * is piggy-backed to freeing of the key index.
>> > */
>> > uint32_t *ext_bkt_to_free;
>> >- uint32_t *tbl_chng_cnt;
>> >- /**< Indicates if the hash table changed from last read. */
>> > } __rte_cache_aligned;
>> >
>> > struct queue_node {
>> >--
>> >2.17.1
>>
>> [Wang, Yipeng]
>> I am not sure about this change. By moving counter to front, I think you
>> seems push key_store out of the cache line. And key_store Is also used in
>> lookup (and more commonly).
>> My tests also show perf drop in many cases.
>I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm
>platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?
>
>But, I do agree that this work is not complete. We can drop this patch and take this up separately.
[Wang, Yipeng]
I run the LF test on x86. For 1 reader case it seems a clear improvement for lookup without key-shifts. Other results are a mixed bag even
for lock-free. For most HTM it is a drop.
My opinion is to delay similar fine tuning effort tied to architecture specs.
@@ -170,7 +170,9 @@ struct rte_hash {
/* Fields used in lookup */
- uint32_t key_len __rte_cache_aligned;
+ uint32_t *tbl_chng_cnt __rte_cache_aligned;
+ /**< Indicates if the hash table changed from last read. */
+ uint32_t key_len;
/**< Length of hash key. */
uint8_t hw_trans_mem_support;
/**< If hardware transactional memory is used. */
@@ -218,8 +220,6 @@ struct rte_hash {
* is piggy-backed to freeing of the key index.
*/
uint32_t *ext_bkt_to_free;
- uint32_t *tbl_chng_cnt;
- /**< Indicates if the hash table changed from last read. */
} __rte_cache_aligned;
struct queue_node {