[v3] hash: fix SSE comparison

Message ID 20231007073634.3458294-1-jieqiang.wang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v3] hash: fix SSE comparison |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Jieqiang Wang Oct. 7, 2023, 7:36 a.m. UTC
  __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
equal. In original SSE2 implementation for function compare_signatures,
it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
element, while we should only care about the MSB of lower 8-bit in each
16-bit element.
For example, if the comparison result is all equal, SSE2 path returns
0xFFFF while NEON and default scalar path return 0x5555.
Although this bug is not causing any negative effects since the caller
function solely examines the trailing zeros of each match mask, we
recommend this fix to ensure consistency with NEON and default scalar
code behaviors.

Fixes: c7d93df552c2 ("hash: use partial-key hashing")
Cc: stable@dpdk.org

v2:
1. Utilize scalar mask instead of vector mask to save extra loads (Bruce)

v3:
1. Fix coding style warnings

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/hash/rte_cuckoo_hash.c | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

Bruce Richardson Oct. 9, 2023, 2:33 p.m. UTC | #1
On Sat, Oct 07, 2023 at 03:36:34PM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
> 
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: stable@dpdk.org
> 
> v2:
> 1. Utilize scalar mask instead of vector mask to save extra loads (Bruce)
> 
> v3:
> 1. Fix coding style warnings
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
David Marchand Oct. 10, 2023, 9:50 a.m. UTC | #2
On Mon, Oct 9, 2023 at 4:33 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Sat, Oct 07, 2023 at 03:36:34PM +0800, Jieqiang Wang wrote:
> > __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> > equal. In original SSE2 implementation for function compare_signatures,
> > it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> > element, while we should only care about the MSB of lower 8-bit in each
> > 16-bit element.
> > For example, if the comparison result is all equal, SSE2 path returns
> > 0xFFFF while NEON and default scalar path return 0x5555.
> > Although this bug is not causing any negative effects since the caller
> > function solely examines the trailing zeros of each match mask, we
> > recommend this fix to ensure consistency with NEON and default scalar
> > code behaviors.
> >
> > Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> > Cc: stable@dpdk.org
> >
> > v2:
> > 1. Utilize scalar mask instead of vector mask to save extra loads (Bruce)
> >
> > v3:
> > 1. Fix coding style warnings

Changelog and other notes on a patch should not be in the commitlog
itself, but should go after ---.
https://doc.dpdk.org/guides/contributing/patches.html#creating-patches


> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks Jieqiang and Bruce.
  

Patch

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d92a903bb3..19b23f2a97 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1868,11 +1868,15 @@  compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
 				_mm_set1_epi16(sig)));
+		/* Extract the even-index bits only */
+		*prim_hash_matches &= 0x5555;
 		/* Compare all signatures in the bucket */
 		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
 				_mm_set1_epi16(sig)));
+		/* Extract the even-index bits only */
+		*sec_hash_matches &= 0x5555;
 		break;
 #elif defined(__ARM_NEON)
 	case RTE_HASH_COMPARE_NEON: {