[v1] examples/l3fwd: enable hash multi lookup for ARM

Message ID 20190102052826.156605-1-ruifeng.wang@arm.com
State New
Delegated to: Thomas Monjalon
Headers show
Series
  • [v1] examples/l3fwd: enable hash multi lookup for ARM
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/checkpatch success coding style OK

Commit Message

Ruifeng Wang (Arm Technology China) Jan. 2, 2019, 5:28 a.m.
Compile option for hash_multi_lookup was broken, and caused feature
cannot be enabled on Arm.
This patch sets hash_multi_lookup method as default, and sequential
lookup becomes optional.

In test of 8192 flows with 128-byte packets, throughput increased by
25.6% after enabling hash_multi_lookup.

Fixes: 52c97adc1f0f ("examples/l3fwd: fix exact match performance")
Cc: tomaszx.kulasek@intel.com

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 examples/l3fwd/l3fwd.h | 4 ----
 1 file changed, 4 deletions(-)

Comments

Honnappa Nagarahalli Jan. 2, 2019, 6:23 p.m. | #1
Thanks Ruifeng for the patch. I have one question inline.

Jerin/Hemant,
	It would be good if you could test this on your platforms, since this is being made default.

Thanks,
Honnappa

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, January 1, 2019 11:28 PM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com;
> bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ruifeng
> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> tomaszx.kulasek@intel.com
> Subject: [PATCH v1] examples/l3fwd: enable hash multi lookup for ARM
> 
> Compile option for hash_multi_lookup was broken, and caused feature cannot
> be enabled on Arm.
> This patch sets hash_multi_lookup method as default, and sequential lookup
> becomes optional.
> 
> In test of 8192 flows with 128-byte packets, throughput increased by 25.6%
> after enabling hash_multi_lookup.
> 
I assume these are lookup-hit numbers. Do you have look-up miss numbers?

> Fixes: 52c97adc1f0f ("examples/l3fwd: fix exact match performance")
> Cc: tomaszx.kulasek@intel.com
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  examples/l3fwd/l3fwd.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> c962deac3..063b80018 100644
> --- a/examples/l3fwd/l3fwd.h
> +++ b/examples/l3fwd/l3fwd.h
> @@ -11,10 +11,6 @@
> 
>  #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
> 
> -#if !defined(NO_HASH_MULTI_LOOKUP) &&
> defined(RTE_MACHINE_CPUFLAG_NEON) -#define
> NO_HASH_MULTI_LOOKUP 1 -#endif
> -
>  #define MAX_PKT_BURST     32
>  #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
> 
> --
> 2.17.1
Ruifeng Wang (Arm Technology China) Jan. 3, 2019, 1:14 a.m. | #2
Hi Honnappa,

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, January 3, 2019 2:23
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> dev@dpdk.org
> Cc: thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com;
> bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; nd
> <nd@arm.com>; Ruifeng Wang (Arm Technology China)
> <Ruifeng.Wang@arm.com>; tomaszx.kulasek@intel.com; nd <nd@arm.com>
> Subject: RE: [PATCH v1] examples/l3fwd: enable hash multi lookup for ARM
> 
> Thanks Ruifeng for the patch. I have one question inline.
> 
> Jerin/Hemant,
> 	It would be good if you could test this on your platforms, since this is
> being made default.
> 
> Thanks,
> Honnappa
> 
> > -----Original Message-----
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > Sent: Tuesday, January 1, 2019 11:28 PM
> > To: dev@dpdk.org
> > Cc: thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com;
> > bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
> Ruifeng
> > Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> > tomaszx.kulasek@intel.com
> > Subject: [PATCH v1] examples/l3fwd: enable hash multi lookup for ARM
> >
> > Compile option for hash_multi_lookup was broken, and caused feature
> > cannot be enabled on Arm.
> > This patch sets hash_multi_lookup method as default, and sequential
> > lookup becomes optional.
> >
> > In test of 8192 flows with 128-byte packets, throughput increased by
> > 25.6% after enabling hash_multi_lookup.
> >
> I assume these are lookup-hit numbers. Do you have look-up miss numbers?
> 
Yes, lookup-hit had 25.6% gain.
In lookup-miss tests, throughput had over 33% gain.

> > Fixes: 52c97adc1f0f ("examples/l3fwd: fix exact match performance")
> > Cc: tomaszx.kulasek@intel.com
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  examples/l3fwd/l3fwd.h | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> > c962deac3..063b80018 100644
> > --- a/examples/l3fwd/l3fwd.h
> > +++ b/examples/l3fwd/l3fwd.h
> > @@ -11,10 +11,6 @@
> >
> >  #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
> >
> > -#if !defined(NO_HASH_MULTI_LOOKUP) &&
> > defined(RTE_MACHINE_CPUFLAG_NEON) -#define
> NO_HASH_MULTI_LOOKUP 1
> > -#endif
> > -
> >  #define MAX_PKT_BURST     32
> >  #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
> >
> > --
> > 2.17.1
Jerin Jacob Kollanukkaran Jan. 3, 2019, 3:34 a.m. | #3
On Wed, 2019-01-02 at 13:28 +0800, Ruifeng Wang wrote:
> -------------------------------------------------------------------
> ---
> Compile option for hash_multi_lookup was broken, and caused feature
> cannot be enabled on Arm.
> This patch sets hash_multi_lookup method as default, and sequential
> lookup becomes optional.
> 
> In test of 8192 flows with 128-byte packets, throughput increased by
> 25.6% after enabling hash_multi_lookup.

# Are you changing l3fwd source code to test with 8K flows? meaning it
has only a few flows for hit case.

# Are you testing with ThunderX2?

# Can you check with single flow with 64B and one core? In my case,
I am getting >20% regression with octeontx.

Command used:
#./l3fwd -c 0x800000 -n 4 -- -P -E -p 0x3 --config="(0, 0,
23),(1,0,23)"

In addition to that,

The file examples/l3fwd/l3fwd_em_hlm.h has following change, Not sure
why we need ARM64 specific change there?

#ifdef RTE_ARCH_ARM64
#define EM_HASH_LOOKUP_COUNT 16
#else
#define EM_HASH_LOOKUP_COUNT 8
#endif


> Fixes: 52c97adc1f0f ("examples/l3fwd: fix exact match performance")
> Cc: tomaszx.kulasek@intel.com
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  examples/l3fwd/l3fwd.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
> index c962deac3..063b80018 100644
> --- a/examples/l3fwd/l3fwd.h
> +++ b/examples/l3fwd/l3fwd.h
> @@ -11,10 +11,6 @@
>  
>  #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
>  
> -#if !defined(NO_HASH_MULTI_LOOKUP) &&
> defined(RTE_MACHINE_CPUFLAG_NEON)
> -#define NO_HASH_MULTI_LOOKUP 1
> -#endif
> -
>  #define MAX_PKT_BURST     32
>  #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
>
Hemant Agrawal Jan. 3, 2019, 8:04 a.m. | #4
On 02-Jan-19 11:53 PM, Honnappa Nagarahalli wrote:
> Thanks Ruifeng for the patch. I have one question inline.
>
> Jerin/Hemant,
> 	It would be good if you could test this on your platforms, since this is being made default.
>
> Thanks,
> Honnappa
>
>> -----Original Message-----
>> From: Ruifeng Wang <ruifeng.wang@arm.com>
>> Sent: Tuesday, January 1, 2019 11:28 PM
>> To: dev@dpdk.org
>> Cc: thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com;
>> bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; Honnappa
>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ruifeng
>> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>> tomaszx.kulasek@intel.com
>> Subject: [PATCH v1] examples/l3fwd: enable hash multi lookup for ARM
>>
>> Compile option for hash_multi_lookup was broken, and caused feature cannot
>> be enabled on Arm.
>> This patch sets hash_multi_lookup method as default, and sequential lookup
>> becomes optional.
>>
>> In test of 8192 flows with 128-byte packets, throughput increased by 25.6%
>> after enabling hash_multi_lookup.

Hi,

   I tested this patch on LS2088 (A72) platform. I have tried both dpaa2 
and armv8a config for builds.

In both cases, I am seeing a drop of 3-8%  in different scenario using 1 
core case with small packets.

  - 1 flow case -  7-8 % drop in performance

  - 8 K flow case - 2-5 % drop in performance


Regards,

Hemant

> I assume these are lookup-hit numbers. Do you have look-up miss numbers?
>
>> Fixes: 52c97adc1f0f ("examples/l3fwd: fix exact match performance")
>> Cc: tomaszx.kulasek@intel.com
>>
>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> Reviewed-by: Phil Yang <phil.yang@arm.com>
>> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>>   examples/l3fwd/l3fwd.h | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
>> c962deac3..063b80018 100644
>> --- a/examples/l3fwd/l3fwd.h
>> +++ b/examples/l3fwd/l3fwd.h
>> @@ -11,10 +11,6 @@
>>
>>   #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
>>
>> -#if !defined(NO_HASH_MULTI_LOOKUP) &&
>> defined(RTE_MACHINE_CPUFLAG_NEON) -#define
>> NO_HASH_MULTI_LOOKUP 1 -#endif
>> -
>>   #define MAX_PKT_BURST     32
>>   #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
>>
>> --
>> 2.17.1
Ruifeng Wang (Arm Technology China) Jan. 3, 2019, 10:12 a.m. | #5
Hi Hemant,

> -----Original Message-----
> From: Hemant Agrawal <hemant.agrawal@nxp.com>
> Sent: Thursday, January 3, 2019 16:05
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> (Arm Technology China) <Ruifeng.Wang@arm.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; jerinj@marvell.com; bruce.richardson@intel.com;
> chaozhu@linux.vnet.ibm.com; nd <nd@arm.com>;
> tomaszx.kulasek@intel.com
> Subject: Re: [dpdk-dev] [PATCH v1] examples/l3fwd: enable hash multi
> lookup for ARM
> 
> 
> On 02-Jan-19 11:53 PM, Honnappa Nagarahalli wrote:
> > Thanks Ruifeng for the patch. I have one question inline.
> >
> > Jerin/Hemant,
> > 	It would be good if you could test this on your platforms, since this is
> being made default.
> >
> > Thanks,
> > Honnappa
> >
> >> -----Original Message-----
> >> From: Ruifeng Wang <ruifeng.wang@arm.com>
> >> Sent: Tuesday, January 1, 2019 11:28 PM
> >> To: dev@dpdk.org
> >> Cc: thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com;
> >> bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; Honnappa
> >> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
> Ruifeng
> >> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> >> tomaszx.kulasek@intel.com
> >> Subject: [PATCH v1] examples/l3fwd: enable hash multi lookup for ARM
> >>
> >> Compile option for hash_multi_lookup was broken, and caused feature
> >> cannot be enabled on Arm.
> >> This patch sets hash_multi_lookup method as default, and sequential
> >> lookup becomes optional.
> >>
> >> In test of 8192 flows with 128-byte packets, throughput increased by
> >> 25.6% after enabling hash_multi_lookup.
> 
> Hi,
> 
>    I tested this patch on LS2088 (A72) platform. I have tried both dpaa2 and
> armv8a config for builds.
> 
> In both cases, I am seeing a drop of 3-8%  in different scenario using 1 core
> case with small packets.
> 
>   - 1 flow case -  7-8 % drop in performance
> 
>   - 8 K flow case - 2-5 % drop in performance
> 
> 
> Regards,
> 
> Hemant
> 
Thanks for your tests.
Were the 8K flow in your case all for hit?
Not know why, but in my tests, 1 core case with 64B packet had notable performance gain with 8K flow.
Although performance had drop with few hash entry configured.

Command used:
./l3fwd -n 4 -c 0x80 -- -E -P --parse-ptype -p 0x3 --config="(0,0,7),(1,0,7)" --hash-entry-num=0x8000

> > I assume these are lookup-hit numbers. Do you have look-up miss
> numbers?
> >
> >> Fixes: 52c97adc1f0f ("examples/l3fwd: fix exact match performance")
> >> Cc: tomaszx.kulasek@intel.com
> >>
> >> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >> Reviewed-by: Phil Yang <phil.yang@arm.com>
> >> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> ---
> >>   examples/l3fwd/l3fwd.h | 4 ----
> >>   1 file changed, 4 deletions(-)
> >>
> >> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> >> c962deac3..063b80018 100644
> >> --- a/examples/l3fwd/l3fwd.h
> >> +++ b/examples/l3fwd/l3fwd.h
> >> @@ -11,10 +11,6 @@
> >>
> >>   #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
> >>
> >> -#if !defined(NO_HASH_MULTI_LOOKUP) &&
> >> defined(RTE_MACHINE_CPUFLAG_NEON) -#define
> NO_HASH_MULTI_LOOKUP 1
> >> -#endif
> >> -
> >>   #define MAX_PKT_BURST     32
> >>   #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
> >>
> >> --
> >> 2.17.1

Patch

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index c962deac3..063b80018 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -11,10 +11,6 @@ 
 
 #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
 
-#if !defined(NO_HASH_MULTI_LOOKUP) && defined(RTE_MACHINE_CPUFLAG_NEON)
-#define NO_HASH_MULTI_LOOKUP 1
-#endif
-
 #define MAX_PKT_BURST     32
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */