[v2,3/3] lib/lpm: remove unnecessary inline

Message ID 20190619053615.24613-3-ruifeng.wang@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2,1/3] lib/lpm: memory orderings to avoid race conditions for v1604 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Ruifeng Wang June 19, 2019, 5:36 a.m. UTC
  Tests showed that the 'inline' keyword caused performance drop
on some x86 platforms after the memory ordering patches applied.
By removing the 'inline' keyword, the performance was recovered
as before on x86 and no impact to arm64 platforms.

Suggested-by: Medvedkin Vladimir <vladimir.medvedkin@intel.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
v2: initail version to recover rte_lpm_add() performance

 lib/librte_lpm/rte_lpm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Vladimir Medvedkin June 19, 2019, 12:51 p.m. UTC | #1
Hi Wang,

1. It is better to explicitly use __rte_noinline with this function 
because my gcc still inlines it even without the inline qualifier.

2. The same should be applied to _v20 functions.

3. Please try running the tests again and show the results.

4. Make this patch the first in a series.

On 19/06/2019 06:36, Ruifeng Wang wrote:
> Tests showed that the 'inline' keyword caused performance drop
> on some x86 platforms after the memory ordering patches applied.
> By removing the 'inline' keyword, the performance was recovered
> as before on x86 and no impact to arm64 platforms.
>
> Suggested-by: Medvedkin Vladimir <vladimir.medvedkin@intel.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
> v2: initail version to recover rte_lpm_add() performance
>
>   lib/librte_lpm/rte_lpm.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 0addff5d4..c97b602e6 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -778,7 +778,7 @@ add_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
>   	return 0;
>   }
>   
> -static inline int32_t
> +static int32_t
>   add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
>   		uint32_t next_hop)
>   {
> @@ -975,7 +975,7 @@ add_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth,
>   	return 0;
>   }
>   
> -static inline int32_t
> +static int32_t
>   add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   		uint32_t next_hop)
>   {
  
Ruifeng Wang June 20, 2019, 10:34 a.m. UTC | #2
Hi Vladimir,

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Wednesday, June 19, 2019 20:51
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> bruce.richardson@intel.com
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v2 3/3] lib/lpm: remove unnecessary inline
> 
> Hi Wang,
> 
> 1. It is better to explicitly use __rte_noinline with this function because my
> gcc still inlines it even without the inline qualifier.
> 
> 2. The same should be applied to _v20 functions.
> 
> 3. Please try running the tests again and show the results.
For x86, I can do test with E5 platform that we have.  

> 
> 4. Make this patch the first in a series.
> 

Thanks. Will come back with new patch set.

> On 19/06/2019 06:36, Ruifeng Wang wrote:
> > Tests showed that the 'inline' keyword caused performance drop on some
> > x86 platforms after the memory ordering patches applied.
> > By removing the 'inline' keyword, the performance was recovered as
> > before on x86 and no impact to arm64 platforms.
> >
> > Suggested-by: Medvedkin Vladimir <vladimir.medvedkin@intel.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> > v2: initail version to recover rte_lpm_add() performance
> >
> >   lib/librte_lpm/rte_lpm.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 0addff5d4..c97b602e6 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -778,7 +778,7 @@ add_depth_small_v20(struct rte_lpm_v20 *lpm,
> uint32_t ip, uint8_t depth,
> >   	return 0;
> >   }
> >
> > -static inline int32_t
> > +static int32_t
> >   add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
> >   		uint32_t next_hop)
> >   {
> > @@ -975,7 +975,7 @@ add_depth_big_v20(struct rte_lpm_v20 *lpm,
> uint32_t ip_masked, uint8_t depth,
> >   	return 0;
> >   }
> >
> > -static inline int32_t
> > +static int32_t
> >   add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t
> depth,
> >   		uint32_t next_hop)
> >   {
> 
> --
> Regards,
> Vladimir
  

Patch

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 0addff5d4..c97b602e6 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -778,7 +778,7 @@  add_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
 	return 0;
 }
 
-static inline int32_t
+static int32_t
 add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
 		uint32_t next_hop)
 {
@@ -975,7 +975,7 @@  add_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth,
 	return 0;
 }
 
-static inline int32_t
+static int32_t
 add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 		uint32_t next_hop)
 {