[v2,2/2] net/mlx5: remove unnecessary wmb for Memory Region cache

Message ID 20210517100002.19905-3-feifei.wang2@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Raslan Darawsheh
Headers
Series remove wmb for net/mlx |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Functional success Functional Testing PASS

Commit Message

Feifei Wang May 17, 2021, 10 a.m. UTC
  'dev_gen' is a variable to inform other cores to flush their local cache
when global cache is rebuilt. It is unnecessary to add write memory
barrier (wmb) before or after its updating for synchronization.

This is due to MR cache's R/W lock can maintain synchronization between
threads:
1. dev_gen and global cache update ordering inside the lock protected
section does not matter. Because other threads cannot take the lock
until global cache has been updated. Thus, in out of order platform,
even if other agents firstly observed updated dev_gen but global does
not update, they also needs to wait the lock. As a result, it is
unnecessary to add a wmb between rebuiling global cache and updating
dev_gen to keep the order of rebuilding global cache and updating
dev_gen.

2. Store-Release of unlock can provide the implicit wmb at the level
visible by software. This makes 'rebuiling global cache' and 'updating
dev_gen' be observed before local_cache starts to be updated by other
agents. Thus, wmb after 'updating dev_gen' can be removed.

Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)
  

Comments

Slava Ovsiienko May 17, 2021, 2:15 p.m. UTC | #1
Hi, Feifei

Thanks you for the patch.
Please, see my notes below about typos and minor commit message rewording.

> -----Original Message-----
> From: Feifei Wang <feifei.wang2@arm.com>
> Sent: Monday, May 17, 2021 13:00
> To: Matan Azrad <matan@nvidia.com>; Shahaf Shuler
> <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; nd@arm.com; Feifei Wang <feifei.wang2@arm.com>;
> Ruifeng Wang <ruifeng.wang@arm.com>
> Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> Region cache
> 
> 'dev_gen' is a variable to inform other cores to flush their local cache when
> global cache is rebuilt. It is unnecessary to add write memory barrier (wmb)
> before or after its updating for synchronization.
> 
Would it be better "to trigger all cores to flush their local caches once the global
MR cache has been rebuilt"  ? 

> This is due to MR cache's R/W lock can maintain synchronization between
> threads:
I would add empty line here.
> 1. dev_gen and global cache update ordering inside the lock protected
> section does not matter. Because other threads cannot take the lock until
> global cache has been updated. Thus, in out of order platform, even if other
> agents firstly observed updated dev_gen but global does not update, they
> also needs to wait the lock. As a result, it is unnecessary to add a wmb
Type: "need" (no S) -> "have to" would be better ? 

> between rebuiling global cache and updating dev_gen to keep the order

rebuiling -> rebuilDing
And let's reword a little bit?
"wmb between global cache rebuilding and updating the dev_gen to keep the memory store order."

> rebuilding global cache and updating dev_gen.
> 
> 2. Store-Release of unlock can provide the implicit wmb at the level visible by
can provide -> provides

> software. This makes 'rebuiling global cache' and 'updating dev_gen' be
Typo: rebuiling -> rebuilDing


> observed before local_cache starts to be updated by other agents. Thus,
> wmb after 'updating dev_gen' can be removed.
> 
> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index
> e791b6338d..85e5865050 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct
> mlx5_dev_ctx_shared *sh,
>  	if (rebuild) {
>  		mlx5_mr_rebuild_cache(&sh->share_cache);
>  		/*
> -		 * Flush local caches by propagating invalidation across cores.
> -		 * rte_smp_wmb() is enough to synchronize this event. If
> one of
> -		 * freed memsegs is seen by other core, that means the
> memseg
> -		 * has been allocated by allocator, which will come after this
> -		 * free call. Therefore, this store instruction (incrementing
> -		 * generation below) will be guaranteed to be seen by other
> core
> -		 * before the core sees the newly allocated memory.
> +		 * No wmb is needed after updating dev_gen due to store-
> release of
> +		 * unlock can provide the implicit wmb at the level visible by
> +		 * software. This makes rebuilt global cache and updated
> dev_gen
> +		 * be observed when local_cache starts to be updating by
> other
> +		 * agents.
>  		 */
Let's make comment a less wordy (and try to keep source code concise), what about this?
"No explicit wmb is needed after updating dev_gen due to store-release ordering
in unlock that provides the implicit barrier at the software visible level."

>  		++sh->share_cache.dev_gen;
>  		DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
>  		      sh->share_cache.dev_gen);
> -		rte_smp_wmb();
>  	}
>  	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
>  }
> @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev,
> void *addr,
>  	      (void *)mr);
>  	mlx5_mr_rebuild_cache(&sh->share_cache);
>  	/*
> -	 * Flush local caches by propagating invalidation across cores.
> -	 * rte_smp_wmb() is enough to synchronize this event. If one of
> -	 * freed memsegs is seen by other core, that means the memseg
> -	 * has been allocated by allocator, which will come after this
> -	 * free call. Therefore, this store instruction (incrementing
> -	 * generation below) will be guaranteed to be seen by other core
> -	 * before the core sees the newly allocated memory.
> +	 * No wmb is needed after updating dev_gen due to store-release of
> +	 * unlock can provide the implicit wmb at the level visible by
> +	 * software. This makes rebuilt global cache and updated dev_gen
> +	 * be observed when local_cache starts to be updating by other
> +	 * agents.
The same as previous comment above.

Please, apply the same comments to the mlx4 patch:
http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2-feifei.wang2@arm.com/

With best regards,
Slava
  
Feifei Wang May 18, 2021, 8:52 a.m. UTC | #2
Hi, Slava

> -----邮件原件-----
> 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> 发送时间: 2021年5月17日 22:15
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> 主题: RE: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> Region cache
> 
> Hi, Feifei
> 
> Thanks you for the patch.
> Please, see my notes below about typos and minor commit message
> rewording.

Thanks very much for your very careful reviewing.
I will apply these in the next version.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
> 
> > -----Original Message-----
> > From: Feifei Wang <feifei.wang2@arm.com>
> > Sent: Monday, May 17, 2021 13:00
> > To: Matan Azrad <matan@nvidia.com>; Shahaf Shuler
> > <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; nd@arm.com; Feifei Wang <feifei.wang2@arm.com>;
> > Ruifeng Wang <ruifeng.wang@arm.com>
> > Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> > Region cache
> >
> > 'dev_gen' is a variable to inform other cores to flush their local
> > cache when global cache is rebuilt. It is unnecessary to add write
> > memory barrier (wmb) before or after its updating for synchronization.
> >
> Would it be better "to trigger all cores to flush their local caches once the
> global MR cache has been rebuilt"  ?
> 
1.Yes, I think this can be more clear.

> > This is due to MR cache's R/W lock can maintain synchronization
> > between
> > threads:
> I would add empty line here.
2.Done.

> > 1. dev_gen and global cache update ordering inside the lock protected
> > section does not matter. Because other threads cannot take the lock
> > until global cache has been updated. Thus, in out of order platform,
> > even if other agents firstly observed updated dev_gen but global does
> > not update, they also needs to wait the lock. As a result, it is
> > unnecessary to add a wmb
> Type: "need" (no S) -> "have to" would be better ?
> 
3.Done.

> > between rebuiling global cache and updating dev_gen to keep the order
> 
> rebuiling -> rebuilding
4.Done.

> And let's reword a little bit?
> "wmb between global cache rebuilding and updating the dev_gen to keep
> the memory store order."
> 
5.Done.

> > rebuilding global cache and updating dev_gen.
> >
> > 2. Store-Release of unlock can provide the implicit wmb at the level
> > visible by
> can provide -> provides
> 
6.Done.

> > software. This makes 'rebuiling global cache' and 'updating dev_gen'
> > be
> Typo: rebuiling -> rebuilding
7.Done.

> 
> 
> > observed before local_cache starts to be updated by other agents.
> > Thus, wmb after 'updating dev_gen' can be removed.
> >
> > Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
> >  1 file changed, 10 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> > index
> > e791b6338d..85e5865050 100644
> > --- a/drivers/net/mlx5/mlx5_mr.c
> > +++ b/drivers/net/mlx5/mlx5_mr.c
> > @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct
> > mlx5_dev_ctx_shared *sh,
> >  	if (rebuild) {
> >  		mlx5_mr_rebuild_cache(&sh->share_cache);
> >  		/*
> > -		 * Flush local caches by propagating invalidation across cores.
> > -		 * rte_smp_wmb() is enough to synchronize this event. If
> > one of
> > -		 * freed memsegs is seen by other core, that means the
> > memseg
> > -		 * has been allocated by allocator, which will come after this
> > -		 * free call. Therefore, this store instruction (incrementing
> > -		 * generation below) will be guaranteed to be seen by other
> > core
> > -		 * before the core sees the newly allocated memory.
> > +		 * No wmb is needed after updating dev_gen due to store-
> > release of
> > +		 * unlock can provide the implicit wmb at the level visible by
> > +		 * software. This makes rebuilt global cache and updated
> > dev_gen
> > +		 * be observed when local_cache starts to be updating by
> > other
> > +		 * agents.
> >  		 */
> Let's make comment a less wordy (and try to keep source code concise),
> what about this?
> "No explicit wmb is needed after updating dev_gen due to store-release
> ordering in unlock that provides the implicit barrier at the software visible
> level."
8.That's better than before. A concise comment works better in the code.

> 
> >  		++sh->share_cache.dev_gen;
> >  		DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
> >  		      sh->share_cache.dev_gen);
> > -		rte_smp_wmb();
> >  	}
> >  	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
> >  }
> > @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev,
> void
> > *addr,
> >  	      (void *)mr);
> >  	mlx5_mr_rebuild_cache(&sh->share_cache);
> >  	/*
> > -	 * Flush local caches by propagating invalidation across cores.
> > -	 * rte_smp_wmb() is enough to synchronize this event. If one of
> > -	 * freed memsegs is seen by other core, that means the memseg
> > -	 * has been allocated by allocator, which will come after this
> > -	 * free call. Therefore, this store instruction (incrementing
> > -	 * generation below) will be guaranteed to be seen by other core
> > -	 * before the core sees the newly allocated memory.
> > +	 * No wmb is needed after updating dev_gen due to store-release of
> > +	 * unlock can provide the implicit wmb at the level visible by
> > +	 * software. This makes rebuilt global cache and updated dev_gen
> > +	 * be observed when local_cache starts to be updating by other
> > +	 * agents.
> The same as previous comment above.
9.Done.

> 
> Please, apply the same comments to the mlx4 patch:
> http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2-
> feifei.wang2@arm.com/
> 
10.Done.

Best Regards
Feifei

> With best regards,
> Slava
  

Patch

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e791b6338d..85e5865050 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -107,18 +107,15 @@  mlx5_mr_mem_event_free_cb(struct mlx5_dev_ctx_shared *sh,
 	if (rebuild) {
 		mlx5_mr_rebuild_cache(&sh->share_cache);
 		/*
-		 * Flush local caches by propagating invalidation across cores.
-		 * rte_smp_wmb() is enough to synchronize this event. If one of
-		 * freed memsegs is seen by other core, that means the memseg
-		 * has been allocated by allocator, which will come after this
-		 * free call. Therefore, this store instruction (incrementing
-		 * generation below) will be guaranteed to be seen by other core
-		 * before the core sees the newly allocated memory.
+		 * No wmb is needed after updating dev_gen due to store-release of
+		 * unlock can provide the implicit wmb at the level visible by
+		 * software. This makes rebuilt global cache and updated dev_gen
+		 * be observed when local_cache starts to be updating by other
+		 * agents.
 		 */
 		++sh->share_cache.dev_gen;
 		DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
 		      sh->share_cache.dev_gen);
-		rte_smp_wmb();
 	}
 	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
 }
@@ -411,18 +408,15 @@  mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	      (void *)mr);
 	mlx5_mr_rebuild_cache(&sh->share_cache);
 	/*
-	 * Flush local caches by propagating invalidation across cores.
-	 * rte_smp_wmb() is enough to synchronize this event. If one of
-	 * freed memsegs is seen by other core, that means the memseg
-	 * has been allocated by allocator, which will come after this
-	 * free call. Therefore, this store instruction (incrementing
-	 * generation below) will be guaranteed to be seen by other core
-	 * before the core sees the newly allocated memory.
+	 * No wmb is needed after updating dev_gen due to store-release of
+	 * unlock can provide the implicit wmb at the level visible by
+	 * software. This makes rebuilt global cache and updated dev_gen
+	 * be observed when local_cache starts to be updating by other
+	 * agents.
 	 */
 	++sh->share_cache.dev_gen;
 	DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
 	      sh->share_cache.dev_gen);
-	rte_smp_wmb();
 	rte_rwlock_read_unlock(&sh->share_cache.rwlock);
 	return 0;
 }