[dpdk-dev,1/2] eal/arm64: modify I/O device memory barriers

Message ID 20171227042824.33373-1-yskoh@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Yongseok Koh Dec. 27, 2017, 4:28 a.m. UTC
  Instead of using system-wide 'dsb' instruction for IO barriers, 'dmb' is
sufficient and could bring better performance. Using 'dmb' with Outer
Shareable Domain option is also consistent with linux kernel.

Cc: Thomas Speier <tspeier@qti.qualcomm.com>

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Thomas Speier <tspeier@qti.qualcomm.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Jerin Jacob Jan. 4, 2018, 12:58 p.m. UTC | #1
-----Original Message-----
> Date: Tue, 26 Dec 2017 20:28:23 -0800
> From: Yongseok Koh <yskoh@mellanox.com>
> To: adrien.mazarguil@6wind.com, nelio.laranjeiro@6wind.com,
>  jerin.jacob@caviumnetworks.com, jianbo.liu@arm.com
> CC: dev@dpdk.org, Yongseok Koh <yskoh@mellanox.com>, Thomas Speier
>  <tspeier@qti.qualcomm.com>
> Subject: [PATCH 1/2] eal/arm64: modify I/O device memory barriers
> X-Mailer: git-send-email 2.11.0
> 
> Instead of using system-wide 'dsb' instruction for IO barriers, 'dmb' is
> sufficient and could bring better performance. Using 'dmb' with Outer
> Shareable Domain option is also consistent with linux kernel.
> 
> Cc: Thomas Speier <tspeier@qti.qualcomm.com>
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Acked-by: Thomas Speier <tspeier@qti.qualcomm.com>
> Acked-by: Shahaf Shuler <shahafs@mellanox.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

> ---
>  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> index 0b70d6209..8dcce6054 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> @@ -58,11 +58,11 @@ extern "C" {
>  
>  #define rte_smp_rmb() dmb(ishld)
>  
> -#define rte_io_mb() rte_mb()
> +#define rte_io_mb() dmb(osh)
>  
> -#define rte_io_wmb() rte_wmb()
> +#define rte_io_wmb() dmb(oshst)
>  
> -#define rte_io_rmb() rte_rmb()
> +#define rte_io_rmb() dmb(oshld)
>  
>  #ifdef __cplusplus
>  }
> -- 
> 2.11.0
>
  
Jianbo Liu Jan. 8, 2018, 1:55 a.m. UTC | #2
The 12/26/2017 20:28, Yongseok Koh wrote:
> Instead of using system-wide 'dsb' instruction for IO barriers, 'dmb' is
> sufficient and could bring better performance. Using 'dmb' with Outer
> Shareable Domain option is also consistent with linux kernel.

But in kernel dsb is used for io barriers.
https://github.com/torvalds/linux/blob/master/arch/arm64/include/asm/io.h#L109

Do you consider adding dma_*mb?
https://github.com/torvalds/linux/blob/master/arch/arm64/include/asm/barrier.h#L40

>
> Cc: Thomas Speier <tspeier@qti.qualcomm.com>
>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Acked-by: Thomas Speier <tspeier@qti.qualcomm.com>
> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> index 0b70d6209..8dcce6054 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> @@ -58,11 +58,11 @@ extern "C" {
>
>  #define rte_smp_rmb() dmb(ishld)
>
> -#define rte_io_mb() rte_mb()
> +#define rte_io_mb() dmb(osh)
>
> -#define rte_io_wmb() rte_wmb()
> +#define rte_io_wmb() dmb(oshst)
>
> -#define rte_io_rmb() rte_rmb()
> +#define rte_io_rmb() dmb(oshld)
>
>  #ifdef __cplusplus
>  }
> --
> 2.11.0
>

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Yongseok Koh Jan. 16, 2018, 12:42 a.m. UTC | #3
> On Jan 7, 2018, at 5:55 PM, Jianbo Liu <Jianbo.Liu@arm.com> wrote:
> 
> The 12/26/2017 20:28, Yongseok Koh wrote:
>> Instead of using system-wide 'dsb' instruction for IO barriers, 'dmb' is
>> sufficient and could bring better performance. Using 'dmb' with Outer
>> Shareable Domain option is also consistent with linux kernel.
> 
> But in kernel dsb is used for io barriers.
> Do you consider adding dma_*mb?

Right. I'll send out a patchset, which adds rte_dma_rmb/wmb() today.

Thanks
Yongseok
  
Yongseok Koh Jan. 16, 2018, 1:10 a.m. UTC | #4
This patchset is to introduce DMA memory barriers, which could be more
efficient for coherent memory between I/O device and CPU, especially for
ARMv8.

Yongseok Koh (8):
  eal: introduce DMA memory barriers
  eal/x86: define DMA memory barriers
  eal/ppc64: define DMA device memory barriers
  eal/armv7: define DMA memory barriers
  eal/arm64: define DMA memory barriers
  net/mlx5: remove unnecessary memory barrier
  net/mlx5: replace IO memory barrier with DMA memory barrier
  net/mlx5: fix synchonization on polling Rx completions

 drivers/net/mlx5/mlx5_rxq.c                        |  1 -
 drivers/net/mlx5/mlx5_rxtx.c                       |  5 +-
 drivers/net/mlx5/mlx5_rxtx.h                       |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                   |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h              | 53 ++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h               |  2 +-
 .../common/include/arch/arm/rte_atomic_32.h        |  4 ++
 .../common/include/arch/arm/rte_atomic_64.h        |  4 ++
 .../common/include/arch/ppc_64/rte_atomic.h        |  4 ++
 .../common/include/arch/x86/rte_atomic.h           |  4 ++
 lib/librte_eal/common/include/generic/rte_atomic.h | 18 ++++++++
 11 files changed, 70 insertions(+), 29 deletions(-)
  
Yongseok Koh Jan. 19, 2018, 12:44 a.m. UTC | #5
This patchset is to introduce DMA memory barriers, which could be more
efficient for coherent memory between I/O device and CPU, especially for
ARMv8.

v3:
* add more detailed comments about the new memory barriers.

v2:
* introduce DMA memory barriers.

Yongseok Koh (8):
  eal: introduce DMA memory barriers
  eal/x86: define DMA memory barriers
  eal/ppc64: define DMA memory barriers
  eal/armv7: define DMA memory barriers
  eal/arm64: define DMA memory barriers
  net/mlx5: remove unnecessary memory barrier
  net/mlx5: replace IO memory barrier with DMA memory barrier
  net/mlx5: fix synchonization on polling Rx completions

 drivers/net/mlx5/mlx5_rxq.c                        |  1 -
 drivers/net/mlx5/mlx5_rxtx.c                       |  5 +-
 drivers/net/mlx5/mlx5_rxtx.h                       |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                   |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h              | 53 ++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h               |  2 +-
 .../common/include/arch/arm/rte_atomic_32.h        |  4 ++
 .../common/include/arch/arm/rte_atomic_64.h        |  4 ++
 .../common/include/arch/ppc_64/rte_atomic.h        |  4 ++
 .../common/include/arch/x86/rte_atomic.h           |  4 ++
 lib/librte_eal/common/include/generic/rte_atomic.h | 52 +++++++++++++++++++++
 11 files changed, 104 insertions(+), 29 deletions(-)
  
Yongseok Koh Jan. 25, 2018, 9:02 p.m. UTC | #6
This patchset is to introduce coherent I/O memory barriers, which could be more
efficient for coherent memory between I/O device and CPU, especially for ARMv8.

v4:
* rename barriers to "coherent I/O memory barrier".
* Make groups for various barriers in Doxygen doc.

v3:
* add more detailed comments about the new memory barriers.

v2:
* introduce DMA memory barriers.

Yongseok Koh (9):
  eal: add Doxygen grouping for memory barriers
  eal: introduce coherent I/O memory barriers
  eal/x86: define coherent I/O memory barriers
  eal/ppc64: define coherent I/O memory barriers
  eal/armv7: define coherent I/O memory barriers
  eal/arm64: define coherent I/O memory barriers
  net/mlx5: remove unnecessary memory barrier
  net/mlx5: replace I/O memory barrier with coherent version
  net/mlx5: fix synchronization on polling Rx completions

 drivers/net/mlx5/mlx5_rxq.c                        |  1 -
 drivers/net/mlx5/mlx5_rxtx.c                       |  5 +-
 drivers/net/mlx5/mlx5_rxtx.h                       |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                   |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h              | 53 ++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h               |  2 +-
 .../common/include/arch/arm/rte_atomic_32.h        |  4 ++
 .../common/include/arch/arm/rte_atomic_64.h        |  4 ++
 .../common/include/arch/ppc_64/rte_atomic.h        |  4 ++
 .../common/include/arch/x86/rte_atomic.h           |  4 ++
 lib/librte_eal/common/include/generic/rte_atomic.h | 51 +++++++++++++++++++++
 11 files changed, 103 insertions(+), 29 deletions(-)
  
Thomas Monjalon Jan. 28, 2018, 7:32 a.m. UTC | #7
25/01/2018 22:02, Yongseok Koh:
> This patchset is to introduce coherent I/O memory barriers, which could be more
> efficient for coherent memory between I/O device and CPU, especially for ARMv8.
> 
> v4:
> * rename barriers to "coherent I/O memory barrier".
> * Make groups for various barriers in Doxygen doc.
> 
> v3:
> * add more detailed comments about the new memory barriers.
> 
> v2:
> * introduce DMA memory barriers.
> 
> Yongseok Koh (9):
>   eal: add Doxygen grouping for memory barriers
>   eal: introduce coherent I/O memory barriers
>   eal/x86: define coherent I/O memory barriers
>   eal/ppc64: define coherent I/O memory barriers
>   eal/armv7: define coherent I/O memory barriers
>   eal/arm64: define coherent I/O memory barriers
>   net/mlx5: remove unnecessary memory barrier
>   net/mlx5: replace I/O memory barrier with coherent version
>   net/mlx5: fix synchronization on polling Rx completions

Applied, thanks
  

Patch

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 0b70d6209..8dcce6054 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -58,11 +58,11 @@  extern "C" {
 
 #define rte_smp_rmb() dmb(ishld)
 
-#define rte_io_mb() rte_mb()
+#define rte_io_mb() dmb(osh)
 
-#define rte_io_wmb() rte_wmb()
+#define rte_io_wmb() dmb(oshst)
 
-#define rte_io_rmb() rte_rmb()
+#define rte_io_rmb() dmb(oshld)
 
 #ifdef __cplusplus
 }