[v3,09/21] net/ena/base: use optimized memcpy version also on Arm

Message ID 20220223121944.24156-10-mk@semihalf.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series net/ena: v2.6.0 driver update |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Michal Krawczyk Feb. 23, 2022, 12:19 p.m. UTC
  As the default behavior for arm64 is to alias rte_memcpy as memcpy, ENA
cannot redefine memcpy as rte_memcpy as it would cause nested
declaration.

To make it possible to use optimized memcpy in the ena_com layer on Arm,
the driver now redefines memcpy when it is beneficial:
  * For arm64 only when the flag RTE_ARCH_ARM64_MEMCPY was defined
  * For arm only when the flag RTE_ARCH_ARM_NEON_MEMCPY was defined

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
 doc/guides/rel_notes/release_22_03.rst | 1 +
 drivers/net/ena/base/ena_plat_dpdk.h   | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)
  

Comments

Ferruh Yigit Feb. 23, 2022, 5:25 p.m. UTC | #1
On 2/23/2022 12:19 PM, Michal Krawczyk wrote:
> As the default behavior for arm64 is to alias rte_memcpy as memcpy, ENA
> cannot redefine memcpy as rte_memcpy as it would cause nested
> declaration.
> 
> To make it possible to use optimized memcpy in the ena_com layer on Arm,

Out of curiosity, do you have any performance measurements for
the optimized memcpy usage?

> the driver now redefines memcpy when it is beneficial:
>    * For arm64 only when the flag RTE_ARCH_ARM64_MEMCPY was defined
>    * For arm only when the flag RTE_ARCH_ARM_NEON_MEMCPY was defined
> 
> Signed-off-by: Michal Krawczyk <mk@semihalf.com>
> Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
> Reviewed-by: Shai Brandes <shaibran@amazon.com>
> ---
>   doc/guides/rel_notes/release_22_03.rst | 1 +
>   drivers/net/ena/base/ena_plat_dpdk.h   | 7 +++++--
>   2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index c8e38d4c70..92490afd60 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -112,6 +112,7 @@ New Features
>     * Added new checksum related xstats: ``l3_csum_bad``, ``l4_csum_bad`` and
>       ``l4_csum_good``.
>     * Added support for the link status configuration.
> +  * Added optimized memcpy support for the ARM platforms.
>   
>   * **Updated Cisco enic driver.**
>   
> diff --git a/drivers/net/ena/base/ena_plat_dpdk.h b/drivers/net/ena/base/ena_plat_dpdk.h
> index 4e7f52881a..41db883c63 100644
> --- a/drivers/net/ena/base/ena_plat_dpdk.h
> +++ b/drivers/net/ena/base/ena_plat_dpdk.h
> @@ -66,8 +66,11 @@ typedef uint64_t dma_addr_t;
>   #define ENA_UDELAY(x) rte_delay_us_block(x)
>   
>   #define ENA_TOUCH(x) ((void)(x))
> -/* Avoid nested declaration on arm64, as it may define rte_memcpy as memcpy. */
> -#if defined(RTE_ARCH_X86)
> +/* Redefine memcpy with caution: rte_memcpy can be simply aliased to memcpy, so
> + * make the redefinition only if it's safe (and beneficial) to do so.
> + */
> +#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64_MEMCPY) || \
> +	defined(RTE_ARCH_ARM_NEON_MEMCPY)
>   #undef memcpy
>   #define memcpy rte_memcpy
>   #endif

I can see there is 'ena_plat_dpdk.h', which seems like an osdep header,

it is possible to use 'ena_memcpy' in the code and in the 'ena_plat_dpdk.h'
define it as:
#define ena_memcpy rte_memcpy


This is just for your information if it helps, usage is up to you.
  
Michal Krawczyk Feb. 23, 2022, 5:40 p.m. UTC | #2
śr., 23 lut 2022 o 18:26 Ferruh Yigit <ferruh.yigit@intel.com> napisał(a):
>
> On 2/23/2022 12:19 PM, Michal Krawczyk wrote:
> > As the default behavior for arm64 is to alias rte_memcpy as memcpy, ENA
> > cannot redefine memcpy as rte_memcpy as it would cause nested
> > declaration.
> >
> > To make it possible to use optimized memcpy in the ena_com layer on Arm,
>
> Out of curiosity, do you have any performance measurements for
> the optimized memcpy usage?
>

Unfortunately no - on AWS it's hard to see noticeable performance
improvement due to limitations in terms of the PPS number that can be
processed at maximum. The basic packet generator applications are
mostly idle in that situation.

> > the driver now redefines memcpy when it is beneficial:
> >    * For arm64 only when the flag RTE_ARCH_ARM64_MEMCPY was defined
> >    * For arm only when the flag RTE_ARCH_ARM_NEON_MEMCPY was defined
> >
> > Signed-off-by: Michal Krawczyk <mk@semihalf.com>
> > Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
> > Reviewed-by: Shai Brandes <shaibran@amazon.com>
> > ---
> >   doc/guides/rel_notes/release_22_03.rst | 1 +
> >   drivers/net/ena/base/ena_plat_dpdk.h   | 7 +++++--
> >   2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> > index c8e38d4c70..92490afd60 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -112,6 +112,7 @@ New Features
> >     * Added new checksum related xstats: ``l3_csum_bad``, ``l4_csum_bad`` and
> >       ``l4_csum_good``.
> >     * Added support for the link status configuration.
> > +  * Added optimized memcpy support for the ARM platforms.
> >
> >   * **Updated Cisco enic driver.**
> >
> > diff --git a/drivers/net/ena/base/ena_plat_dpdk.h b/drivers/net/ena/base/ena_plat_dpdk.h
> > index 4e7f52881a..41db883c63 100644
> > --- a/drivers/net/ena/base/ena_plat_dpdk.h
> > +++ b/drivers/net/ena/base/ena_plat_dpdk.h
> > @@ -66,8 +66,11 @@ typedef uint64_t dma_addr_t;
> >   #define ENA_UDELAY(x) rte_delay_us_block(x)
> >
> >   #define ENA_TOUCH(x) ((void)(x))
> > -/* Avoid nested declaration on arm64, as it may define rte_memcpy as memcpy. */
> > -#if defined(RTE_ARCH_X86)
> > +/* Redefine memcpy with caution: rte_memcpy can be simply aliased to memcpy, so
> > + * make the redefinition only if it's safe (and beneficial) to do so.
> > + */
> > +#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64_MEMCPY) || \
> > +     defined(RTE_ARCH_ARM_NEON_MEMCPY)
> >   #undef memcpy
> >   #define memcpy rte_memcpy
> >   #endif
>
> I can see there is 'ena_plat_dpdk.h', which seems like an osdep header,
>
> it is possible to use 'ena_memcpy' in the code and in the 'ena_plat_dpdk.h'
> define it as:
> #define ena_memcpy rte_memcpy
>
>
> This is just for your information if it helps, usage is up to you.

Thanks for the input! It looks like a robust solution, but
unfortunately we can't modify other ena_com files which use raw memcpy
calls.The preferred approach is to do some work-arounds in the osdep
headers instead of modifying the common files.
  

Patch

diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
index c8e38d4c70..92490afd60 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -112,6 +112,7 @@  New Features
   * Added new checksum related xstats: ``l3_csum_bad``, ``l4_csum_bad`` and
     ``l4_csum_good``.
   * Added support for the link status configuration.
+  * Added optimized memcpy support for the ARM platforms.
 
 * **Updated Cisco enic driver.**
 
diff --git a/drivers/net/ena/base/ena_plat_dpdk.h b/drivers/net/ena/base/ena_plat_dpdk.h
index 4e7f52881a..41db883c63 100644
--- a/drivers/net/ena/base/ena_plat_dpdk.h
+++ b/drivers/net/ena/base/ena_plat_dpdk.h
@@ -66,8 +66,11 @@  typedef uint64_t dma_addr_t;
 #define ENA_UDELAY(x) rte_delay_us_block(x)
 
 #define ENA_TOUCH(x) ((void)(x))
-/* Avoid nested declaration on arm64, as it may define rte_memcpy as memcpy. */
-#if defined(RTE_ARCH_X86)
+/* Redefine memcpy with caution: rte_memcpy can be simply aliased to memcpy, so
+ * make the redefinition only if it's safe (and beneficial) to do so.
+ */
+#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64_MEMCPY) || \
+	defined(RTE_ARCH_ARM_NEON_MEMCPY)
 #undef memcpy
 #define memcpy rte_memcpy
 #endif