x86: rte_mov256 was missing for AVX2

Message ID 20220820103032.119741-1-mb@smartsharesystems.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series x86: rte_mov256 was missing for AVX2 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build success github build: passed
ci/iol-x86_64-compile-testing fail Testing issues
ci/iol-x86_64-unit-testing fail Testing issues

Commit Message

Morten Brørup Aug. 20, 2022, 10:30 a.m. UTC
  The rte_mov256 function was missing for AVX2.
Does nobody build test for AVX2 and check the compiler output?

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/eal/x86/include/rte_memcpy.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
  

Comments

Thomas Monjalon Aug. 29, 2022, 10:55 a.m. UTC | #1
20/08/2022 12:30, Morten Brørup:
> The rte_mov256 function was missing for AVX2.
> Does nobody build test for AVX2 and check the compiler output?

Please could you specify the context/setup to reproduce the issue?

An error message would be nice to paste here as well.
Thanks
  
Morten Brørup Aug. 29, 2022, 12:18 p.m. UTC | #2
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, 29 August 2022 12.56
> 
> 20/08/2022 12:30, Morten Brørup:
> > The rte_mov256 function was missing for AVX2.
> > Does nobody build test for AVX2 and check the compiler output?
> 
> Please could you specify the context/setup to reproduce the issue?

I stumbled upon it while working on the new non-temporal memcpy function.

Reproduction described below.

> 
> An error message would be nice to paste here as well.
> Thanks

The rte_memcpy declarations are in the lib/eal/generic/rte_memcpy.h header file, so add this declaration header file to the implementation file. (I wonder why it is not already there?)

lib/eal/x86/rte_memcpy.h:

  #include <rte_common.h>
  #include <rte_config.h>
  #include <rte_debug.h>
+ #include "generic/rte_memcpy.h"

  #ifdef __cplusplus
  extern "C" {
  #endif


The error messages from ninja look like this:

[46/2597] Compiling C object lib/acl/libavx2_tmp.a.p/acl_run_avx2.c.o
In file included from ../lib/eal/x86/include/rte_memcpy.h:24,
                 from ../lib/acl/rte_acl_osdep.h:40,
                 from ../lib/acl/rte_acl.h:14,
                 from ../lib/acl/acl_run.h:8,
                 from ../lib/acl/acl_run_sse.h:5,
                 from ../lib/acl/acl_run_avx2.h:5,
                 from ../lib/acl/acl_run_avx2.c:6:
../lib/eal/include/generic/rte_memcpy.h:89:1: warning: 'rte_mov256' declared 'static' but never defined [-Wunused-function]
   89 | rte_mov256(uint8_t *dst, const uint8_t *src);
      | ^~~~~~~~~~
[52/2597] Compiling C object lib/acl/libavx512_tmp.a.p/acl_run_avx512.c.o
In file included from ../lib/eal/x86/include/rte_memcpy.h:24,
                 from ../lib/acl/rte_acl_osdep.h:40,
                 from ../lib/acl/rte_acl.h:14,
                 from ../lib/acl/acl_run.h:8,
                 from ../lib/acl/acl_run_sse.h:5,
                 from ../lib/acl/acl_run_avx512.c:5:
../lib/eal/include/generic/rte_memcpy.h:89:1: warning: 'rte_mov256' declared 'static' but never defined [-Wunused-function]
   89 | rte_mov256(uint8_t *dst, const uint8_t *src);
      | ^~~~~~~~~~


At SmartShare Systems we follow a coding convention of including the declaration header file at the absolute top of the file implementing it. This reveals at build time if anything is missing in the declaration header file. The DPDK Project could do the same, and find bugs like this.

Here's an example:

foo.h:
------
// Declaration
static inline uint32_t bar(uint32_t x);

foo.c:
------
#include <foo.h> // <-- Note: At the absolute top!
#include <stdint.h>

// Implementation
static inline uint32_t bar(uint32_t x)
{
	return x * 2;
}

Following our coding convention will reveal that <stdint.h> is required for using <foo.h>, and thus should be included in foo.h (not in foo.c) - because someone else might include <foo.h>, and then <stdint.h> could be missing there.

-Morten
  
Thomas Monjalon Aug. 29, 2022, 1:12 p.m. UTC | #3
29/08/2022 14:18, Morten Brørup:
> At SmartShare Systems we follow a coding convention of including the declaration header file at the absolute top of the file implementing it. This reveals at build time if anything is missing in the declaration header file. The DPDK Project could do the same, and find bugs like this.
> 
> Here's an example:
> 
> foo.h:
> ------
> // Declaration
> static inline uint32_t bar(uint32_t x);
> 
> foo.c:
> ------
> #include <foo.h> // <-- Note: At the absolute top!
> #include <stdint.h>
> 
> // Implementation
> static inline uint32_t bar(uint32_t x)
> {
> 	return x * 2;
> }
> 
> Following our coding convention will reveal that <stdint.h> is required for using <foo.h>, and thus should be included in foo.h (not in foo.c) - because someone else might include <foo.h>, and then <stdint.h> could be missing there.

Yes we could follow this convention.
  
Morten Brørup Sept. 28, 2022, 7:44 p.m. UTC | #4
Bruce, David, Thomas,

PING. Please ack or review this simple patch, so it can be merged.

Details were already discussed on the list with Thomas.

NB: The test errors in Patchwork are bogus: "ERROR: Could not detect Ninja v1.5 or newer" is clearly not related to the patch.

-Morten

> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Saturday, 20 August 2022 12.31
> 
> The rte_mov256 function was missing for AVX2.
> Does nobody build test for AVX2 and check the compiler output?
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/eal/x86/include/rte_memcpy.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index b678b5c942..d4d7a5cfc8 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -371,6 +371,23 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
>  	rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 *
> 32);
>  }
> 
> +/**
> + * Copy 256 bytes from one location to another,
> + * locations should not overlap.
> + */
> +static __rte_always_inline void
> +rte_mov256(uint8_t *dst, const uint8_t *src)
> +{
> +	rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 *
> 32);
> +	rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 *
> 32);
> +	rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 *
> 32);
> +	rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 *
> 32);
> +	rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 *
> 32);
> +	rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 *
> 32);
> +	rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 *
> 32);
> +	rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 *
> 32);
> +}
> +
>  /**
>   * Copy 128-byte blocks from one location to another,
>   * locations should not overlap.
> --
> 2.17.1
  
Bruce Richardson Sept. 29, 2022, 8:25 a.m. UTC | #5
On Wed, Sep 28, 2022 at 09:44:35PM +0200, Morten Brørup wrote:
> Bruce, David, Thomas,
> 
> PING. Please ack or review this simple patch, so it can be merged.
> 
> Details were already discussed on the list with Thomas.
> 
> NB: The test errors in Patchwork are bogus: "ERROR: Could not detect Ninja v1.5 or newer" is clearly not related to the patch.
> 
> -Morten
> 
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Saturday, 20 August 2022 12.31
> > 
> > The rte_mov256 function was missing for AVX2.
> > Does nobody build test for AVX2 and check the compiler output?
> > 
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
David Marchand Sept. 30, 2022, 8:34 a.m. UTC | #6
On Sat, Aug 20, 2022 at 12:30 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> The rte_mov256 function was missing for AVX2.

Afaiu:
Fixes: 9144d6bcdefd ("eal/x86: optimize memcpy for SSE and AVX")

This has been missing for a long time, so I guess nobody actually uses it.

>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks.

If you think it is worth always including the generic/ headers in all
arch specific headers, can you work on it?
Thanks.
  

Patch

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index b678b5c942..d4d7a5cfc8 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -371,6 +371,23 @@  rte_mov128(uint8_t *dst, const uint8_t *src)
 	rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
 }
 
+/**
+ * Copy 256 bytes from one location to another,
+ * locations should not overlap.
+ */
+static __rte_always_inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
+	rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+	rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 * 32);
+	rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
+	rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 * 32);
+	rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 * 32);
+	rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 * 32);
+	rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 * 32);
+}
+
 /**
  * Copy 128-byte blocks from one location to another,
  * locations should not overlap.