[v4] eal: non-temporal memcpy

  This patch provides a function for memory copy using non-temporal store,
load or both, controlled by flags passed to the function.

Applications sometimes copy data to another memory location, which is only
used much later.
In this case, it is inefficient to pollute the data cache with the copied
data.

An example use case (originating from a real life application):
Copying filtered packets, or the first part of them, into a capture buffer
for offline analysis.

The purpose of the function is to achieve a performance gain by not
polluting the cache when copying data.
Although the throughput can be improved by further optimization, I do not
have time to do it now.

The functional tests and performance tests for memory copy have been
expanded to include non-temporal copying.

A non-temporal version of the mbuf library's function to create a full
copy of a given packet mbuf is provided.

The packet capture and packet dump libraries have been updated to use
non-temporal memory copy of the packets.

Implementation notes:

Implementations for non-x86 architectures can be provided by anyone at a
later time. I am not going to do it.

x86 non-temporal load instructions must be 16 byte aligned [1], and
non-temporal store instructions must be 4, 8 or 16 byte aligned [2].

ARM non-temporal load and store instructions seem to require 4 byte
alignment [3].

[1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
index.html#text=_mm_stream_load
[2] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
index.html#text=_mm_stream_si
[3] https://developer.arm.com/documentation/100076/0100/
A64-Instruction-Set-Reference/A64-Floating-point-Instructions/
LDNP--SIMD-and-FP-

This patch is a major rewrite from the RFC v3, so no version log comparing
to the RFC is provided.

v4
* Also ignore the warning for clang int the workaround for
  _mm_stream_load_si128() missing const in the parameter.
* Add missing C linkage specifier in rte_memcpy.h.

v3
* _mm_stream_si64() is not supported on 32-bit x86 architecture, so only
  use it on 64-bit x86 architecture.
* CLANG warns that _mm_stream_load_si128_const() and
  rte_memcpy_nt_15_or_less_s16a() are not public,
  so remove __rte_internal from them. It also affects the documentation
  for the functions, so the fix can't be limited to CLANG.
* Use __rte_experimental instead of __rte_internal.
* Replace <n> with nnn in function documentation; it doesn't look like
  HTML.
* Slightly modify the workaround for _mm_stream_load_si128() missing const
  in the parameter; the ancient GCC 4.5.8 in RHEL7 doesn't understand
  #pragma GCC diagnostic ignored "-Wdiscarded-qualifiers", so use
  #pragma GCC diagnostic ignored "-Wcast-qual" instead. I hope that works.
* Fixed one coding style issue missed in v2.

v2
* The last 16 byte block of data, incl. any trailing bytes, were not
  copied from the source memory area in rte_memcpy_nt_buf().
* Fix many coding style issues.
* Add some missing header files.
* Fix build time warning for non-x86 architectures by using a different
  method to mark the flags parameter unused.
* CLANG doesn't understand RTE_BUILD_BUG_ON(!__builtin_constant_p(flags)),
  so omit it when using CLANG.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_memcpy.c               |   65 +-
 app/test/test_memcpy_perf.c          |  187 ++--
 lib/eal/include/generic/rte_memcpy.h |  127 +++
 lib/eal/x86/include/rte_memcpy.h     | 1238 ++++++++++++++++++++++++++
 lib/mbuf/rte_mbuf.c                  |   77 ++
 lib/mbuf/rte_mbuf.h                  |   32 +
 lib/mbuf/version.map                 |    1 +
 lib/pcapng/rte_pcapng.c              |    3 +-
 lib/pdump/rte_pdump.c                |    6 +-
 9 files changed, 1645 insertions(+), 91 deletions(-)

Message ID	20221010064600.16495-1-mb@smartsharesystems.com (mailing list archive)
State	Changes Requested, archived
Delegated to:	Thomas Monjalon
Headers	Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1B5BCA0544; Mon, 10 Oct 2022 08:46:05 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AAE924021E; Mon, 10 Oct 2022 08:46:04 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 01EEC40146 for <dev@dpdk.org>; Mon, 10 Oct 2022 08:46:03 +0200 (CEST) Received: from dkrd2.smartsharesys.local ([192.168.4.12]) by smartserver.smartsharesystems.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 10 Oct 2022 08:46:02 +0200 From: =?utf-8?q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com> To: hofors@lysator.liu.se, bruce.richardson@intel.com, konstantin.v.ananyev@yandex.ru, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org Cc: mattias.ronnblom@ericsson.com, kda@semihalf.com, drc@linux.vnet.ibm.com, dev@dpdk.org, =?utf-8?q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com> Subject: [PATCH v4] eal: non-temporal memcpy Date: Mon, 10 Oct 2022 08:46:00 +0200 Message-Id: <20221010064600.16495-1-mb@smartsharesystems.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8728A@smartserver.smartshare.dk> References: <98CBD80474FA8B44BF855DF32C47DC35D8728A@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 10 Oct 2022 06:46:02.0697 (UTC) FILETIME=[F6C64B90:01D8DC73] X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org
Series	[v4] eal: non-temporal memcpy \| [v4] eal: non-temporal memcpy

Context	Check	Description
ci/checkpatch	warning	coding style issues
ci/iol-mellanox-Performance	success	Performance Testing PASS
ci/Intel-compilation	success	Compilation OK
ci/iol-intel-Performance	success	Performance Testing PASS
ci/github-robot: build	fail	github build: failed
ci/iol-intel-Functional	success	Functional Testing PASS
ci/iol-aarch64-unit-testing	success	Testing PASS
ci/iol-x86_64-compile-testing	success	Testing PASS
ci/iol-x86_64-unit-testing	success	Testing PASS
ci/iol-aarch64-compile-testing	success	Testing PASS
ci/intel-Testing	success	Testing PASS

[v4] eal: non-temporal memcpy

Checks

Commit Message

Comments

Patch