From patchwork Thu Oct 15 23:20:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omkar Maslekar X-Patchwork-Id: 81030 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 860E5A04DB; Fri, 16 Oct 2020 08:23:50 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BF6F41EAA2; Fri, 16 Oct 2020 08:23:32 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id C569A1EA8D for ; Fri, 16 Oct 2020 08:23:29 +0200 (CEST) IronPort-SDR: H7HfjYuzAmvzMKFKruOZkMspisMwnn3U/FYzB3KT6CbUAw97OKIcjeKKR3VBOI3Rwv9IJafmoF XZ7pr7ZhSn7w== X-IronPort-AV: E=McAfee;i="6000,8403,9775"; a="153461784" X-IronPort-AV: E=Sophos;i="5.77,381,1596524400"; d="scan'208";a="153461784" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 23:23:28 -0700 IronPort-SDR: EEW42okbeQK4uiEMPi8G0oLi1L3LtVfsKNa9ZbFeQ1tUHtgrVUy+aVlxegpP1sVOY25yGLRDrV LHwWDaMuBjQA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,381,1596524400"; d="scan'208";a="319332664" Received: from unknown (HELO localhost.ch.intel.com) ([143.182.137.102]) by orsmga006.jf.intel.com with ESMTP; 15 Oct 2020 23:23:27 -0700 From: Omkar Maslekar To: dev@dpdk.org Cc: bruce.richardson@intel.com, ciara.loftus@intel.com, omkar.maslekar@intel.com, drc@linux.vnet.ibm.com, jerinj@marvell.com, ruifeng.wang@arm.com, honnappa.nagarahalli@arm.com Date: Thu, 15 Oct 2020 16:20:03 -0700 Message-Id: <1602804003-9417-2-git-send-email-omkar.maslekar@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1602804003-9417-1-git-send-email-omkar.maslekar@intel.com> References: <1599700614-22809-1-git-send-email-omkar.maslekar@intel.com> <1602804003-9417-1-git-send-email-omkar.maslekar@intel.com> Subject: [dpdk-dev] [PATCH v9] eal: add cache-line demote support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr) enables software to hint to hardware that line is likely to be shared. Useful in core-to-core communications where cache-line is likely to be shared. ARM and PPC implementation is provided with NOP and can be added if any equivalent instructions could be used for implementation on those architectures. Signed-off-by: Omkar Maslekar Acked-by: Bruce Richardson Acked-by: David Christensen Acked-by: Jerin Jacob Reviewed-by: Ruifeng Wang --- v9: added experimental tag in arch specific files v8: removed unnecessary comment in test_prefetch.h removed header file rte_compat.h from specific arch rearranged sequence in the release notes fixed coding style in test_prefetch.h and grammar issue in documentation added tag Reviewed-by: Ruifeng Wang v7: fixed experimental tag v6: marked rte_cldemote as experimental added rte_cldemote call in existing app/test_prefetch.c v5: documentation updated fixed formatting issue in release notes added Acked-by: Bruce Richardson * v4: updated bold text for title and fixed margin in release notes * v3: fixed warning regarding whitespace * v2: documentation updated --- --- app/test/test_prefetch.c | 2 ++ doc/guides/rel_notes/release_20_11.rst | 8 ++++++++ lib/librte_eal/arm/include/rte_prefetch_32.h | 7 +++++++ lib/librte_eal/arm/include/rte_prefetch_64.h | 7 +++++++ lib/librte_eal/include/generic/rte_prefetch.h | 18 ++++++++++++++++++ lib/librte_eal/ppc/include/rte_prefetch.h | 7 +++++++ lib/librte_eal/x86/include/rte_prefetch.h | 11 +++++++++++ 7 files changed, 60 insertions(+) diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c index 32e08f8..5489885 100644 --- a/app/test/test_prefetch.c +++ b/app/test/test_prefetch.c @@ -30,6 +30,8 @@ rte_prefetch1_write(&a); rte_prefetch2_write(&a); + rte_cldemote(&a); + return 0; } diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index cda5b2f..7095727 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -68,6 +68,14 @@ New Features which allow the programmer to prefetch a cache line and also indicate the intention to write. +* **Added new function rte_cldemote in rte_prefetch.h.** + + Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse. + CLDEMOTE moves the cache line to the more remote cache, where it expects + sharing to be efficient. Moving the cache line to a level more distant from + the processor helps to accelerate core-to-core communication.This is X86 + specific implementation. + * **Updated CRC modules of the net library.** * Added runtime selection of the optimal architecture-specific CRC path. diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h index e53420a..303caaa 100644 --- a/lib/librte_eal/arm/include/rte_prefetch_32.h +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h @@ -33,6 +33,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) rte_prefetch0(p); } +__rte_experimental +static inline void +rte_cldemote(const volatile void *p) +{ + RTE_SET_USED(p); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h index fc2b391..e28b66f 100644 --- a/lib/librte_eal/arm/include/rte_prefetch_64.h +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h @@ -32,6 +32,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p)); } +__rte_experimental +static inline void +rte_cldemote(const volatile void *p) +{ + RTE_SET_USED(p); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h index df9764e..f9fab5e 100644 --- a/lib/librte_eal/include/generic/rte_prefetch.h +++ b/lib/librte_eal/include/generic/rte_prefetch.h @@ -116,4 +116,22 @@ __builtin_prefetch(p, 1, 1); } +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Demote a cache line to a more distant level of cache from the processor. + * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to + * the processor to a level more distant from the processor. It is a hint and + * not guaranteed. rte_cldemote is intended to move the cache line to the more + * remote cache, where it expects sharing to be efficient and to indicate that + * a line may be accessed by a different core in the future. + * + * @param p + * Address to demote + */ +__rte_experimental +static inline void +rte_cldemote(const volatile void *p); + #endif /* _RTE_PREFETCH_H_ */ diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h index 9ba07c8..6df8087 100644 --- a/lib/librte_eal/ppc/include/rte_prefetch.h +++ b/lib/librte_eal/ppc/include/rte_prefetch.h @@ -34,6 +34,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) rte_prefetch0(p); } +__rte_experimental +static inline void +rte_cldemote(const volatile void *p) +{ + RTE_SET_USED(p); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h index 384c6b3..05d49fc 100644 --- a/lib/librte_eal/x86/include/rte_prefetch.h +++ b/lib/librte_eal/x86/include/rte_prefetch.h @@ -32,6 +32,17 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p)); } +/* + * we use raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ +__rte_experimental +static inline void +rte_cldemote(const volatile void *p) +{ + asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); +} + #ifdef __cplusplus } #endif