eal: add new prefetch0_write variant

Message ID 20200911091919.62167-1-harry.van.haaren@intel.com (mailing list archive)
State Superseded, archived
Headers
Series eal: add new prefetch0_write variant |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing fail Testing issues
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Van Haaren, Harry Sept. 11, 2020, 9:19 a.m. UTC
  This commit adds a new rte_prefetch0_write() variant, suggests to the
compiler to use a prefetch instruction with intention to write. As a
compiler builtin, the compiler can choose based on compilation target
what the best implementation for this instruction is.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

The integer constants passed to the builtin are not available as
a #define value, and doing #defines just for this write variant
does not seems a nice solution to me... particularly for those using
IDEs where any #define value is auto-hinted for code-completion.

---
 lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
  

Comments

Pavan Nikhilesh Bhagavatula Sept. 13, 2020, 8:11 p.m. UTC | #1
>This commit adds a new rte_prefetch0_write() variant, suggests to the
>compiler to use a prefetch instruction with intention to write. As a
>compiler builtin, the compiler can choose based on compilation target
>what the best implementation for this instruction is.	

Why not have the other variants too i.e. l2/l3/temporal store prefetches too?


>
>Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>
>---
>
>The integer constants passed to the builtin are not available as
>a #define value, and doing #defines just for this write variant
>does not seems a nice solution to me... particularly for those using
>IDEs where any #define value is auto-hinted for code-completion.
>
>---
> lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
>diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
>b/lib/librte_eal/include/generic/rte_prefetch.h
>index 6e47bdfbad..44e2e9abfc 100644
>--- a/lib/librte_eal/include/generic/rte_prefetch.h
>+++ b/lib/librte_eal/include/generic/rte_prefetch.h
>@@ -51,4 +51,20 @@ static inline void rte_prefetch2(const volatile
>void *p);
>  */
> static inline void rte_prefetch_non_temporal(const volatile void *p);
>
>+/**
>+ * Prefetch a cache line into all cache levels, with intention to write.
>This
>+ * prefetch variant hints to the CPU that the program is expecting to
>write to
>+ * the cache line being prefetched.
>+ *
>+ * @param p Address to prefetch
>+ */
>+static inline void rte_prefetch0_write(const void *p)
>+{
>+	/* 1 indicates intention to write, 3 sets target cache level to L1.
>See
>+	 * GCC docs where these integer constants are described in
>more detail:
>+	 *  https://urldefense.proofpoint.com/v2/url?u=https-
>3A__gcc.gnu.org_onlinedocs_gcc_Other-
>2DBuiltins.html&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1cjuAHrG
>h745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=OMPc_t21vsWlzVl0UjdGB
>hTTv1Gngqfk8vth9UQtUSA&s=51jgfiV2UC5B3xxBE-
>4gRAte3lCZP3v370U7Fpn6LaE&e=
>+	 */
>+	__builtin_prefetch(p, 1, 3);
>+}
>+
> #endif /* _RTE_PREFETCH_H_ */
>--
>2.17.1
  
Van Haaren, Harry Sept. 14, 2020, 8:12 a.m. UTC | #2
> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Sunday, September 13, 2020 9:11 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] eal: add new prefetch0_write variant
> 
> >This commit adds a new rte_prefetch0_write() variant, suggests to the
> >compiler to use a prefetch instruction with intention to write. As a
> >compiler builtin, the compiler can choose based on compilation target
> >what the best implementation for this instruction is.
> 
> Why not have the other variants too i.e. l2/l3/temporal store prefetches too?

Hi Pavan,

Are there architectures that actually implement those? Usually for a WB mem store to complete,
the data must be present in L1 cache (on x86 at least), and that's what the patch below with write0 achieves.

I'm against adding all the variants "just in case", it leads to API bloat, and increases
cognitive load on the programmer. My expectation is that in 99% of usage the prefetch
write instruction should target L1.

Cheers, -Harry

> >Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> >
> >---
> >
> >The integer constants passed to the builtin are not available as
> >a #define value, and doing #defines just for this write variant
> >does not seems a nice solution to me... particularly for those using
> >IDEs where any #define value is auto-hinted for code-completion.
> >
> >---
> > lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)

<snip patch contents>
  
Pavan Nikhilesh Bhagavatula Sept. 14, 2020, 10:39 a.m. UTC | #3
>> >This commit adds a new rte_prefetch0_write() variant, suggests to
>the
>> >compiler to use a prefetch instruction with intention to write. As a
>> >compiler builtin, the compiler can choose based on compilation
>target
>> >what the best implementation for this instruction is.
>>
>> Why not have the other variants too i.e. l2/l3/temporal store
>prefetches too?
>
>Hi Pavan,
>
Hi Harry,
(LTNS)

>Are there architectures that actually implement those? Usually for a WB
>mem store to complete,
>the data must be present in L1 cache (on x86 at least), and that's what
>the patch below with write0 achieves.

ARM64 does supports all modes of store prefetch
"
<type> is one of:
PLD Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.
PLI Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
<target> is one of:
L1 Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.
L2 Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
L3 Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
<policy> is one of:
KEEP Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>"
field as 0.
STRM Streaming or non-temporal prefetch, for data that is used only once. Encoded in the
"Rt<0>" field as 1.
For more information on these prefetch
"

>
>I'm against adding all the variants "just in case", it leads to API bloat,
>and increases
>cognitive load on the programmer. My expectation is that in 99% of
>usage the prefetch
>write instruction should target L1.
>

There is a use case when cache mode is write through and application is 
pipelining work across cores sharing same L2 cluster.

>Cheers, -Harry

Regards,
Pavan.

>
>> >Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>> >
>> >---
>> >
>> >The integer constants passed to the builtin are not available as
>> >a #define value, and doing #defines just for this write variant
>> >does not seems a nice solution to me... particularly for those using
>> >IDEs where any #define value is auto-hinted for code-completion.
>> >
>> >---
>> > lib/librte_eal/include/generic/rte_prefetch.h | 16
>++++++++++++++++
>> > 1 file changed, 16 insertions(+)
>
><snip patch contents>
  
Van Haaren, Harry Sept. 14, 2020, 3:10 p.m. UTC | #4
> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Monday, September 14, 2020 11:39 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] eal: add new prefetch0_write variant
> 
> >> >This commit adds a new rte_prefetch0_write() variant, suggests to
> >the
> >> >compiler to use a prefetch instruction with intention to write. As a
> >> >compiler builtin, the compiler can choose based on compilation
> >target
> >> >what the best implementation for this instruction is.
> >>
> >> Why not have the other variants too i.e. l2/l3/temporal store
> >prefetches too?
> >
> >Hi Pavan,
> >
> Hi Harry,
> (LTNS)
> 
> >Are there architectures that actually implement those? Usually for a WB
> >mem store to complete,
> >the data must be present in L1 cache (on x86 at least), and that's what
> >the patch below with write0 achieves.
> 
> ARM64 does supports all modes of store prefetch
> "
> <type> is one of:
> PLD Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.
> PLI Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
> PST Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
> <target> is one of:
> L1 Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.
> L2 Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
> L3 Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
> <policy> is one of:
> KEEP Retained or temporal prefetch, allocated in the cache normally. Encoded in
> the "Rt<0>"
> field as 0.
> STRM Streaming or non-temporal prefetch, for data that is used only once. Encoded
> in the
> "Rt<0>" field as 1.
> For more information on these prefetch
> "
> 
> >
> >I'm against adding all the variants "just in case", it leads to API bloat,
> >and increases
> >cognitive load on the programmer. My expectation is that in 99% of
> >usage the prefetch
> >write instruction should target L1.
> >
> 
> There is a use case when cache mode is write through and application is
> pipelining work across cores sharing same L2 cluster.

OK - v2 sent: http://patches.dpdk.org/patch/77632/

APIs matching the existing prefetch APIs:
rte_prefetch0_write() L1 and all below
rte_prefetch1_write() L2 and all below
rte_prefetch2_write() L3

Cheers, -Harry
  

Patch

diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdfbad..44e2e9abfc 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,20 @@  static inline void rte_prefetch2(const volatile void *p);
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Prefetch a cache line into all cache levels, with intention to write. This
+ * prefetch variant hints to the CPU that the program is expecting to write to
+ * the cache line being prefetched.
+ *
+ * @param p Address to prefetch
+ */
+static inline void rte_prefetch0_write(const void *p)
+{
+	/* 1 indicates intention to write, 3 sets target cache level to L1. See
+	 * GCC docs where these integer constants are described in more detail:
+	 *  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
+	 */
+	__builtin_prefetch(p, 1, 3);
+}
+
 #endif /* _RTE_PREFETCH_H_ */