eal: add new prefetch0_write variant
Checks
Commit Message
This commit adds a new rte_prefetch0_write() variant, suggests to the
compiler to use a prefetch instruction with intention to write. As a
compiler builtin, the compiler can choose based on compilation target
what the best implementation for this instruction is.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
The integer constants passed to the builtin are not available as
a #define value, and doing #defines just for this write variant
does not seems a nice solution to me... particularly for those using
IDEs where any #define value is auto-hinted for code-completion.
---
lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Comments
>This commit adds a new rte_prefetch0_write() variant, suggests to the
>compiler to use a prefetch instruction with intention to write. As a
>compiler builtin, the compiler can choose based on compilation target
>what the best implementation for this instruction is.
Why not have the other variants too i.e. l2/l3/temporal store prefetches too?
>
>Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>
>---
>
>The integer constants passed to the builtin are not available as
>a #define value, and doing #defines just for this write variant
>does not seems a nice solution to me... particularly for those using
>IDEs where any #define value is auto-hinted for code-completion.
>
>---
> lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
>diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
>b/lib/librte_eal/include/generic/rte_prefetch.h
>index 6e47bdfbad..44e2e9abfc 100644
>--- a/lib/librte_eal/include/generic/rte_prefetch.h
>+++ b/lib/librte_eal/include/generic/rte_prefetch.h
>@@ -51,4 +51,20 @@ static inline void rte_prefetch2(const volatile
>void *p);
> */
> static inline void rte_prefetch_non_temporal(const volatile void *p);
>
>+/**
>+ * Prefetch a cache line into all cache levels, with intention to write.
>This
>+ * prefetch variant hints to the CPU that the program is expecting to
>write to
>+ * the cache line being prefetched.
>+ *
>+ * @param p Address to prefetch
>+ */
>+static inline void rte_prefetch0_write(const void *p)
>+{
>+ /* 1 indicates intention to write, 3 sets target cache level to L1.
>See
>+ * GCC docs where these integer constants are described in
>more detail:
>+ * https://urldefense.proofpoint.com/v2/url?u=https-
>3A__gcc.gnu.org_onlinedocs_gcc_Other-
>2DBuiltins.html&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1cjuAHrG
>h745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=OMPc_t21vsWlzVl0UjdGB
>hTTv1Gngqfk8vth9UQtUSA&s=51jgfiV2UC5B3xxBE-
>4gRAte3lCZP3v370U7Fpn6LaE&e=
>+ */
>+ __builtin_prefetch(p, 1, 3);
>+}
>+
> #endif /* _RTE_PREFETCH_H_ */
>--
>2.17.1
> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Sunday, September 13, 2020 9:11 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] eal: add new prefetch0_write variant
>
> >This commit adds a new rte_prefetch0_write() variant, suggests to the
> >compiler to use a prefetch instruction with intention to write. As a
> >compiler builtin, the compiler can choose based on compilation target
> >what the best implementation for this instruction is.
>
> Why not have the other variants too i.e. l2/l3/temporal store prefetches too?
Hi Pavan,
Are there architectures that actually implement those? Usually for a WB mem store to complete,
the data must be present in L1 cache (on x86 at least), and that's what the patch below with write0 achieves.
I'm against adding all the variants "just in case", it leads to API bloat, and increases
cognitive load on the programmer. My expectation is that in 99% of usage the prefetch
write instruction should target L1.
Cheers, -Harry
> >Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> >
> >---
> >
> >The integer constants passed to the builtin are not available as
> >a #define value, and doing #defines just for this write variant
> >does not seems a nice solution to me... particularly for those using
> >IDEs where any #define value is auto-hinted for code-completion.
> >
> >---
> > lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
<snip patch contents>
>> >This commit adds a new rte_prefetch0_write() variant, suggests to
>the
>> >compiler to use a prefetch instruction with intention to write. As a
>> >compiler builtin, the compiler can choose based on compilation
>target
>> >what the best implementation for this instruction is.
>>
>> Why not have the other variants too i.e. l2/l3/temporal store
>prefetches too?
>
>Hi Pavan,
>
Hi Harry,
(LTNS)
>Are there architectures that actually implement those? Usually for a WB
>mem store to complete,
>the data must be present in L1 cache (on x86 at least), and that's what
>the patch below with write0 achieves.
ARM64 does supports all modes of store prefetch
"
<type> is one of:
PLD Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.
PLI Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
<target> is one of:
L1 Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.
L2 Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
L3 Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
<policy> is one of:
KEEP Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>"
field as 0.
STRM Streaming or non-temporal prefetch, for data that is used only once. Encoded in the
"Rt<0>" field as 1.
For more information on these prefetch
"
>
>I'm against adding all the variants "just in case", it leads to API bloat,
>and increases
>cognitive load on the programmer. My expectation is that in 99% of
>usage the prefetch
>write instruction should target L1.
>
There is a use case when cache mode is write through and application is
pipelining work across cores sharing same L2 cluster.
>Cheers, -Harry
Regards,
Pavan.
>
>> >Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>> >
>> >---
>> >
>> >The integer constants passed to the builtin are not available as
>> >a #define value, and doing #defines just for this write variant
>> >does not seems a nice solution to me... particularly for those using
>> >IDEs where any #define value is auto-hinted for code-completion.
>> >
>> >---
>> > lib/librte_eal/include/generic/rte_prefetch.h | 16
>++++++++++++++++
>> > 1 file changed, 16 insertions(+)
>
><snip patch contents>
> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Monday, September 14, 2020 11:39 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] eal: add new prefetch0_write variant
>
> >> >This commit adds a new rte_prefetch0_write() variant, suggests to
> >the
> >> >compiler to use a prefetch instruction with intention to write. As a
> >> >compiler builtin, the compiler can choose based on compilation
> >target
> >> >what the best implementation for this instruction is.
> >>
> >> Why not have the other variants too i.e. l2/l3/temporal store
> >prefetches too?
> >
> >Hi Pavan,
> >
> Hi Harry,
> (LTNS)
>
> >Are there architectures that actually implement those? Usually for a WB
> >mem store to complete,
> >the data must be present in L1 cache (on x86 at least), and that's what
> >the patch below with write0 achieves.
>
> ARM64 does supports all modes of store prefetch
> "
> <type> is one of:
> PLD Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.
> PLI Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
> PST Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
> <target> is one of:
> L1 Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.
> L2 Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
> L3 Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
> <policy> is one of:
> KEEP Retained or temporal prefetch, allocated in the cache normally. Encoded in
> the "Rt<0>"
> field as 0.
> STRM Streaming or non-temporal prefetch, for data that is used only once. Encoded
> in the
> "Rt<0>" field as 1.
> For more information on these prefetch
> "
>
> >
> >I'm against adding all the variants "just in case", it leads to API bloat,
> >and increases
> >cognitive load on the programmer. My expectation is that in 99% of
> >usage the prefetch
> >write instruction should target L1.
> >
>
> There is a use case when cache mode is write through and application is
> pipelining work across cores sharing same L2 cluster.
OK - v2 sent: http://patches.dpdk.org/patch/77632/
APIs matching the existing prefetch APIs:
rte_prefetch0_write() L1 and all below
rte_prefetch1_write() L2 and all below
rte_prefetch2_write() L3
Cheers, -Harry
@@ -51,4 +51,20 @@ static inline void rte_prefetch2(const volatile void *p);
*/
static inline void rte_prefetch_non_temporal(const volatile void *p);
+/**
+ * Prefetch a cache line into all cache levels, with intention to write. This
+ * prefetch variant hints to the CPU that the program is expecting to write to
+ * the cache line being prefetched.
+ *
+ * @param p Address to prefetch
+ */
+static inline void rte_prefetch0_write(const void *p)
+{
+ /* 1 indicates intention to write, 3 sets target cache level to L1. See
+ * GCC docs where these integer constants are described in more detail:
+ * https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
+ */
+ __builtin_prefetch(p, 1, 3);
+}
+
#endif /* _RTE_PREFETCH_H_ */