diff mbox series

[1/4] examples/l3fwd: tune prefetch for better performance

Message ID 20210318102550.59265-2-ruifeng.wang@arm.com (mailing list archive)
State New
Delegated to: David Marchand
Headers show
Series l3fwd improvements | expand

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Ruifeng Wang March 18, 2021, 10:25 a.m. UTC
Packet header is prefetched before packet processing for better
memory access performance. As L2 header will be updated by l3fwd,
using of prefetch for store hint will set cache line to proper
status and reduce cache maintenance overhead.

With this change, 12.9% performance uplift was measured on N1SDP
platform with MLX5 NIC.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 examples/l3fwd/l3fwd_lpm_neon.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Jerin Jacob April 13, 2021, 6:50 p.m. UTC | #1
On Thu, Mar 18, 2021 at 3:56 PM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> Packet header is prefetched before packet processing for better
> memory access performance. As L2 header will be updated by l3fwd,
> using of prefetch for store hint will set cache line to proper
> status and reduce cache maintenance overhead.

The code does read the cache line too. Right?

>
> With this change, 12.9% performance uplift was measured on N1SDP
> platform with MLX5 NIC.
>
> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>


On the octeontx2 platform, It is 2% regression.

Looks like micro architecture-specific item of handing write hint on the memory
the area that does read and write.


I am testing the LPM lookup miss case.

My test command:
./build/examples/dpdk-l3fwd  -c 0x0100  -- -p 0x1 --config="(0,0,8)" -P



> ---
>  examples/l3fwd/l3fwd_lpm_neon.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h
> index d6c0ba64a..ae8840694 100644
> --- a/examples/l3fwd/l3fwd_lpm_neon.h
> +++ b/examples/l3fwd/l3fwd_lpm_neon.h
> @@ -97,13 +97,13 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst,
>
>         if (k) {
>                 for (i = 0; i < FWDSTEP; i++) {
> -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
> +                       rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[i],
>                                                 struct rte_ether_hdr *) + 1);
>                 }
>
>                 for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
>                         for (i = 0; i < FWDSTEP; i++) {
> -                               rte_prefetch0(rte_pktmbuf_mtod(
> +                               rte_prefetch0_write(rte_pktmbuf_mtod(
>                                                 pkts_burst[j + i + FWDSTEP],
>                                                 struct rte_ether_hdr *) + 1);
>                         }
> @@ -124,17 +124,17 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst,
>                 /* Prefetch last up to 3 packets one by one */
>                 switch (m) {
>                 case 3:
> -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
> +                       rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
>                                                 struct rte_ether_hdr *) + 1);
>                         j++;
>                         /* fallthrough */
>                 case 2:
> -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
> +                       rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
>                                                 struct rte_ether_hdr *) + 1);
>                         j++;
>                         /* fallthrough */
>                 case 1:
> -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
> +                       rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
>                                                 struct rte_ether_hdr *) + 1);
>                         j++;
>                 }
> --
> 2.25.1
>
Honnappa Nagarahalli April 13, 2021, 8 p.m. UTC | #2
<snip>

> On Thu, Mar 18, 2021 at 3:56 PM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> >
> > Packet header is prefetched before packet processing for better memory
> > access performance. As L2 header will be updated by l3fwd, using of
> > prefetch for store hint will set cache line to proper status and
> > reduce cache maintenance overhead.
> 
> The code does read the cache line too. Right?
> 
> >
> > With this change, 12.9% performance uplift was measured on N1SDP
> > platform with MLX5 NIC.
> >
> > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> 
> On the octeontx2 platform, It is 2% regression.
Thanks Jerin for testing.
It would be good to know the results from others with A72 platforms.

> 
> Looks like micro architecture-specific item of handing write hint on the memory
> the area that does read and write.
> 
> 
> I am testing the LPM lookup miss case.
> 
> My test command:
> ./build/examples/dpdk-l3fwd  -c 0x0100  -- -p 0x1 --config="(0,0,8)" -P
> 
> 
> 
> > ---
> >  examples/l3fwd/l3fwd_lpm_neon.h | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_lpm_neon.h
> > b/examples/l3fwd/l3fwd_lpm_neon.h index d6c0ba64a..ae8840694 100644
> > --- a/examples/l3fwd/l3fwd_lpm_neon.h
> > +++ b/examples/l3fwd/l3fwd_lpm_neon.h
> > @@ -97,13 +97,13 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf
> > **pkts_burst,
> >
> >         if (k) {
> >                 for (i = 0; i < FWDSTEP; i++) {
> > -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
> > +
> > + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[i],
> >                                                 struct rte_ether_hdr *) + 1);
> >                 }
> >
> >                 for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
> >                         for (i = 0; i < FWDSTEP; i++) {
> > -                               rte_prefetch0(rte_pktmbuf_mtod(
> > +                               rte_prefetch0_write(rte_pktmbuf_mtod(
> >                                                 pkts_burst[j + i + FWDSTEP],
> >                                                 struct rte_ether_hdr *) + 1);
> >                         }
> > @@ -124,17 +124,17 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf
> **pkts_burst,
> >                 /* Prefetch last up to 3 packets one by one */
> >                 switch (m) {
> >                 case 3:
> > -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
> > +
> > + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
> >                                                 struct rte_ether_hdr *) + 1);
> >                         j++;
> >                         /* fallthrough */
> >                 case 2:
> > -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
> > +
> > + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
> >                                                 struct rte_ether_hdr *) + 1);
> >                         j++;
> >                         /* fallthrough */
> >                 case 1:
> > -                       rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
> > +
> > + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
> >                                                 struct rte_ether_hdr *) + 1);
> >                         j++;
> >                 }
> > --
> > 2.25.1
> >
diff mbox series

Patch

diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h
index d6c0ba64a..ae8840694 100644
--- a/examples/l3fwd/l3fwd_lpm_neon.h
+++ b/examples/l3fwd/l3fwd_lpm_neon.h
@@ -97,13 +97,13 @@  l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst,
 
 	if (k) {
 		for (i = 0; i < FWDSTEP; i++) {
-			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
+			rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[i],
 						struct rte_ether_hdr *) + 1);
 		}
 
 		for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
 			for (i = 0; i < FWDSTEP; i++) {
-				rte_prefetch0(rte_pktmbuf_mtod(
+				rte_prefetch0_write(rte_pktmbuf_mtod(
 						pkts_burst[j + i + FWDSTEP],
 						struct rte_ether_hdr *) + 1);
 			}
@@ -124,17 +124,17 @@  l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst,
 		/* Prefetch last up to 3 packets one by one */
 		switch (m) {
 		case 3:
-			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
+			rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
 						struct rte_ether_hdr *) + 1);
 			j++;
 			/* fallthrough */
 		case 2:
-			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
+			rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
 						struct rte_ether_hdr *) + 1);
 			j++;
 			/* fallthrough */
 		case 1:
-			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
+			rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j],
 						struct rte_ether_hdr *) + 1);
 			j++;
 		}