netvsc: optimize stats counters performance

Message ID 20240802144048.270152-1-mb@smartsharesystems.com (mailing list archive)
State Accepted
Delegated to: Ferruh Yigit
Headers
Series netvsc: optimize stats counters performance |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Morten Brørup Aug. 2, 2024, 2:40 p.m. UTC
Optimized the performance of updating the statistics counters by reducing
the number of branches.

Ordered the packet size comparisons according to the probability with
typical internet traffic mix.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 drivers/net/netvsc/hn_rxtx.c | 32 ++++++++++----------------------
 1 file changed, 10 insertions(+), 22 deletions(-)
  

Comments

Long Li Aug. 2, 2024, 4:48 p.m. UTC | #1
> Subject: [PATCH] netvsc: optimize stats counters performance
> 
> Optimized the performance of updating the statistics counters by reducing the
> number of branches.
> 
> Ordered the packet size comparisons according to the probability with typical
> internet traffic mix.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  drivers/net/netvsc/hn_rxtx.c | 32 ++++++++++----------------------
>  1 file changed, 10 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c index
> 9bf1ec5509..b704b2c971 100644
> --- a/drivers/net/netvsc/hn_rxtx.c
> +++ b/drivers/net/netvsc/hn_rxtx.c
> @@ -110,30 +110,18 @@ hn_update_packet_stats(struct hn_stats *stats,
> const struct rte_mbuf *m)
>  	uint32_t s = m->pkt_len;
>  	const struct rte_ether_addr *ea;
> 
> -	if (s == 64) {
> -		stats->size_bins[1]++;
> -	} else if (s > 64 && s < 1024) {
> -		uint32_t bin;
> -
> -		/* count zeros, and offset into correct bin */
> -		bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
> -		stats->size_bins[bin]++;
> -	} else {
> -		if (s < 64)
> -			stats->size_bins[0]++;
> -		else if (s < 1519)
> -			stats->size_bins[6]++;
> -		else
> -			stats->size_bins[7]++;
> -	}
> +	if (s >= 1024)
> +		stats->size_bins[6 + (s > 1518)]++;
> +	else if (s <= 64)
> +		stats->size_bins[s >> 6]++;
> +	else
> +		stats->size_bins[32UL - rte_clz32(s) - 5]++;

This part looks good.

> 
>  	ea = rte_pktmbuf_mtod(m, const struct rte_ether_addr *);
> -	if (rte_is_multicast_ether_addr(ea)) {
> -		if (rte_is_broadcast_ether_addr(ea))
> -			stats->broadcast++;
> -		else
> -			stats->multicast++;
> -	}
> +	RTE_BUILD_BUG_ON(offsetof(struct hn_stats, broadcast) !=
> +			offsetof(struct hn_stats, multicast) + sizeof(uint64_t));
> +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
>  }

This makes the code a little harder to read. How about just add "unlikely" to rte_is_multicast_ether_addr(ea) and keep the rest unchanged?

Thank you,

Long
  
Morten Brørup Aug. 2, 2024, 5:28 p.m. UTC | #2
> From: Long Li [mailto:longli@microsoft.com]
> Sent: Friday, 2 August 2024 18.49
> 
> > Subject: [PATCH] netvsc: optimize stats counters performance
> >
> > Optimized the performance of updating the statistics counters by
> reducing the
> > number of branches.
> >
> > Ordered the packet size comparisons according to the probability with
> typical
> > internet traffic mix.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  drivers/net/netvsc/hn_rxtx.c | 32 ++++++++++----------------------
> >  1 file changed, 10 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/netvsc/hn_rxtx.c
> b/drivers/net/netvsc/hn_rxtx.c index
> > 9bf1ec5509..b704b2c971 100644
> > --- a/drivers/net/netvsc/hn_rxtx.c
> > +++ b/drivers/net/netvsc/hn_rxtx.c
> > @@ -110,30 +110,18 @@ hn_update_packet_stats(struct hn_stats *stats,
> > const struct rte_mbuf *m)
> >  	uint32_t s = m->pkt_len;
> >  	const struct rte_ether_addr *ea;
> >
> > -	if (s == 64) {
> > -		stats->size_bins[1]++;
> > -	} else if (s > 64 && s < 1024) {
> > -		uint32_t bin;
> > -
> > -		/* count zeros, and offset into correct bin */
> > -		bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
> > -		stats->size_bins[bin]++;
> > -	} else {
> > -		if (s < 64)
> > -			stats->size_bins[0]++;
> > -		else if (s < 1519)
> > -			stats->size_bins[6]++;
> > -		else
> > -			stats->size_bins[7]++;
> > -	}
> > +	if (s >= 1024)
> > +		stats->size_bins[6 + (s > 1518)]++;
> > +	else if (s <= 64)
> > +		stats->size_bins[s >> 6]++;
> > +	else
> > +		stats->size_bins[32UL - rte_clz32(s) - 5]++;
> 
> This part looks good.
> 
> >
> >  	ea = rte_pktmbuf_mtod(m, const struct rte_ether_addr *);
> > -	if (rte_is_multicast_ether_addr(ea)) {
> > -		if (rte_is_broadcast_ether_addr(ea))
> > -			stats->broadcast++;
> > -		else
> > -			stats->multicast++;
> > -	}
> > +	RTE_BUILD_BUG_ON(offsetof(struct hn_stats, broadcast) !=
> > +			offsetof(struct hn_stats, multicast) +
> sizeof(uint64_t));
> > +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> > +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
> >  }
> 
> This makes the code a little harder to read.

I agree it is somewhat convoluted.
It's a tradeoff... I preferred performance at the cost of making the code somewhat harder to read.
The RTE_BUILD_BUG_ON() also helps showing what is going on with the weird indexing.

> How about just add
> "unlikely" to rte_is_multicast_ether_addr(ea) and keep the rest
> unchanged?

Then I could also add "unlikely" to is_broadcast().
In theory, there should be minimal broadcast traffic in any normal network. (Except when experiencing broadcast storms due to network loops or other network problems.)
But in reality, I think most real life networks see less multicast (non-broadcast) than broadcast traffic.

I don' think the following alternative makes the code significantly more readable than the method in this patch, but I'll mention it for the sake of discussion:

I could modify the hn_stats type like this:

struct hn_stats {
other fields...
+	union {
+		struct {
			uint64_t	multicast;
			uint64_t	broadcast;
+		};
+		uint64_t	multicast_broadcast[2];
+	};
other fields...
};

So the code would become:

+	if (unlikely(rte_is_multicast_ether_addr(ea)))
+		 stats->multicast_broadcast[rte_is_broadcast_ether_addr(ea)]++;

Whatever we decide, we should use the same method in all three patches (netvsc, virtio, vhost-user).
The choices are:
1. Stick with the original code (with two branches), and add unlikely().
2. Use the method provided in this patch (with only one branch).
3. Introduce a multicast_broadcast[2] to the stats structure as described above (also with only one branch).

@Long, @Wei, @Maxime, @Chenbo, @Stephen and anyone else interested, please cast your votes.
Comments are also welcome! :-)

I'm in favor of #2.
  
Stephen Hemminger Aug. 2, 2024, 5:33 p.m. UTC | #3
On Fri, 2 Aug 2024 19:28:26 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > >  	ea = rte_pktmbuf_mtod(m, const struct rte_ether_addr *);
> > > -	if (rte_is_multicast_ether_addr(ea)) {
> > > -		if (rte_is_broadcast_ether_addr(ea))
> > > -			stats->broadcast++;
> > > -		else
> > > -			stats->multicast++;
> > > -	}
> > > +	RTE_BUILD_BUG_ON(offsetof(struct hn_stats, broadcast) !=
> > > +			offsetof(struct hn_stats, multicast) +  
> > sizeof(uint64_t));  
> > > +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> > > +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
> > >  }  
> > 
> > This makes the code a little harder to read.  
> 
> I agree it is somewhat convoluted.
> It's a tradeoff... I preferred performance at the cost of making the code somewhat harder to read.
> The RTE_BUILD_BUG_ON() also helps showing what is going on with the weird indexing.

Optimizing for multicast packets is not worth bothering.
Keep the original code it is simpler.
  
Morten Brørup Sept. 19, 2024, 1:51 p.m. UTC | #4
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, 2 August 2024 19.33
> 
> On Fri, 2 Aug 2024 19:28:26 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > >  	ea = rte_pktmbuf_mtod(m, const struct rte_ether_addr *);
> > > > -	if (rte_is_multicast_ether_addr(ea)) {
> > > > -		if (rte_is_broadcast_ether_addr(ea))
> > > > -			stats->broadcast++;
> > > > -		else
> > > > -			stats->multicast++;
> > > > -	}
> > > > +	RTE_BUILD_BUG_ON(offsetof(struct hn_stats, broadcast) !=
> > > > +			offsetof(struct hn_stats, multicast) +
> > > sizeof(uint64_t));
> > > > +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> > > > +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
> > > >  }
> > >
> > > This makes the code a little harder to read.
> >
> > I agree it is somewhat convoluted.
> > It's a tradeoff... I preferred performance at the cost of making the code
> somewhat harder to read.
> > The RTE_BUILD_BUG_ON() also helps showing what is going on with the weird
> indexing.

Similar patches have been accepted by other drivers:
[virtio]: https://patchwork.dpdk.org/project/dpdk/patch/20240801160312.205281-1-mb@smartsharesystems.com/
[vhost-user]: https://patchwork.dpdk.org/project/dpdk/patch/20240802143259.269827-1-mb@smartsharesystems.com/

> 
> Optimizing for multicast packets is not worth bothering.

Optimizing for multicast/broadcast comes into play in multicast environments, and during network broadcast storms.
Although I don't know if any of those two scenarios are relevant for this specific driver.

> Keep the original code it is simpler.

Let's keep similar code similar across drivers.

@Long, @Wei, please Review/Ack, so the patch can be applied.
  
Long Li Sept. 19, 2024, 6:06 p.m. UTC | #5
> Subject: RE: [PATCH] netvsc: optimize stats counters performance
> 
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Friday, 2 August 2024 19.33
> >
> > On Fri, 2 Aug 2024 19:28:26 +0200
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > > >  	ea = rte_pktmbuf_mtod(m, const struct rte_ether_addr *);
> > > > > -	if (rte_is_multicast_ether_addr(ea)) {
> > > > > -		if (rte_is_broadcast_ether_addr(ea))
> > > > > -			stats->broadcast++;
> > > > > -		else
> > > > > -			stats->multicast++;
> > > > > -	}
> > > > > +	RTE_BUILD_BUG_ON(offsetof(struct hn_stats, broadcast) !=
> > > > > +			offsetof(struct hn_stats, multicast) +
> > > > sizeof(uint64_t));
> > > > > +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> > > > > +		(&stats-
> >multicast)[rte_is_broadcast_ether_addr(ea)]++;
> > > > >  }
> > > >
> > > > This makes the code a little harder to read.
> > >
> > > I agree it is somewhat convoluted.
> > > It's a tradeoff... I preferred performance at the cost of making the code
> > somewhat harder to read.
> > > The RTE_BUILD_BUG_ON() also helps showing what is going on with the
> weird
> > indexing.
> 
> Similar patches have been accepted by other drivers:
> [virtio]:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwor
> k.dpdk.org%2Fproject%2Fdpdk%2Fpatch%2F20240801160312.205281-1-
> mb%40smartsharesystems.com%2F&data=05%7C02%7Clongli%40microsoft.com
> %7C014c845ad8ec4ea25fb508dcd8b22474%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C638623506854135960%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0
> %7C%7C%7C&sdata=DrVP6TuTruREAxAoSEBED11TOapSqDN5stsDJJTuux0%3D&r
> eserved=0
> [vhost-user]:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwor
> k.dpdk.org%2Fproject%2Fdpdk%2Fpatch%2F20240802143259.269827-1-
> mb%40smartsharesystems.com%2F&data=05%7C02%7Clongli%40microsoft.com
> %7C014c845ad8ec4ea25fb508dcd8b22474%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C638623506854150865%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0
> %7C%7C%7C&sdata=85T9qnC4nbAwU53sxnfQ2lMayp5soX8BMEJsbB58ZBw%3D
> &reserved=0
> 
> >
> > Optimizing for multicast packets is not worth bothering.
> 
> Optimizing for multicast/broadcast comes into play in multicast environments,
> and during network broadcast storms.
> Although I don't know if any of those two scenarios are relevant for this specific
> driver.
> 
> > Keep the original code it is simpler.
> 
> Let's keep similar code similar across drivers.
> 
> @Long, @Wei, please Review/Ack, so the patch can be applied.

Reviewed-by: Long Li <longli@microsoft.com>
  
Ferruh Yigit Sept. 22, 2024, 2:36 a.m. UTC | #6
On 9/19/2024 7:06 PM, Long Li wrote:
>> Subject: RE: [PATCH] netvsc: optimize stats counters performance
>>
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>> Sent: Friday, 2 August 2024 19.33
>>>
>>> On Fri, 2 Aug 2024 19:28:26 +0200
>>> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> Reviewed-by: Long Li <longli@microsoft.com>
> 

Applied to dpdk-next-net/main, thanks.
  

Patch

diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 9bf1ec5509..b704b2c971 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -110,30 +110,18 @@  hn_update_packet_stats(struct hn_stats *stats, const struct rte_mbuf *m)
 	uint32_t s = m->pkt_len;
 	const struct rte_ether_addr *ea;
 
-	if (s == 64) {
-		stats->size_bins[1]++;
-	} else if (s > 64 && s < 1024) {
-		uint32_t bin;
-
-		/* count zeros, and offset into correct bin */
-		bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
-		stats->size_bins[bin]++;
-	} else {
-		if (s < 64)
-			stats->size_bins[0]++;
-		else if (s < 1519)
-			stats->size_bins[6]++;
-		else
-			stats->size_bins[7]++;
-	}
+	if (s >= 1024)
+		stats->size_bins[6 + (s > 1518)]++;
+	else if (s <= 64)
+		stats->size_bins[s >> 6]++;
+	else
+		stats->size_bins[32UL - rte_clz32(s) - 5]++;
 
 	ea = rte_pktmbuf_mtod(m, const struct rte_ether_addr *);
-	if (rte_is_multicast_ether_addr(ea)) {
-		if (rte_is_broadcast_ether_addr(ea))
-			stats->broadcast++;
-		else
-			stats->multicast++;
-	}
+	RTE_BUILD_BUG_ON(offsetof(struct hn_stats, broadcast) !=
+			offsetof(struct hn_stats, multicast) + sizeof(uint64_t));
+	if (unlikely(rte_is_multicast_ether_addr(ea)))
+		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
 }
 
 static inline unsigned int hn_rndis_pktlen(const struct rndis_packet_msg *pkt)