[v5] virtio: optimize stats counters performance

Message ID 20240801160312.205281-1-mb@smartsharesystems.com (mailing list archive)
State Awaiting Upstream
Delegated to: Maxime Coquelin
Headers
Series [v5] virtio: optimize stats counters performance |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Morten Brørup Aug. 1, 2024, 4:03 p.m. UTC
Optimized the performance of updating the virtio statistics counters by
reducing the number of branches.

Ordered the packet size comparisons according to the probability with
typical internet traffic mix.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v5:
* Do not inline the function. (Stephen)
v4:
* Consider multicast/broadcast packets unlikely.
v3:
* Eliminated a local variable.
* Note: Substituted sizeof(uint32_t)*4 by 32UL, using unsigned long type
  to keep optimal offsetting in generated assembler output.
* Removed unnecessary curly braces.
v2:
* Fixed checkpatch warning about line length.
---
 drivers/net/virtio/virtio_rxtx.c | 39 ++++++++++++--------------------
 drivers/net/virtio/virtio_rxtx.h |  4 ++--
 2 files changed, 16 insertions(+), 27 deletions(-)
  

Comments

Stephen Hemminger Aug. 1, 2024, 4:17 p.m. UTC | #1
On Thu,  1 Aug 2024 16:03:12 +0000
Morten Brørup <mb@smartsharesystems.com> wrote:

> Optimized the performance of updating the virtio statistics counters by
> reducing the number of branches.
> 
> Ordered the packet size comparisons according to the probability with
> typical internet traffic mix.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

LGTM
Acked-by: Stephen Hemminger <stephen@networkplumber.org>

I wonder if other software drivers could use similar counters?
  
Morten Brørup Aug. 1, 2024, 8:38 p.m. UTC | #2
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, 1 August 2024 18.18
> 
> On Thu,  1 Aug 2024 16:03:12 +0000
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > Optimized the performance of updating the virtio statistics counters
> by
> > reducing the number of branches.
> >
> > Ordered the packet size comparisons according to the probability with
> > typical internet traffic mix.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> 
> LGTM
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> I wonder if other software drivers could use similar counters?

While working on this, I noticed the netvsc driver and vhost lib also have size_bins [1], so they are likely candidates.
 
The netvsc's hn_update_packet_stats() function [2] seems like 100 % copy-paste, and should be easy to paste over with the new implementation.
The vhost lib's vhost_queue_stats_update() function [3] also looks rather similar to the original variant, and could benefit too.

[1]: https://elixir.bootlin.com/dpdk/v24.07/A/ident/size_bins
[2]: https://elixir.bootlin.com/dpdk/v24.07/source/drivers/net/netvsc/hn_rxtx.c#L108
[3]: https://elixir.bootlin.com/dpdk/v24.07/source/lib/vhost/virtio_net.c#L56

I'll take a look around and add similar patches for what I find. Thanks for the reminder.
  
Huisong Li Aug. 2, 2024, 2:23 a.m. UTC | #3
在 2024/8/2 0:03, Morten Brørup 写道:
> Optimized the performance of updating the virtio statistics counters by
> reducing the number of branches.
>
> Ordered the packet size comparisons according to the probability with
> typical internet traffic mix.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v5:
> * Do not inline the function. (Stephen)
> v4:
> * Consider multicast/broadcast packets unlikely.
> v3:
> * Eliminated a local variable.
> * Note: Substituted sizeof(uint32_t)*4 by 32UL, using unsigned long type
>    to keep optimal offsetting in generated assembler output.
> * Removed unnecessary curly braces.
> v2:
> * Fixed checkpatch warning about line length.
> ---
>   drivers/net/virtio/virtio_rxtx.c | 39 ++++++++++++--------------------
>   drivers/net/virtio/virtio_rxtx.h |  4 ++--
>   2 files changed, 16 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index f69b9453a2..b67f063b31 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -82,37 +82,26 @@ vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
>   }
>   
>   void
> -virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
> +virtio_update_packet_stats(struct virtnet_stats *const stats,
> +		const struct rte_mbuf *const mbuf)
The two const is also for performace?  Is there gain?
>   {
>   	uint32_t s = mbuf->pkt_len;
> -	struct rte_ether_addr *ea;
> +	const struct rte_ether_addr *const ea =
> +			rte_pktmbuf_mtod(mbuf, const struct rte_ether_addr *);
>   
>   	stats->bytes += s;
>   
> -	if (s == 64) {
> -		stats->size_bins[1]++;
> -	} else if (s > 64 && s < 1024) {
> -		uint32_t bin;
> -
> -		/* count zeros, and offset into correct bin */
> -		bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
> -		stats->size_bins[bin]++;
> -	} else {
> -		if (s < 64)
> -			stats->size_bins[0]++;
> -		else if (s < 1519)
> -			stats->size_bins[6]++;
> -		else
> -			stats->size_bins[7]++;
> -	}
> +	if (s >= 1024)
> +		stats->size_bins[6 + (s > 1518)]++;
> +	else if (s <= 64)
> +		stats->size_bins[s >> 6]++;
> +	else
> +		stats->size_bins[32UL - rte_clz32(s) - 5]++;
>   
> -	ea = rte_pktmbuf_mtod(mbuf, struct rte_ether_addr *);
> -	if (rte_is_multicast_ether_addr(ea)) {
> -		if (rte_is_broadcast_ether_addr(ea))
> -			stats->broadcast++;
> -		else
> -			stats->multicast++;
> -	}
> +	RTE_BUILD_BUG_ON(offsetof(struct virtnet_stats, broadcast) !=
> +			offsetof(struct virtnet_stats, multicast) + sizeof(uint64_t));
> +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
The "rte_is_broadcast_ether_addr(ea) " will be calculated twice if 
packet is mulitcast.
How about coding like:
-->
is_mulitcast = rte_is_multicast_ether_addr(ea);
if (unlikely(is_mulitcast))
(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
>   }
>   
>   static inline void
> diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
> index afc4b74534..68034c914b 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -35,7 +35,7 @@ struct virtnet_tx {
>   };
>   
>   int virtio_rxq_vec_setup(struct virtnet_rx *rxvq);
> -void virtio_update_packet_stats(struct virtnet_stats *stats,
> -				struct rte_mbuf *mbuf);
> +void virtio_update_packet_stats(struct virtnet_stats *const stats,
> +		const struct rte_mbuf *const mbuf);
>   
>   #endif /* _VIRTIO_RXTX_H_ */
  
Stephen Hemminger Aug. 2, 2024, 2:42 a.m. UTC | #4
On Fri, 2 Aug 2024 10:23:12 +0800
"lihuisong (C)" <lihuisong@huawei.com> wrote:

> >   void
> > -virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
> > +virtio_update_packet_stats(struct virtnet_stats *const stats,
> > +		const struct rte_mbuf *const mbuf)  
> The two const is also for performace?  Is there gain?

If you look at resulting code (ie godbolt.org) the resulting code never
changes when const is added.  The compiler already
knows what is modified. Const is only a programmer and correctness thing.
  
Huisong Li Aug. 2, 2024, 3:17 a.m. UTC | #5
在 2024/8/2 10:42, Stephen Hemminger 写道:
> On Fri, 2 Aug 2024 10:23:12 +0800
> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>
>>>    void
>>> -virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
>>> +virtio_update_packet_stats(struct virtnet_stats *const stats,
>>> +		const struct rte_mbuf *const mbuf)
>> The two const is also for performace?  Is there gain?
> If you look at resulting code (ie godbolt.org) the resulting code never
> changes when const is added.  The compiler already
> knows what is modified. Const is only a programmer and correctness thing.
I know this. Thanks for your clarifying and recommending the site that 
is good to use.😁
This patch is just for optimizing stats counters performance.
And the new added const has nothing to do with this optimization.
But it's ok for me.
> .
  
Morten Brørup Aug. 2, 2024, 11:27 a.m. UTC | #6
> From: lihuisong (C) [mailto:lihuisong@huawei.com]
> Sent: Friday, 2 August 2024 04.23
> 
> 在 2024/8/2 0:03, Morten Brørup 写道:
> >   void
> > -virtio_update_packet_stats(struct virtnet_stats *stats, struct
> rte_mbuf *mbuf)
> > +virtio_update_packet_stats(struct virtnet_stats *const stats,
> > +		const struct rte_mbuf *const mbuf)
> The two const is also for performace?  Is there gain?

The "const struct rte_mbuf * mbuf" informs the optimizer that the function does not modify the mbuf.  This is for the benefit of callers of the function; they can rely on the mbuf being unchanged when the function returns.
So, if the optimizer has cached some mbuf field before calling the function, it does not need to read the mbuf field again, but can continue using the cached value after the function call.

The two "type *const ptr" probably make no performance difference with the compilers and function call conventions used by the CPU architectures supported by DPDK.

> > +	RTE_BUILD_BUG_ON(offsetof(struct virtnet_stats, broadcast) !=
> > +			offsetof(struct virtnet_stats, multicast) +
> sizeof(uint64_t));
> > +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> > +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
> The "rte_is_broadcast_ether_addr(ea) " will be calculated twice if
> packet is mulitcast.
> How about coding like:
> -->
> is_mulitcast = rte_is_multicast_ether_addr(ea);
> if (unlikely(is_mulitcast))
> (&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;

I don't think "rte_is_broadcast_ether_addr(ea)" is calculated twice for multicast packets.
My code essentially does this:
	if (mc(ea))
		stats[bc(ea)]++;

Changing to this shouldn't make a difference:
	m = mc(ea);
	if (m)
		stats[bc(ea)]++;
  
Morten Brørup Aug. 2, 2024, 2:49 p.m. UTC | #7
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Thursday, 1 August 2024 22.38
> 
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, 1 August 2024 18.18
> >
> > On Thu,  1 Aug 2024 16:03:12 +0000
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > Optimized the performance of updating the virtio statistics counters
> > by
> > > reducing the number of branches.
> > >
> > > Ordered the packet size comparisons according to the probability
> with
> > > typical internet traffic mix.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> >
> > LGTM
> > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > I wonder if other software drivers could use similar counters?
> 
> While working on this, I noticed the netvsc driver and vhost lib also
> have size_bins [1], so they are likely candidates.
> 
> The netvsc's hn_update_packet_stats() function [2] seems like 100 %
> copy-paste, and should be easy to paste over with the new
> implementation.
> The vhost lib's vhost_queue_stats_update() function [3] also looks
> rather similar to the original variant, and could benefit too.
> 
> [1]: https://elixir.bootlin.com/dpdk/v24.07/A/ident/size_bins
> [2]:
> https://elixir.bootlin.com/dpdk/v24.07/source/drivers/net/netvsc/hn_rxtx
> .c#L108
> [3]:
> https://elixir.bootlin.com/dpdk/v24.07/source/lib/vhost/virtio_net.c#L56
> 
> I'll take a look around and add similar patches for what I find. Thanks
> for the reminder.

Separate patches for vhost-user [4] and netvsc [5] now provided.

[4]: https://inbox.dpdk.org/dev/20240802143259.269827-1-mb@smartsharesystems.com/T/#u
[5]: https://inbox.dpdk.org/dev/20240802144048.270152-1-mb@smartsharesystems.com/T/#u
  
Huisong Li Aug. 5, 2024, 1:14 a.m. UTC | #8
在 2024/8/2 19:27, Morten Brørup 写道:
>> From: lihuisong (C) [mailto:lihuisong@huawei.com]
>> Sent: Friday, 2 August 2024 04.23
>>
>> 在 2024/8/2 0:03, Morten Brørup 写道:
>>>    void
>>> -virtio_update_packet_stats(struct virtnet_stats *stats, struct
>> rte_mbuf *mbuf)
>>> +virtio_update_packet_stats(struct virtnet_stats *const stats,
>>> +		const struct rte_mbuf *const mbuf)
>> The two const is also for performace?  Is there gain?
> The "const struct rte_mbuf * mbuf" informs the optimizer that the function does not modify the mbuf.  This is for the benefit of callers of the function; they can rely on the mbuf being unchanged when the function returns.
> So, if the optimizer has cached some mbuf field before calling the function, it does not need to read the mbuf field again, but can continue using the cached value after the function call.
>
> The two "type *const ptr" probably make no performance difference with the compilers and function call conventions used by the CPU architectures supported by DPDK.
ok
>
>>> +	RTE_BUILD_BUG_ON(offsetof(struct virtnet_stats, broadcast) !=
>>> +			offsetof(struct virtnet_stats, multicast) +
>> sizeof(uint64_t));
>>> +	if (unlikely(rte_is_multicast_ether_addr(ea)))
>>> +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
>> The "rte_is_broadcast_ether_addr(ea) " will be calculated twice if
>> packet is mulitcast.
>> How about coding like:
>> -->
>> is_mulitcast = rte_is_multicast_ether_addr(ea);
>> if (unlikely(is_mulitcast))
>> (&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
> I don't think "rte_is_broadcast_ether_addr(ea)" is calculated twice for multicast packets.
> My code essentially does this:
> 	if (mc(ea))
> 		stats[bc(ea)]++;
>
> Changing to this shouldn't make a difference:
> 	m = mc(ea);
> 	if (m)
> 		stats[bc(ea)]++;
Yeah,you are right.
>
  
Huisong Li Aug. 5, 2024, 1:19 a.m. UTC | #9
LGTM,

Acked-by: Huisong Li <lihuisong@huawei.com>

在 2024/8/2 0:03, Morten Brørup 写道:
> Optimized the performance of updating the virtio statistics counters by
> reducing the number of branches.
>
> Ordered the packet size comparisons according to the probability with
> typical internet traffic mix.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v5:
> * Do not inline the function. (Stephen)
> v4:
> * Consider multicast/broadcast packets unlikely.
> v3:
> * Eliminated a local variable.
> * Note: Substituted sizeof(uint32_t)*4 by 32UL, using unsigned long type
>    to keep optimal offsetting in generated assembler output.
> * Removed unnecessary curly braces.
> v2:
> * Fixed checkpatch warning about line length.
> ---
>   drivers/net/virtio/virtio_rxtx.c | 39 ++++++++++++--------------------
>   drivers/net/virtio/virtio_rxtx.h |  4 ++--
>   2 files changed, 16 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index f69b9453a2..b67f063b31 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -82,37 +82,26 @@ vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
>   }
>   
>   void
> -virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
> +virtio_update_packet_stats(struct virtnet_stats *const stats,
> +		const struct rte_mbuf *const mbuf)
>   {
>   	uint32_t s = mbuf->pkt_len;
> -	struct rte_ether_addr *ea;
> +	const struct rte_ether_addr *const ea =
> +			rte_pktmbuf_mtod(mbuf, const struct rte_ether_addr *);
>   
>   	stats->bytes += s;
>   
> -	if (s == 64) {
> -		stats->size_bins[1]++;
> -	} else if (s > 64 && s < 1024) {
> -		uint32_t bin;
> -
> -		/* count zeros, and offset into correct bin */
> -		bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
> -		stats->size_bins[bin]++;
> -	} else {
> -		if (s < 64)
> -			stats->size_bins[0]++;
> -		else if (s < 1519)
> -			stats->size_bins[6]++;
> -		else
> -			stats->size_bins[7]++;
> -	}
> +	if (s >= 1024)
> +		stats->size_bins[6 + (s > 1518)]++;
> +	else if (s <= 64)
> +		stats->size_bins[s >> 6]++;
> +	else
> +		stats->size_bins[32UL - rte_clz32(s) - 5]++;
>   
> -	ea = rte_pktmbuf_mtod(mbuf, struct rte_ether_addr *);
> -	if (rte_is_multicast_ether_addr(ea)) {
> -		if (rte_is_broadcast_ether_addr(ea))
> -			stats->broadcast++;
> -		else
> -			stats->multicast++;
> -	}
> +	RTE_BUILD_BUG_ON(offsetof(struct virtnet_stats, broadcast) !=
> +			offsetof(struct virtnet_stats, multicast) + sizeof(uint64_t));
> +	if (unlikely(rte_is_multicast_ether_addr(ea)))
> +		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
>   }
>   
>   static inline void
> diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
> index afc4b74534..68034c914b 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -35,7 +35,7 @@ struct virtnet_tx {
>   };
>   
>   int virtio_rxq_vec_setup(struct virtnet_rx *rxvq);
> -void virtio_update_packet_stats(struct virtnet_stats *stats,
> -				struct rte_mbuf *mbuf);
> +void virtio_update_packet_stats(struct virtnet_stats *const stats,
> +		const struct rte_mbuf *const mbuf);
>   
>   #endif /* _VIRTIO_RXTX_H_ */
  
Chenbo Xia Aug. 6, 2024, 8:23 a.m. UTC | #10
> On Aug 2, 2024, at 00:03, Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Optimized the performance of updating the virtio statistics counters by
> reducing the number of branches.
> 
> Ordered the packet size comparisons according to the probability with
> typical internet traffic mix.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v5:
> * Do not inline the function. (Stephen)
> v4:
> * Consider multicast/broadcast packets unlikely.
> v3:
> * Eliminated a local variable.
> * Note: Substituted sizeof(uint32_t)*4 by 32UL, using unsigned long type
>  to keep optimal offsetting in generated assembler output.
> * Removed unnecessary curly braces.
> v2:
> * Fixed checkpatch warning about line length.
> ---
> drivers/net/virtio/virtio_rxtx.c | 39 ++++++++++++--------------------
> drivers/net/virtio/virtio_rxtx.h |  4 ++--
> 2 files changed, 16 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index f69b9453a2..b67f063b31 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -82,37 +82,26 @@ vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
> }
> 
> void
> -virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
> +virtio_update_packet_stats(struct virtnet_stats *const stats,
> +               const struct rte_mbuf *const mbuf)
> {
>        uint32_t s = mbuf->pkt_len;
> -       struct rte_ether_addr *ea;
> +       const struct rte_ether_addr *const ea =
> +                       rte_pktmbuf_mtod(mbuf, const struct rte_ether_addr *);
> 
>        stats->bytes += s;
> 
> -       if (s == 64) {
> -               stats->size_bins[1]++;
> -       } else if (s > 64 && s < 1024) {
> -               uint32_t bin;
> -
> -               /* count zeros, and offset into correct bin */
> -               bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
> -               stats->size_bins[bin]++;
> -       } else {
> -               if (s < 64)
> -                       stats->size_bins[0]++;
> -               else if (s < 1519)
> -                       stats->size_bins[6]++;
> -               else
> -                       stats->size_bins[7]++;
> -       }
> +       if (s >= 1024)
> +               stats->size_bins[6 + (s > 1518)]++;
> +       else if (s <= 64)
> +               stats->size_bins[s >> 6]++;
> +       else
> +               stats->size_bins[32UL - rte_clz32(s) - 5]++;
> 
> -       ea = rte_pktmbuf_mtod(mbuf, struct rte_ether_addr *);
> -       if (rte_is_multicast_ether_addr(ea)) {
> -               if (rte_is_broadcast_ether_addr(ea))
> -                       stats->broadcast++;
> -               else
> -                       stats->multicast++;
> -       }
> +       RTE_BUILD_BUG_ON(offsetof(struct virtnet_stats, broadcast) !=
> +                       offsetof(struct virtnet_stats, multicast) + sizeof(uint64_t));
> +       if (unlikely(rte_is_multicast_ether_addr(ea)))
> +               (&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
> }
> 
> static inline void
> diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
> index afc4b74534..68034c914b 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -35,7 +35,7 @@ struct virtnet_tx {
> };
> 
> int virtio_rxq_vec_setup(struct virtnet_rx *rxvq);
> -void virtio_update_packet_stats(struct virtnet_stats *stats,
> -                               struct rte_mbuf *mbuf);
> +void virtio_update_packet_stats(struct virtnet_stats *const stats,
> +               const struct rte_mbuf *const mbuf);
> 
> #endif /* _VIRTIO_RXTX_H_ */
> —
> 2.43.0
> 

Reviewed-by: Chenbo Xia <chenbox@nvidia.com>
  
Maxime Coquelin Sept. 10, 2024, 3:01 p.m. UTC | #11
On 8/1/24 18:03, Morten Brørup wrote:
> Optimized the performance of updating the virtio statistics counters by
> reducing the number of branches.
> 
> Ordered the packet size comparisons according to the probability with
> typical internet traffic mix.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v5:
> * Do not inline the function. (Stephen)
> v4:
> * Consider multicast/broadcast packets unlikely.
> v3:
> * Eliminated a local variable.
> * Note: Substituted sizeof(uint32_t)*4 by 32UL, using unsigned long type
>    to keep optimal offsetting in generated assembler output.
> * Removed unnecessary curly braces.
> v2:
> * Fixed checkpatch warning about line length.
> ---
>   drivers/net/virtio/virtio_rxtx.c | 39 ++++++++++++--------------------
>   drivers/net/virtio/virtio_rxtx.h |  4 ++--
>   2 files changed, 16 insertions(+), 27 deletions(-)
>

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime
  
Maxime Coquelin Sept. 19, 2024, 12:04 p.m. UTC | #12
On 8/1/24 18:03, Morten Brørup wrote:
> Optimized the performance of updating the virtio statistics counters by
> reducing the number of branches.
> 
> Ordered the packet size comparisons according to the probability with
> typical internet traffic mix.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v5:
> * Do not inline the function. (Stephen)
> v4:
> * Consider multicast/broadcast packets unlikely.
> v3:
> * Eliminated a local variable.
> * Note: Substituted sizeof(uint32_t)*4 by 32UL, using unsigned long type
>    to keep optimal offsetting in generated assembler output.
> * Removed unnecessary curly braces.
> v2:
> * Fixed checkpatch warning about line length.
> ---
>   drivers/net/virtio/virtio_rxtx.c | 39 ++++++++++++--------------------
>   drivers/net/virtio/virtio_rxtx.h |  4 ++--
>   2 files changed, 16 insertions(+), 27 deletions(-)
> 

Applied to next-virtio/for-next-net.

Thanks,
Maxime
  

Patch

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index f69b9453a2..b67f063b31 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -82,37 +82,26 @@  vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 }
 
 void
-virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
+virtio_update_packet_stats(struct virtnet_stats *const stats,
+		const struct rte_mbuf *const mbuf)
 {
 	uint32_t s = mbuf->pkt_len;
-	struct rte_ether_addr *ea;
+	const struct rte_ether_addr *const ea =
+			rte_pktmbuf_mtod(mbuf, const struct rte_ether_addr *);
 
 	stats->bytes += s;
 
-	if (s == 64) {
-		stats->size_bins[1]++;
-	} else if (s > 64 && s < 1024) {
-		uint32_t bin;
-
-		/* count zeros, and offset into correct bin */
-		bin = (sizeof(s) * 8) - rte_clz32(s) - 5;
-		stats->size_bins[bin]++;
-	} else {
-		if (s < 64)
-			stats->size_bins[0]++;
-		else if (s < 1519)
-			stats->size_bins[6]++;
-		else
-			stats->size_bins[7]++;
-	}
+	if (s >= 1024)
+		stats->size_bins[6 + (s > 1518)]++;
+	else if (s <= 64)
+		stats->size_bins[s >> 6]++;
+	else
+		stats->size_bins[32UL - rte_clz32(s) - 5]++;
 
-	ea = rte_pktmbuf_mtod(mbuf, struct rte_ether_addr *);
-	if (rte_is_multicast_ether_addr(ea)) {
-		if (rte_is_broadcast_ether_addr(ea))
-			stats->broadcast++;
-		else
-			stats->multicast++;
-	}
+	RTE_BUILD_BUG_ON(offsetof(struct virtnet_stats, broadcast) !=
+			offsetof(struct virtnet_stats, multicast) + sizeof(uint64_t));
+	if (unlikely(rte_is_multicast_ether_addr(ea)))
+		(&stats->multicast)[rte_is_broadcast_ether_addr(ea)]++;
 }
 
 static inline void
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index afc4b74534..68034c914b 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -35,7 +35,7 @@  struct virtnet_tx {
 };
 
 int virtio_rxq_vec_setup(struct virtnet_rx *rxvq);
-void virtio_update_packet_stats(struct virtnet_stats *stats,
-				struct rte_mbuf *mbuf);
+void virtio_update_packet_stats(struct virtnet_stats *const stats,
+		const struct rte_mbuf *const mbuf);
 
 #endif /* _VIRTIO_RXTX_H_ */