[RFC] net/af_packet: make stats reset reliable

Message ID 20240425174617.2126159-1-ferruh.yigit@amd.com (mailing list archive)
State Superseded
Delegated to: Ferruh Yigit
Headers
Series [RFC] net/af_packet: make stats reset reliable |

Checks

Context Check Description
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-compile-amd64-testing warning Testing issues

Commit Message

Ferruh Yigit April 25, 2024, 5:46 p.m. UTC
  For stats reset, use an offset instead of zeroing out actual stats values,
get_stats() displays diff between stats and offset.
This way stats only updated in datapath and offset only updated in stats
reset function. This makes stats reset function more reliable.

As stats only written by single thread, we can remove 'volatile' qualifier
which should improve the performance in datapath.

Signed-off-by: Ferruh Yigit <ferruh.yigit@amd.com>
---
Cc: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Morten Brørup <mb@smartsharesystems.com>

This update triggered by mail list discussion [1].

[1]
https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969f99@lysator.liu.se/
---
 drivers/net/af_packet/rte_eth_af_packet.c | 69 +++++++++++++++--------
 1 file changed, 44 insertions(+), 25 deletions(-)
  

Comments

Morten Brørup April 26, 2024, 11:33 a.m. UTC | #1
> +static uint64_t
> +stats_get_diff(uint64_t stats, uint64_t offset)
> +{
> +	if (stats >= offset)
> +		return stats - offset;
> +	/* unlikely wraparound case */
> +	return UINT64_MAX + stats - offset;

The numbers are unsigned, so wrapping comes for free.

Remove the comparison and always return stats - offset.

Using uint8_t for easier explanation, if offset is 255 and stats is 0, then the diff should be 1.
Returning stats - offset:
stats - offset = 0 - 255 = 0 - 0xFF = 1.

Returning UINT8_MAX + stats - offset is wrong:
UINT8_MAX + stats - offset = 255 - 0 - 255 = 0.

Besides that, it looks good to me.


While reviewing, I came across the rx_mbuf_alloc_failed counter in the rte_eth_dev_data structure:
https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/rte_ethdev.c#L3145
https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/ethdev_driver.h#L127

Doesn't it have the same problem?


BTW, the af_packet PMD doesn't increase the rx_mbuf_alloc_failed counter on mbuf allocation failures. But that's a separate bug.
  
Ferruh Yigit April 26, 2024, 1:37 p.m. UTC | #2
On 4/26/2024 12:33 PM, Morten Brørup wrote:
>> +static uint64_t
>> +stats_get_diff(uint64_t stats, uint64_t offset)
>> +{
>> +	if (stats >= offset)
>> +		return stats - offset;
>> +	/* unlikely wraparound case */
>> +	return UINT64_MAX + stats - offset;
> 
> The numbers are unsigned, so wrapping comes for free.
> 
> Remove the comparison and always return stats - offset.
> 
> Using uint8_t for easier explanation, if offset is 255 and stats is 0, then the diff should be 1.
> Returning stats - offset:
> stats - offset = 0 - 255 = 0 - 0xFF = 1.
> 
> Returning UINT8_MAX + stats - offset is wrong:
> UINT8_MAX + stats - offset = 255 - 0 - 255 = 0.
> 
> Besides that, it looks good to me.
> 

Yes, it is wrong, and thanks for removing comparison tip.

But thinking twice, taking wrapping into account for a uint64_t variable
can be being too cautious anyway. I will remove it completely.

> 
> While reviewing, I came across the rx_mbuf_alloc_failed counter in the rte_eth_dev_data structure:
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/rte_ethdev.c#L3145
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/ethdev_driver.h#L127
> 
> Doesn't it have the same problem?
> 

stats reset problem? af_packet is not collecting 'rx_mbuf_alloc_failed',
so nothing to do there for af_packet.

> 
> BTW, the af_packet PMD doesn't increase the rx_mbuf_alloc_failed counter on mbuf allocation failures. But that's a separate bug.
> 

Yes it is missing, but I wouldn't call it a bug, just one of the stats
is missing. And yes this can be handled separately if required.
  
Morten Brørup April 26, 2024, 2:56 p.m. UTC | #3
> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
> Sent: Friday, 26 April 2024 15.38
> 
> On 4/26/2024 12:33 PM, Morten Brørup wrote:

[...]

> > While reviewing, I came across the rx_mbuf_alloc_failed counter in the
> rte_eth_dev_data structure:
> > https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/rte_ethdev.c#L3145
> >
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/ethdev_driver.h#L127
> >
> > Doesn't it have the same problem?
> >
> 
> stats reset problem? af_packet is not collecting 'rx_mbuf_alloc_failed',
> so nothing to do there for af_packet.

Agreed, not related to af_packet or this patch.

I'm just wondering if a similar or other patch should be applied to rx_mbuf_alloc_failed in the ethdev layer.
rx_mbuf_alloc_failed is shared by lcores, so perhaps it should be atomic, or atomically incremented by drivers using it.

> 
> >
> > BTW, the af_packet PMD doesn't increase the rx_mbuf_alloc_failed counter on
> mbuf allocation failures. But that's a separate bug.
> >
> 
> Yes it is missing, but I wouldn't call it a bug, just one of the stats
> is missing. And yes this can be handled separately if required.

OK, then just some stats missing, not a bug.
Quite useful stats for debugging a production system, if mbuf allocations fail.
But still, not related to this patch.
  
Patrick Robb April 26, 2024, 9:28 p.m. UTC | #4
Recheck-request: iol-compile-amd64-testing

The DPDK Community Lab updated to the latest Alpine image yesterday, which
resulted in all Alpine builds failing. The failure is unrelated to your
patch, and this recheck should remove the fail on Patchwork, as we have
disabled Alpine testing for now.
  
Mattias Rönnblom April 28, 2024, 3:42 p.m. UTC | #5
On 2024-04-26 13:33, Morten Brørup wrote:
>> +static uint64_t
>> +stats_get_diff(uint64_t stats, uint64_t offset)
>> +{
>> +	if (stats >= offset)
>> +		return stats - offset;
>> +	/* unlikely wraparound case */
>> +	return UINT64_MAX + stats - offset;
> 
> The numbers are unsigned, so wrapping comes for free.
> 

With 64-bit counters, will they ever wrap? If you constantly run 100 
Gbps it'll take > 1000 years before the byte counter wrap.

> Remove the comparison and always return stats - offset.
> 
> Using uint8_t for easier explanation, if offset is 255 and stats is 0, then the diff should be 1.
> Returning stats - offset:
> stats - offset = 0 - 255 = 0 - 0xFF = 1.
> 
> Returning UINT8_MAX + stats - offset is wrong:
> UINT8_MAX + stats - offset = 255 - 0 - 255 = 0.
> 
> Besides that, it looks good to me.
> 
> 
> While reviewing, I came across the rx_mbuf_alloc_failed counter in the rte_eth_dev_data structure:
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/rte_ethdev.c#L3145
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/ethdev/ethdev_driver.h#L127
> 
> Doesn't it have the same problem?
> 
> 
> BTW, the af_packet PMD doesn't increase the rx_mbuf_alloc_failed counter on mbuf allocation failures. But that's a separate bug.
>
  
Morten Brørup May 10, 2024, 5:29 p.m. UTC | #6
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, 10 May 2024 07.01
> 
> This is my attempt to demonstrate:
>   - generic counters for SW drivers, the example does af_packet and tap
>     but same should be applied to af_xdp, virtio, etc.
>   - counters are safe against 64 bit tearing on 32 bit platform

+1 for this RFC

> 
> The naming and organization could be improved:
>   - should this be in rte_ethdev?

I think the part for handling 64 bit counters should be moved out into a generic library, so it can also be used by libraries and applications. It can be extended with more features (feature creep) later. And if some architectures offer (more feature creep) optimized implementations, all 64 bit counters for those architectures will benefit from such improvements.

I like the concept of joining and generalizing the stats handling for SW drivers, to avoid copy-paste of detailed stats handing between SW drivers. Copy-paste is bad, common functions are good.

I'm somewhat skeptical about putting the stats structure first in the rte_eth_dev_data's tx_queues and rx_queues.
These are void* because their types are private to the PMD. Putting the stats structure first is somewhat a hack, partially removing that privacy.
Perhaps we can make it look less like a hack. After all, it is still a private structure type, only the first part is public and must be the same across drivers using the SW stats counters. Overlapping certain parts of a private structure with a public structure is a common design pattern; I've seen it used elsewhere in DPDK too.
If we get used to it for SW ethdev drivers, we might not consider it a hack anymore. ;-)

>   - better name for struct and vairables?
>   - audit and handle errors better.
> 
> Stephen Hemminger (3):
>   ethdev: add internal helper of SW driver statistics
>   net/af_packet: use SW stats helper
>   net/tap: use generic SW stats
> 
>  drivers/net/af_packet/rte_eth_af_packet.c |  97 ++-----
>  drivers/net/tap/rte_eth_tap.c             | 100 ++------
>  drivers/net/tap/rte_eth_tap.h             |  17 +-
>  lib/ethdev/ethdev_swstats.c               | 294 ++++++++++++++++++++++
>  lib/ethdev/ethdev_swstats.h               |  60 +++++
>  lib/ethdev/meson.build                    |   2 +
>  lib/ethdev/version.map                    |   7 +
>  7 files changed, 400 insertions(+), 177 deletions(-)
>  create mode 100644 lib/ethdev/ethdev_swstats.c
>  create mode 100644 lib/ethdev/ethdev_swstats.h
> 
> --
> 2.43.0
  
Stephen Hemminger May 10, 2024, 7:30 p.m. UTC | #7
On Fri, 10 May 2024 19:29:48 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> ad, common functions are good.
> 
> I'm somewhat skeptical about putting the stats structure first in the rte_eth_dev_data's tx_queues and rx_queues.
> These are void* because their types are private to the PMD. Putting the stats structure first is somewhat a hack, partially removing that privacy.
> Perhaps we can make it look less like a hack. After all, it is still a private structure type, only the first part is public and must be the same across drivers using the SW stats counters. Overlapping certain parts of a private structure with a public structure is a common design pattern; I've seen it used elsewhere in DPDK too.
> If we get used to it for SW ethdev drivers, we might not consider it a hack anymore. ;-)

The tradeoff here, is putting them first allows for writing generic version of eth_stats_get and eth_stats_reset.
Other option is having generic code take offset for tx/rx to find the stats.

C doesn't have RTI like C++ so this was best I could think of.
  

Patch

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index 397a32db5886..2061cdab4997 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -51,8 +51,10 @@  struct pkt_rx_queue {
 	uint16_t in_port;
 	uint8_t vlan_strip;
 
-	volatile unsigned long rx_pkts;
-	volatile unsigned long rx_bytes;
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_pkts_offset;
+	uint64_t rx_bytes_offset;
 };
 
 struct pkt_tx_queue {
@@ -64,9 +66,12 @@  struct pkt_tx_queue {
 	unsigned int framecount;
 	unsigned int framenum;
 
-	volatile unsigned long tx_pkts;
-	volatile unsigned long err_pkts;
-	volatile unsigned long tx_bytes;
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+	uint64_t tx_pkts_offset;
+	uint64_t err_pkts_offset;
+	uint64_t tx_bytes_offset;
 };
 
 struct pmd_internals {
@@ -385,8 +390,18 @@  eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	return 0;
 }
 
+
+static uint64_t
+stats_get_diff(uint64_t stats, uint64_t offset)
+{
+	if (stats >= offset)
+		return stats - offset;
+	/* unlikely wraparound case */
+	return UINT64_MAX + stats - offset;
+}
+
 static int
-eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
 	unsigned i, imax;
 	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
@@ -396,27 +411,29 @@  eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
 	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
 	        internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (i = 0; i < imax; i++) {
-		igb_stats->q_ipackets[i] = internal->rx_queue[i].rx_pkts;
-		igb_stats->q_ibytes[i] = internal->rx_queue[i].rx_bytes;
-		rx_total += igb_stats->q_ipackets[i];
-		rx_bytes_total += igb_stats->q_ibytes[i];
+		struct pkt_rx_queue *rxq = &internal->rx_queue[i];
+		stats->q_ipackets[i] = stats_get_diff(rxq->rx_pkts, rxq->rx_pkts_offset);
+		stats->q_ibytes[i] = stats_get_diff(rxq->rx_bytes, rxq->rx_bytes_offset);
+		rx_total += stats->q_ipackets[i];
+		rx_bytes_total += stats->q_ibytes[i];
 	}
 
 	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
 	        internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (i = 0; i < imax; i++) {
-		igb_stats->q_opackets[i] = internal->tx_queue[i].tx_pkts;
-		igb_stats->q_obytes[i] = internal->tx_queue[i].tx_bytes;
-		tx_total += igb_stats->q_opackets[i];
-		tx_err_total += internal->tx_queue[i].err_pkts;
-		tx_bytes_total += igb_stats->q_obytes[i];
+		struct pkt_tx_queue *txq = &internal->tx_queue[i];
+		stats->q_opackets[i] = stats_get_diff(txq->tx_pkts, txq->tx_pkts_offset);
+		stats->q_obytes[i] = stats_get_diff(txq->tx_bytes, txq->tx_bytes_offset);
+		tx_total += stats->q_opackets[i];
+		tx_err_total += stats_get_diff(txq->err_pkts, txq->err_pkts_offset);
+		tx_bytes_total += stats->q_obytes[i];
 	}
 
-	igb_stats->ipackets = rx_total;
-	igb_stats->ibytes = rx_bytes_total;
-	igb_stats->opackets = tx_total;
-	igb_stats->oerrors = tx_err_total;
-	igb_stats->obytes = tx_bytes_total;
+	stats->ipackets = rx_total;
+	stats->ibytes = rx_bytes_total;
+	stats->opackets = tx_total;
+	stats->oerrors = tx_err_total;
+	stats->obytes = tx_bytes_total;
 	return 0;
 }
 
@@ -427,14 +444,16 @@  eth_stats_reset(struct rte_eth_dev *dev)
 	struct pmd_internals *internal = dev->data->dev_private;
 
 	for (i = 0; i < internal->nb_queues; i++) {
-		internal->rx_queue[i].rx_pkts = 0;
-		internal->rx_queue[i].rx_bytes = 0;
+		struct pkt_rx_queue *rxq = &internal->rx_queue[i];
+		rxq->rx_pkts_offset = rxq->rx_pkts;
+		rxq->rx_bytes_offset = rxq->rx_bytes;
 	}
 
 	for (i = 0; i < internal->nb_queues; i++) {
-		internal->tx_queue[i].tx_pkts = 0;
-		internal->tx_queue[i].err_pkts = 0;
-		internal->tx_queue[i].tx_bytes = 0;
+		struct pkt_tx_queue *txq = &internal->tx_queue[i];
+		txq->tx_pkts_offset = txq->tx_pkts;
+		txq->err_pkts_offset = txq->err_pkts;
+		txq->tx_bytes_offset = txq->tx_bytes;
 	}
 
 	return 0;