[v2,5/5] examples/kni: improve zeroing statistics

Message ID 20180919195549.5585-6-dg@adax.com (mailing list archive)
State Changes Requested, archived
Delegated to: Thomas Monjalon
Headers
Series kni: add API to set link status on kernel interface |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Dan Gora Sept. 19, 2018, 7:55 p.m. UTC
  The worker threads incrementing the rx/tx_packets race with the signal
handler from the main thread zeroing the entire statistics structure.
This can cause the statistics to fail to be zeroed, even when there
is no traffic on those interfaces.

Improve zeroing the statistics by only incrementing rx/tx_packets
in worker threads by a non-zero amount.  This limits the race to the
periods in which traffic is actually being received or transmitted.

Signed-off-by: Dan Gora <dg@adax.com>
---
 examples/kni/main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Ferruh Yigit Sept. 26, 2018, 2:01 p.m. UTC | #1
On 9/19/2018 8:55 PM, Dan Gora wrote:
> The worker threads incrementing the rx/tx_packets race with the signal
> handler from the main thread zeroing the entire statistics structure.
> This can cause the statistics to fail to be zeroed, even when there
> is no traffic on those interfaces.
> 
> Improve zeroing the statistics by only incrementing rx/tx_packets
> in worker threads by a non-zero amount.  This limits the race to the
> periods in which traffic is actually being received or transmitted.

Not sure about introducing an extra check to datapath for possible error on
stats zero. I am for dropping this patch, what do you think?

> 
> Signed-off-by: Dan Gora <dg@adax.com>
> ---
>  examples/kni/main.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/kni/main.c b/examples/kni/main.c
> index ca45347d8..d00569740 100644
> --- a/examples/kni/main.c
> +++ b/examples/kni/main.c
> @@ -223,7 +223,8 @@ kni_ingress(struct kni_port_params *p)
>  		}
>  		/* Burst tx to kni */
>  		num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
> -		kni_stats[port_id].rx_packets += num;
> +		if (num)
> +			kni_stats[port_id].rx_packets += num;
>  
>  		rte_kni_handle_request(p->kni[i]);
>  		if (unlikely(num < nb_rx)) {
> @@ -260,7 +261,8 @@ kni_egress(struct kni_port_params *p)
>  		}
>  		/* Burst tx to eth */
>  		nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
> -		kni_stats[port_id].tx_packets += nb_tx;
> +		if (nb_tx)
> +			kni_stats[port_id].tx_packets += nb_tx;
>  		if (unlikely(nb_tx < num)) {
>  			/* Free mbufs not tx to NIC */
>  			kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);
>
  
Dan Gora Sept. 26, 2018, 2:48 p.m. UTC | #2
On Wed, Sep 26, 2018 at 11:01 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 9/19/2018 8:55 PM, Dan Gora wrote:
>> The worker threads incrementing the rx/tx_packets race with the signal
>> handler from the main thread zeroing the entire statistics structure.
>> This can cause the statistics to fail to be zeroed, even when there
>> is no traffic on those interfaces.
>>
>> Improve zeroing the statistics by only incrementing rx/tx_packets
>> in worker threads by a non-zero amount.  This limits the race to the
>> periods in which traffic is actually being received or transmitted.
>
> Not sure about introducing an extra check to datapath for possible error on
> stats zero. I am for dropping this patch, what do you think?

This is literally adding one instruction to the datapath.  Not even an
atomic instruction.  There is no effect on the performance caused by
this change.

Is that not better than the user (like me who experienced this)
wondering why they cannot zero the counters even when there is no
traffic?

-d
  
Ferruh Yigit Sept. 27, 2018, 11:40 a.m. UTC | #3
On 9/26/2018 3:48 PM, Dan Gora wrote:
> On Wed, Sep 26, 2018 at 11:01 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>> On 9/19/2018 8:55 PM, Dan Gora wrote:
>>> The worker threads incrementing the rx/tx_packets race with the signal
>>> handler from the main thread zeroing the entire statistics structure.
>>> This can cause the statistics to fail to be zeroed, even when there
>>> is no traffic on those interfaces.
>>>
>>> Improve zeroing the statistics by only incrementing rx/tx_packets
>>> in worker threads by a non-zero amount.  This limits the race to the
>>> periods in which traffic is actually being received or transmitted.
>>
>> Not sure about introducing an extra check to datapath for possible error on
>> stats zero. I am for dropping this patch, what do you think?
> 
> This is literally adding one instruction to the datapath.  Not even an
> atomic instruction.  There is no effect on the performance caused by
> this change.
> 
> Is that not better than the user (like me who experienced this)
> wondering why they cannot zero the counters even when there is no
> traffic?

Can we have something like, stop the forwarding, clear the stats and start
forwarding back?

Yes effect is small but it is for each packet, it doesn't make sense to me to
add extra check for each packet for the rare case of stats reset.
  
Dan Gora Sept. 27, 2018, 3:53 p.m. UTC | #4
On Thu, Sep 27, 2018 at 8:40 AM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >> Not sure about introducing an extra check to datapath for possible error on
> >> stats zero. I am for dropping this patch, what do you think?
> >
> > This is literally adding one instruction to the datapath.  Not even an
> > atomic instruction.  There is no effect on the performance caused by
> > this change.
> >
> > Is that not better than the user (like me who experienced this)
> > wondering why they cannot zero the counters even when there is no
> > traffic?
>
> Can we have something like, stop the forwarding, clear the stats and start
> forwarding back?

This is what is broken.  The stats do not zero because you have the
two worker threads who are constantly performing:

 kni_stats[port_id].rx_packets += num;

and

 kni_stats[port_id].tx_packets += nb_tx;

You know how read/inc/write races work, right?

These are not atomic increments, so the other thread zeroing these
counters is _always_ going to race with these worker threads
overwriting the counters with the old values.

With no traffic it's worse because the worker threads perform these
increments even more often!

> Yes effect is small but it is for each packet, it doesn't make sense to me to
> add extra check for each packet for the rare case of stats reset.

Its not rare at all!  You cannot zero the statistics around 80% of the
time on my machine when there is no traffic.  It's trivial to
reproduce this.  Just run a little traffic, stop the traffic, zero the
stats and check the stats.

If you cannot zero the statistics reliably under any circumstance then
the statistics themselves are worthless and should be removed.  It's
better to have no stats than completely unreliable ones.
  
Ferruh Yigit Sept. 27, 2018, 10:04 p.m. UTC | #5
On 9/27/2018 4:53 PM, Dan Gora wrote:
> On Thu, Sep 27, 2018 at 8:40 AM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>>> Not sure about introducing an extra check to datapath for possible error on
>>>> stats zero. I am for dropping this patch, what do you think?
>>>
>>> This is literally adding one instruction to the datapath.  Not even an
>>> atomic instruction.  There is no effect on the performance caused by
>>> this change.
>>>
>>> Is that not better than the user (like me who experienced this)
>>> wondering why they cannot zero the counters even when there is no
>>> traffic?
>>
>> Can we have something like, stop the forwarding, clear the stats and start
>> forwarding back?
> 
> This is what is broken.  The stats do not zero because you have the
> two worker threads who are constantly performing:
> 
>  kni_stats[port_id].rx_packets += num;
> 
> and
> 
>  kni_stats[port_id].tx_packets += nb_tx;
> 
> You know how read/inc/write races work, right?
> 
> These are not atomic increments, so the other thread zeroing these
> counters is _always_ going to race with these worker threads
> overwriting the counters with the old values.
> 
> With no traffic it's worse because the worker threads perform these
> increments even more often!

Dear Dan,

Your implementation doesn't prevent the race when there is traffic, it can be
useful when there is no traffic, that is why my suggestion was in signal
handler, stop worker threads, zero stats, start workers again.
But for traffic case this will cause packet lost so overall may not be desirable.

> 
>> Yes effect is small but it is for each packet, it doesn't make sense to me to
>> add extra check for each packet for the rare case of stats reset.
> 
> Its not rare at all!  You cannot zero the statistics around 80% of the
> time on my machine when there is no traffic.  It's trivial to
> reproduce this.  Just run a little traffic, stop the traffic, zero the
> stats and check the stats.

I was trying to say, packet transfer is millions per second, so the waste is in
that scale, but clear stats will be called in the scale of seconds/minutes? Or
not at all perhaps for some cases?

> 
> If you cannot zero the statistics reliably under any circumstance then
> the statistics themselves are worthless and should be removed.  It's
> better to have no stats than completely unreliable ones.

How much really you rely on sample application stats clear, and stats are still
useful of course since you can use them without reset.
  
Dan Gora Sept. 27, 2018, 10:40 p.m. UTC | #6
On Thu, Sep 27, 2018 at 7:04 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:

>>
>> With no traffic it's worse because the worker threads perform these
>> increments even more often!
>
> Dear Dan,
>
> Your implementation doesn't prevent the race when there is traffic, it can be

Correct.  To do that would require per-cpu counters or atomic
increments.  That all seems a bit much for a simple example app.

> useful when there is no traffic, that is why my suggestion was in signal
> handler, stop worker threads, zero stats, start workers again.
> But for traffic case this will cause packet lost so overall may not be desirable.

Yeah, of course it would cause packet loss while traffic is running,
that's why I didn't even consider that..

>>> Yes effect is small but it is for each packet, it doesn't make sense to me to
>>> add extra check for each packet for the rare case of stats reset.
>>
>> Its not rare at all!  You cannot zero the statistics around 80% of the
>> time on my machine when there is no traffic.  It's trivial to
>> reproduce this.  Just run a little traffic, stop the traffic, zero the
>> stats and check the stats.
>
> I was trying to say, packet transfer is millions per second, so the waste is in
> that scale, but clear stats will be called in the scale of seconds/minutes? Or
> not at all perhaps for some cases?

Even at 100G 64 byte packets, this one extra instruction is not going
to make any difference.  In fact I would bet money that it will
actually improve performance by decreasing the pressure on that cache
line, which is shared between the tx and rx threads (and cores!
possibly on different sockets!).

Currently you have two cores constantly dirtying the same cache line
every few instructions for no reason other than to read and write back
the same value if no data was tx/rx'd.

>> If you cannot zero the statistics reliably under any circumstance then
>> the statistics themselves are worthless and should be removed.  It's
>> better to have no stats than completely unreliable ones.
>
> How much really you rely on sample application stats clear, and stats are still
> useful of course since you can use them without reset.

Well, you can, but... Say you're doing a data transfer test and you
want to make sure that no packets are lost.  You stop the traffic,
zero the counters, run the test, look at the counters.  Oops, they are
different!  Oh yeah, the silly test app cannot zero the counters, I
forgot... So then you have to go back, find the old stats (if you
checked them first), copy that number into dc, do the subtraction by
hand, etc..

It just seems all a bit much, especially for a toy example application.

That said, this is by no means a hill I want to die on.  If you don't
want to add it, that's fine...

d
  

Patch

diff --git a/examples/kni/main.c b/examples/kni/main.c
index ca45347d8..d00569740 100644
--- a/examples/kni/main.c
+++ b/examples/kni/main.c
@@ -223,7 +223,8 @@  kni_ingress(struct kni_port_params *p)
 		}
 		/* Burst tx to kni */
 		num = rte_kni_tx_burst(p->kni[i], pkts_burst, nb_rx);
-		kni_stats[port_id].rx_packets += num;
+		if (num)
+			kni_stats[port_id].rx_packets += num;
 
 		rte_kni_handle_request(p->kni[i]);
 		if (unlikely(num < nb_rx)) {
@@ -260,7 +261,8 @@  kni_egress(struct kni_port_params *p)
 		}
 		/* Burst tx to eth */
 		nb_tx = rte_eth_tx_burst(port_id, 0, pkts_burst, (uint16_t)num);
-		kni_stats[port_id].tx_packets += nb_tx;
+		if (nb_tx)
+			kni_stats[port_id].tx_packets += nb_tx;
 		if (unlikely(nb_tx < num)) {
 			/* Free mbufs not tx to NIC */
 			kni_burst_free_mbufs(&pkts_burst[nb_tx], num - nb_tx);