[v2] ethdev: warn once for buggy applications

Message ID 20211027120144.20485-1-david.marchand@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series [v2] ethdev: warn once for buggy applications |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Functional fail Functional Testing issues
ci/iol-aarch64-unit-testing success Testing PASS

Commit Message

David Marchand Oct. 27, 2021, 12:01 p.m. UTC
  Warning continuously is a pain when developping or if a unit test
is/gets broken.

It could also be a problem if application behaves badly only in some
corner cases and a DoS results of those logs being continuously displayed.

Let's warn once per port and per rx/tx.

Getting such a log is scary, but let's make it more eye catching by
dumping a backtrace with it.

Tested by introducing a bug in testpmd:
 static int
 eth_dev_start_mp(uint16_t port_id)
 {
-       if (is_proc_primary())
+       if (!is_proc_primary())
                return rte_eth_dev_start(port_id);

        return 0;

Then, running a basic null test:
$ ./devtools/test-null.sh
...
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
  enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

lcore 0 called rx_pkt_burst for not ready port 0
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
lcore 0 called rx_pkt_burst for not ready port 1
8: [build/app/dpdk-testpmd() [0x59e839]]
7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
4: [build/app/dpdk-testpmd() [0x65e1be]]
3: [build/app/dpdk-testpmd() [0x65a996]]
2: [build/app/dpdk-testpmd() [0xa6cbc7]]
1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0

Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- removed diff banner in commitlog,
- renamed vars,
- removed unneeded cast on anonymous struct in initialisation construct,
- added check on queue pointer,

---
 lib/ethdev/ethdev_private.c | 63 +++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 9 deletions(-)
  

Comments

Ananyev, Konstantin Oct. 27, 2021, 12:15 p.m. UTC | #1
> Warning continuously is a pain when developping or if a unit test
> is/gets broken.
> 
> It could also be a problem if application behaves badly only in some
> corner cases and a DoS results of those logs being continuously displayed.
> 
> Let's warn once per port and per rx/tx.
> 
> Getting such a log is scary, but let's make it more eye catching by
> dumping a backtrace with it.
> 
> Tested by introducing a bug in testpmd:
>  static int
>  eth_dev_start_mp(uint16_t port_id)
>  {
> -       if (is_proc_primary())
> +       if (!is_proc_primary())
>                 return rte_eth_dev_start(port_id);
> 
>         return 0;
> 
> Then, running a basic null test:
> $ ./devtools/test-null.sh
> ...
> Start automatic packet forwarding
> io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
>   enabled, MP allocation mode: native
> Logical Core 1 (socket 0) forwards packets on 2 streams:
>   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
>   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
> 
> lcore 0 called rx_pkt_burst for not ready port 0
> 8: [build/app/dpdk-testpmd() [0x59e839]]
> 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
> 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
> 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
> 4: [build/app/dpdk-testpmd() [0x65e1be]]
> 3: [build/app/dpdk-testpmd() [0x65a996]]
> 2: [build/app/dpdk-testpmd() [0xa6cbc7]]
> 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
> lcore 0 called rx_pkt_burst for not ready port 1
> 8: [build/app/dpdk-testpmd() [0x59e839]]
> 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
> 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
> 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
> 4: [build/app/dpdk-testpmd() [0x65e1be]]
> 3: [build/app/dpdk-testpmd() [0x65a996]]
> 2: [build/app/dpdk-testpmd() [0xa6cbc7]]
> 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
>   io packet forwarding packets/burst=32
>   nb forwarding cores=1 - nb forwarding ports=2
>   port 0: RX queue number: 1 Tx queue number: 1
>     Rx offloads=0x0 Tx offloads=0x0
> 
> Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Changes since v1:
> - removed diff banner in commitlog,
> - renamed vars,
> - removed unneeded cast on anonymous struct in initialisation construct,
> - added check on queue pointer,
> 
> ---
>  lib/ethdev/ethdev_private.c | 63 +++++++++++++++++++++++++++++++------
>  1 file changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
> index c905c2df6f..8fca20c7d4 100644
> --- a/lib/ethdev/ethdev_private.c
> +++ b/lib/ethdev/ethdev_private.c
> @@ -2,6 +2,7 @@
>   * Copyright(c) 2018 Gaëtan Rivet
>   */
> 
> +#include <rte_debug.h>
>  #include "rte_ethdev.h"
>  #include "ethdev_driver.h"
>  #include "ethdev_private.h"
> @@ -175,22 +176,58 @@ rte_eth_devargs_parse_representor_ports(char *str, void *data)
>  	return str == NULL ? -1 : 0;
>  }
> 
> +struct dummy_queue {
> +	bool rx_warn_once;
> +	bool tx_warn_once;
> +};
> +static struct dummy_queue *dummy_queues_array[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> +static struct dummy_queue per_port_queues[RTE_MAX_ETHPORTS];
> +RTE_INIT(dummy_queue_init)
> +{
> +	uint16_t port_id;
> +
> +	for (port_id = 0; port_id < RTE_DIM(per_port_queues); port_id++) {
> +		unsigned int q;
> +
> +		for (q = 0; q < RTE_DIM(dummy_queues_array[port_id]); q++)
> +			dummy_queues_array[port_id][q] = &per_port_queues[port_id];
> +	}
> +}
> +
>  static uint16_t
> -dummy_eth_rx_burst(__rte_unused void *rxq,
> +dummy_eth_rx_burst(void *rxq,
>  		__rte_unused struct rte_mbuf **rx_pkts,
>  		__rte_unused uint16_t nb_pkts)
>  {
> -	RTE_ETHDEV_LOG(ERR, "rx_pkt_burst for not ready port\n");
> +	struct dummy_queue *queue = rxq;
> +	uintptr_t port_id;
> +
> +	port_id = queue - per_port_queues;
> +	if (port_id < RTE_DIM(per_port_queues) && !queue->rx_warn_once) {
> +		RTE_ETHDEV_LOG(ERR, "lcore %u called rx_pkt_burst for not ready port %"PRIuPTR"\n",
> +			rte_lcore_id(), port_id);
> +		rte_dump_stack();
> +		queue->rx_warn_once = true;
> +	}
>  	rte_errno = ENOTSUP;
>  	return 0;
>  }
> 
>  static uint16_t
> -dummy_eth_tx_burst(__rte_unused void *txq,
> +dummy_eth_tx_burst(void *txq,
>  		__rte_unused struct rte_mbuf **tx_pkts,
>  		__rte_unused uint16_t nb_pkts)
>  {
> -	RTE_ETHDEV_LOG(ERR, "tx_pkt_burst for not ready port\n");
> +	struct dummy_queue *queue = txq;
> +	uintptr_t port_id;
> +
> +	port_id = queue - per_port_queues;
> +	if (port_id < RTE_DIM(per_port_queues) && !queue->tx_warn_once) {
> +		RTE_ETHDEV_LOG(ERR, "lcore %u called tx_pkt_burst for not ready port %"PRIuPTR"\n",
> +			rte_lcore_id(), port_id);
> +		rte_dump_stack();
> +		queue->tx_warn_once = true;
> +	}
>  	rte_errno = ENOTSUP;
>  	return 0;
>  }
> @@ -199,14 +236,22 @@ void
>  eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
>  {
>  	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
> -	static const struct rte_eth_fp_ops dummy_ops = {
> +	uintptr_t port_id = fpo - rte_eth_fp_ops;
> +
> +	per_port_queues[port_id].rx_warn_once = false;
> +	per_port_queues[port_id].tx_warn_once = false;
> +	*fpo = (struct rte_eth_fp_ops) {
>  		.rx_pkt_burst = dummy_eth_rx_burst,
>  		.tx_pkt_burst = dummy_eth_tx_burst,
> -		.rxq = {.data = dummy_data, .clbk = dummy_data,},
> -		.txq = {.data = dummy_data, .clbk = dummy_data,},
> +		.rxq = {
> +			.data = (void **)&dummy_queues_array[port_id],
> +			.clbk = dummy_data,
> +		},
> +		.txq = {
> +			.data = (void **)&dummy_queues_array[port_id],
> +			.clbk = dummy_data,
> +		},
>  	};
> -
> -	*fpo = dummy_ops;
>  }
> 
>  void
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.23.0
  
Thomas Monjalon Oct. 27, 2021, 12:46 p.m. UTC | #2
> > Warning continuously is a pain when developping or if a unit test
> > is/gets broken.
> > 
> > It could also be a problem if application behaves badly only in some
> > corner cases and a DoS results of those logs being continuously displayed.
> > 
> > Let's warn once per port and per rx/tx.
> > 
> > Getting such a log is scary, but let's make it more eye catching by
> > dumping a backtrace with it.
> > 
> > Tested by introducing a bug in testpmd:
> >  static int
> >  eth_dev_start_mp(uint16_t port_id)
> >  {
> > -       if (is_proc_primary())
> > +       if (!is_proc_primary())
> >                 return rte_eth_dev_start(port_id);
> > 
> >         return 0;
> > 
> > Then, running a basic null test:
> > $ ./devtools/test-null.sh
> > ...
> > Start automatic packet forwarding
> > io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
> >   enabled, MP allocation mode: native
> > Logical Core 1 (socket 0) forwards packets on 2 streams:
> >   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
> >   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
> > 
> > lcore 0 called rx_pkt_burst for not ready port 0
> > 8: [build/app/dpdk-testpmd() [0x59e839]]
> > 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
> > 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
> > 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
> > 4: [build/app/dpdk-testpmd() [0x65e1be]]
> > 3: [build/app/dpdk-testpmd() [0x65a996]]
> > 2: [build/app/dpdk-testpmd() [0xa6cbc7]]
> > 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
> > lcore 0 called rx_pkt_burst for not ready port 1
> > 8: [build/app/dpdk-testpmd() [0x59e839]]
> > 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
> > 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
> > 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
> > 4: [build/app/dpdk-testpmd() [0x65e1be]]
> > 3: [build/app/dpdk-testpmd() [0x65a996]]
> > 2: [build/app/dpdk-testpmd() [0xa6cbc7]]
> > 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
> >   io packet forwarding packets/burst=32
> >   nb forwarding cores=1 - nb forwarding ports=2
> >   port 0: RX queue number: 1 Tx queue number: 1
> >     Rx offloads=0x0 Tx offloads=0x0
> > 
> > Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")
> > 
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> > Changes since v1:
> > - removed diff banner in commitlog,
> > - renamed vars,
> > - removed unneeded cast on anonymous struct in initialisation construct,
> > - added check on queue pointer,
> > 
> > ---
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>
  
Ferruh Yigit Oct. 27, 2021, 5:31 p.m. UTC | #3
On 10/27/2021 1:46 PM, Thomas Monjalon wrote:
>>> Warning continuously is a pain when developping or if a unit test
>>> is/gets broken.
>>>
>>> It could also be a problem if application behaves badly only in some
>>> corner cases and a DoS results of those logs being continuously displayed.
>>>
>>> Let's warn once per port and per rx/tx.
>>>
>>> Getting such a log is scary, but let's make it more eye catching by
>>> dumping a backtrace with it.
>>>
>>> Tested by introducing a bug in testpmd:
>>>   static int
>>>   eth_dev_start_mp(uint16_t port_id)
>>>   {
>>> -       if (is_proc_primary())
>>> +       if (!is_proc_primary())
>>>                  return rte_eth_dev_start(port_id);
>>>
>>>          return 0;
>>>
>>> Then, running a basic null test:
>>> $ ./devtools/test-null.sh
>>> ...
>>> Start automatic packet forwarding
>>> io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support
>>>    enabled, MP allocation mode: native
>>> Logical Core 1 (socket 0) forwards packets on 2 streams:
>>>    RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
>>>    RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
>>>
>>> lcore 0 called rx_pkt_burst for not ready port 0
>>> 8: [build/app/dpdk-testpmd() [0x59e839]]
>>> 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
>>> 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
>>> 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
>>> 4: [build/app/dpdk-testpmd() [0x65e1be]]
>>> 3: [build/app/dpdk-testpmd() [0x65a996]]
>>> 2: [build/app/dpdk-testpmd() [0xa6cbc7]]
>>> 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
>>> lcore 0 called rx_pkt_burst for not ready port 1
>>> 8: [build/app/dpdk-testpmd() [0x59e839]]
>>> 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]]
>>> 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]]
>>> 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]]
>>> 4: [build/app/dpdk-testpmd() [0x65e1be]]
>>> 3: [build/app/dpdk-testpmd() [0x65a996]]
>>> 2: [build/app/dpdk-testpmd() [0xa6cbc7]]
>>> 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]]
>>>    io packet forwarding packets/burst=32
>>>    nb forwarding cores=1 - nb forwarding ports=2
>>>    port 0: RX queue number: 1 Tx queue number: 1
>>>      Rx offloads=0x0 Tx offloads=0x0
>>>
>>> Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")
>>>
>>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>>> ---
>>> Changes since v1:
>>> - removed diff banner in commitlog,
>>> - renamed vars,
>>> - removed unneeded cast on anonymous struct in initialisation construct,
>>> - added check on queue pointer,
>>>
>>> ---
>>
>> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
> 
> 

Applied to dpdk-next-net/main, thanks.
  

Patch

diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index c905c2df6f..8fca20c7d4 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -2,6 +2,7 @@ 
  * Copyright(c) 2018 Gaëtan Rivet
  */
 
+#include <rte_debug.h>
 #include "rte_ethdev.h"
 #include "ethdev_driver.h"
 #include "ethdev_private.h"
@@ -175,22 +176,58 @@  rte_eth_devargs_parse_representor_ports(char *str, void *data)
 	return str == NULL ? -1 : 0;
 }
 
+struct dummy_queue {
+	bool rx_warn_once;
+	bool tx_warn_once;
+};
+static struct dummy_queue *dummy_queues_array[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+static struct dummy_queue per_port_queues[RTE_MAX_ETHPORTS];
+RTE_INIT(dummy_queue_init)
+{
+	uint16_t port_id;
+
+	for (port_id = 0; port_id < RTE_DIM(per_port_queues); port_id++) {
+		unsigned int q;
+
+		for (q = 0; q < RTE_DIM(dummy_queues_array[port_id]); q++)
+			dummy_queues_array[port_id][q] = &per_port_queues[port_id];
+	}
+}
+
 static uint16_t
-dummy_eth_rx_burst(__rte_unused void *rxq,
+dummy_eth_rx_burst(void *rxq,
 		__rte_unused struct rte_mbuf **rx_pkts,
 		__rte_unused uint16_t nb_pkts)
 {
-	RTE_ETHDEV_LOG(ERR, "rx_pkt_burst for not ready port\n");
+	struct dummy_queue *queue = rxq;
+	uintptr_t port_id;
+
+	port_id = queue - per_port_queues;
+	if (port_id < RTE_DIM(per_port_queues) && !queue->rx_warn_once) {
+		RTE_ETHDEV_LOG(ERR, "lcore %u called rx_pkt_burst for not ready port %"PRIuPTR"\n",
+			rte_lcore_id(), port_id);
+		rte_dump_stack();
+		queue->rx_warn_once = true;
+	}
 	rte_errno = ENOTSUP;
 	return 0;
 }
 
 static uint16_t
-dummy_eth_tx_burst(__rte_unused void *txq,
+dummy_eth_tx_burst(void *txq,
 		__rte_unused struct rte_mbuf **tx_pkts,
 		__rte_unused uint16_t nb_pkts)
 {
-	RTE_ETHDEV_LOG(ERR, "tx_pkt_burst for not ready port\n");
+	struct dummy_queue *queue = txq;
+	uintptr_t port_id;
+
+	port_id = queue - per_port_queues;
+	if (port_id < RTE_DIM(per_port_queues) && !queue->tx_warn_once) {
+		RTE_ETHDEV_LOG(ERR, "lcore %u called tx_pkt_burst for not ready port %"PRIuPTR"\n",
+			rte_lcore_id(), port_id);
+		rte_dump_stack();
+		queue->tx_warn_once = true;
+	}
 	rte_errno = ENOTSUP;
 	return 0;
 }
@@ -199,14 +236,22 @@  void
 eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
 {
 	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
-	static const struct rte_eth_fp_ops dummy_ops = {
+	uintptr_t port_id = fpo - rte_eth_fp_ops;
+
+	per_port_queues[port_id].rx_warn_once = false;
+	per_port_queues[port_id].tx_warn_once = false;
+	*fpo = (struct rte_eth_fp_ops) {
 		.rx_pkt_burst = dummy_eth_rx_burst,
 		.tx_pkt_burst = dummy_eth_tx_burst,
-		.rxq = {.data = dummy_data, .clbk = dummy_data,},
-		.txq = {.data = dummy_data, .clbk = dummy_data,},
+		.rxq = {
+			.data = (void **)&dummy_queues_array[port_id],
+			.clbk = dummy_data,
+		},
+		.txq = {
+			.data = (void **)&dummy_queues_array[port_id],
+			.clbk = dummy_data,
+		},
 	};
-
-	*fpo = dummy_ops;
 }
 
 void