mbox series

[00/12] update and simplify telemetry library.

Message ID 20200319171907.60891-1-ciara.power@intel.com (mailing list archive)
Headers
Series update and simplify telemetry library. |

Message

Ciara Power March 19, 2020, 5:18 p.m. UTC
  This patchset extensively reworks the telemetry library adding new
functionality and simplifying much of the existing code, while
maintaining backward compatibility.

This work is based on the previously sent RFC for a "process info"
library: https://patchwork.dpdk.org/project/dpdk/list/?series=7741
However, rather than creating a new library, this patchset takes
that work and merges it into the existing telemetry library, as
mentioned above.

The telemetry library as shipped in 19.11 is based upon the metrics
library and outputs all statistics based on that as a source. However,
this limits the telemetry output to only port-level statistics
information, rather than allowing it to be used as a general scheme for
telemetry information across all DPDK libraries.

With this patchset applied, rather than the telemetry library being
responsible for pulling ethdev stats and pushing them into the metrics
library for retrieval later, each library e.g. ethdev, rawdev, and even
the metrics library itself (for backwards compatiblity) now handle their
own stats.  Any library or app can register a callback function with
telemetry, which will be called if requested by the client connected via
the telemetry socket. The callback function in the library/app then
formats its stats, or other data, into a JSON string, and returns it to
telemetry to be sent to the client.

To maintain backward compatibility, e.g. to allow the dpdk telemetry
collectd plugin to continue to work, some of the existing telemetry
code is kept, but is moved into the metrics library, and callbacks are
registered with telemetry for the legacy commands that were supported
previously.

The new version of the library, apart from the legacy interface support
for backward compatibility, does not have an external dependency on the
Jansson library, allowing the library to be enabled by default.

Note: In this version of the patchset, telemetry output is provided by
the ethdev, rawdev and eal libraries, but this may be expanded further
in later versions which are planned ahead of the merge deadline for
20.05

Bruce Richardson (5):
  telemetry: invert dependency on metrics
  telemetry: introduce new telemetry functionality
  ethdev: add callback support for telemetry
  usertools: add new telemetry python script
  eal: add eal telemetry callbacks

Ciara Power (7):
  telemetry: move code to metrics for later reuse
  metrics: reduce code taken from telemetry
  rawdev: add callback support for telemetry
  examples/l3fwd-power: enable use of new telemetry
  telemetry: introduce telemetry backward compatibility
  telemetry: remove existing telemetry files
  lib: add telemetry as eal dependency

 config/common_base                            |    2 +-
 examples/l3fwd-power/Makefile                 |    4 +
 examples/l3fwd-power/main.c                   |   62 +-
 examples/l3fwd-power/meson.build              |    4 +
 lib/Makefile                                  |   10 +-
 lib/librte_eal/common/eal_common_options.c    |   79 +
 lib/librte_eal/common/eal_internal_cfg.h      |    1 +
 lib/librte_eal/common/eal_options.h           |    7 +
 lib/librte_eal/freebsd/eal/Makefile           |    1 +
 lib/librte_eal/freebsd/eal/eal.c              |   14 +
 lib/librte_eal/freebsd/eal/meson.build        |    2 +-
 lib/librte_eal/linux/eal/Makefile             |    1 +
 lib/librte_eal/linux/eal/eal.c                |   14 +
 lib/librte_eal/linux/eal/meson.build          |    2 +-
 lib/librte_eal/meson.build                    |    2 +-
 lib/librte_ethdev/Makefile                    |    4 +
 lib/librte_ethdev/meson.build                 |    4 +
 lib/librte_ethdev/rte_ethdev.c                |   79 +
 lib/librte_metrics/Makefile                   |   14 +
 lib/librte_metrics/meson.build                |   10 +
 lib/librte_metrics/rte_metrics.c              |    6 +-
 lib/librte_metrics/rte_metrics.h              |    5 +-
 lib/librte_metrics/rte_metrics_telemetry.c    |  539 +++++
 lib/librte_metrics/rte_metrics_telemetry.h    |   65 +
 lib/librte_metrics/rte_metrics_version.map    |    7 +
 lib/librte_rawdev/Makefile                    |    5 +
 lib/librte_rawdev/meson.build                 |    5 +
 lib/librte_rawdev/rte_rawdev.c                |   88 +
 lib/librte_telemetry/Makefile                 |   12 +-
 lib/librte_telemetry/meson.build              |   18 +-
 lib/librte_telemetry/rte_telemetry.c          | 1895 -----------------
 lib/librte_telemetry/rte_telemetry.h          |   53 +-
 lib/librte_telemetry/rte_telemetry_internal.h |  112 -
 lib/librte_telemetry/rte_telemetry_legacy.h   |   48 +
 lib/librte_telemetry/rte_telemetry_parser.c   |  682 ------
 lib/librte_telemetry/rte_telemetry_parser.h   |   15 -
 .../rte_telemetry_parser_test.c               |  533 -----
 .../rte_telemetry_socket_tests.h              |   36 -
 .../rte_telemetry_version.map                 |    5 +-
 lib/librte_telemetry/telemetry.c              |  290 +++
 lib/librte_telemetry/telemetry_legacy.c       |  226 ++
 lib/meson.build                               |    3 +-
 mk/rte.app.mk                                 |    9 +-
 mk/rte.vars.mk                                |    2 +
 usertools/test_new_telemetry.py               |   30 +
 45 files changed, 1659 insertions(+), 3346 deletions(-)
 create mode 100644 lib/librte_metrics/rte_metrics_telemetry.c
 create mode 100644 lib/librte_metrics/rte_metrics_telemetry.h
 delete mode 100644 lib/librte_telemetry/rte_telemetry.c
 delete mode 100644 lib/librte_telemetry/rte_telemetry_internal.h
 create mode 100644 lib/librte_telemetry/rte_telemetry_legacy.h
 delete mode 100644 lib/librte_telemetry/rte_telemetry_parser.c
 delete mode 100644 lib/librte_telemetry/rte_telemetry_parser.h
 delete mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.c
 delete mode 100644 lib/librte_telemetry/rte_telemetry_socket_tests.h
 create mode 100644 lib/librte_telemetry/telemetry.c
 create mode 100644 lib/librte_telemetry/telemetry_legacy.c
 create mode 100755 usertools/test_new_telemetry.py
  

Comments

David Marchand April 1, 2020, 3:42 p.m. UTC | #1
Hello,


On Thu, Mar 19, 2020 at 6:35 PM Ciara Power <ciara.power@intel.com> wrote:
>
> This patchset extensively reworks the telemetry library adding new
> functionality and simplifying much of the existing code, while
> maintaining backward compatibility.
>
> This work is based on the previously sent RFC for a "process info"
> library: https://patchwork.dpdk.org/project/dpdk/list/?series=7741
> However, rather than creating a new library, this patchset takes
> that work and merges it into the existing telemetry library, as
> mentioned above.
>
> The telemetry library as shipped in 19.11 is based upon the metrics
> library and outputs all statistics based on that as a source. However,
> this limits the telemetry output to only port-level statistics
> information, rather than allowing it to be used as a general scheme for
> telemetry information across all DPDK libraries.
>
> With this patchset applied, rather than the telemetry library being
> responsible for pulling ethdev stats and pushing them into the metrics
> library for retrieval later, each library e.g. ethdev, rawdev, and even
> the metrics library itself (for backwards compatiblity) now handle their
> own stats.  Any library or app can register a callback function with
> telemetry, which will be called if requested by the client connected via
> the telemetry socket. The callback function in the library/app then
> formats its stats, or other data, into a JSON string, and returns it to
> telemetry to be sent to the client.
>
> To maintain backward compatibility, e.g. to allow the dpdk telemetry
> collectd plugin to continue to work, some of the existing telemetry
> code is kept, but is moved into the metrics library, and callbacks are
> registered with telemetry for the legacy commands that were supported
> previously.
>
> The new version of the library, apart from the legacy interface support
> for backward compatibility, does not have an external dependency on the
> Jansson library, allowing the library to be enabled by default.
>
> Note: In this version of the patchset, telemetry output is provided by
> the ethdev, rawdev and eal libraries, but this may be expanded further
> in later versions which are planned ahead of the merge deadline for
> 20.05

This patchset does not apply on current master.
Could you rebase it?

CI reported a build failure for Windows, please have a look.

Is there a reason to keep a separate telemetry library rather than
integrate this framework into EAL?

This series removes the only user of the experimental rte_option API,
which can be removed afaiu.


Thanks.
  
Bruce Richardson April 1, 2020, 4:16 p.m. UTC | #2
On Wed, Apr 01, 2020 at 05:42:21PM +0200, David Marchand wrote:
> Hello,
> 
> 
> On Thu, Mar 19, 2020 at 6:35 PM Ciara Power <ciara.power@intel.com> wrote:
> >
> > This patchset extensively reworks the telemetry library adding new
> > functionality and simplifying much of the existing code, while
> > maintaining backward compatibility.
> >
> > This work is based on the previously sent RFC for a "process info"
> > library: https://patchwork.dpdk.org/project/dpdk/list/?series=7741
> > However, rather than creating a new library, this patchset takes
> > that work and merges it into the existing telemetry library, as
> > mentioned above.
> >
> > The telemetry library as shipped in 19.11 is based upon the metrics
> > library and outputs all statistics based on that as a source. However,
> > this limits the telemetry output to only port-level statistics
> > information, rather than allowing it to be used as a general scheme for
> > telemetry information across all DPDK libraries.
> >
> > With this patchset applied, rather than the telemetry library being
> > responsible for pulling ethdev stats and pushing them into the metrics
> > library for retrieval later, each library e.g. ethdev, rawdev, and even
> > the metrics library itself (for backwards compatiblity) now handle their
> > own stats.  Any library or app can register a callback function with
> > telemetry, which will be called if requested by the client connected via
> > the telemetry socket. The callback function in the library/app then
> > formats its stats, or other data, into a JSON string, and returns it to
> > telemetry to be sent to the client.
> >
> > To maintain backward compatibility, e.g. to allow the dpdk telemetry
> > collectd plugin to continue to work, some of the existing telemetry
> > code is kept, but is moved into the metrics library, and callbacks are
> > registered with telemetry for the legacy commands that were supported
> > previously.
> >
> > The new version of the library, apart from the legacy interface support
> > for backward compatibility, does not have an external dependency on the
> > Jansson library, allowing the library to be enabled by default.
> >
> > Note: In this version of the patchset, telemetry output is provided by
> > the ethdev, rawdev and eal libraries, but this may be expanded further
> > in later versions which are planned ahead of the merge deadline for
> > 20.05
> 
> This patchset does not apply on current master.
> Could you rebase it?
> 
> CI reported a build failure for Windows, please have a look.
> 
Yep, we spotted that and will fix in V2 (which naturally will also be
rebased).

> Is there a reason to keep a separate telemetry library rather than
> integrate this framework into EAL?
> 
No reason this could not be done, however, since telemetry library is
already separate, and EAL is already pretty crowded, I think keeping this
separate might lead to easier maintenance.

However, if people generally prefer it just merged into EAL, that can be
done also.

> This series removes the only user of the experimental rte_option API,
> which can be removed afaiu.
> 
Yep, that's something we spotted and intended to flag for discussion on the
V2 patchset.

My slight concern would be that having extra args for particular additional
libraries is something we might want in the future, so we should think
carefully before removing it. However, since overall I like shorter code,
I'd personally vote in favour of removal. :-)

Regards,
/Bruce
  
Wiles, Keith April 1, 2020, 4:48 p.m. UTC | #3
> On Mar 19, 2020, at 12:18 PM, Ciara Power <ciara.power@intel.com> wrote:
> 
> This patchset extensively reworks the telemetry library adding new
> functionality and simplifying much of the existing code, while
> maintaining backward compatibility.
> 
> This work is based on the previously sent RFC for a "process info"
> library: https://patchwork.dpdk.org/project/dpdk/list/?series=7741
> However, rather than creating a new library, this patchset takes
> that work and merges it into the existing telemetry library, as
> mentioned above.
> 
> The telemetry library as shipped in 19.11 is based upon the metrics
> library and outputs all statistics based on that as a source. However,
> this limits the telemetry output to only port-level statistics
> information, rather than allowing it to be used as a general scheme for
> telemetry information across all DPDK libraries.
> 
> With this patchset applied, rather than the telemetry library being
> responsible for pulling ethdev stats and pushing them into the metrics
> library for retrieval later, each library e.g. ethdev, rawdev, and even
> the metrics library itself (for backwards compatiblity) now handle their
> own stats.  Any library or app can register a callback function with
> telemetry, which will be called if requested by the client connected via
> the telemetry socket. The callback function in the library/app then
> formats its stats, or other data, into a JSON string, and returns it to
> telemetry to be sent to the client.
> 
> To maintain backward compatibility, e.g. to allow the dpdk telemetry
> collectd plugin to continue to work, some of the existing telemetry
> code is kept, but is moved into the metrics library, and callbacks are
> registered with telemetry for the legacy commands that were supported
> previously.
> 
> The new version of the library, apart from the legacy interface support
> for backward compatibility, does not have an external dependency on the
> Jansson library, allowing the library to be enabled by default.
> 
> Note: In this version of the patchset, telemetry output is provided by
> the ethdev, rawdev and eal libraries, but this may be expanded further
> in later versions which are planned ahead of the merge deadline for
> 20.05
> 

I am looking at the patch and noticed we are adding new lines to the JSON output, they are not required. The spaces in the text could be removed as well as it will reduce the amount of data being sent. The output can be put into a JSON formatter to see the data better if needed.

Some required information I believe needs to be added to the patch.
- Command to get the Link status of a port Speed/Duplex/LinkState via a new command ‘/ethdev/linkstate,X’ or I added it to the /ethdev/list command. I also added the total devices and available devices via the DPDK APIs.

{
	"/ethdev/list": {
		"ports": [{
			"port": 0,
			"duplex": "Full",
			"state": "UP",
			"rate": 40000
		}, {
			"port": 1,
			"duplex": "Full",
			"state": "UP",
			"rate": 100000
		}, {
			"port": 2,
			"duplex": "Full",
			"state": "UP",
			"rate": 40000
		}, {
			"port": 3,
			"duplex": "Full",
			"state": "UP",
			"rate": 100000
		}],
		"avail": 4,
		"total": 4
	}
}

The next one is the format of the /ethdev/stats are not very usable as most of the fields are not supported in drivers.

{
	"/ethdev/stats": {
		"rx_good_packets": 0,
		"tx_good_packets": 0,
		"rx_good_bytes": 0,
		"tx_good_bytes": 0,
		"rx_missed_errors": 0,
		"rx_errors": 0,
		"tx_errors": 0,
		"rx_mbuf_allocation_errors": 0,
		"rx_q0packets": 0,
		"rx_q0bytes": 0,
		"rx_q0errors": 0,
		"tx_q0packets": 0,
		"tx_q0bytes": 0,
		"rx_unicast_packets": 0,
		"rx_multicast_packets": 0,
		"rx_broadcast_packets": 0,
		"rx_dropped_packets": 0,
		"rx_unknown_protocol_packets": 5,
		"tx_unicast_packets": 0,
		"tx_multicast_packets": 0,
		"tx_broadcast_packets": 0,
		"tx_dropped_packets": 0,
		"tx_link_down_dropped": 0,
		"rx_crc_errors": 0,
		"rx_illegal_byte_errors": 0,
		"rx_error_bytes": 0,
		"mac_local_errors": 0,
		"mac_remote_errors": 0,
		"rx_length_errors": 0,
		"tx_xon_packets": 0,
		"rx_xon_packets": 0,
		"tx_xoff_packets": 0,
		"rx_xoff_packets": 0,
		"rx_size_64_packets": 0,
		"rx_size_65_to_127_packets": 5,
		"rx_size_128_to_255_packets": 0,
		"rx_size_256_to_511_packets": 0,
		"rx_size_512_to_1023_packets": 0,
		"rx_size_1024_to_1522_packets": 0,
		"rx_size_1523_to_max_packets": 0,
		"rx_undersized_errors": 0,
		"rx_oversize_errors": 0,
		"rx_mac_short_dropped": 0,
		"rx_fragmented_errors": 0,
		"rx_jabber_errors": 0,
		"tx_size_64_packets": 0,
		"tx_size_65_to_127_packets": 6,
		"tx_size_128_to_255_packets": 0,
		"tx_size_256_to_511_packets": 0,
		"tx_size_512_to_1023_packets": 0,
		"tx_size_1024_to_1522_packets": 0,
		"tx_size_1523_to_max_packets": 0,
		"rx_flow_director_atr_match_packets": 0,
		"rx_flow_director_sb_match_packets": 0,
		"tx_low_power_idle_status": 0,
		"rx_low_power_idle_status": 0,
		"tx_low_power_idle_count": 0,
		"rx_low_power_idle_count": 0,
		"rx_priority0_xon_packets": 0,
		"rx_priority1_xon_packets": 0,
		"rx_priority2_xon_packets": 0,
		"rx_priority3_xon_packets": 0,
		"rx_priority4_xon_packets": 0,
		"rx_priority5_xon_packets": 0,
		"rx_priority6_xon_packets": 0,
		"rx_priority7_xon_packets": 0,
		"rx_priority0_xoff_packets": 0,
		"rx_priority1_xoff_packets": 0,
		"rx_priority2_xoff_packets": 0,
		"rx_priority3_xoff_packets": 0,
		"rx_priority4_xoff_packets": 0,
		"rx_priority5_xoff_packets": 0,
		"rx_priority6_xoff_packets": 0,
		"rx_priority7_xoff_packets": 0,
		"tx_priority0_xon_packets": 0,
		"tx_priority1_xon_packets": 0,
		"tx_priority2_xon_packets": 0,
		"tx_priority3_xon_packets": 0,
		"tx_priority4_xon_packets": 0,
		"tx_priority5_xon_packets": 0,
		"tx_priority6_xon_packets": 0,
		"tx_priority7_xon_packets": 0,
		"tx_priority0_xoff_packets": 0,
		"tx_priority1_xoff_packets": 0,
		"tx_priority2_xoff_packets": 0,
		"tx_priority3_xoff_packets": 0,
		"tx_priority4_xoff_packets": 0,
		"tx_priority5_xoff_packets": 0,
		"tx_priority6_xoff_packets": 0,
		"tx_priority7_xoff_packets": 0,
		"tx_priority0_xon_to_xoff_packets": 0,
		"tx_priority1_xon_to_xoff_packets": 0,
		"tx_priority2_xon_to_xoff_packets": 0,
		"tx_priority3_xon_to_xoff_packets": 0,
		"tx_priority4_xon_to_xoff_packets": 0,
		"tx_priority5_xon_to_xoff_packets": 0,
		"tx_priority6_xon_to_xoff_packets": 0,
		"tx_priority7_xon_to_xoff_packets": 0
	}
}

Most of the fields above are device dependent and very difficult to this data from device to device. The array based information like queue stats needs to be an array in json. The packet size counters are great, but not all of the ports support hardware versions. I would like to see us add these to all PMDs even if we have to do them in software and then added to the basic /ethdev/stats output.

I create a simpler layout with just the basic values from the dpdk stats get API. Even the queue data is not valid for most ports, but I hope we can add software support to drivers that do not have hardware queue stats. The following is much easier to parse and use then the above format. Note I added the portid value to structure. I used my own names for the fields, just easier to type, but we can use the ones above.
{
    “/ethdev/stats”: {
	"portid": 0,
	"ipackets": 0,
	"opackets": 0,
	"ibytes": 0,
	"obytes": 0,
	"imissed": 0,
	"ierrors": 0,
	"oerrors": 0,
	"rx_nombuf": 0,
	"q_ipackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
	"q_opackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
	"q_ibytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
	"q_obytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
	"q_errors": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    }
}

If we need device specific data then we need to create a new command to dump this data instead e.g. ‘/ethdev/xstats,X'.

—Keith
  
Morten Brørup April 2, 2020, 8:30 a.m. UTC | #4
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, April 1, 2020 6:16 PM
> 
> On Wed, Apr 01, 2020 at 05:42:21PM +0200, David Marchand wrote:
> > Hello,
> >
> > On Thu, Mar 19, 2020 at 6:35 PM Ciara Power <ciara.power@intel.com>
> wrote:
> > >
> > > This patchset extensively reworks the telemetry library adding new
> > > functionality and simplifying much of the existing code, while
> > > maintaining backward compatibility.

[...]
 
> > Is there a reason to keep a separate telemetry library rather than
> > integrate this framework into EAL?
> >
> No reason this could not be done, however, since telemetry library is
> already separate, and EAL is already pretty crowded, I think keeping
> this
> separate might lead to easier maintenance.
> 
> However, if people generally prefer it just merged into EAL, that can
> be
> done also.
> 

No! EAL is the Environment Abstraction Layer. EAL should only provide the common abstraction interface for the different hardware/hypervisors and operating systems that may lie beneath the DPDK application, and nothing else.

If there is consensus that everyone absolutely needs some feature, which is not an abstraction of the underlying execution environment, it should be in a "Common" (or "Framework" or "Support") library instead.

EAL is already way too bloated.

Take Service Cores for instance: It doesn't provide a shim for the underlying execution environment; it provides a process scheduling framework, which is optional for the DPDK application to use or not.

The same goes for the Trace library. Not a shim, but a framework library.

Logging is even worse: It logs to a file, but if it was truly an environment abstraction, it would log to the Event Log on Windows. In other words... Logging is not implemented as an environment abstraction, but at the preference of its implementer. I would consider it a core/framework library, not an EAL library.


Med venlig hilsen / kind regards
- Morten Brørup
  
Thomas Monjalon April 2, 2020, 9:38 a.m. UTC | #5
02/04/2020 10:30, Morten Brørup:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Wednesday, April 1, 2020 6:16 PM
> > 
> > On Wed, Apr 01, 2020 at 05:42:21PM +0200, David Marchand wrote:
> > > Hello,
> > >
> > > On Thu, Mar 19, 2020 at 6:35 PM Ciara Power <ciara.power@intel.com>
> > wrote:
> > > >
> > > > This patchset extensively reworks the telemetry library adding new
> > > > functionality and simplifying much of the existing code, while
> > > > maintaining backward compatibility.
> 
> [...]
>  
> > > Is there a reason to keep a separate telemetry library rather than
> > > integrate this framework into EAL?
> > >
> > No reason this could not be done, however, since telemetry library is
> > already separate, and EAL is already pretty crowded, I think keeping
> > this
> > separate might lead to easier maintenance.
> > 
> > However, if people generally prefer it just merged into EAL, that can
> > be
> > done also.
> > 
> 
> No! EAL is the Environment Abstraction Layer. EAL should only provide the common abstraction interface for the different hardware/hypervisors and operating systems that may lie beneath the DPDK application, and nothing else.
> 
> If there is consensus that everyone absolutely needs some feature, which is not an abstraction of the underlying execution environment, it should be in a "Common" (or "Framework" or "Support") library instead.
> 
> EAL is already way too bloated.

I agree. We should move some features from EAL to a separate library.

> Take Service Cores for instance: It doesn't provide a shim for the underlying execution environment; it provides a process scheduling framework, which is optional for the DPDK application to use or not.
> 
> The same goes for the Trace library. Not a shim, but a framework library.
> 
> Logging is even worse: It logs to a file, but if it was truly an environment abstraction, it would log to the Event Log on Windows. In other words... Logging is not implemented as an environment abstraction, but at the preference of its implementer. I would consider it a core/framework library, not an EAL library.

Logging should be an OS abstraction, yes.
So logging must stay in EAL.
  
Bruce Richardson April 2, 2020, 10:09 a.m. UTC | #6
On Wed, Apr 01, 2020 at 04:48:21PM +0000, Wiles, Keith wrote:
> 
> 
> > On Mar 19, 2020, at 12:18 PM, Ciara Power <ciara.power@intel.com>
> > wrote:
> > 
> > This patchset extensively reworks the telemetry library adding new
> > functionality and simplifying much of the existing code, while
> > maintaining backward compatibility.
> > 
> > This work is based on the previously sent RFC for a "process info"
> > library: https://patchwork.dpdk.org/project/dpdk/list/?series=7741
> > However, rather than creating a new library, this patchset takes that
> > work and merges it into the existing telemetry library, as mentioned
> > above.
> > 
> > The telemetry library as shipped in 19.11 is based upon the metrics
> > library and outputs all statistics based on that as a source. However,
> > this limits the telemetry output to only port-level statistics
> > information, rather than allowing it to be used as a general scheme for
> > telemetry information across all DPDK libraries.
> > 
> > With this patchset applied, rather than the telemetry library being
> > responsible for pulling ethdev stats and pushing them into the metrics
> > library for retrieval later, each library e.g. ethdev, rawdev, and even
> > the metrics library itself (for backwards compatiblity) now handle
> > their own stats.  Any library or app can register a callback function
> > with telemetry, which will be called if requested by the client
> > connected via the telemetry socket. The callback function in the
> > library/app then formats its stats, or other data, into a JSON string,
> > and returns it to telemetry to be sent to the client.
> > 
> > To maintain backward compatibility, e.g. to allow the dpdk telemetry
> > collectd plugin to continue to work, some of the existing telemetry
> > code is kept, but is moved into the metrics library, and callbacks are
> > registered with telemetry for the legacy commands that were supported
> > previously.
> > 
> > The new version of the library, apart from the legacy interface support
> > for backward compatibility, does not have an external dependency on the
> > Jansson library, allowing the library to be enabled by default.
> > 
> > Note: In this version of the patchset, telemetry output is provided by
> > the ethdev, rawdev and eal libraries, but this may be expanded further
> > in later versions which are planned ahead of the merge deadline for
> > 20.05
> > 
> 
Thanks for the feedback Keith, agree with all your points. Some further
comments inline below. 

NOTE: We hope to send out a V2 very soon, so not all feedback about new
commands may make it into that V2, but they should be able to be added
later - either in V3 or subsequent patch. [We'll try and get at least some
of them in for V2, though.]

/Bruce

> I am looking at the patch and noticed we are adding new lines to the JSON output, they are not required. The spaces in the text could be removed as well as it will reduce the amount of data being sent. The output can be put into a JSON formatter to see the data better if needed.
> 

The output from the new telemetry should not include any spaces. The
whitespace in the output of the script shown comes from python, since it
uses the json python library to parse (and therefore verify as correct
json) the received info, and then uses json.dumps to print it out.

+        reply = json.loads(fd.recv(1024 * 12).decode())
+        print(json.dumps(reply))

You can tweak the format you'd like the output displayed in by adding some
more paramters to the json.dumps() call.

As for the legacy support, it will include some whitespace as we have kept
the output byte-for-byte idential as far as possible to avoid any
compatibilty issues.

> Some required information I believe needs to be added to the patch.
> - Command to get the Link status of a port Speed/Duplex/LinkState via a new command ‘/ethdev/linkstate,X’ or I added it to the /ethdev/list command. I also added the total devices and available devices via the DPDK APIs.
> 

I think a linkstate command is a good idea. We'll see about adding it.

> {
> 	"/ethdev/list": {
> 		"ports": [{
> 			"port": 0,
> 			"duplex": "Full",
> 			"state": "UP",
> 			"rate": 40000
> 		}, {
> 			"port": 1,
> 			"duplex": "Full",
> 			"state": "UP",
> 			"rate": 100000
> 		}, {
> 			"port": 2,
> 			"duplex": "Full",
> 			"state": "UP",
> 			"rate": 40000
> 		}, {
> 			"port": 3,
> 			"duplex": "Full",
> 			"state": "UP",
> 			"rate": 100000
> 		}],
> 		"avail": 4,
> 		"total": 4
> 	}
> }
> 
> The next one is the format of the /ethdev/stats are not very usable as most of the fields are not supported in drivers.
> 
Yes, it's a problem. For V2 we are going to rename this to xstats, so that
the standard ethdev stats can be added as a separate command. We did xstats
originally because that was what was done previously in telemetry. We have
planned to add standard stats in future.

> {
> 	"/ethdev/stats": {
> 		"rx_good_packets": 0,
> 		"tx_good_packets": 0,
> 		"rx_good_bytes": 0,
> 		"tx_good_bytes": 0,
> 		"rx_missed_errors": 0,
> 		"rx_errors": 0,
> 		"tx_errors": 0,
> 		"rx_mbuf_allocation_errors": 0,
> 		"rx_q0packets": 0,
> 		"rx_q0bytes": 0,
> 		"rx_q0errors": 0,
> 		"tx_q0packets": 0,
> 		"tx_q0bytes": 0,
> 		"rx_unicast_packets": 0,
> 		"rx_multicast_packets": 0,
> 		"rx_broadcast_packets": 0,
> 		"rx_dropped_packets": 0,
> 		"rx_unknown_protocol_packets": 5,
> 		"tx_unicast_packets": 0,
> 		"tx_multicast_packets": 0,
> 		"tx_broadcast_packets": 0,
> 		"tx_dropped_packets": 0,
> 		"tx_link_down_dropped": 0,
> 		"rx_crc_errors": 0,
> 		"rx_illegal_byte_errors": 0,
> 		"rx_error_bytes": 0,
> 		"mac_local_errors": 0,
> 		"mac_remote_errors": 0,
> 		"rx_length_errors": 0,
> 		"tx_xon_packets": 0,
> 		"rx_xon_packets": 0,
> 		"tx_xoff_packets": 0,
> 		"rx_xoff_packets": 0,
> 		"rx_size_64_packets": 0,
> 		"rx_size_65_to_127_packets": 5,
> 		"rx_size_128_to_255_packets": 0,
> 		"rx_size_256_to_511_packets": 0,
> 		"rx_size_512_to_1023_packets": 0,
> 		"rx_size_1024_to_1522_packets": 0,
> 		"rx_size_1523_to_max_packets": 0,
> 		"rx_undersized_errors": 0,
> 		"rx_oversize_errors": 0,
> 		"rx_mac_short_dropped": 0,
> 		"rx_fragmented_errors": 0,
> 		"rx_jabber_errors": 0,
> 		"tx_size_64_packets": 0,
> 		"tx_size_65_to_127_packets": 6,
> 		"tx_size_128_to_255_packets": 0,
> 		"tx_size_256_to_511_packets": 0,
> 		"tx_size_512_to_1023_packets": 0,
> 		"tx_size_1024_to_1522_packets": 0,
> 		"tx_size_1523_to_max_packets": 0,
> 		"rx_flow_director_atr_match_packets": 0,
> 		"rx_flow_director_sb_match_packets": 0,
> 		"tx_low_power_idle_status": 0,
> 		"rx_low_power_idle_status": 0,
> 		"tx_low_power_idle_count": 0,
> 		"rx_low_power_idle_count": 0,
> 		"rx_priority0_xon_packets": 0,
> 		"rx_priority1_xon_packets": 0,
> 		"rx_priority2_xon_packets": 0,
> 		"rx_priority3_xon_packets": 0,
> 		"rx_priority4_xon_packets": 0,
> 		"rx_priority5_xon_packets": 0,
> 		"rx_priority6_xon_packets": 0,
> 		"rx_priority7_xon_packets": 0,
> 		"rx_priority0_xoff_packets": 0,
> 		"rx_priority1_xoff_packets": 0,
> 		"rx_priority2_xoff_packets": 0,
> 		"rx_priority3_xoff_packets": 0,
> 		"rx_priority4_xoff_packets": 0,
> 		"rx_priority5_xoff_packets": 0,
> 		"rx_priority6_xoff_packets": 0,
> 		"rx_priority7_xoff_packets": 0,
> 		"tx_priority0_xon_packets": 0,
> 		"tx_priority1_xon_packets": 0,
> 		"tx_priority2_xon_packets": 0,
> 		"tx_priority3_xon_packets": 0,
> 		"tx_priority4_xon_packets": 0,
> 		"tx_priority5_xon_packets": 0,
> 		"tx_priority6_xon_packets": 0,
> 		"tx_priority7_xon_packets": 0,
> 		"tx_priority0_xoff_packets": 0,
> 		"tx_priority1_xoff_packets": 0,
> 		"tx_priority2_xoff_packets": 0,
> 		"tx_priority3_xoff_packets": 0,
> 		"tx_priority4_xoff_packets": 0,
> 		"tx_priority5_xoff_packets": 0,
> 		"tx_priority6_xoff_packets": 0,
> 		"tx_priority7_xoff_packets": 0,
> 		"tx_priority0_xon_to_xoff_packets": 0,
> 		"tx_priority1_xon_to_xoff_packets": 0,
> 		"tx_priority2_xon_to_xoff_packets": 0,
> 		"tx_priority3_xon_to_xoff_packets": 0,
> 		"tx_priority4_xon_to_xoff_packets": 0,
> 		"tx_priority5_xon_to_xoff_packets": 0,
> 		"tx_priority6_xon_to_xoff_packets": 0,
> 		"tx_priority7_xon_to_xoff_packets": 0
> 	}
> }
> 
> Most of the fields above are device dependent and very difficult to this data from device to device. The array based information like queue stats needs to be an array in json. The packet size counters are great, but not all of the ports support hardware versions. I would like to see us add these to all PMDs even if we have to do them in software and then added to the basic /ethdev/stats output.
> 
> I create a simpler layout with just the basic values from the dpdk stats get API. Even the queue data is not valid for most ports, but I hope we can add software support to drivers that do not have hardware queue stats. The following is much easier to parse and use then the above format. Note I added the portid value to structure. I used my own names for the fields, just easier to type, but we can use the ones above.
> {
>     “/ethdev/stats”: {
> 	"portid": 0,
> 	"ipackets": 0,
> 	"opackets": 0,
> 	"ibytes": 0,
> 	"obytes": 0,
> 	"imissed": 0,
> 	"ierrors": 0,
> 	"oerrors": 0,
> 	"rx_nombuf": 0,
> 	"q_ipackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
> 	"q_opackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
> 	"q_ibytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
> 	"q_obytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
> 	"q_errors": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>     }
> }
> 
> If we need device specific data then we need to create a new command to dump this data instead e.g. ‘/ethdev/xstats,X'.

Yep, +1.
Stay tuned for V2! :-)

> 
> —Keith