[RFC,0/6] replace telemetry with process_info

Message ID 20191205173128.64543-1-ciara.power@intel.com (mailing list archive)


Power, Ciara Dec. 5, 2019, 5:31 p.m. UTC
  From: Bruce Richardson <bruce.richardson@intel.com>

This patchset proposes a new library, called "process-info" for now, to
replace the existing telemetry library in DPDK. (Name subject to change
if someone can propose a better one).

The existing telemetry library provides useful capabilities if used:
  - Creates a unix socket on the system to allow external programs
    connect and gather stats about the DPDK process.
  - Supports outputting the xstats for various network cards on the
  - Can be used to output any other information exported to the metrics
    library, e.g. by applications.
  - Uses JSON message format, which is widely supported by other
    languages and systems.
  - Is supported by a plugin to collectd.

However, the library also has some issues and limitations that could be
improved upon:
  - Has a dependency on libjansson for JSON processing, so is disabled
    by default.
  - Tied entirely to the metrics library for statistics.
  - No support for sending non-stats data, e.g. something as simple as
    DPDK version string.
  - All data gathering functions are in the library itself, which also
  - No support for libraries or drivers to present their own
    information via the library.

We therefore propose to keep the good points of the existing library,
but change the way things work to rectify the downsides.
This leads to the following design choices in the library:
  - Keep the existing idea of using a unix socket for connection (just
    simplifying the connection handling).
  - We would like to use JSON format, where possible, but the jansson
    library dependency is problematic. However, creating JSON-encoded
    data is easier than trying to parse JSON in C code, so we can keep
    the JSON output format for processing by e.g. collectd and other
    tools, while simplifying the input to be plain text commands:
	- Commands in this RFC are designed to all start with "/" for
	- Any parameters to the commands, e.g. the specific port to get
	  stats for, are placed after a comma ","
  - Have the library only handle socket creation and input handling.
    All data gathering should be provided by functions external to the
    library registered by other components, e.g. have ethdev library
    provide the function to query NIC xstats, etc.
  - Have the library directly initialized by EAL by default, so that
    unless an app explicitly does not want the support, monitoring is
    available on all DPDK apps.

The obvious question that remains to be answered here is: "why a new
library rather than just fixing the old one?"

The answer is that we have conflicts between the final two design
choices above, which require that the library be built early in the
build as other libraries will call into it to register callbacks, and
the desire to keep backward compatibility e.g. for use with collectd
plugin, which requires the existing library code be kept around and
built - as it is now - at the end of the build process since it calls
into other DPDK libraries. We therefore cannot have one library that
meets both requirements, hence the replacement which allows us to
maintain backward compatibility by just leaving the old lib in place
until e.g. 20.11. 

This is also why the new library is called "process_info", since the
name "telemetry" is already taken. Suggestions for a better name

Bruce Richardson (4):
  process-info: introduce process-info library
  eal: integrate process-info library
  usertools: add process-info python script
  ethdev: add callback support for process-info

Ciara Power (2):
  rawdev: add callback support for process-info
  examples/l3fwd-power: enable use of process-info

 config/common_base                            |   5 +
 examples/l3fwd-power/main.c                   |  83 ++-----
 lib/Makefile                                  |   3 +-
 lib/librte_eal/common/eal_common_options.c    |  75 ++++++
 lib/librte_eal/common/eal_internal_cfg.h      |   1 +
 lib/librte_eal/common/eal_options.h           |   5 +
 lib/librte_eal/freebsd/eal/Makefile           |   1 +
 lib/librte_eal/freebsd/eal/eal.c              |  14 ++
 lib/librte_eal/linux/eal/Makefile             |   1 +
 lib/librte_eal/linux/eal/eal.c                |  15 ++
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_ethdev/Makefile                    |   2 +-
 lib/librte_ethdev/rte_ethdev.c                |  73 ++++++
 lib/librte_process_info/Makefile              |  26 ++
 lib/librte_process_info/meson.build           |   8 +
 lib/librte_process_info/process_info.c        | 231 ++++++++++++++++++
 lib/librte_process_info/rte_process_info.h    |  25 ++
 .../rte_process_info_version.map              |   6 +
 lib/librte_rawdev/Makefile                    |   3 +-
 lib/librte_rawdev/meson.build                 |   1 +
 lib/librte_rawdev/rte_rawdev.c                |  82 +++++++
 lib/meson.build                               |   1 +
 mk/rte.app.mk                                 |   1 +
 usertools/test_new_telemetry.py               |  28 +++
 24 files changed, 630 insertions(+), 62 deletions(-)
 create mode 100644 lib/librte_process_info/Makefile
 create mode 100644 lib/librte_process_info/meson.build
 create mode 100644 lib/librte_process_info/process_info.c
 create mode 100644 lib/librte_process_info/rte_process_info.h
 create mode 100644 lib/librte_process_info/rte_process_info_version.map
 create mode 100755 usertools/test_new_telemetry.py