Message ID | 20181016155802.2067-1-kevin.laatz@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CC2AD1B128; Tue, 16 Oct 2018 17:57:43 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 05BC81B10A for <dev@dpdk.org>; Tue, 16 Oct 2018 17:57:41 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 08:57:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,389,1534834800"; d="scan'208";a="92465317" Received: from silpixa00397517.ir.intel.com (HELO silpixa00397517.ger.corp.intel.com) ([10.237.222.54]) by orsmga003.jf.intel.com with ESMTP; 16 Oct 2018 08:57:38 -0700 From: Kevin Laatz <kevin.laatz@intel.com> To: dev@dpdk.org Cc: harry.van.haaren@intel.com, stephen@networkplumber.org, gaetan.rivet@6wind.com, shreyansh.jain@nxp.com, thomas@monjalon.net, mattias.ronnblom@ericsson.com, bruce.richardson@intel.com, Kevin Laatz <kevin.laatz@intel.com> Date: Tue, 16 Oct 2018 16:57:49 +0100 Message-Id: <20181016155802.2067-1-kevin.laatz@intel.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20181011165837.81030-1-kevin.laatz@intel.com> References: <20181011165837.81030-1-kevin.laatz@intel.com> Subject: [dpdk-dev] [PATCH v5 00/13] introduce telemetry library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
introduce telemetry library
|
|
Message
Kevin Laatz
Oct. 16, 2018, 3:57 p.m. UTC
This patchset introduces a Telemetry library for DPDK Service Assurance. This library provides an easy way to query DPDK Ethdev metrics. The telemetry library provides a method for a service assurance component to retrieve metrics from a DPDK packet forwarding application. Communicating from the service assurance component to DPDK is done using a UNIX domain socket, passing a JSON formatted string. A reply is sent (again a JSON formatted string) of the current DPDK metrics. The telemetry component makes use of the existing rte_metrics library to query values. The values to be transmitted via the telemetry infrastructure must be present in the Metrics library. Currently the ethdev values are pushed to the metrics library, and the queried from there there is an open question on how applications would like this to occur. Currently only ethdev to metrics functionality is implemented, however other subsystems like crypto, eventdev, keepalive etc can use similar mechanisms. Exposing DPDK Telemetry via a socket interface enables service assurance agents like collectd to consume data from DPDK. This is vital for monitoring, fault-detection, and error reporting. A collectd plugin has been created to interact with the DPDK Telemetry component, showing how it can be used in practice. The collectd plugin will be upstreamed to collectd at a later stage. A small python script is provided in ./usertools/telemetry_client.py to quick-start using DPDK Telemetry. Note: Despite opterr being set to 0, --telemetry said to be 'unrecognized' as a startup print. This is a cosmetic issue and will be addressed in the future. --- v2: - Reworked telemetry as part of EAL instead of using vdev (Gaetan) - Refactored rte_telemetry_command (Gaetan) - Added MAINTAINERS file entry (Stephen) - Updated docs to reflect vdev to eal rework - Removed collectd patch from patchset (Thomas) - General code clean up from v1 feedback v3: - Reworked registering with eal and moved to rte_param (Gaetan) - Added BSD implementation for rte_param (Gaetan) - Updated the paths to align with the new runtime file location (Mattias) - Fixed pointer checks to align with the coding style 1.8.1 (Mattias) - Added missing decref's and close's (Mattias) - Fixed runtime issue in Meson (was not recognising flag due to linking) - More general clean up v4: - Added Doxygen comments for rte_param.h (Thomas) - Made eal_get_runtime_dir a public function to use outside of EAL (Thomas) - Reworked telemetry to get path using rte_eal_get_runtime_dir (Thomas) - Fixed checkpatch coding style error v5: - Moved the BUF_SIZE define to fix build (Harry) - Set default config for telemetry to 'n' (Harry) - Improved Doxygen comments (Thomas) - Cleaned up rte_param struct (Thomas) Ciara Power, Brian Archbold and Kevin Laatz (10): telemetry: initial telemetry infrastructure telemetry: add initial connection socket telemetry: add client feature and sockets telemetry: add parser for client socket messages telemetry: update metrics before sending stats telemetry: format json response when sending stats telemetry: add tests for telemetry api telemetry: add ability to disable selftest doc: add telemetry documentation usertools: add client python script for telemetry Kevin Laatz (3): eal: add param register infrastructure eal: make get runtime dir function public build: add dependency on telemetry to apps in meson MAINTAINERS | 5 + app/meson.build | 4 +- app/pdump/meson.build | 2 +- app/proc-info/meson.build | 2 +- app/test-bbdev/meson.build | 2 +- app/test-crypto-perf/meson.build | 2 +- app/test-pmd/meson.build | 2 +- config/common_base | 5 + config/meson.build | 3 + doc/guides/howto/index.rst | 1 + doc/guides/howto/telemetry.rst | 85 + lib/Makefile | 2 + lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/librte_eal/bsdapp/eal/eal.c | 20 +- lib/librte_eal/common/Makefile | 1 + lib/librte_eal/common/eal_filesystem.h | 14 +- lib/librte_eal/common/include/rte_eal.h | 9 + lib/librte_eal/common/include/rte_param.h | 91 ++ lib/librte_eal/common/meson.build | 2 + lib/librte_eal/common/rte_param.c | 47 + lib/librte_eal/linuxapp/eal/Makefile | 1 + lib/librte_eal/linuxapp/eal/eal.c | 20 +- lib/librte_eal/rte_eal_version.map | 2 + lib/librte_telemetry/Makefile | 30 + lib/librte_telemetry/meson.build | 9 + lib/librte_telemetry/rte_telemetry.c | 1810 +++++++++++++++++++++ lib/librte_telemetry/rte_telemetry.h | 48 + lib/librte_telemetry/rte_telemetry_internal.h | 81 + lib/librte_telemetry/rte_telemetry_parser.c | 586 +++++++ lib/librte_telemetry/rte_telemetry_parser.h | 13 + lib/librte_telemetry/rte_telemetry_parser_test.c | 534 ++++++ lib/librte_telemetry/rte_telemetry_parser_test.h | 39 + lib/librte_telemetry/rte_telemetry_socket_tests.h | 36 + lib/librte_telemetry/rte_telemetry_version.map | 7 + lib/meson.build | 3 +- meson.build | 1 + mk/rte.app.mk | 1 + usertools/dpdk-telemetry-client.py | 116 ++ 38 files changed, 3618 insertions(+), 19 deletions(-) create mode 100644 doc/guides/howto/telemetry.rst create mode 100644 lib/librte_eal/common/include/rte_param.h create mode 100644 lib/librte_eal/common/rte_param.c create mode 100644 lib/librte_telemetry/Makefile create mode 100644 lib/librte_telemetry/meson.build create mode 100644 lib/librte_telemetry/rte_telemetry.c create mode 100644 lib/librte_telemetry/rte_telemetry.h create mode 100644 lib/librte_telemetry/rte_telemetry_internal.h create mode 100644 lib/librte_telemetry/rte_telemetry_parser.c create mode 100644 lib/librte_telemetry/rte_telemetry_parser.h create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.c create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.h create mode 100644 lib/librte_telemetry/rte_telemetry_socket_tests.h create mode 100644 lib/librte_telemetry/rte_telemetry_version.map create mode 100644 usertools/dpdk-telemetry-client.py
Comments
Most of the issues I pointed out in v2 of this patchset is still here. On 2018-10-16 17:57, Kevin Laatz wrote: > This patchset introduces a Telemetry library for DPDK Service Assurance. > This library provides an easy way to query DPDK Ethdev metrics. > > The telemetry library provides a method for a service assurance component > to retrieve metrics from a DPDK packet forwarding application. > Communicating from the service assurance component to DPDK is done using a > UNIX domain socket, passing a JSON formatted string. A reply is sent (again > a JSON formatted string) of the current DPDK metrics. > > The telemetry component makes use of the existing rte_metrics library to > query values. The values to be transmitted via the telemetry infrastructure > must be present in the Metrics library. Currently the ethdev values are > pushed to the metrics library, and the queried from there there is an open > question on how applications would like this to occur. Currently only > ethdev to metrics functionality is implemented, however other subsystems > like crypto, eventdev, keepalive etc can use similar mechanisms. > > Exposing DPDK Telemetry via a socket interface enables service assurance > agents like collectd to consume data from DPDK. This is vital for > monitoring, fault-detection, and error reporting. A collectd plugin has > been created to interact with the DPDK Telemetry component, showing how it > can be used in practice. The collectd plugin will be upstreamed to collectd > at a later stage. A small python script is provided in > ./usertools/telemetry_client.py to quick-start using DPDK Telemetry. > > Note: Despite opterr being set to 0, --telemetry said to be 'unrecognized' > as a startup print. This is a cosmetic issue and will be addressed in the > future. > > --- > v2: > - Reworked telemetry as part of EAL instead of using vdev (Gaetan) > - Refactored rte_telemetry_command (Gaetan) > - Added MAINTAINERS file entry (Stephen) > - Updated docs to reflect vdev to eal rework > - Removed collectd patch from patchset (Thomas) > - General code clean up from v1 feedback > > v3: > - Reworked registering with eal and moved to rte_param (Gaetan) > - Added BSD implementation for rte_param (Gaetan) > - Updated the paths to align with the new runtime file location (Mattias) > - Fixed pointer checks to align with the coding style 1.8.1 (Mattias) > - Added missing decref's and close's (Mattias) > - Fixed runtime issue in Meson (was not recognising flag due to linking) > - More general clean up > > v4: > - Added Doxygen comments for rte_param.h (Thomas) > - Made eal_get_runtime_dir a public function to use outside of EAL (Thomas) > - Reworked telemetry to get path using rte_eal_get_runtime_dir (Thomas) > - Fixed checkpatch coding style error > > v5: > - Moved the BUF_SIZE define to fix build (Harry) > - Set default config for telemetry to 'n' (Harry) > - Improved Doxygen comments (Thomas) > - Cleaned up rte_param struct (Thomas) > > Ciara Power, Brian Archbold and Kevin Laatz (10): > telemetry: initial telemetry infrastructure > telemetry: add initial connection socket > telemetry: add client feature and sockets > telemetry: add parser for client socket messages > telemetry: update metrics before sending stats > telemetry: format json response when sending stats > telemetry: add tests for telemetry api > telemetry: add ability to disable selftest > doc: add telemetry documentation > usertools: add client python script for telemetry > > Kevin Laatz (3): > eal: add param register infrastructure > eal: make get runtime dir function public > build: add dependency on telemetry to apps in meson > > MAINTAINERS | 5 + > app/meson.build | 4 +- > app/pdump/meson.build | 2 +- > app/proc-info/meson.build | 2 +- > app/test-bbdev/meson.build | 2 +- > app/test-crypto-perf/meson.build | 2 +- > app/test-pmd/meson.build | 2 +- > config/common_base | 5 + > config/meson.build | 3 + > doc/guides/howto/index.rst | 1 + > doc/guides/howto/telemetry.rst | 85 + > lib/Makefile | 2 + > lib/librte_eal/bsdapp/eal/Makefile | 1 + > lib/librte_eal/bsdapp/eal/eal.c | 20 +- > lib/librte_eal/common/Makefile | 1 + > lib/librte_eal/common/eal_filesystem.h | 14 +- > lib/librte_eal/common/include/rte_eal.h | 9 + > lib/librte_eal/common/include/rte_param.h | 91 ++ > lib/librte_eal/common/meson.build | 2 + > lib/librte_eal/common/rte_param.c | 47 + > lib/librte_eal/linuxapp/eal/Makefile | 1 + > lib/librte_eal/linuxapp/eal/eal.c | 20 +- > lib/librte_eal/rte_eal_version.map | 2 + > lib/librte_telemetry/Makefile | 30 + > lib/librte_telemetry/meson.build | 9 + > lib/librte_telemetry/rte_telemetry.c | 1810 +++++++++++++++++++++ > lib/librte_telemetry/rte_telemetry.h | 48 + > lib/librte_telemetry/rte_telemetry_internal.h | 81 + > lib/librte_telemetry/rte_telemetry_parser.c | 586 +++++++ > lib/librte_telemetry/rte_telemetry_parser.h | 13 + > lib/librte_telemetry/rte_telemetry_parser_test.c | 534 ++++++ > lib/librte_telemetry/rte_telemetry_parser_test.h | 39 + > lib/librte_telemetry/rte_telemetry_socket_tests.h | 36 + > lib/librte_telemetry/rte_telemetry_version.map | 7 + > lib/meson.build | 3 +- > meson.build | 1 + > mk/rte.app.mk | 1 + > usertools/dpdk-telemetry-client.py | 116 ++ > 38 files changed, 3618 insertions(+), 19 deletions(-) > create mode 100644 doc/guides/howto/telemetry.rst > create mode 100644 lib/librte_eal/common/include/rte_param.h > create mode 100644 lib/librte_eal/common/rte_param.c > create mode 100644 lib/librte_telemetry/Makefile > create mode 100644 lib/librte_telemetry/meson.build > create mode 100644 lib/librte_telemetry/rte_telemetry.c > create mode 100644 lib/librte_telemetry/rte_telemetry.h > create mode 100644 lib/librte_telemetry/rte_telemetry_internal.h > create mode 100644 lib/librte_telemetry/rte_telemetry_parser.c > create mode 100644 lib/librte_telemetry/rte_telemetry_parser.h > create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.c > create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.h > create mode 100644 lib/librte_telemetry/rte_telemetry_socket_tests.h > create mode 100644 lib/librte_telemetry/rte_telemetry_version.map > create mode 100644 usertools/dpdk-telemetry-client.py >
Hi Mattias, On 18/10/2018 09:07, Mattias Rönnblom wrote: > Most of the issues I pointed out in v2 of this patchset is still here. Will recheck feedback for the next version. With regards to comments on v2, 3/10, could you please help provide clarification on the below? <Copy from the v2 feedback> On 03/10/2018 20:06, Mattias Rönnblom wrote: > On 2018-10-03 19:36, Kevin Laatz wrote: >> From: Ciara Power <ciara.power@intel.com> >> + >> + if (!telemetry->request_client) { >> + TELEMETRY_LOG_ERR("No client has been chosen to write to"); >> + return -1; >> + } > + >> + if (!json_string) { >> + TELEMETRY_LOG_ERR("Invalid JSON string!"); >> + return -1; >> + } >> + >> + ret = send(telemetry->request_client->fd, >> + json_string, strlen(json_string), 0); > > How would this code handle a partial success (as in, for example, half > of the string fits the socket buffer)? In not, maybe switching over to > a SOCK_SEQPACKET AF_UNIX socket would be the best way around it. > Is the suggestion here simply to use socket(AF_UNIX, SOCK_SEQPACKET) instead of (AF_UNIX, SOCK_STREAM) ? > >> + >> + buffer_read = read(telemetry->accept_fd, buf, BUF_SIZE-1); > > This and the below code seem to assume that read() returns one and > only one message, but on a SOCK_STREAM, there is no such thing as a > message. It's a byte stream, and you need to provide your own framing, > or have an application protocol which allows only have one outstanding > request. If you do the latter, you still need to allow for "short" > (partial) read()s (i.e. re-read() until done). > Will the above solve this part of the problem too? Best regards, Kevin
On 2018-10-19 12:16, Laatz, Kevin wrote: > On 03/10/2018 20:06, Mattias Rönnblom wrote: >> On 2018-10-03 19:36, Kevin Laatz wrote: >>> From: Ciara Power <ciara.power@intel.com> >>> + >>> + if (!telemetry->request_client) { >>> + TELEMETRY_LOG_ERR("No client has been chosen to write to"); >>> + return -1; >>> + } > + >>> + if (!json_string) { >>> + TELEMETRY_LOG_ERR("Invalid JSON string!"); >>> + return -1; >>> + } >>> + >>> + ret = send(telemetry->request_client->fd, >>> + json_string, strlen(json_string), 0); >> >> How would this code handle a partial success (as in, for example, half >> of the string fits the socket buffer)? In not, maybe switching over to >> a SOCK_SEQPACKET AF_UNIX socket would be the best way around it. >> > Is the suggestion here simply to use socket(AF_UNIX, SOCK_SEQPACKET) > instead of (AF_UNIX, SOCK_STREAM) ? That would be the most straight-forward to the problem, I think. Linux' SOCK_SEQPACKET implementation has problems with really large messages (since a per-message linear buffer is allocated), but I'm guessing these messages are not in the hundreds-of-kb range. >> >>> + >>> + buffer_read = read(telemetry->accept_fd, buf, BUF_SIZE-1); >> >> This and the below code seem to assume that read() returns one and >> only one message, but on a SOCK_STREAM, there is no such thing as a >> message. It's a byte stream, and you need to provide your own framing, >> or have an application protocol which allows only have one outstanding >> request. If you do the latter, you still need to allow for "short" >> (partial) read()s (i.e. re-read() until done). >> > Will the above solve this part of the problem too? > Yes. The kernel will delivered the application (at most) one message, in its entirety.
Hi Mattias, Thanks for the input and clarification. I will include these changes in the v6. Regards, Kevin On 22/10/2018 08:11, Mattias Rönnblom wrote: > On 2018-10-19 12:16, Laatz, Kevin wrote: >> On 03/10/2018 20:06, Mattias Rönnblom wrote: >>> On 2018-10-03 19:36, Kevin Laatz wrote: >>>> From: Ciara Power <ciara.power@intel.com> >>>> + >>>> + if (!telemetry->request_client) { >>>> + TELEMETRY_LOG_ERR("No client has been chosen to write to"); >>>> + return -1; >>>> + } > + >>>> + if (!json_string) { >>>> + TELEMETRY_LOG_ERR("Invalid JSON string!"); >>>> + return -1; >>>> + } >>>> + >>>> + ret = send(telemetry->request_client->fd, >>>> + json_string, strlen(json_string), 0); >>> >>> How would this code handle a partial success (as in, for example, >>> half of the string fits the socket buffer)? In not, maybe switching >>> over to a SOCK_SEQPACKET AF_UNIX socket would be the best way around >>> it. >>> >> Is the suggestion here simply to use socket(AF_UNIX, SOCK_SEQPACKET) >> instead of (AF_UNIX, SOCK_STREAM) ? > > That would be the most straight-forward to the problem, I think. > Linux' SOCK_SEQPACKET implementation has problems with really large > messages (since a per-message linear buffer is allocated), but I'm > guessing these messages are not in the hundreds-of-kb range. > >>> >>>> + >>>> + buffer_read = read(telemetry->accept_fd, buf, BUF_SIZE-1); >>> >>> This and the below code seem to assume that read() returns one and >>> only one message, but on a SOCK_STREAM, there is no such thing as a >>> message. It's a byte stream, and you need to provide your own >>> framing, or have an application protocol which allows only have one >>> outstanding request. If you do the latter, you still need to allow >>> for "short" (partial) read()s (i.e. re-read() until done). >>> >> Will the above solve this part of the problem too? >> > > Yes. The kernel will delivered the application (at most) one message, > in its entirety.