mbox series

[v5,00/13] introduce telemetry library

Message ID 20181016155802.2067-1-kevin.laatz@intel.com (mailing list archive)
Headers
Series introduce telemetry library |

Message

Kevin Laatz Oct. 16, 2018, 3:57 p.m. UTC
  This patchset introduces a Telemetry library for DPDK Service Assurance.
This library provides an easy way to query DPDK Ethdev metrics.

The telemetry library provides a method for a service assurance component
to retrieve metrics from a DPDK packet forwarding application.
Communicating from the service assurance component to DPDK is done using a
UNIX domain socket, passing a JSON formatted string. A reply is sent (again
a JSON formatted string) of the current DPDK metrics.

The telemetry component makes use of the existing rte_metrics library to
query values. The values to be transmitted via the telemetry infrastructure
must be present in the Metrics library. Currently the ethdev values are
pushed to the metrics library, and the queried from there  there is an open
question on how applications would like this to occur. Currently only
ethdev to metrics functionality is implemented, however other subsystems
like crypto, eventdev, keepalive etc can use similar mechanisms.

Exposing DPDK Telemetry via a socket interface enables service assurance
agents like collectd to consume data from DPDK. This is vital for
monitoring, fault-detection, and error reporting. A collectd plugin has
been created to interact with the DPDK Telemetry component, showing how it
can be used in practice. The collectd plugin will be upstreamed to collectd
at a later stage.  A small python script is provided in
./usertools/telemetry_client.py to quick-start using DPDK Telemetry.

Note: Despite opterr being set to 0, --telemetry said to be 'unrecognized'
as a startup print. This is a cosmetic issue and will be addressed in the
future.

---
v2:
   - Reworked telemetry as part of EAL instead of using vdev (Gaetan)
   - Refactored rte_telemetry_command (Gaetan)
   - Added MAINTAINERS file entry (Stephen)
   - Updated docs to reflect vdev to eal rework
   - Removed collectd patch from patchset (Thomas)
   - General code clean up from v1 feedback

v3:
  - Reworked registering with eal and moved to rte_param (Gaetan)
  - Added BSD implementation for rte_param (Gaetan)
  - Updated the paths to align with the new runtime file location (Mattias)
  - Fixed pointer checks to align with the coding style 1.8.1 (Mattias)
  - Added missing decref's and close's (Mattias)
  - Fixed runtime issue in Meson (was not recognising flag due to linking)
  - More general clean up

v4:
  - Added Doxygen comments for rte_param.h (Thomas)
  - Made eal_get_runtime_dir a public function to use outside of EAL (Thomas)
  - Reworked telemetry to get path using rte_eal_get_runtime_dir (Thomas)
  - Fixed checkpatch coding style error

v5:
  - Moved the BUF_SIZE define to fix build (Harry)
  - Set default config for telemetry to 'n' (Harry)
  - Improved Doxygen comments (Thomas)
  - Cleaned up rte_param struct (Thomas)

Ciara Power, Brian Archbold and Kevin Laatz (10):
  telemetry: initial telemetry infrastructure
  telemetry: add initial connection socket
  telemetry: add client feature and sockets
  telemetry: add parser for client socket messages
  telemetry: update metrics before sending stats
  telemetry: format json response when sending stats
  telemetry: add tests for telemetry api
  telemetry: add ability to disable selftest
  doc: add telemetry documentation
  usertools: add client python script for telemetry

Kevin Laatz (3):
  eal: add param register infrastructure
  eal: make get runtime dir function public
  build: add dependency on telemetry to apps in meson

 MAINTAINERS                                       |    5 +
 app/meson.build                                   |    4 +-
 app/pdump/meson.build                             |    2 +-
 app/proc-info/meson.build                         |    2 +-
 app/test-bbdev/meson.build                        |    2 +-
 app/test-crypto-perf/meson.build                  |    2 +-
 app/test-pmd/meson.build                          |    2 +-
 config/common_base                                |    5 +
 config/meson.build                                |    3 +
 doc/guides/howto/index.rst                        |    1 +
 doc/guides/howto/telemetry.rst                    |   85 +
 lib/Makefile                                      |    2 +
 lib/librte_eal/bsdapp/eal/Makefile                |    1 +
 lib/librte_eal/bsdapp/eal/eal.c                   |   20 +-
 lib/librte_eal/common/Makefile                    |    1 +
 lib/librte_eal/common/eal_filesystem.h            |   14 +-
 lib/librte_eal/common/include/rte_eal.h           |    9 +
 lib/librte_eal/common/include/rte_param.h         |   91 ++
 lib/librte_eal/common/meson.build                 |    2 +
 lib/librte_eal/common/rte_param.c                 |   47 +
 lib/librte_eal/linuxapp/eal/Makefile              |    1 +
 lib/librte_eal/linuxapp/eal/eal.c                 |   20 +-
 lib/librte_eal/rte_eal_version.map                |    2 +
 lib/librte_telemetry/Makefile                     |   30 +
 lib/librte_telemetry/meson.build                  |    9 +
 lib/librte_telemetry/rte_telemetry.c              | 1810 +++++++++++++++++++++
 lib/librte_telemetry/rte_telemetry.h              |   48 +
 lib/librte_telemetry/rte_telemetry_internal.h     |   81 +
 lib/librte_telemetry/rte_telemetry_parser.c       |  586 +++++++
 lib/librte_telemetry/rte_telemetry_parser.h       |   13 +
 lib/librte_telemetry/rte_telemetry_parser_test.c  |  534 ++++++
 lib/librte_telemetry/rte_telemetry_parser_test.h  |   39 +
 lib/librte_telemetry/rte_telemetry_socket_tests.h |   36 +
 lib/librte_telemetry/rte_telemetry_version.map    |    7 +
 lib/meson.build                                   |    3 +-
 meson.build                                       |    1 +
 mk/rte.app.mk                                     |    1 +
 usertools/dpdk-telemetry-client.py                |  116 ++
 38 files changed, 3618 insertions(+), 19 deletions(-)
 create mode 100644 doc/guides/howto/telemetry.rst
 create mode 100644 lib/librte_eal/common/include/rte_param.h
 create mode 100644 lib/librte_eal/common/rte_param.c
 create mode 100644 lib/librte_telemetry/Makefile
 create mode 100644 lib/librte_telemetry/meson.build
 create mode 100644 lib/librte_telemetry/rte_telemetry.c
 create mode 100644 lib/librte_telemetry/rte_telemetry.h
 create mode 100644 lib/librte_telemetry/rte_telemetry_internal.h
 create mode 100644 lib/librte_telemetry/rte_telemetry_parser.c
 create mode 100644 lib/librte_telemetry/rte_telemetry_parser.h
 create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.c
 create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.h
 create mode 100644 lib/librte_telemetry/rte_telemetry_socket_tests.h
 create mode 100644 lib/librte_telemetry/rte_telemetry_version.map
 create mode 100644 usertools/dpdk-telemetry-client.py
  

Comments

Mattias Rönnblom Oct. 18, 2018, 8:07 a.m. UTC | #1
Most of the issues I pointed out in v2 of this patchset is still here.

On 2018-10-16 17:57, Kevin Laatz wrote:
> This patchset introduces a Telemetry library for DPDK Service Assurance.
> This library provides an easy way to query DPDK Ethdev metrics.
> 
> The telemetry library provides a method for a service assurance component
> to retrieve metrics from a DPDK packet forwarding application.
> Communicating from the service assurance component to DPDK is done using a
> UNIX domain socket, passing a JSON formatted string. A reply is sent (again
> a JSON formatted string) of the current DPDK metrics.
> 
> The telemetry component makes use of the existing rte_metrics library to
> query values. The values to be transmitted via the telemetry infrastructure
> must be present in the Metrics library. Currently the ethdev values are
> pushed to the metrics library, and the queried from there  there is an open
> question on how applications would like this to occur. Currently only
> ethdev to metrics functionality is implemented, however other subsystems
> like crypto, eventdev, keepalive etc can use similar mechanisms.
> 
> Exposing DPDK Telemetry via a socket interface enables service assurance
> agents like collectd to consume data from DPDK. This is vital for
> monitoring, fault-detection, and error reporting. A collectd plugin has
> been created to interact with the DPDK Telemetry component, showing how it
> can be used in practice. The collectd plugin will be upstreamed to collectd
> at a later stage.  A small python script is provided in
> ./usertools/telemetry_client.py to quick-start using DPDK Telemetry.
> 
> Note: Despite opterr being set to 0, --telemetry said to be 'unrecognized'
> as a startup print. This is a cosmetic issue and will be addressed in the
> future.
> 
> ---
> v2:
>     - Reworked telemetry as part of EAL instead of using vdev (Gaetan)
>     - Refactored rte_telemetry_command (Gaetan)
>     - Added MAINTAINERS file entry (Stephen)
>     - Updated docs to reflect vdev to eal rework
>     - Removed collectd patch from patchset (Thomas)
>     - General code clean up from v1 feedback
> 
> v3:
>    - Reworked registering with eal and moved to rte_param (Gaetan)
>    - Added BSD implementation for rte_param (Gaetan)
>    - Updated the paths to align with the new runtime file location (Mattias)
>    - Fixed pointer checks to align with the coding style 1.8.1 (Mattias)
>    - Added missing decref's and close's (Mattias)
>    - Fixed runtime issue in Meson (was not recognising flag due to linking)
>    - More general clean up
> 
> v4:
>    - Added Doxygen comments for rte_param.h (Thomas)
>    - Made eal_get_runtime_dir a public function to use outside of EAL (Thomas)
>    - Reworked telemetry to get path using rte_eal_get_runtime_dir (Thomas)
>    - Fixed checkpatch coding style error
> 
> v5:
>    - Moved the BUF_SIZE define to fix build (Harry)
>    - Set default config for telemetry to 'n' (Harry)
>    - Improved Doxygen comments (Thomas)
>    - Cleaned up rte_param struct (Thomas)
> 
> Ciara Power, Brian Archbold and Kevin Laatz (10):
>    telemetry: initial telemetry infrastructure
>    telemetry: add initial connection socket
>    telemetry: add client feature and sockets
>    telemetry: add parser for client socket messages
>    telemetry: update metrics before sending stats
>    telemetry: format json response when sending stats
>    telemetry: add tests for telemetry api
>    telemetry: add ability to disable selftest
>    doc: add telemetry documentation
>    usertools: add client python script for telemetry
> 
> Kevin Laatz (3):
>    eal: add param register infrastructure
>    eal: make get runtime dir function public
>    build: add dependency on telemetry to apps in meson
> 
>   MAINTAINERS                                       |    5 +
>   app/meson.build                                   |    4 +-
>   app/pdump/meson.build                             |    2 +-
>   app/proc-info/meson.build                         |    2 +-
>   app/test-bbdev/meson.build                        |    2 +-
>   app/test-crypto-perf/meson.build                  |    2 +-
>   app/test-pmd/meson.build                          |    2 +-
>   config/common_base                                |    5 +
>   config/meson.build                                |    3 +
>   doc/guides/howto/index.rst                        |    1 +
>   doc/guides/howto/telemetry.rst                    |   85 +
>   lib/Makefile                                      |    2 +
>   lib/librte_eal/bsdapp/eal/Makefile                |    1 +
>   lib/librte_eal/bsdapp/eal/eal.c                   |   20 +-
>   lib/librte_eal/common/Makefile                    |    1 +
>   lib/librte_eal/common/eal_filesystem.h            |   14 +-
>   lib/librte_eal/common/include/rte_eal.h           |    9 +
>   lib/librte_eal/common/include/rte_param.h         |   91 ++
>   lib/librte_eal/common/meson.build                 |    2 +
>   lib/librte_eal/common/rte_param.c                 |   47 +
>   lib/librte_eal/linuxapp/eal/Makefile              |    1 +
>   lib/librte_eal/linuxapp/eal/eal.c                 |   20 +-
>   lib/librte_eal/rte_eal_version.map                |    2 +
>   lib/librte_telemetry/Makefile                     |   30 +
>   lib/librte_telemetry/meson.build                  |    9 +
>   lib/librte_telemetry/rte_telemetry.c              | 1810 +++++++++++++++++++++
>   lib/librte_telemetry/rte_telemetry.h              |   48 +
>   lib/librte_telemetry/rte_telemetry_internal.h     |   81 +
>   lib/librte_telemetry/rte_telemetry_parser.c       |  586 +++++++
>   lib/librte_telemetry/rte_telemetry_parser.h       |   13 +
>   lib/librte_telemetry/rte_telemetry_parser_test.c  |  534 ++++++
>   lib/librte_telemetry/rte_telemetry_parser_test.h  |   39 +
>   lib/librte_telemetry/rte_telemetry_socket_tests.h |   36 +
>   lib/librte_telemetry/rte_telemetry_version.map    |    7 +
>   lib/meson.build                                   |    3 +-
>   meson.build                                       |    1 +
>   mk/rte.app.mk                                     |    1 +
>   usertools/dpdk-telemetry-client.py                |  116 ++
>   38 files changed, 3618 insertions(+), 19 deletions(-)
>   create mode 100644 doc/guides/howto/telemetry.rst
>   create mode 100644 lib/librte_eal/common/include/rte_param.h
>   create mode 100644 lib/librte_eal/common/rte_param.c
>   create mode 100644 lib/librte_telemetry/Makefile
>   create mode 100644 lib/librte_telemetry/meson.build
>   create mode 100644 lib/librte_telemetry/rte_telemetry.c
>   create mode 100644 lib/librte_telemetry/rte_telemetry.h
>   create mode 100644 lib/librte_telemetry/rte_telemetry_internal.h
>   create mode 100644 lib/librte_telemetry/rte_telemetry_parser.c
>   create mode 100644 lib/librte_telemetry/rte_telemetry_parser.h
>   create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.c
>   create mode 100644 lib/librte_telemetry/rte_telemetry_parser_test.h
>   create mode 100644 lib/librte_telemetry/rte_telemetry_socket_tests.h
>   create mode 100644 lib/librte_telemetry/rte_telemetry_version.map
>   create mode 100644 usertools/dpdk-telemetry-client.py
>
  
Kevin Laatz Oct. 19, 2018, 10:16 a.m. UTC | #2
Hi Mattias,

On 18/10/2018 09:07, Mattias Rönnblom wrote:
> Most of the issues I pointed out in v2 of this patchset is still here.

Will recheck feedback for the next version.

With regards to comments on v2, 3/10, could you please help provide 
clarification on the below?

<Copy from the v2 feedback>
On 03/10/2018 20:06, Mattias Rönnblom wrote:
> On 2018-10-03 19:36, Kevin Laatz wrote:
>> From: Ciara Power <ciara.power@intel.com>
>> +
>> +    if (!telemetry->request_client) {
>> +        TELEMETRY_LOG_ERR("No client has been chosen to write to");
>> +        return -1;
>> +    } > +
>> +    if (!json_string) {
>> +        TELEMETRY_LOG_ERR("Invalid JSON string!");
>> +        return -1;
>> +    }
>> +
>> +    ret = send(telemetry->request_client->fd,
>> +            json_string, strlen(json_string), 0);
>
> How would this code handle a partial success (as in, for example, half 
> of the string fits the socket buffer)? In not, maybe switching over to 
> a SOCK_SEQPACKET AF_UNIX socket would be the best way around it.
>
Is the suggestion here simply to use socket(AF_UNIX, SOCK_SEQPACKET) 
instead of (AF_UNIX, SOCK_STREAM) ?
>
>> +
>> +    buffer_read = read(telemetry->accept_fd, buf, BUF_SIZE-1);
>
> This and the below code seem to assume that read() returns one and 
> only one message, but on a SOCK_STREAM, there is no such thing as a 
> message. It's a byte stream, and you need to provide your own framing, 
> or have an application protocol which allows only have one outstanding 
> request. If you do the latter, you still need to allow for "short" 
> (partial) read()s (i.e. re-read() until done).
>
Will the above solve this part of the problem too?

Best regards,
Kevin
  
Mattias Rönnblom Oct. 22, 2018, 7:11 a.m. UTC | #3
On 2018-10-19 12:16, Laatz, Kevin wrote:
> On 03/10/2018 20:06, Mattias Rönnblom wrote:
>> On 2018-10-03 19:36, Kevin Laatz wrote:
>>> From: Ciara Power <ciara.power@intel.com>
>>> +
>>> +    if (!telemetry->request_client) {
>>> +        TELEMETRY_LOG_ERR("No client has been chosen to write to");
>>> +        return -1;
>>> +    } > +
>>> +    if (!json_string) {
>>> +        TELEMETRY_LOG_ERR("Invalid JSON string!");
>>> +        return -1;
>>> +    }
>>> +
>>> +    ret = send(telemetry->request_client->fd,
>>> +            json_string, strlen(json_string), 0);
>>
>> How would this code handle a partial success (as in, for example, half 
>> of the string fits the socket buffer)? In not, maybe switching over to 
>> a SOCK_SEQPACKET AF_UNIX socket would be the best way around it.
>>
> Is the suggestion here simply to use socket(AF_UNIX, SOCK_SEQPACKET) 
> instead of (AF_UNIX, SOCK_STREAM) ?

That would be the most straight-forward to the problem, I think. Linux' 
SOCK_SEQPACKET implementation has problems with really large messages 
(since a per-message linear buffer is allocated), but I'm guessing these 
messages are not in the hundreds-of-kb range.

>>
>>> +
>>> +    buffer_read = read(telemetry->accept_fd, buf, BUF_SIZE-1);
>>
>> This and the below code seem to assume that read() returns one and 
>> only one message, but on a SOCK_STREAM, there is no such thing as a 
>> message. It's a byte stream, and you need to provide your own framing, 
>> or have an application protocol which allows only have one outstanding 
>> request. If you do the latter, you still need to allow for "short" 
>> (partial) read()s (i.e. re-read() until done).
>>
> Will the above solve this part of the problem too?
> 

Yes. The kernel will delivered the application (at most) one message, in 
its entirety.
  
Kevin Laatz Oct. 22, 2018, 9:03 a.m. UTC | #4
Hi Mattias,

Thanks for the input and clarification. I will include these changes in 
the v6.

Regards,
Kevin

On 22/10/2018 08:11, Mattias Rönnblom wrote:
> On 2018-10-19 12:16, Laatz, Kevin wrote:
>> On 03/10/2018 20:06, Mattias Rönnblom wrote:
>>> On 2018-10-03 19:36, Kevin Laatz wrote:
>>>> From: Ciara Power <ciara.power@intel.com>
>>>> +
>>>> +    if (!telemetry->request_client) {
>>>> +        TELEMETRY_LOG_ERR("No client has been chosen to write to");
>>>> +        return -1;
>>>> +    } > +
>>>> +    if (!json_string) {
>>>> +        TELEMETRY_LOG_ERR("Invalid JSON string!");
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    ret = send(telemetry->request_client->fd,
>>>> +            json_string, strlen(json_string), 0);
>>>
>>> How would this code handle a partial success (as in, for example, 
>>> half of the string fits the socket buffer)? In not, maybe switching 
>>> over to a SOCK_SEQPACKET AF_UNIX socket would be the best way around 
>>> it.
>>>
>> Is the suggestion here simply to use socket(AF_UNIX, SOCK_SEQPACKET) 
>> instead of (AF_UNIX, SOCK_STREAM) ?
>
> That would be the most straight-forward to the problem, I think. 
> Linux' SOCK_SEQPACKET implementation has problems with really large 
> messages (since a per-message linear buffer is allocated), but I'm 
> guessing these messages are not in the hundreds-of-kb range.
>
>>>
>>>> +
>>>> +    buffer_read = read(telemetry->accept_fd, buf, BUF_SIZE-1);
>>>
>>> This and the below code seem to assume that read() returns one and 
>>> only one message, but on a SOCK_STREAM, there is no such thing as a 
>>> message. It's a byte stream, and you need to provide your own 
>>> framing, or have an application protocol which allows only have one 
>>> outstanding request. If you do the latter, you still need to allow 
>>> for "short" (partial) read()s (i.e. re-read() until done).
>>>
>> Will the above solve this part of the problem too?
>>
>
> Yes. The kernel will delivered the application (at most) one message, 
> in its entirety.