Message ID | 20220623164245.561371-1-bruce.richardson@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D4FD3A0093; Thu, 23 Jun 2022 18:43:00 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7761C4067B; Thu, 23 Jun 2022 18:43:00 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id A76FC40146 for <dev@dpdk.org>; Thu, 23 Jun 2022 18:42:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656002578; x=1687538578; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=V0pAORNgNQMChPn+RoVFDNUp7/WM3Ly24hT2VKutCUs=; b=lSyfyTN+QvqSu6K1Wp2Q0PxNOUJojizy7Hb6CLOysomgusyQr1y0RX45 L85rx9cYY52TTluB0EiT2J4UhKp/YvIwJc00Fh6hDZZej3n+3xGL0wy4L LS+rbxSczxoGRzOOqyzLdnAjxZ5bz+v0uUZ0uG37dT6g5ps0yyij0+FcR kkqFbJDdJJcJbVrUOVFFZgLVSbPjJgcf2YRkzxLmQITgl/SQout843O9F /r8P6o+HSuIknUNG25sHXa7+LbrkNqCoFpIM9XiMX2jGBaxJXjdYRHqBY b+rw9UHHOt/QlYCIgbExzDwD/DduPosyubwdveaZWw353oV8opi6Y873Q A==; X-IronPort-AV: E=McAfee;i="6400,9594,10386"; a="260589061" X-IronPort-AV: E=Sophos;i="5.92,216,1650956400"; d="scan'208";a="260589061" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2022 09:42:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,216,1650956400"; d="scan'208";a="915267922" Received: from silpixa00401385.ir.intel.com (HELO silpixa00401385.ger.corp.intel.com.) ([10.237.223.125]) by fmsmga005.fm.intel.com with ESMTP; 23 Jun 2022 09:42:55 -0700 From: Bruce Richardson <bruce.richardson@intel.com> To: dev@dpdk.org Cc: ciara.power@intel.com, fengchengwen@huawei.com, mb@smartsharesystems.com, Bruce Richardson <bruce.richardson@intel.com> Subject: [RFC PATCH 0/6] add json string escaping to telemetry Date: Thu, 23 Jun 2022 17:42:39 +0100 Message-Id: <20220623164245.561371-1-bruce.richardson@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org |
Series |
add json string escaping to telemetry
|
|
Message
Bruce Richardson
June 23, 2022, 4:42 p.m. UTC
This RFC shows one possible approach for escaping strings for the json output of telemetry library. For now this RFC supports escaping strings for the cases of returning a single string, or returning an array of strings. Not done is escaping of strings in objs/dicts [see more below on TODO] As well as telemetry lib changes, this patchset includes unit tests for the above and also little bit of cleanup to the json tests. TODO: Beyond what is here in this RFC: 1. we need to decide what to do about name/value pairs. Personally, I think we should add the restriction to the "rte_tel_data_add_obj_*" APIs to only allow a defined subset of characters in names: e.g. alphanumeric chars, underscore and dash. That means that we only need to escape the data part in the case of string returns. 2. once agreed, need to implement a patch to escape strings in dicts/objs 3. need to add a patch to escape the input command if it contains invalid chars 4. some small refactoring of the main telemetry.c json-encoding function may be possible. Bruce Richardson (6): test/telemetry_json: print success or failure per subtest telemetry: fix escaping of invalid json characters telemetry: use json string function for string outputs test/telemetry_json: add test for string character escaping telemetry: add escaping of strings in arrays test/telemetry-json: add test case for escaping strings in arrays app/test/test_telemetry_json.c | 74 +++++++++++++++++++++++++++++----- lib/telemetry/telemetry.c | 11 +++-- lib/telemetry/telemetry_json.h | 62 ++++++++++++++++++++++++++-- 3 files changed, 132 insertions(+), 15 deletions(-) -- 2.34.1
Comments
> From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Thursday, 23 June 2022 18.43 > > This RFC shows one possible approach for escaping strings for the json > output of telemetry library. For now this RFC supports escaping strings > for the cases of returning a single string, or returning an array of > strings. Not done is escaping of strings in objs/dicts [see more below > on TODO] Very good initiative. > > As well as telemetry lib changes, this patchset includes unit tests for > the above and also little bit of cleanup to the json tests. > > TODO: > Beyond what is here in this RFC: > > 1. we need to decide what to do about name/value pairs. Personally, I > think we should add the restriction to the "rte_tel_data_add_obj_*" > APIs > to only allow a defined subset of characters in names: e.g. > alphanumeric > chars, underscore and dash. That means that we only need to escape > the data part in the case of string returns. I agree about only allowing a subset of characters in names, so JSON (and other) encoding is not required. However, I think we should be less restrictive, and also allow characters commonly used for separation, indexing and wildcard, such as '/', '[', ']', and '*', '?' or '%'. Obviously, we should disallow characters requiring escaping in not just JSON, but also other foreseeable encodings and protocols. So please bring your crystal ball to the discussion. ;-) > 2. once agreed, need to implement a patch to escape strings in > dicts/objs Yes. > > 3. need to add a patch to escape the input command if it contains > invalid chars What do you mean here? You mean unescape JSON encoded input (arriving on the JSON telemetry socket) to a proper binary string? > 4. some small refactoring of the main telemetry.c json-encoding > function > may be possible. Perhaps.
On Thu, Jun 23, 2022 at 09:04:31PM +0200, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Thursday, 23 June 2022 18.43 > > > > This RFC shows one possible approach for escaping strings for the json > > output of telemetry library. For now this RFC supports escaping strings > > for the cases of returning a single string, or returning an array of > > strings. Not done is escaping of strings in objs/dicts [see more below > > on TODO] > > Very good initiative. > > > > > As well as telemetry lib changes, this patchset includes unit tests for > > the above and also little bit of cleanup to the json tests. > > > > TODO: > > Beyond what is here in this RFC: > > > > 1. we need to decide what to do about name/value pairs. Personally, I > > think we should add the restriction to the "rte_tel_data_add_obj_*" > > APIs > > to only allow a defined subset of characters in names: e.g. > > alphanumeric > > chars, underscore and dash. That means that we only need to escape > > the data part in the case of string returns. > > I agree about only allowing a subset of characters in names, so JSON (and other) encoding is not required. > > However, I think we should be less restrictive, and also allow characters commonly used for separation, indexing and wildcard, such as '/', '[', ']', and '*', '?' or '%'. > > Obviously, we should disallow characters requiring escaping in not just JSON, but also other foreseeable encodings and protocols. So please bring your crystal ball to the discussion. ;-) > Exactly why I am looking for feedback - and why I'm looking to have an explicit allowed list of characters rather than trying to just block the known-bad in json ones. For your suggestions: +1 to separators and indexing, i.e. '[', ']' and '/', though I would probably also add ',' and maybe '.' (unless it's likely to cause issues with some protocol we are likely to want to use). For the wildcarding, I find it hard to see why we would want those? The other advantage of using an allowlist of characters is that it makes it possible to expand over time, compared to a blocklist which always runs the risk of breaking something if you expand it. Therefore I suggest we keep the list as small as we need right now, and expand it only as we need. > > 2. once agreed, need to implement a patch to escape strings in > > dicts/objs > > Yes. > > > > > 3. need to add a patch to escape the input command if it contains > > invalid chars > > What do you mean here? You mean unescape JSON encoded input (arriving on the JSON telemetry socket) to a proper binary string? > The thing with the telemetry socket interface right now is that the input requests are not-json. The reasons for that is that they be kept as simple as possible, and to avoid needing a full json parser inside DPDK. Therefore, the input sent by the user could contain invalid characters for json output so we need to: 1. Guarantee that no command registered with the telemetry library contains invalid json characters (though why someone would do so, I don't know!) 2. When we return the command back in the reply, properly escape any invalid characters in the error case. #1 is very important for sanity checking, but now that I think about it #2 is probably optional, since if any user does start sending invalid garbage input that breaks their json parser on return, they are only hurting themselves and not affecting anything else on the system. > > 4. some small refactoring of the main telemetry.c json-encoding > > function may be possible. > > Perhaps. > I saw some options for cleanup when I was working on the code, so including this as a note-to-self as much as anything else for feedback. :-) /Bruce
> From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Friday, 24 June 2022 10.14 > > On Thu, Jun 23, 2022 at 09:04:31PM +0200, Morten Brørup wrote: > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > Sent: Thursday, 23 June 2022 18.43 > > > > > > This RFC shows one possible approach for escaping strings for the > json > > > output of telemetry library. For now this RFC supports escaping > strings > > > for the cases of returning a single string, or returning an array > of > > > strings. Not done is escaping of strings in objs/dicts [see more > below > > > on TODO] > > > > Very good initiative. > > > > > > > > As well as telemetry lib changes, this patchset includes unit tests > for > > > the above and also little bit of cleanup to the json tests. > > > > > > TODO: > > > Beyond what is here in this RFC: > > > > > > 1. we need to decide what to do about name/value pairs. Personally, > I > > > think we should add the restriction to the > "rte_tel_data_add_obj_*" > > > APIs > > > to only allow a defined subset of characters in names: e.g. > > > alphanumeric > > > chars, underscore and dash. That means that we only need to > escape > > > the data part in the case of string returns. > > > > I agree about only allowing a subset of characters in names, so JSON > (and other) encoding is not required. > > > > However, I think we should be less restrictive, and also allow > characters commonly used for separation, indexing and wildcard, such as > '/', '[', ']', and '*', '?' or '%'. > > > > Obviously, we should disallow characters requiring escaping in not > just JSON, but also other foreseeable encodings and protocols. So > please bring your crystal ball to the discussion. ;-) > > > Exactly why I am looking for feedback - and why I'm looking to have an > explicit allowed list of characters rather than trying to just block > the > known-bad in json ones. > > For your suggestions: +1 to separators and indexing, i.e. '[', ']' and > '/', > though I would probably also add ',' and maybe '.' (unless it's likely > to > cause issues with some protocol we are likely to want to use). After having slept on it, I think we should also allow characters that could appear in IP and MAC addresses, i.e. '.' and ':' (and '/' for subnetting). > For the wildcarding, I find it hard to see why we would want those? Initially, I thought a wildcard might be useful as a placeholder in templates. But it might also be useful for partial IP or MAC addresses. E.g.: - The SmartShare Systems OUI could be represented by the MAC address "00:1F:B4:??:??:??". - A default gateway address in a template configuration could be "192.168.*.1". On the other hand, wildcard characters could be disallowed or require escaping in other (non-JSON) protocols. So I'm just being a bit creative here, throwing out ideas in our search for the right balance in the restrictions. > > The other advantage of using an allowlist of characters is that it > makes it > possible to expand over time, compared to a blocklist which always runs > the > risk of breaking something if you expand it. Therefore I suggest we > keep > the list as small as we need right now, and expand it only as we need. +1 > > > > 2. once agreed, need to implement a patch to escape strings in > > > dicts/objs > > > > Yes. > > > > > > > > 3. need to add a patch to escape the input command if it contains > > > invalid chars > > > > What do you mean here? You mean unescape JSON encoded input (arriving > on the JSON telemetry socket) to a proper binary string? > > > > The thing with the telemetry socket interface right now is that the > input > requests are not-json. The reasons for that is that they be kept as > simple > as possible, and to avoid needing a full json parser inside DPDK. > Therefore, the input sent by the user could contain invalid characters > for > json output so we need to: > 1. Guarantee that no command registered with the telemetry library > contains > invalid json characters (though why someone would do so, I don't > know!) > 2. When we return the command back in the reply, properly escape any > invalid characters in the error case. > > #1 is very important for sanity checking, but now that I think about it > #2 > is probably optional, since if any user does start sending invalid > garbage > input that breaks their json parser on return, they are only hurting > themselves and not affecting anything else on the system. > > > > 4. some small refactoring of the main telemetry.c json-encoding > > > function may be possible. > > > > Perhaps. > > > I saw some options for cleanup when I was working on the code, so > including > this as a note-to-self as much as anything else for feedback. :-) > > /Bruce
On Fri, Jun 24, 2022 at 11:12:05AM +0200, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Friday, 24 June 2022 10.14 > > > > On Thu, Jun 23, 2022 at 09:04:31PM +0200, Morten Brørup wrote: > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > Sent: Thursday, 23 June 2022 18.43 > > > > > > > > This RFC shows one possible approach for escaping strings for the > > json > > > > output of telemetry library. For now this RFC supports escaping > > strings > > > > for the cases of returning a single string, or returning an array > > of > > > > strings. Not done is escaping of strings in objs/dicts [see more > > below > > > > on TODO] > > > > > > Very good initiative. > > > > > > > > > > > As well as telemetry lib changes, this patchset includes unit tests > > for > > > > the above and also little bit of cleanup to the json tests. > > > > > > > > TODO: > > > > Beyond what is here in this RFC: > > > > > > > > 1. we need to decide what to do about name/value pairs. Personally, > > I > > > > think we should add the restriction to the > > "rte_tel_data_add_obj_*" > > > > APIs > > > > to only allow a defined subset of characters in names: e.g. > > > > alphanumeric > > > > chars, underscore and dash. That means that we only need to > > escape > > > > the data part in the case of string returns. > > > > > > I agree about only allowing a subset of characters in names, so JSON > > (and other) encoding is not required. > > > > > > However, I think we should be less restrictive, and also allow > > characters commonly used for separation, indexing and wildcard, such as > > '/', '[', ']', and '*', '?' or '%'. > > > > > > Obviously, we should disallow characters requiring escaping in not > > just JSON, but also other foreseeable encodings and protocols. So > > please bring your crystal ball to the discussion. ;-) > > > > > Exactly why I am looking for feedback - and why I'm looking to have an > > explicit allowed list of characters rather than trying to just block > > the > > known-bad in json ones. > > > > For your suggestions: +1 to separators and indexing, i.e. '[', ']' and > > '/', > > though I would probably also add ',' and maybe '.' (unless it's likely > > to > > cause issues with some protocol we are likely to want to use). > > After having slept on it, I think we should also allow characters that could appear in IP and MAC addresses, i.e. '.' and ':' (and '/' for subnetting). > > > For the wildcarding, I find it hard to see why we would want those? > > Initially, I thought a wildcard might be useful as a placeholder in templates. > > But it might also be useful for partial IP or MAC addresses. E.g.: > - The SmartShare Systems OUI could be represented by the MAC address "00:1F:B4:??:??:??". > - A default gateway address in a template configuration could be "192.168.*.1". > > On the other hand, wildcard characters could be disallowed or require escaping in other (non-JSON) protocols. > > So I'm just being a bit creative here, throwing out ideas in our search for the right balance in the restrictions. > I could see those characters certainly being needed in data values, but do you foresee them being required in the names of fields? > > > > The other advantage of using an allowlist of characters is that it > > makes it > > possible to expand over time, compared to a blocklist which always runs > > the > > risk of breaking something if you expand it. Therefore I suggest we > > keep > > the list as small as we need right now, and expand it only as we need. > > +1 > From previous on-list discussion, I take it that SNMP is a possible target protocol you might have in mind. Any other protocols you can think of and what restrictions (if any) would SNMP or those other protocols add? /Bruce
> From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Friday, 24 June 2022 11.17 > > On Fri, Jun 24, 2022 at 11:12:05AM +0200, Morten Brørup wrote: > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > Sent: Friday, 24 June 2022 10.14 > > > > > > On Thu, Jun 23, 2022 at 09:04:31PM +0200, Morten Brørup wrote: > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > > Sent: Thursday, 23 June 2022 18.43 > > > > > > > > > > This RFC shows one possible approach for escaping strings for > the > > > json > > > > > output of telemetry library. For now this RFC supports escaping > > > strings > > > > > for the cases of returning a single string, or returning an > array > > > of > > > > > strings. Not done is escaping of strings in objs/dicts [see > more > > > below > > > > > on TODO] > > > > > > > > Very good initiative. > > > > > > > > > > > > > > As well as telemetry lib changes, this patchset includes unit > tests > > > for > > > > > the above and also little bit of cleanup to the json tests. > > > > > > > > > > TODO: > > > > > Beyond what is here in this RFC: > > > > > > > > > > 1. we need to decide what to do about name/value pairs. > Personally, > > > I > > > > > think we should add the restriction to the > > > "rte_tel_data_add_obj_*" > > > > > APIs > > > > > to only allow a defined subset of characters in names: e.g. > > > > > alphanumeric > > > > > chars, underscore and dash. That means that we only need to > > > escape > > > > > the data part in the case of string returns. > > > > > > > > I agree about only allowing a subset of characters in names, so > JSON > > > (and other) encoding is not required. > > > > > > > > However, I think we should be less restrictive, and also allow > > > characters commonly used for separation, indexing and wildcard, > such as > > > '/', '[', ']', and '*', '?' or '%'. > > > > > > > > Obviously, we should disallow characters requiring escaping in > not > > > just JSON, but also other foreseeable encodings and protocols. So > > > please bring your crystal ball to the discussion. ;-) > > > > > > > Exactly why I am looking for feedback - and why I'm looking to have > an > > > explicit allowed list of characters rather than trying to just > block > > > the > > > known-bad in json ones. > > > > > > For your suggestions: +1 to separators and indexing, i.e. '[', ']' > and > > > '/', > > > though I would probably also add ',' and maybe '.' (unless it's > likely > > > to > > > cause issues with some protocol we are likely to want to use). > > > > After having slept on it, I think we should also allow characters > that could appear in IP and MAC addresses, i.e. '.' and ':' (and '/' > for subnetting). > > > > > For the wildcarding, I find it hard to see why we would want those? > > > > Initially, I thought a wildcard might be useful as a placeholder in > templates. > > > > But it might also be useful for partial IP or MAC addresses. E.g.: > > - The SmartShare Systems OUI could be represented by the MAC address > "00:1F:B4:??:??:??". > > - A default gateway address in a template configuration could be > "192.168.*.1". > > > > On the other hand, wildcard characters could be disallowed or require > escaping in other (non-JSON) protocols. > > > > So I'm just being a bit creative here, throwing out ideas in our > search for the right balance in the restrictions. > > > > I could see those characters certainly being needed in data values, but > do > you foresee them being required in the names of fields? We don't use the Telemetry library, because we have our own libraries for similar and related purposes. So I'm mostly speculating, trying to transform our experience into how I would expect the Telemetry library to work, while also trying to look farther into the future. Answering your question: Yes, if you consider the names as keys in a key/value store, there might be single entries that look like a template. Although the names of such entries might as well be "00:1F:B4:xx:xx:xx" or "192.168.z.1", using 'x' and 'z' as the wildcard characters. Perhaps we should start with the low risk choice, and not allow the special wild card characters, such as '*', '?', '%', since 'x' is just as good in those cases. > > > > > > > The other advantage of using an allowlist of characters is that it > > > makes it > > > possible to expand over time, compared to a blocklist which always > runs > > > the > > > risk of breaking something if you expand it. Therefore I suggest we > > > keep > > > the list as small as we need right now, and expand it only as we > need. > > > > +1 > > > > From previous on-list discussion, I take it that SNMP is a possible > target > protocol you might have in mind. Any other protocols you can think of > and > what restrictions (if any) would SNMP or those other protocols add? JSON and UTF-8 seems to have taken over the world entirely. SNMP support is usually required for legacy reasons. The SNMP lookup key is always an OID (Object Identifier), which basically is a sequence of numbers with a well known length of the sequence. In theory, any BLOB could be converted to an OID. With that in mind, I don't think SNMP puts any restrictions to the character set of the Telemetry names. The translation between OID format (i.e. a sequence of numbers) and Telemetry name format (i.e. a string) could be a very simple encoder/decoder, since there are no special characters requiring special treatment. Going back to the IP address topic above, some of the SNMP MIBs use the IP address as the last four numbers in the OID, e.g. "ipAdEntIfIndex.192.0.1.1" (where ipAdEntIfIndex is short for "1.3.6.1.2.1.4.20.1.2"). My point here is: The names available for lookup in the telemetry database could be highly dynamic. As for other protocols, there could be something like InfluxDB [1], for direct streaming of statistics and other telemetry, but I don't have real experience with any of them. Our customers currently use scripts to poll the JSON data from our API and push them into their InfluxDB databases. There could also be limitations in the structured format for SYSLOG [2], but again I don't have any experience with it. We just use classic SYSLOG text messages. [1] https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/ [2] https://datatracker.ietf.org/doc/html/rfc5424 > > /Bruce
> From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Thursday, 23 June 2022 18.43 > > This RFC shows one possible approach for escaping strings for the json > output of telemetry library. For now this RFC supports escaping strings > for the cases of returning a single string, or returning an array of > strings. Not done is escaping of strings in objs/dicts [see more below > on TODO] Bugzilla ID: 1037 -Morten
On Thu, Jul 14, 2022 at 05:42:59PM +0200, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Thursday, 23 June 2022 18.43 > > > > This RFC shows one possible approach for escaping strings for the json > > output of telemetry library. For now this RFC supports escaping strings > > for the cases of returning a single string, or returning an array of > > strings. Not done is escaping of strings in objs/dicts [see more below > > on TODO] > > Bugzilla ID: 1037 > Noted in the v2 patchset now sent to the list. Thanks for all the feedback on the RFC. Hopefully I've managed to take all - or at least most of it - correctly into account on the v2. /Bruce