telemetry: fix autotest failures on Alpine

Message ID 20230310181836.162336-1-bruce.richardson@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series telemetry: fix autotest failures on Alpine |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Bruce Richardson March 10, 2023, 6:18 p.m. UTC
  On Alpine linux, the telemetry_data_autotest was failing for the
test where we had dictionaries embedded in other dictionaries up
to three levels deep. Indications are that this issue is due to
excess data being stored on the stack, so replace stack-allocated
buffer data with dynamically allocated data in the case where we
are doing recursive processing of telemetry data structures into
json.

Bugzilla ID: 1177
Fixes: c933bb5177ca ("telemetry: support array values in data object")
Fixes: d2671e642a8e ("telemetry: support dict of dicts")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/telemetry/telemetry.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)
  

Comments

Stephen Hemminger March 10, 2023, 7:08 p.m. UTC | #1
On Fri, 10 Mar 2023 18:18:36 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Alpine linux, the telemetry_data_autotest was failing for the
> test where we had dictionaries embedded in other dictionaries up
> to three levels deep. Indications are that this issue is due to
> excess data being stored on the stack, so replace stack-allocated
> buffer data with dynamically allocated data in the case where we
> are doing recursive processing of telemetry data structures into
> json.
> 
> Bugzilla ID: 1177
> Fixes: c933bb5177ca ("telemetry: support array values in data object")
> Fixes: d2671e642a8e ("telemetry: support dict of dicts")
> Cc: stable@dpdk.org

Looking at the telemetry code:

- why so many temporary buffers, could this be streamed or redesigned
  so that an allocated buffer is returned.

- why is rte_tel_json_XXX all inline?  These should just be internal
  functions and not in a .h file.

FYI - if this library reused existing json writer it would have
been much simpler.  
https://github.com/shemminger/iproute2/blob/main/lib/json_writer.c
  
Bruce Richardson March 13, 2023, 9:38 a.m. UTC | #2
On Fri, Mar 10, 2023 at 11:08:32AM -0800, Stephen Hemminger wrote:
> On Fri, 10 Mar 2023 18:18:36 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Alpine linux, the telemetry_data_autotest was failing for the
> > test where we had dictionaries embedded in other dictionaries up
> > to three levels deep. Indications are that this issue is due to
> > excess data being stored on the stack, so replace stack-allocated
> > buffer data with dynamically allocated data in the case where we
> > are doing recursive processing of telemetry data structures into
> > json.
> > 
> > Bugzilla ID: 1177
> > Fixes: c933bb5177ca ("telemetry: support array values in data object")
> > Fixes: d2671e642a8e ("telemetry: support dict of dicts")
> > Cc: stable@dpdk.org
> 
> Looking at the telemetry code:
> 
> - why so many temporary buffers, could this be streamed or redesigned
>   so that an allocated buffer is returned.
> 

This is largely what the fix patch does. The temporary buffers are used to
ensure we never end up with a partially written buffer from any truncated
snprintf, since that could cause us to emit invalid json. I think as a
scheme it works well enough for what it was designed for, but this
particular test case has more levels of recursion than was expected, so the
limit of the design are showing. The downside of using memory allocation
using malloc as here, is just more failure cases.

> - why is rte_tel_json_XXX all inline?  These should just be internal
>   functions and not in a .h file.
>

Sure. Since these are all just internal functions used by the library, I'm
not sure it really matters. 

> FYI - if this library reused existing json writer it would have
> been much simpler.  
> https://github.com/shemminger/iproute2/blob/main/lib/json_writer.c

Yep. Looked at that previously, but it's at a lower level than what we
have. IMHO, the complicated bit of producing json output is not the
formatting characters for each data-type but ensuring correct termination
of the json string even in case of errors, so each function in our
telemetry json code always includes correct terminators at all points, i.e.
we have no separate functions to be called for terminating objects, arrays
etc. - they are always included in the output of the items. This is why so
many temporary buffers are used - the input string to a function is
well-formed json, and we only actually append to that buffer by copying
from the temporary buffer once we are sure that the output will also be
similarly well-formed.

That said, if you want to replace the current json implementation with a
better one, I'm not opposed to it. [though I'd definitely rather no
external dependencies which would mean we could no longer rely on it always
being available in default builds]

/Bruce
  

Patch

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 7bceadcee7..34d371ab8a 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -208,7 +208,9 @@  container_to_json(const struct rte_tel_data *d, char *out_buf, size_t buf_len)
 				break;
 			case RTE_TEL_CONTAINER:
 			{
-				char temp[buf_len];
+				char *temp = malloc(buf_len);
+				if (temp == NULL)
+					break;
 				const struct container *cont =
 						&v->value.container;
 				if (container_to_json(cont->data,
@@ -219,6 +221,7 @@  container_to_json(const struct rte_tel_data *d, char *out_buf, size_t buf_len)
 							v->name, temp);
 				if (!cont->keep)
 					rte_tel_data_free(cont->data);
+				free(temp);
 				break;
 			}
 			}
@@ -275,7 +278,9 @@  output_json(const char *cmd, const struct rte_tel_data *d, int s)
 				break;
 			case RTE_TEL_CONTAINER:
 			{
-				char temp[buf_len];
+				char *temp = malloc(buf_len);
+				if (temp == NULL)
+					break;
 				const struct container *cont =
 						&v->value.container;
 				if (container_to_json(cont->data,
@@ -286,6 +291,7 @@  output_json(const char *cmd, const struct rte_tel_data *d, int s)
 							v->name, temp);
 				if (!cont->keep)
 					rte_tel_data_free(cont->data);
+				free(temp);
 			}
 			}
 		}
@@ -311,7 +317,9 @@  output_json(const char *cmd, const struct rte_tel_data *d, int s)
 						buf_len, used,
 						d->data.array[i].uval);
 			else if (d->type == TEL_ARRAY_CONTAINER) {
-				char temp[buf_len];
+				char *temp = malloc(buf_len);
+				if (temp == NULL)
+					break;
 				const struct container *rec_data =
 						&d->data.array[i].container;
 				if (container_to_json(rec_data->data,
@@ -321,6 +329,7 @@  output_json(const char *cmd, const struct rte_tel_data *d, int s)
 							buf_len, used, temp);
 				if (!rec_data->keep)
 					rte_tel_data_free(rec_data->data);
+				free(temp);
 			}
 		break;
 	}