eal: add support for TRNG with Arm RNG feature

Message ID 20240723212703.721050-1-shunzhi.wen@arm.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series eal: add support for TRNG with Arm RNG feature |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build fail github build: failed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-compile-amd64-testing fail Testing issues
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing fail Testing issues
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Shunzhi Wen July 23, 2024, 9:27 p.m. UTC
True Random Number Generator (TRNG) is capable of
generating random numbers from a physical entropy source.
TRNG is enabled when compiled on Arm arch that supports
FEAT_RNG introduced in Armv8.5-A flagged by __ARM_FEATURE_RNG.

Signed-off-by: Shunzhi Wen <shunzhi.wen@arm.com>
Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 .mailmap                                      |  2 ++
 app/test/test_rand_perf.c                     | 25 +++++++++++++++++--
 config/arm/meson.build                        |  2 +-
 lib/eal/arm/include/rte_cpuflags_64.h         |  3 +++
 lib/eal/arm/meson.build                       |  1 +
 lib/eal/arm/rte_cpuflags.c                    |  1 +
 lib/eal/arm/rte_random.c                      | 20 +++++++++++++++
 .../{rte_random.c => eal_common_random.c}     |  0
 lib/eal/common/meson.build                    |  2 +-
 lib/eal/include/rte_random.h                  | 17 +++++++++++++
 lib/eal/loongarch/meson.build                 |  1 +
 lib/eal/loongarch/rte_random.c                | 14 +++++++++++
 lib/eal/ppc/meson.build                       |  1 +
 lib/eal/ppc/rte_random.c                      | 14 +++++++++++
 lib/eal/riscv/meson.build                     |  1 +
 lib/eal/riscv/rte_random.c                    | 14 +++++++++++
 lib/eal/version.map                           |  1 +
 lib/eal/x86/meson.build                       |  1 +
 lib/eal/x86/rte_random.c                      | 14 +++++++++++
 19 files changed, 130 insertions(+), 4 deletions(-)
 create mode 100644 lib/eal/arm/rte_random.c
 rename lib/eal/common/{rte_random.c => eal_common_random.c} (100%)
 create mode 100644 lib/eal/loongarch/rte_random.c
 create mode 100644 lib/eal/ppc/rte_random.c
 create mode 100644 lib/eal/riscv/rte_random.c
 create mode 100644 lib/eal/x86/rte_random.c
  

Comments

Mattias Rönnblom July 24, 2024, 6:40 a.m. UTC | #1
On 2024-07-23 23:27, Shunzhi Wen wrote:> True Random Number Generator 
(TRNG) is capable of
 > generating random numbers from a physical entropy source.
 > TRNG is enabled when compiled on Arm arch that supports
 > FEAT_RNG introduced in Armv8.5-A flagged by __ARM_FEATURE_RNG.
 >

I'm missing a rationale here. Why is this useful?

 > Signed-off-by: Shunzhi Wen <shunzhi.wen@arm.com>
 > Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
 > Reviewed-by: Jack Bond-Preston <jack.bond-preston@arm.com>
 > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
 > ---
 >   .mailmap                                      |  2 ++
 >   app/test/test_rand_perf.c                     | 25 +++++++++++++++++--
 >   config/arm/meson.build                        |  2 +-
 >   lib/eal/arm/include/rte_cpuflags_64.h         |  3 +++
 >   lib/eal/arm/meson.build                       |  1 +
 >   lib/eal/arm/rte_cpuflags.c                    |  1 +
 >   lib/eal/arm/rte_random.c                      | 20 +++++++++++++++
 >   .../{rte_random.c => eal_common_random.c}     |  0
 >   lib/eal/common/meson.build                    |  2 +-
 >   lib/eal/include/rte_random.h                  | 17 +++++++++++++
 >   lib/eal/loongarch/meson.build                 |  1 +
 >   lib/eal/loongarch/rte_random.c                | 14 +++++++++++
 >   lib/eal/ppc/meson.build                       |  1 +
 >   lib/eal/ppc/rte_random.c                      | 14 +++++++++++
 >   lib/eal/riscv/meson.build                     |  1 +
 >   lib/eal/riscv/rte_random.c                    | 14 +++++++++++
 >   lib/eal/version.map                           |  1 +
 >   lib/eal/x86/meson.build                       |  1 +
 >   lib/eal/x86/rte_random.c                      | 14 +++++++++++
 >   19 files changed, 130 insertions(+), 4 deletions(-)
 >   create mode 100644 lib/eal/arm/rte_random.c
 >   rename lib/eal/common/{rte_random.c => eal_common_random.c} (100%)
 >   create mode 100644 lib/eal/loongarch/rte_random.c
 >   create mode 100644 lib/eal/ppc/rte_random.c
 >   create mode 100644 lib/eal/riscv/rte_random.c
 >   create mode 100644 lib/eal/x86/rte_random.c
 >
 > diff --git a/.mailmap b/.mailmap
 > index ac06962e82..23209edfd2 100644
 > --- a/.mailmap
 > +++ b/.mailmap
 > @@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
 >   Dexuan Cui <decui@microsoft.com>
 >   Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> 
<dharmik.thakkar@arm.com>
 >   Dheemanth Mallikarjun <dheemanthm@vmware.com>
 > +Dhruv Tripathi <dhruv.tripathi@arm.com>
 >   Diana Wang <na.wang@corigine.com>
 >   Didier Pallard <didier.pallard@6wind.com>
 >   Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
 > @@ -1353,6 +1354,7 @@ Shuki Katzenelson <shuki@lightbitslabs.com>
 >   Shun Hao <shunh@nvidia.com>
 >   Shu Shen <shu.shen@radisys.com>
 >   Shujing Dong <shujing.dong@corigine.com>
 > +Shunzhi Wen <shunzhi.wen@arm.com>
 >   Shweta Choudaha <shweta.choudaha@att.com>
 >   Shyam Kumar Shrivastav <shrivastav.shyam@gmail.com>
 >   Shy Shyman <shys@nvidia.com> <shys@mellanox.com>
 > diff --git a/app/test/test_rand_perf.c b/app/test/test_rand_perf.c
 > index 30204e12c0..b61cc75014 100644
 > --- a/app/test/test_rand_perf.c
 > +++ b/app/test/test_rand_perf.c
 > @@ -20,6 +20,7 @@ static volatile uint64_t vsum;
 >
 >   enum rand_type {
 >   	rand_type_64,
 > +	rand_type_true_rand_64,
 >   	rand_type_float,
 >   	rand_type_bounded_best_case,
 >   	rand_type_bounded_worst_case
 > @@ -31,6 +32,8 @@ rand_type_desc(enum rand_type rand_type)
 >   	switch (rand_type) {
 >   	case rand_type_64:
 >   		return "Full 64-bit [rte_rand()]";
 > +	case rand_type_true_rand_64:
 > +		return "Full 64-bit True Random [rte_trand()]";
 >   	case rand_type_float:
 >   		return "Floating point [rte_drand()]";
 >   	case rand_type_bounded_best_case:
 > @@ -50,6 +53,9 @@ test_rand_perf_type(enum rand_type rand_type)
 >   	uint64_t end;
 >   	uint64_t sum = 0;
 >   	uint64_t op_latency;
 > +	int ret;
 > +	uint64_t val;
 > +	uint32_t fail_count = 0;
 >
 >   	start = rte_rdtsc();
 >
 > @@ -58,6 +64,13 @@ test_rand_perf_type(enum rand_type rand_type)
 >   		case rand_type_64:
 >   			sum += rte_rand();
 >   			break;
 > +		case rand_type_true_rand_64:
 > +			ret = rte_trand(&val);
 > +			if (ret == 0)
 > +				sum += val;
 > +			else
 > +				fail_count++;
 > +			break;
 >   		case rand_type_float:
 >   			sum += 1000. * rte_drand();
 >   			break;
 > @@ -77,8 +90,15 @@ test_rand_perf_type(enum rand_type rand_type)
 >
 >   	op_latency = (end - start) / ITERATIONS;
 >
 > -	printf("%s: %"PRId64" TSC cycles/op\n", rand_type_desc(rand_type),
 > -	       op_latency);
 > +	if (!fail_count)
 > +		printf("%s: %"PRId64" TSC cycles/op\n",
 > +		       rand_type_desc(rand_type),
 > +		       op_latency);
 > +	else
 > +		printf("%s: %"PRId64" TSC cycles/op (failed %d time(s))\n",
 > +		       rand_type_desc(rand_type),
 > +		       op_latency,
 > +			   fail_count);
 >   }
 >
 >   static int
 > @@ -89,6 +109,7 @@ test_rand_perf(void)
 >   	printf("Pseudo-random number generation latencies:\n");
 >
 >   	test_rand_perf_type(rand_type_64);
 > +	test_rand_perf_type(rand_type_true_rand_64);
 >   	test_rand_perf_type(rand_type_float);
 >   	test_rand_perf_type(rand_type_bounded_best_case);
 >   	test_rand_perf_type(rand_type_bounded_worst_case);
 > diff --git a/config/arm/meson.build b/config/arm/meson.build
 > index 012935d5d7..13be94254e 100644
 > --- a/config/arm/meson.build
 > +++ b/config/arm/meson.build
 > @@ -95,7 +95,7 @@ part_number_config_arm = {
 >       },
 >       '0xd49': {
 >           'march': 'armv9-a',
 > -        'march_features': ['sve2'],
 > +        'march_features': ['sve2', 'rng'],
 >           'fallback_march': 'armv8.5-a',
 >           'mcpu': 'neoverse-n2',
 >           'flags': [
 > diff --git a/lib/eal/arm/include/rte_cpuflags_64.h 
b/lib/eal/arm/include/rte_cpuflags_64.h
 > index afe70209c3..6aa067339f 100644
 > --- a/lib/eal/arm/include/rte_cpuflags_64.h
 > +++ b/lib/eal/arm/include/rte_cpuflags_64.h
 > @@ -36,6 +36,9 @@ enum rte_cpu_flag_t {
 >   	RTE_CPUFLAG_SVEF64MM,
 >   	RTE_CPUFLAG_SVEBF16,
 >   	RTE_CPUFLAG_AARCH64,
 > +
 > +	/* RNDR, RNDRRS instructions */
 > +	RTE_CPUFLAG_RNG,
 >   };
 >
 >   #include "generic/rte_cpuflags.h"
 > diff --git a/lib/eal/arm/meson.build b/lib/eal/arm/meson.build
 > index 6fba3d6ba7..e9e468cbf0 100644
 > --- a/lib/eal/arm/meson.build
 > +++ b/lib/eal/arm/meson.build
 > @@ -9,4 +9,5 @@ sources += files(
 >           'rte_hypervisor.c',
 >           'rte_mmu.c',
 >           'rte_power_intrinsics.c',
 > +        'rte_random.c',
 >   )
 > diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
 > index 7ba4f8ba97..56074f0c6a 100644
 > --- a/lib/eal/arm/rte_cpuflags.c
 > +++ b/lib/eal/arm/rte_cpuflags.c
 > @@ -116,6 +116,7 @@ const struct feature_entry 
rte_cpu_feature_table[] = {
 >   	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
 >   	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
 >   	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
 > +	FEAT_DEF(RNG,		REG_HWCAP2,   16)
 >   };
 >   #endif /* RTE_ARCH */
 >
 > diff --git a/lib/eal/arm/rte_random.c b/lib/eal/arm/rte_random.c
 > new file mode 100644
 > index 0000000000..74c8fa733b
 > --- /dev/null
 > +++ b/lib/eal/arm/rte_random.c
 > @@ -0,0 +1,20 @@
 > +/* SPDX-License-Identifier: BSD-3-Clause
 > + * Copyright(c) 2024 Arm Limited
 > + */
 > +
 > +#include "arm_acle.h"
 > +#include "rte_common.h"
 > +#include "rte_random.h"
 > +#include <errno.h>
 > +
 > +int
 > +rte_trand(uint64_t *val)
 > +{
 > +#if defined __ARM_FEATURE_RNG
 > +	int ret = __rndr(val);
 > +	return (ret == 0) ? 0 : -ENODATA;
 > +#else
 > +	RTE_SET_USED(val);
 > +	return -ENOTSUP;
 > +#endif
 > +}
 > diff --git a/lib/eal/common/rte_random.c 
b/lib/eal/common/eal_common_random.c
 > similarity index 100%
 > rename from lib/eal/common/rte_random.c
 > rename to lib/eal/common/eal_common_random.c
 > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
 > index 22a626ba6f..c4405aa48b 100644
 > --- a/lib/eal/common/meson.build
 > +++ b/lib/eal/common/meson.build
 > @@ -32,7 +32,6 @@ sources += files(
 >           'malloc_elem.c',
 >           'malloc_heap.c',
 >           'rte_malloc.c',
 > -        'rte_random.c',
 >           'rte_reciprocal.c',
 >           'rte_service.c',
 >           'rte_version.c',
 > @@ -48,6 +47,7 @@ if not is_windows
 >               'eal_common_trace.c',
 >               'eal_common_trace_ctf.c',
 >               'eal_common_trace_utils.c',
 > +            'eal_common_random.c',
 >               'hotplug_mp.c',
 >               'malloc_mp.c',
 >               'rte_keepalive.c',
 > diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
 > index 5031c6fe5f..e6b5ac46ed 100644
 > --- a/lib/eal/include/rte_random.h
 > +++ b/lib/eal/include/rte_random.h
 > @@ -15,6 +15,7 @@
 >   extern "C" {
 >   #endif
 >
 > +#include <rte_compat.h>
 >   #include <stdint.h>
 >
 >   /**
 > @@ -84,6 +85,22 @@ rte_rand_max(uint64_t upper_bound);
 >    */
 >   double rte_drand(void);
 >
 > +/**
 > + * Get a true random value.
 > + *
 > + * The generator is cryptographically secure.

If you want to extend <rte_random.h> with a cryptographically secure 
random number generator, that's fine.

To have an API that's only available on certain ARM CPUs is not.

NAK

A new function should be called something with "secure", rather than 
"true" (which is a bit silly, since we might well live in a completely 
deterministic universe). "secure" would more clearly communicate the 
intent, and also doesn't imply any particular implementation.

 > + *
 > + * @param val
 > + *   A pointer to store a true random value between 0 and (1<<64)-1.
 > + * @return
 > + *   0 on success
 > + *   -ENODATA on failure
 > + *   -ENOTSUP if unsupported
 > + */
 > +__rte_experimental
 > +int
 > +rte_trand(uint64_t *val);
 > +
 >   #ifdef __cplusplus
 >   }
 >   #endif
 > diff --git a/lib/eal/loongarch/meson.build 
b/lib/eal/loongarch/meson.build
 > index 3acfe6c3bd..f1dc741598 100644
 > --- a/lib/eal/loongarch/meson.build
 > +++ b/lib/eal/loongarch/meson.build
 > @@ -9,4 +9,5 @@ sources += files(
 >           'rte_hypervisor.c',
 >           'rte_mmu.c',
 >           'rte_power_intrinsics.c',
 > +        'rte_random.c',
 >   )
 > diff --git a/lib/eal/loongarch/rte_random.c 
b/lib/eal/loongarch/rte_random.c
 > new file mode 100644
 > index 0000000000..d033c47ea1
 > --- /dev/null
 > +++ b/lib/eal/loongarch/rte_random.c
 > @@ -0,0 +1,14 @@
 > +/* SPDX-License-Identifier: BSD-3-Clause
 > + * Copyright(c) 2024 Arm Limited
 > + */
 > +
 > +#include "rte_common.h"
 > +#include "rte_random.h"
 > +#include <errno.h>
 > +
 > +int
 > +rte_trand(uint64_t *val)
 > +{
 > +	RTE_SET_USED(val);
 > +	return -ENOTSUP;
 > +}
 > diff --git a/lib/eal/ppc/meson.build b/lib/eal/ppc/meson.build
 > index eeeaeee240..4bf3add6f3 100644
 > --- a/lib/eal/ppc/meson.build
 > +++ b/lib/eal/ppc/meson.build
 > @@ -9,4 +9,5 @@ sources += files(
 >           'rte_hypervisor.c',
 >           'rte_mmu.c',
 >           'rte_power_intrinsics.c',
 > +        'rte_random.c',
 >   )
 > diff --git a/lib/eal/ppc/rte_random.c b/lib/eal/ppc/rte_random.c
 > new file mode 100644
 > index 0000000000..d033c47ea1
 > --- /dev/null
 > +++ b/lib/eal/ppc/rte_random.c
 > @@ -0,0 +1,14 @@
 > +/* SPDX-License-Identifier: BSD-3-Clause
 > + * Copyright(c) 2024 Arm Limited
 > + */
 > +
 > +#include "rte_common.h"
 > +#include "rte_random.h"
 > +#include <errno.h>
 > +
 > +int
 > +rte_trand(uint64_t *val)
 > +{
 > +	RTE_SET_USED(val);
 > +	return -ENOTSUP;
 > +}
 > diff --git a/lib/eal/riscv/meson.build b/lib/eal/riscv/meson.build
 > index 6fba3d6ba7..e9e468cbf0 100644
 > --- a/lib/eal/riscv/meson.build
 > +++ b/lib/eal/riscv/meson.build
 > @@ -9,4 +9,5 @@ sources += files(
 >           'rte_hypervisor.c',
 >           'rte_mmu.c',
 >           'rte_power_intrinsics.c',
 > +        'rte_random.c',
 >   )
 > diff --git a/lib/eal/riscv/rte_random.c b/lib/eal/riscv/rte_random.c
 > new file mode 100644
 > index 0000000000..d033c47ea1
 > --- /dev/null
 > +++ b/lib/eal/riscv/rte_random.c
 > @@ -0,0 +1,14 @@
 > +/* SPDX-License-Identifier: BSD-3-Clause
 > + * Copyright(c) 2024 Arm Limited
 > + */
 > +
 > +#include "rte_common.h"
 > +#include "rte_random.h"
 > +#include <errno.h>
 > +
 > +int
 > +rte_trand(uint64_t *val)
 > +{
 > +	RTE_SET_USED(val);
 > +	return -ENOTSUP;
 > +}
 > diff --git a/lib/eal/version.map b/lib/eal/version.map
 > index 3df50c3fbb..d8b69ea609 100644
 > --- a/lib/eal/version.map
 > +++ b/lib/eal/version.map
 > @@ -396,6 +396,7 @@ EXPERIMENTAL {
 >
 >   	# added in 24.03
 >   	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
 > +	rte_trand;
 >   };
 >
 >   INTERNAL {
 > diff --git a/lib/eal/x86/meson.build b/lib/eal/x86/meson.build
 > index e08dffa13d..b67bdb3238 100644
 > --- a/lib/eal/x86/meson.build
 > +++ b/lib/eal/x86/meson.build
 > @@ -10,4 +10,5 @@ sources += files(
 >           'rte_mmu.c',
 >           'rte_spinlock.c',
 >           'rte_power_intrinsics.c',
 > +        'rte_random.c',
 >   )
 > diff --git a/lib/eal/x86/rte_random.c b/lib/eal/x86/rte_random.c
 > new file mode 100644
 > index 0000000000..d033c47ea1
 > --- /dev/null
 > +++ b/lib/eal/x86/rte_random.c
 > @@ -0,0 +1,14 @@
 > +/* SPDX-License-Identifier: BSD-3-Clause
 > + * Copyright(c) 2024 Arm Limited
 > + */
 > +
 > +#include "rte_common.h"
 > +#include "rte_random.h"
 > +#include <errno.h>
 > +
 > +int
 > +rte_trand(uint64_t *val)
 > +{
 > +	RTE_SET_USED(val);
 > +	return -ENOTSUP;
 > +}
  
Stephen Hemminger July 24, 2024, 2:35 p.m. UTC | #2
On Wed, 24 Jul 2024 08:40:39 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> On 2024-07-23 23:27, Shunzhi Wen wrote:> True Random Number Generator 
> (TRNG) is capable of
>  > generating random numbers from a physical entropy source.
>  > TRNG is enabled when compiled on Arm arch that supports
>  > FEAT_RNG introduced in Armv8.5-A flagged by __ARM_FEATURE_RNG.
>  >  
> 
> I'm missing a rationale here. Why is this useful?
> 
>  > Signed-off-by: Shunzhi Wen <shunzhi.wen@arm.com>
>  > Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
>  > Reviewed-by: Jack Bond-Preston <jack.bond-preston@arm.com>
>  > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
>  > ---
>  >   .mailmap                                      |  2 ++
>  >   app/test/test_rand_perf.c                     | 25 +++++++++++++++++--
>  >   config/arm/meson.build                        |  2 +-
>  >   lib/eal/arm/include/rte_cpuflags_64.h         |  3 +++
>  >   lib/eal/arm/meson.build                       |  1 +
>  >   lib/eal/arm/rte_cpuflags.c                    |  1 +
>  >   lib/eal/arm/rte_random.c                      | 20 +++++++++++++++
>  >   .../{rte_random.c => eal_common_random.c}     |  0
>  >   lib/eal/common/meson.build                    |  2 +-
>  >   lib/eal/include/rte_random.h                  | 17 +++++++++++++
>  >   lib/eal/loongarch/meson.build                 |  1 +
>  >   lib/eal/loongarch/rte_random.c                | 14 +++++++++++
>  >   lib/eal/ppc/meson.build                       |  1 +
>  >   lib/eal/ppc/rte_random.c                      | 14 +++++++++++
>  >   lib/eal/riscv/meson.build                     |  1 +
>  >   lib/eal/riscv/rte_random.c                    | 14 +++++++++++
>  >   lib/eal/version.map                           |  1 +
>  >   lib/eal/x86/meson.build                       |  1 +
>  >   lib/eal/x86/rte_random.c                      | 14 +++++++++++
>  >   19 files changed, 130 insertions(+), 4 deletions(-)
>  >   create mode 100644 lib/eal/arm/rte_random.c
>  >   rename lib/eal/common/{rte_random.c => eal_common_random.c} (100%)
>  >   create mode 100644 lib/eal/loongarch/rte_random.c
>  >   create mode 100644 lib/eal/ppc/rte_random.c
>  >   create mode 100644 lib/eal/riscv/rte_random.c
>  >   create mode 100644 lib/eal/x86/rte_random.c
>  >
>  > diff --git a/.mailmap b/.mailmap
>  > index ac06962e82..23209edfd2 100644
>  > --- a/.mailmap
>  > +++ b/.mailmap
>  > @@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
>  >   Dexuan Cui <decui@microsoft.com>
>  >   Dharmik Thakkar <dharmikjayesh.thakkar@arm.com>   
> <dharmik.thakkar@arm.com>
>  >   Dheemanth Mallikarjun <dheemanthm@vmware.com>
>  > +Dhruv Tripathi <dhruv.tripathi@arm.com>
>  >   Diana Wang <na.wang@corigine.com>
>  >   Didier Pallard <didier.pallard@6wind.com>
>  >   Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
>  > @@ -1353,6 +1354,7 @@ Shuki Katzenelson <shuki@lightbitslabs.com>
>  >   Shun Hao <shunh@nvidia.com>
>  >   Shu Shen <shu.shen@radisys.com>
>  >   Shujing Dong <shujing.dong@corigine.com>
>  > +Shunzhi Wen <shunzhi.wen@arm.com>
>  >   Shweta Choudaha <shweta.choudaha@att.com>
>  >   Shyam Kumar Shrivastav <shrivastav.shyam@gmail.com>
>  >   Shy Shyman <shys@nvidia.com> <shys@mellanox.com>
>  > diff --git a/app/test/test_rand_perf.c b/app/test/test_rand_perf.c
>  > index 30204e12c0..b61cc75014 100644
>  > --- a/app/test/test_rand_perf.c
>  > +++ b/app/test/test_rand_perf.c
>  > @@ -20,6 +20,7 @@ static volatile uint64_t vsum;
>  >
>  >   enum rand_type {
>  >   	rand_type_64,
>  > +	rand_type_true_rand_64,
>  >   	rand_type_float,
>  >   	rand_type_bounded_best_case,
>  >   	rand_type_bounded_worst_case
>  > @@ -31,6 +32,8 @@ rand_type_desc(enum rand_type rand_type)
>  >   	switch (rand_type) {
>  >   	case rand_type_64:
>  >   		return "Full 64-bit [rte_rand()]";
>  > +	case rand_type_true_rand_64:
>  > +		return "Full 64-bit True Random [rte_trand()]";
>  >   	case rand_type_float:
>  >   		return "Floating point [rte_drand()]";
>  >   	case rand_type_bounded_best_case:
>  > @@ -50,6 +53,9 @@ test_rand_perf_type(enum rand_type rand_type)
>  >   	uint64_t end;
>  >   	uint64_t sum = 0;
>  >   	uint64_t op_latency;
>  > +	int ret;
>  > +	uint64_t val;
>  > +	uint32_t fail_count = 0;
>  >
>  >   	start = rte_rdtsc();
>  >
>  > @@ -58,6 +64,13 @@ test_rand_perf_type(enum rand_type rand_type)
>  >   		case rand_type_64:
>  >   			sum += rte_rand();
>  >   			break;
>  > +		case rand_type_true_rand_64:
>  > +			ret = rte_trand(&val);
>  > +			if (ret == 0)
>  > +				sum += val;
>  > +			else
>  > +				fail_count++;
>  > +			break;
>  >   		case rand_type_float:
>  >   			sum += 1000. * rte_drand();
>  >   			break;
>  > @@ -77,8 +90,15 @@ test_rand_perf_type(enum rand_type rand_type)
>  >
>  >   	op_latency = (end - start) / ITERATIONS;
>  >
>  > -	printf("%s: %"PRId64" TSC cycles/op\n", rand_type_desc(rand_type),
>  > -	       op_latency);
>  > +	if (!fail_count)
>  > +		printf("%s: %"PRId64" TSC cycles/op\n",
>  > +		       rand_type_desc(rand_type),
>  > +		       op_latency);
>  > +	else
>  > +		printf("%s: %"PRId64" TSC cycles/op (failed %d time(s))\n",
>  > +		       rand_type_desc(rand_type),
>  > +		       op_latency,
>  > +			   fail_count);
>  >   }
>  >
>  >   static int
>  > @@ -89,6 +109,7 @@ test_rand_perf(void)
>  >   	printf("Pseudo-random number generation latencies:\n");
>  >
>  >   	test_rand_perf_type(rand_type_64);
>  > +	test_rand_perf_type(rand_type_true_rand_64);
>  >   	test_rand_perf_type(rand_type_float);
>  >   	test_rand_perf_type(rand_type_bounded_best_case);
>  >   	test_rand_perf_type(rand_type_bounded_worst_case);
>  > diff --git a/config/arm/meson.build b/config/arm/meson.build
>  > index 012935d5d7..13be94254e 100644
>  > --- a/config/arm/meson.build
>  > +++ b/config/arm/meson.build
>  > @@ -95,7 +95,7 @@ part_number_config_arm = {
>  >       },
>  >       '0xd49': {
>  >           'march': 'armv9-a',
>  > -        'march_features': ['sve2'],
>  > +        'march_features': ['sve2', 'rng'],
>  >           'fallback_march': 'armv8.5-a',
>  >           'mcpu': 'neoverse-n2',
>  >           'flags': [
>  > diff --git a/lib/eal/arm/include/rte_cpuflags_64.h   
> b/lib/eal/arm/include/rte_cpuflags_64.h
>  > index afe70209c3..6aa067339f 100644
>  > --- a/lib/eal/arm/include/rte_cpuflags_64.h
>  > +++ b/lib/eal/arm/include/rte_cpuflags_64.h
>  > @@ -36,6 +36,9 @@ enum rte_cpu_flag_t {
>  >   	RTE_CPUFLAG_SVEF64MM,
>  >   	RTE_CPUFLAG_SVEBF16,
>  >   	RTE_CPUFLAG_AARCH64,
>  > +
>  > +	/* RNDR, RNDRRS instructions */
>  > +	RTE_CPUFLAG_RNG,
>  >   };
>  >
>  >   #include "generic/rte_cpuflags.h"
>  > diff --git a/lib/eal/arm/meson.build b/lib/eal/arm/meson.build
>  > index 6fba3d6ba7..e9e468cbf0 100644
>  > --- a/lib/eal/arm/meson.build
>  > +++ b/lib/eal/arm/meson.build
>  > @@ -9,4 +9,5 @@ sources += files(
>  >           'rte_hypervisor.c',
>  >           'rte_mmu.c',
>  >           'rte_power_intrinsics.c',
>  > +        'rte_random.c',
>  >   )
>  > diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
>  > index 7ba4f8ba97..56074f0c6a 100644
>  > --- a/lib/eal/arm/rte_cpuflags.c
>  > +++ b/lib/eal/arm/rte_cpuflags.c
>  > @@ -116,6 +116,7 @@ const struct feature_entry   
> rte_cpu_feature_table[] = {
>  >   	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
>  >   	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
>  >   	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
>  > +	FEAT_DEF(RNG,		REG_HWCAP2,   16)
>  >   };
>  >   #endif /* RTE_ARCH */
>  >
>  > diff --git a/lib/eal/arm/rte_random.c b/lib/eal/arm/rte_random.c
>  > new file mode 100644
>  > index 0000000000..74c8fa733b
>  > --- /dev/null
>  > +++ b/lib/eal/arm/rte_random.c
>  > @@ -0,0 +1,20 @@
>  > +/* SPDX-License-Identifier: BSD-3-Clause
>  > + * Copyright(c) 2024 Arm Limited
>  > + */
>  > +
>  > +#include "arm_acle.h"
>  > +#include "rte_common.h"
>  > +#include "rte_random.h"
>  > +#include <errno.h>
>  > +
>  > +int
>  > +rte_trand(uint64_t *val)
>  > +{
>  > +#if defined __ARM_FEATURE_RNG
>  > +	int ret = __rndr(val);
>  > +	return (ret == 0) ? 0 : -ENODATA;
>  > +#else
>  > +	RTE_SET_USED(val);
>  > +	return -ENOTSUP;
>  > +#endif
>  > +}
>  > diff --git a/lib/eal/common/rte_random.c   
> b/lib/eal/common/eal_common_random.c
>  > similarity index 100%
>  > rename from lib/eal/common/rte_random.c
>  > rename to lib/eal/common/eal_common_random.c
>  > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
>  > index 22a626ba6f..c4405aa48b 100644
>  > --- a/lib/eal/common/meson.build
>  > +++ b/lib/eal/common/meson.build
>  > @@ -32,7 +32,6 @@ sources += files(
>  >           'malloc_elem.c',
>  >           'malloc_heap.c',
>  >           'rte_malloc.c',
>  > -        'rte_random.c',
>  >           'rte_reciprocal.c',
>  >           'rte_service.c',
>  >           'rte_version.c',
>  > @@ -48,6 +47,7 @@ if not is_windows
>  >               'eal_common_trace.c',
>  >               'eal_common_trace_ctf.c',
>  >               'eal_common_trace_utils.c',
>  > +            'eal_common_random.c',
>  >               'hotplug_mp.c',
>  >               'malloc_mp.c',
>  >               'rte_keepalive.c',
>  > diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
>  > index 5031c6fe5f..e6b5ac46ed 100644
>  > --- a/lib/eal/include/rte_random.h
>  > +++ b/lib/eal/include/rte_random.h
>  > @@ -15,6 +15,7 @@
>  >   extern "C" {
>  >   #endif
>  >
>  > +#include <rte_compat.h>
>  >   #include <stdint.h>
>  >
>  >   /**
>  > @@ -84,6 +85,22 @@ rte_rand_max(uint64_t upper_bound);
>  >    */
>  >   double rte_drand(void);
>  >
>  > +/**
>  > + * Get a true random value.
>  > + *
>  > + * The generator is cryptographically secure.  
> 
> If you want to extend <rte_random.h> with a cryptographically secure 
> random number generator, that's fine.
> 
> To have an API that's only available on certain ARM CPUs is not.
> 
> NAK
> 
> A new function should be called something with "secure", rather than 
> "true" (which is a bit silly, since we might well live in a completely 
> deterministic universe). "secure" would more clearly communicate the 
> intent, and also doesn't imply any particular implementation.

Agree, with Mattias. What constitutes a secure random number generator
is a matter of much debate. Most of the HW random generators are taking
diode (Schottky noise) and doing transforms on it to get something uniform.

If a key generation type API was added, why not just use existing and more
researched kernel get_random()?
  
Mattias Rönnblom July 24, 2024, 3:07 p.m. UTC | #3
On 2024-07-24 16:35, Stephen Hemminger wrote:
> On Wed, 24 Jul 2024 08:40:39 +0200
> Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> 
>> On 2024-07-23 23:27, Shunzhi Wen wrote:> True Random Number Generator
>> (TRNG) is capable of
>>   > generating random numbers from a physical entropy source.
>>   > TRNG is enabled when compiled on Arm arch that supports
>>   > FEAT_RNG introduced in Armv8.5-A flagged by __ARM_FEATURE_RNG.
>>   >
>>
>> I'm missing a rationale here. Why is this useful?
>>
>>   > Signed-off-by: Shunzhi Wen <shunzhi.wen@arm.com>
>>   > Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
>>   > Reviewed-by: Jack Bond-Preston <jack.bond-preston@arm.com>
>>   > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
>>   > ---
>>   >   .mailmap                                      |  2 ++
>>   >   app/test/test_rand_perf.c                     | 25 +++++++++++++++++--
>>   >   config/arm/meson.build                        |  2 +-
>>   >   lib/eal/arm/include/rte_cpuflags_64.h         |  3 +++
>>   >   lib/eal/arm/meson.build                       |  1 +
>>   >   lib/eal/arm/rte_cpuflags.c                    |  1 +
>>   >   lib/eal/arm/rte_random.c                      | 20 +++++++++++++++
>>   >   .../{rte_random.c => eal_common_random.c}     |  0
>>   >   lib/eal/common/meson.build                    |  2 +-
>>   >   lib/eal/include/rte_random.h                  | 17 +++++++++++++
>>   >   lib/eal/loongarch/meson.build                 |  1 +
>>   >   lib/eal/loongarch/rte_random.c                | 14 +++++++++++
>>   >   lib/eal/ppc/meson.build                       |  1 +
>>   >   lib/eal/ppc/rte_random.c                      | 14 +++++++++++
>>   >   lib/eal/riscv/meson.build                     |  1 +
>>   >   lib/eal/riscv/rte_random.c                    | 14 +++++++++++
>>   >   lib/eal/version.map                           |  1 +
>>   >   lib/eal/x86/meson.build                       |  1 +
>>   >   lib/eal/x86/rte_random.c                      | 14 +++++++++++
>>   >   19 files changed, 130 insertions(+), 4 deletions(-)
>>   >   create mode 100644 lib/eal/arm/rte_random.c
>>   >   rename lib/eal/common/{rte_random.c => eal_common_random.c} (100%)
>>   >   create mode 100644 lib/eal/loongarch/rte_random.c
>>   >   create mode 100644 lib/eal/ppc/rte_random.c
>>   >   create mode 100644 lib/eal/riscv/rte_random.c
>>   >   create mode 100644 lib/eal/x86/rte_random.c
>>   >
>>   > diff --git a/.mailmap b/.mailmap
>>   > index ac06962e82..23209edfd2 100644
>>   > --- a/.mailmap
>>   > +++ b/.mailmap
>>   > @@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
>>   >   Dexuan Cui <decui@microsoft.com>
>>   >   Dharmik Thakkar <dharmikjayesh.thakkar@arm.com>
>> <dharmik.thakkar@arm.com>
>>   >   Dheemanth Mallikarjun <dheemanthm@vmware.com>
>>   > +Dhruv Tripathi <dhruv.tripathi@arm.com>
>>   >   Diana Wang <na.wang@corigine.com>
>>   >   Didier Pallard <didier.pallard@6wind.com>
>>   >   Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
>>   > @@ -1353,6 +1354,7 @@ Shuki Katzenelson <shuki@lightbitslabs.com>
>>   >   Shun Hao <shunh@nvidia.com>
>>   >   Shu Shen <shu.shen@radisys.com>
>>   >   Shujing Dong <shujing.dong@corigine.com>
>>   > +Shunzhi Wen <shunzhi.wen@arm.com>
>>   >   Shweta Choudaha <shweta.choudaha@att.com>
>>   >   Shyam Kumar Shrivastav <shrivastav.shyam@gmail.com>
>>   >   Shy Shyman <shys@nvidia.com> <shys@mellanox.com>
>>   > diff --git a/app/test/test_rand_perf.c b/app/test/test_rand_perf.c
>>   > index 30204e12c0..b61cc75014 100644
>>   > --- a/app/test/test_rand_perf.c
>>   > +++ b/app/test/test_rand_perf.c
>>   > @@ -20,6 +20,7 @@ static volatile uint64_t vsum;
>>   >
>>   >   enum rand_type {
>>   >   	rand_type_64,
>>   > +	rand_type_true_rand_64,
>>   >   	rand_type_float,
>>   >   	rand_type_bounded_best_case,
>>   >   	rand_type_bounded_worst_case
>>   > @@ -31,6 +32,8 @@ rand_type_desc(enum rand_type rand_type)
>>   >   	switch (rand_type) {
>>   >   	case rand_type_64:
>>   >   		return "Full 64-bit [rte_rand()]";
>>   > +	case rand_type_true_rand_64:
>>   > +		return "Full 64-bit True Random [rte_trand()]";
>>   >   	case rand_type_float:
>>   >   		return "Floating point [rte_drand()]";
>>   >   	case rand_type_bounded_best_case:
>>   > @@ -50,6 +53,9 @@ test_rand_perf_type(enum rand_type rand_type)
>>   >   	uint64_t end;
>>   >   	uint64_t sum = 0;
>>   >   	uint64_t op_latency;
>>   > +	int ret;
>>   > +	uint64_t val;
>>   > +	uint32_t fail_count = 0;
>>   >
>>   >   	start = rte_rdtsc();
>>   >
>>   > @@ -58,6 +64,13 @@ test_rand_perf_type(enum rand_type rand_type)
>>   >   		case rand_type_64:
>>   >   			sum += rte_rand();
>>   >   			break;
>>   > +		case rand_type_true_rand_64:
>>   > +			ret = rte_trand(&val);
>>   > +			if (ret == 0)
>>   > +				sum += val;
>>   > +			else
>>   > +				fail_count++;
>>   > +			break;
>>   >   		case rand_type_float:
>>   >   			sum += 1000. * rte_drand();
>>   >   			break;
>>   > @@ -77,8 +90,15 @@ test_rand_perf_type(enum rand_type rand_type)
>>   >
>>   >   	op_latency = (end - start) / ITERATIONS;
>>   >
>>   > -	printf("%s: %"PRId64" TSC cycles/op\n", rand_type_desc(rand_type),
>>   > -	       op_latency);
>>   > +	if (!fail_count)
>>   > +		printf("%s: %"PRId64" TSC cycles/op\n",
>>   > +		       rand_type_desc(rand_type),
>>   > +		       op_latency);
>>   > +	else
>>   > +		printf("%s: %"PRId64" TSC cycles/op (failed %d time(s))\n",
>>   > +		       rand_type_desc(rand_type),
>>   > +		       op_latency,
>>   > +			   fail_count);
>>   >   }
>>   >
>>   >   static int
>>   > @@ -89,6 +109,7 @@ test_rand_perf(void)
>>   >   	printf("Pseudo-random number generation latencies:\n");
>>   >
>>   >   	test_rand_perf_type(rand_type_64);
>>   > +	test_rand_perf_type(rand_type_true_rand_64);
>>   >   	test_rand_perf_type(rand_type_float);
>>   >   	test_rand_perf_type(rand_type_bounded_best_case);
>>   >   	test_rand_perf_type(rand_type_bounded_worst_case);
>>   > diff --git a/config/arm/meson.build b/config/arm/meson.build
>>   > index 012935d5d7..13be94254e 100644
>>   > --- a/config/arm/meson.build
>>   > +++ b/config/arm/meson.build
>>   > @@ -95,7 +95,7 @@ part_number_config_arm = {
>>   >       },
>>   >       '0xd49': {
>>   >           'march': 'armv9-a',
>>   > -        'march_features': ['sve2'],
>>   > +        'march_features': ['sve2', 'rng'],
>>   >           'fallback_march': 'armv8.5-a',
>>   >           'mcpu': 'neoverse-n2',
>>   >           'flags': [
>>   > diff --git a/lib/eal/arm/include/rte_cpuflags_64.h
>> b/lib/eal/arm/include/rte_cpuflags_64.h
>>   > index afe70209c3..6aa067339f 100644
>>   > --- a/lib/eal/arm/include/rte_cpuflags_64.h
>>   > +++ b/lib/eal/arm/include/rte_cpuflags_64.h
>>   > @@ -36,6 +36,9 @@ enum rte_cpu_flag_t {
>>   >   	RTE_CPUFLAG_SVEF64MM,
>>   >   	RTE_CPUFLAG_SVEBF16,
>>   >   	RTE_CPUFLAG_AARCH64,
>>   > +
>>   > +	/* RNDR, RNDRRS instructions */
>>   > +	RTE_CPUFLAG_RNG,
>>   >   };
>>   >
>>   >   #include "generic/rte_cpuflags.h"
>>   > diff --git a/lib/eal/arm/meson.build b/lib/eal/arm/meson.build
>>   > index 6fba3d6ba7..e9e468cbf0 100644
>>   > --- a/lib/eal/arm/meson.build
>>   > +++ b/lib/eal/arm/meson.build
>>   > @@ -9,4 +9,5 @@ sources += files(
>>   >           'rte_hypervisor.c',
>>   >           'rte_mmu.c',
>>   >           'rte_power_intrinsics.c',
>>   > +        'rte_random.c',
>>   >   )
>>   > diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
>>   > index 7ba4f8ba97..56074f0c6a 100644
>>   > --- a/lib/eal/arm/rte_cpuflags.c
>>   > +++ b/lib/eal/arm/rte_cpuflags.c
>>   > @@ -116,6 +116,7 @@ const struct feature_entry
>> rte_cpu_feature_table[] = {
>>   >   	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
>>   >   	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
>>   >   	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
>>   > +	FEAT_DEF(RNG,		REG_HWCAP2,   16)
>>   >   };
>>   >   #endif /* RTE_ARCH */
>>   >
>>   > diff --git a/lib/eal/arm/rte_random.c b/lib/eal/arm/rte_random.c
>>   > new file mode 100644
>>   > index 0000000000..74c8fa733b
>>   > --- /dev/null
>>   > +++ b/lib/eal/arm/rte_random.c
>>   > @@ -0,0 +1,20 @@
>>   > +/* SPDX-License-Identifier: BSD-3-Clause
>>   > + * Copyright(c) 2024 Arm Limited
>>   > + */
>>   > +
>>   > +#include "arm_acle.h"
>>   > +#include "rte_common.h"
>>   > +#include "rte_random.h"
>>   > +#include <errno.h>
>>   > +
>>   > +int
>>   > +rte_trand(uint64_t *val)
>>   > +{
>>   > +#if defined __ARM_FEATURE_RNG
>>   > +	int ret = __rndr(val);
>>   > +	return (ret == 0) ? 0 : -ENODATA;
>>   > +#else
>>   > +	RTE_SET_USED(val);
>>   > +	return -ENOTSUP;
>>   > +#endif
>>   > +}
>>   > diff --git a/lib/eal/common/rte_random.c
>> b/lib/eal/common/eal_common_random.c
>>   > similarity index 100%
>>   > rename from lib/eal/common/rte_random.c
>>   > rename to lib/eal/common/eal_common_random.c
>>   > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
>>   > index 22a626ba6f..c4405aa48b 100644
>>   > --- a/lib/eal/common/meson.build
>>   > +++ b/lib/eal/common/meson.build
>>   > @@ -32,7 +32,6 @@ sources += files(
>>   >           'malloc_elem.c',
>>   >           'malloc_heap.c',
>>   >           'rte_malloc.c',
>>   > -        'rte_random.c',
>>   >           'rte_reciprocal.c',
>>   >           'rte_service.c',
>>   >           'rte_version.c',
>>   > @@ -48,6 +47,7 @@ if not is_windows
>>   >               'eal_common_trace.c',
>>   >               'eal_common_trace_ctf.c',
>>   >               'eal_common_trace_utils.c',
>>   > +            'eal_common_random.c',
>>   >               'hotplug_mp.c',
>>   >               'malloc_mp.c',
>>   >               'rte_keepalive.c',
>>   > diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
>>   > index 5031c6fe5f..e6b5ac46ed 100644
>>   > --- a/lib/eal/include/rte_random.h
>>   > +++ b/lib/eal/include/rte_random.h
>>   > @@ -15,6 +15,7 @@
>>   >   extern "C" {
>>   >   #endif
>>   >
>>   > +#include <rte_compat.h>
>>   >   #include <stdint.h>
>>   >
>>   >   /**
>>   > @@ -84,6 +85,22 @@ rte_rand_max(uint64_t upper_bound);
>>   >    */
>>   >   double rte_drand(void);
>>   >
>>   > +/**
>>   > + * Get a true random value.
>>   > + *
>>   > + * The generator is cryptographically secure.
>>
>> If you want to extend <rte_random.h> with a cryptographically secure
>> random number generator, that's fine.
>>
>> To have an API that's only available on certain ARM CPUs is not.
>>
>> NAK
>>
>> A new function should be called something with "secure", rather than
>> "true" (which is a bit silly, since we might well live in a completely
>> deterministic universe). "secure" would more clearly communicate the
>> intent, and also doesn't imply any particular implementation.
> 
> Agree, with Mattias. What constitutes a secure random number generator
> is a matter of much debate. Most of the HW random generators are taking
> diode (Schottky noise) and doing transforms on it to get something uniform.
> 
> If a key generation type API was added, why not just use existing and more
> researched kernel get_random()?
> 

Ideally, you want to avoid system calls on lcore workers doing packet 
processing. If you have to do system calls (which I believe is the case 
here), it's better to a simple call, not so often.

getentropy() seems to need about 800 core clock cycles on my x86_64, on 
average. (rte_rand() needs ~11 cc/call.) 800 cc is not too horrible, but 
system calls tend to have some pretty bad tail latencies.

To improve efficiency, one could do a getentropy() on a relatively large 
buffer, and cache the result on a per-lcore basis, amortizing the system 
call overhead over many calls.

You still have the tail latency issue to deal with. We could have a 
control thread providing entropy for the lcores, but that seems like 
massive overkill.
  
Stephen Hemminger July 24, 2024, 4:16 p.m. UTC | #4
On Wed, 24 Jul 2024 17:07:49 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> On 2024-07-24 16:35, Stephen Hemminger wrote:
> > On Wed, 24 Jul 2024 08:40:39 +0200
> > Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> >   
> >> On 2024-07-23 23:27, Shunzhi Wen wrote:> True Random Number Generator
> >> (TRNG) is capable of  
> >>   > generating random numbers from a physical entropy source.
> >>   > TRNG is enabled when compiled on Arm arch that supports
> >>   > FEAT_RNG introduced in Armv8.5-A flagged by __ARM_FEATURE_RNG.
> >>   >  
> >>
> >> I'm missing a rationale here. Why is this useful?
> >>  
> >>   > Signed-off-by: Shunzhi Wen <shunzhi.wen@arm.com>
> >>   > Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> >>   > Reviewed-by: Jack Bond-Preston <jack.bond-preston@arm.com>
> >>   > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> >>   > ---
> >>   >   .mailmap                                      |  2 ++
> >>   >   app/test/test_rand_perf.c                     | 25 +++++++++++++++++--
> >>   >   config/arm/meson.build                        |  2 +-
> >>   >   lib/eal/arm/include/rte_cpuflags_64.h         |  3 +++
> >>   >   lib/eal/arm/meson.build                       |  1 +
> >>   >   lib/eal/arm/rte_cpuflags.c                    |  1 +
> >>   >   lib/eal/arm/rte_random.c                      | 20 +++++++++++++++
> >>   >   .../{rte_random.c => eal_common_random.c}     |  0
> >>   >   lib/eal/common/meson.build                    |  2 +-
> >>   >   lib/eal/include/rte_random.h                  | 17 +++++++++++++
> >>   >   lib/eal/loongarch/meson.build                 |  1 +
> >>   >   lib/eal/loongarch/rte_random.c                | 14 +++++++++++
> >>   >   lib/eal/ppc/meson.build                       |  1 +
> >>   >   lib/eal/ppc/rte_random.c                      | 14 +++++++++++
> >>   >   lib/eal/riscv/meson.build                     |  1 +
> >>   >   lib/eal/riscv/rte_random.c                    | 14 +++++++++++
> >>   >   lib/eal/version.map                           |  1 +
> >>   >   lib/eal/x86/meson.build                       |  1 +
> >>   >   lib/eal/x86/rte_random.c                      | 14 +++++++++++
> >>   >   19 files changed, 130 insertions(+), 4 deletions(-)
> >>   >   create mode 100644 lib/eal/arm/rte_random.c
> >>   >   rename lib/eal/common/{rte_random.c => eal_common_random.c} (100%)
> >>   >   create mode 100644 lib/eal/loongarch/rte_random.c
> >>   >   create mode 100644 lib/eal/ppc/rte_random.c
> >>   >   create mode 100644 lib/eal/riscv/rte_random.c
> >>   >   create mode 100644 lib/eal/x86/rte_random.c
> >>   >
> >>   > diff --git a/.mailmap b/.mailmap
> >>   > index ac06962e82..23209edfd2 100644
> >>   > --- a/.mailmap
> >>   > +++ b/.mailmap
> >>   > @@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
> >>   >   Dexuan Cui <decui@microsoft.com>
> >>   >   Dharmik Thakkar <dharmikjayesh.thakkar@arm.com>  
> >> <dharmik.thakkar@arm.com>  
> >>   >   Dheemanth Mallikarjun <dheemanthm@vmware.com>
> >>   > +Dhruv Tripathi <dhruv.tripathi@arm.com>
> >>   >   Diana Wang <na.wang@corigine.com>
> >>   >   Didier Pallard <didier.pallard@6wind.com>
> >>   >   Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
> >>   > @@ -1353,6 +1354,7 @@ Shuki Katzenelson <shuki@lightbitslabs.com>
> >>   >   Shun Hao <shunh@nvidia.com>
> >>   >   Shu Shen <shu.shen@radisys.com>
> >>   >   Shujing Dong <shujing.dong@corigine.com>
> >>   > +Shunzhi Wen <shunzhi.wen@arm.com>
> >>   >   Shweta Choudaha <shweta.choudaha@att.com>
> >>   >   Shyam Kumar Shrivastav <shrivastav.shyam@gmail.com>
> >>   >   Shy Shyman <shys@nvidia.com> <shys@mellanox.com>
> >>   > diff --git a/app/test/test_rand_perf.c b/app/test/test_rand_perf.c
> >>   > index 30204e12c0..b61cc75014 100644
> >>   > --- a/app/test/test_rand_perf.c
> >>   > +++ b/app/test/test_rand_perf.c
> >>   > @@ -20,6 +20,7 @@ static volatile uint64_t vsum;
> >>   >
> >>   >   enum rand_type {
> >>   >   	rand_type_64,
> >>   > +	rand_type_true_rand_64,
> >>   >   	rand_type_float,
> >>   >   	rand_type_bounded_best_case,
> >>   >   	rand_type_bounded_worst_case
> >>   > @@ -31,6 +32,8 @@ rand_type_desc(enum rand_type rand_type)
> >>   >   	switch (rand_type) {
> >>   >   	case rand_type_64:
> >>   >   		return "Full 64-bit [rte_rand()]";
> >>   > +	case rand_type_true_rand_64:
> >>   > +		return "Full 64-bit True Random [rte_trand()]";
> >>   >   	case rand_type_float:
> >>   >   		return "Floating point [rte_drand()]";
> >>   >   	case rand_type_bounded_best_case:
> >>   > @@ -50,6 +53,9 @@ test_rand_perf_type(enum rand_type rand_type)
> >>   >   	uint64_t end;
> >>   >   	uint64_t sum = 0;
> >>   >   	uint64_t op_latency;
> >>   > +	int ret;
> >>   > +	uint64_t val;
> >>   > +	uint32_t fail_count = 0;
> >>   >
> >>   >   	start = rte_rdtsc();
> >>   >
> >>   > @@ -58,6 +64,13 @@ test_rand_perf_type(enum rand_type rand_type)
> >>   >   		case rand_type_64:
> >>   >   			sum += rte_rand();
> >>   >   			break;
> >>   > +		case rand_type_true_rand_64:
> >>   > +			ret = rte_trand(&val);
> >>   > +			if (ret == 0)
> >>   > +				sum += val;
> >>   > +			else
> >>   > +				fail_count++;
> >>   > +			break;
> >>   >   		case rand_type_float:
> >>   >   			sum += 1000. * rte_drand();
> >>   >   			break;
> >>   > @@ -77,8 +90,15 @@ test_rand_perf_type(enum rand_type rand_type)
> >>   >
> >>   >   	op_latency = (end - start) / ITERATIONS;
> >>   >
> >>   > -	printf("%s: %"PRId64" TSC cycles/op\n", rand_type_desc(rand_type),
> >>   > -	       op_latency);
> >>   > +	if (!fail_count)
> >>   > +		printf("%s: %"PRId64" TSC cycles/op\n",
> >>   > +		       rand_type_desc(rand_type),
> >>   > +		       op_latency);
> >>   > +	else
> >>   > +		printf("%s: %"PRId64" TSC cycles/op (failed %d time(s))\n",
> >>   > +		       rand_type_desc(rand_type),
> >>   > +		       op_latency,
> >>   > +			   fail_count);
> >>   >   }
> >>   >
> >>   >   static int
> >>   > @@ -89,6 +109,7 @@ test_rand_perf(void)
> >>   >   	printf("Pseudo-random number generation latencies:\n");
> >>   >
> >>   >   	test_rand_perf_type(rand_type_64);
> >>   > +	test_rand_perf_type(rand_type_true_rand_64);
> >>   >   	test_rand_perf_type(rand_type_float);
> >>   >   	test_rand_perf_type(rand_type_bounded_best_case);
> >>   >   	test_rand_perf_type(rand_type_bounded_worst_case);
> >>   > diff --git a/config/arm/meson.build b/config/arm/meson.build
> >>   > index 012935d5d7..13be94254e 100644
> >>   > --- a/config/arm/meson.build
> >>   > +++ b/config/arm/meson.build
> >>   > @@ -95,7 +95,7 @@ part_number_config_arm = {
> >>   >       },
> >>   >       '0xd49': {
> >>   >           'march': 'armv9-a',
> >>   > -        'march_features': ['sve2'],
> >>   > +        'march_features': ['sve2', 'rng'],
> >>   >           'fallback_march': 'armv8.5-a',
> >>   >           'mcpu': 'neoverse-n2',
> >>   >           'flags': [
> >>   > diff --git a/lib/eal/arm/include/rte_cpuflags_64.h  
> >> b/lib/eal/arm/include/rte_cpuflags_64.h  
> >>   > index afe70209c3..6aa067339f 100644
> >>   > --- a/lib/eal/arm/include/rte_cpuflags_64.h
> >>   > +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> >>   > @@ -36,6 +36,9 @@ enum rte_cpu_flag_t {
> >>   >   	RTE_CPUFLAG_SVEF64MM,
> >>   >   	RTE_CPUFLAG_SVEBF16,
> >>   >   	RTE_CPUFLAG_AARCH64,
> >>   > +
> >>   > +	/* RNDR, RNDRRS instructions */
> >>   > +	RTE_CPUFLAG_RNG,
> >>   >   };
> >>   >
> >>   >   #include "generic/rte_cpuflags.h"
> >>   > diff --git a/lib/eal/arm/meson.build b/lib/eal/arm/meson.build
> >>   > index 6fba3d6ba7..e9e468cbf0 100644
> >>   > --- a/lib/eal/arm/meson.build
> >>   > +++ b/lib/eal/arm/meson.build
> >>   > @@ -9,4 +9,5 @@ sources += files(
> >>   >           'rte_hypervisor.c',
> >>   >           'rte_mmu.c',
> >>   >           'rte_power_intrinsics.c',
> >>   > +        'rte_random.c',
> >>   >   )
> >>   > diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
> >>   > index 7ba4f8ba97..56074f0c6a 100644
> >>   > --- a/lib/eal/arm/rte_cpuflags.c
> >>   > +++ b/lib/eal/arm/rte_cpuflags.c
> >>   > @@ -116,6 +116,7 @@ const struct feature_entry  
> >> rte_cpu_feature_table[] = {  
> >>   >   	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
> >>   >   	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
> >>   >   	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
> >>   > +	FEAT_DEF(RNG,		REG_HWCAP2,   16)
> >>   >   };
> >>   >   #endif /* RTE_ARCH */
> >>   >
> >>   > diff --git a/lib/eal/arm/rte_random.c b/lib/eal/arm/rte_random.c
> >>   > new file mode 100644
> >>   > index 0000000000..74c8fa733b
> >>   > --- /dev/null
> >>   > +++ b/lib/eal/arm/rte_random.c
> >>   > @@ -0,0 +1,20 @@
> >>   > +/* SPDX-License-Identifier: BSD-3-Clause
> >>   > + * Copyright(c) 2024 Arm Limited
> >>   > + */
> >>   > +
> >>   > +#include "arm_acle.h"
> >>   > +#include "rte_common.h"
> >>   > +#include "rte_random.h"
> >>   > +#include <errno.h>
> >>   > +
> >>   > +int
> >>   > +rte_trand(uint64_t *val)
> >>   > +{
> >>   > +#if defined __ARM_FEATURE_RNG
> >>   > +	int ret = __rndr(val);
> >>   > +	return (ret == 0) ? 0 : -ENODATA;
> >>   > +#else
> >>   > +	RTE_SET_USED(val);
> >>   > +	return -ENOTSUP;
> >>   > +#endif
> >>   > +}
> >>   > diff --git a/lib/eal/common/rte_random.c  
> >> b/lib/eal/common/eal_common_random.c  
> >>   > similarity index 100%
> >>   > rename from lib/eal/common/rte_random.c
> >>   > rename to lib/eal/common/eal_common_random.c
> >>   > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
> >>   > index 22a626ba6f..c4405aa48b 100644
> >>   > --- a/lib/eal/common/meson.build
> >>   > +++ b/lib/eal/common/meson.build
> >>   > @@ -32,7 +32,6 @@ sources += files(
> >>   >           'malloc_elem.c',
> >>   >           'malloc_heap.c',
> >>   >           'rte_malloc.c',
> >>   > -        'rte_random.c',
> >>   >           'rte_reciprocal.c',
> >>   >           'rte_service.c',
> >>   >           'rte_version.c',
> >>   > @@ -48,6 +47,7 @@ if not is_windows
> >>   >               'eal_common_trace.c',
> >>   >               'eal_common_trace_ctf.c',
> >>   >               'eal_common_trace_utils.c',
> >>   > +            'eal_common_random.c',
> >>   >               'hotplug_mp.c',
> >>   >               'malloc_mp.c',
> >>   >               'rte_keepalive.c',
> >>   > diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
> >>   > index 5031c6fe5f..e6b5ac46ed 100644
> >>   > --- a/lib/eal/include/rte_random.h
> >>   > +++ b/lib/eal/include/rte_random.h
> >>   > @@ -15,6 +15,7 @@
> >>   >   extern "C" {
> >>   >   #endif
> >>   >
> >>   > +#include <rte_compat.h>
> >>   >   #include <stdint.h>
> >>   >
> >>   >   /**
> >>   > @@ -84,6 +85,22 @@ rte_rand_max(uint64_t upper_bound);
> >>   >    */
> >>   >   double rte_drand(void);
> >>   >
> >>   > +/**
> >>   > + * Get a true random value.
> >>   > + *
> >>   > + * The generator is cryptographically secure.  
> >>
> >> If you want to extend <rte_random.h> with a cryptographically secure
> >> random number generator, that's fine.
> >>
> >> To have an API that's only available on certain ARM CPUs is not.
> >>
> >> NAK
> >>
> >> A new function should be called something with "secure", rather than
> >> "true" (which is a bit silly, since we might well live in a completely
> >> deterministic universe). "secure" would more clearly communicate the
> >> intent, and also doesn't imply any particular implementation.  
> > 
> > Agree, with Mattias. What constitutes a secure random number generator
> > is a matter of much debate. Most of the HW random generators are taking
> > diode (Schottky noise) and doing transforms on it to get something uniform.
> > 
> > If a key generation type API was added, why not just use existing and more
> > researched kernel get_random()?
> >   
> 
> Ideally, you want to avoid system calls on lcore workers doing packet 
> processing. If you have to do system calls (which I believe is the case 
> here), it's better to a simple call, not so often.
> 
> getentropy() seems to need about 800 core clock cycles on my x86_64, on 
> average. (rte_rand() needs ~11 cc/call.) 800 cc is not too horrible, but 
> system calls tend to have some pretty bad tail latencies.
> 
> To improve efficiency, one could do a getentropy() on a relatively large 
> buffer, and cache the result on a per-lcore basis, amortizing the system 
> call overhead over many calls.
> 
> You still have the tail latency issue to deal with. We could have a 
> control thread providing entropy for the lcores, but that seems like 
> massive overkill.


Getrandom is a vsyscall on current kernels, and it manages use of entropy across
multiple sources. If you are doing lots of key generation, you don't want to
hit the hardware every time.

https://lwn.net/Articles/974468/
  
Mattias Rönnblom July 24, 2024, 7:14 p.m. UTC | #5
On 2024-07-24 18:16, Stephen Hemminger wrote:

<snip>

>>>>    > +/**
>>>>    > + * Get a true random value.
>>>>    > + *
>>>>    > + * The generator is cryptographically secure.
>>>>
>>>> If you want to extend <rte_random.h> with a cryptographically secure
>>>> random number generator, that's fine.
>>>>
>>>> To have an API that's only available on certain ARM CPUs is not.
>>>>
>>>> NAK
>>>>
>>>> A new function should be called something with "secure", rather than
>>>> "true" (which is a bit silly, since we might well live in a completely
>>>> deterministic universe). "secure" would more clearly communicate the
>>>> intent, and also doesn't imply any particular implementation.
>>>
>>> Agree, with Mattias. What constitutes a secure random number generator
>>> is a matter of much debate. Most of the HW random generators are taking
>>> diode (Schottky noise) and doing transforms on it to get something uniform.
>>>
>>> If a key generation type API was added, why not just use existing and more
>>> researched kernel get_random()?
>>>    
>>
>> Ideally, you want to avoid system calls on lcore workers doing packet
>> processing. If you have to do system calls (which I believe is the case
>> here), it's better to a simple call, not so often.
>>
>> getentropy() seems to need about 800 core clock cycles on my x86_64, on
>> average. (rte_rand() needs ~11 cc/call.) 800 cc is not too horrible, but
>> system calls tend to have some pretty bad tail latencies.
>>
>> To improve efficiency, one could do a getentropy() on a relatively large
>> buffer, and cache the result on a per-lcore basis, amortizing the system
>> call overhead over many calls.
>>
>> You still have the tail latency issue to deal with. We could have a
>> control thread providing entropy for the lcores, but that seems like
>> massive overkill.
> 
> 
> Getrandom is a vsyscall on current kernels, and it manages use of entropy across
> multiple sources. If you are doing lots of key generation, you don't want to
> hit the hardware every time.
> 
> https://lwn.net/Articles/974468/
> 
> 

If I understand things correctly, the getrandom() vDSO support was 
mainlined *today*, so you need to be current indeed to have a vDSO 
getrandom(). :)

The above benchmark (rand_perf_autotest with rte_rand() implemented with 
getentropy()) was run on Linux 5.15 and glibc 2.35, so a regular system 
call was used.

(getentropy() delegates to getrandom(), so the performance is the same.)
  
Stephen Hemminger July 24, 2024, 8:02 p.m. UTC | #6
On Wed, 24 Jul 2024 21:14:30 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> >> Ideally, you want to avoid system calls on lcore workers doing packet
> >> processing. If you have to do system calls (which I believe is the case
> >> here), it's better to a simple call, not so often.
> >>
> >> getentropy() seems to need about 800 core clock cycles on my x86_64, on
> >> average. (rte_rand() needs ~11 cc/call.) 800 cc is not too horrible, but
> >> system calls tend to have some pretty bad tail latencies.
> >>
> >> To improve efficiency, one could do a getentropy() on a relatively large
> >> buffer, and cache the result on a per-lcore basis, amortizing the system
> >> call overhead over many calls.
> >>
> >> You still have the tail latency issue to deal with. We could have a
> >> control thread providing entropy for the lcores, but that seems like
> >> massive overkill.  
> > 
> > 
> > Getrandom is a vsyscall on current kernels, and it manages use of entropy across
> > multiple sources. If you are doing lots of key generation, you don't want to
> > hit the hardware every time.
> > 
> > https://lwn.net/Articles/974468/
> > 
> >   
> 
> If I understand things correctly, the getrandom() vDSO support was 
> mainlined *today*, so you need to be current indeed to have a vDSO 
> getrandom(). :)

Yes, it is headed for 6.11, but doubt that any reasonable workload
is going to be constrained by crypto key generation.

> 
> The above benchmark (rand_perf_autotest with rte_rand() implemented with 
> getentropy()) was run on Linux 5.15 and glibc 2.35, so a regular system 
> call was used.
> 
> (getentropy() delegates to getrandom(), so the performance is the same.)

I would trust the upstream kernel support for secure random more than
anything DPDK could develop. As soon as we get deeper into crypto it
opens up a whole new security domain and attack surface.
  
Mattias Rönnblom July 25, 2024, 4:48 a.m. UTC | #7
On 2024-07-24 22:02, Stephen Hemminger wrote:
> On Wed, 24 Jul 2024 21:14:30 +0200
> Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> 
>>>> Ideally, you want to avoid system calls on lcore workers doing packet
>>>> processing. If you have to do system calls (which I believe is the case
>>>> here), it's better to a simple call, not so often.
>>>>
>>>> getentropy() seems to need about 800 core clock cycles on my x86_64, on
>>>> average. (rte_rand() needs ~11 cc/call.) 800 cc is not too horrible, but
>>>> system calls tend to have some pretty bad tail latencies.
>>>>
>>>> To improve efficiency, one could do a getentropy() on a relatively large
>>>> buffer, and cache the result on a per-lcore basis, amortizing the system
>>>> call overhead over many calls.
>>>>
>>>> You still have the tail latency issue to deal with. We could have a
>>>> control thread providing entropy for the lcores, but that seems like
>>>> massive overkill.
>>>
>>>
>>> Getrandom is a vsyscall on current kernels, and it manages use of entropy across
>>> multiple sources. If you are doing lots of key generation, you don't want to
>>> hit the hardware every time.
>>>
>>> https://lwn.net/Articles/974468/
>>>
>>>    
>>
>> If I understand things correctly, the getrandom() vDSO support was
>> mainlined *today*, so you need to be current indeed to have a vDSO
>> getrandom(). :)
> 
> Yes, it is headed for 6.11, but doubt that any reasonable workload
> is going to be constrained by crypto key generation.
> 

For certain *very* latency-sensitive applications the main concern with 
a non-vDSO system call would be jitter, and not necessarily average 
latency (i.e., a throughput degradation).

>>
>> The above benchmark (rand_perf_autotest with rte_rand() implemented with
>> getentropy()) was run on Linux 5.15 and glibc 2.35, so a regular system
>> call was used.
>>
>> (getentropy() delegates to getrandom(), so the performance is the same.)
> 
> I would trust the upstream kernel support for secure random more than
> anything DPDK could develop. As soon as we get deeper into crypto it
> opens up a whole new security domain and attack surface.
> 

I much agree here.

What potentially would be useful is an EAL-level OS wrapper. So 
getrandom() for UNIX-like OSes, and something else for Windows. In 
addition, you could make larger getrandom() calls to shave off some 
cycles on the average (at least for the non-vDSO case).

It seems to me we should defer the introduction of anything like that 
until a) it's needed by a DPDK library, or b) someone on the application 
side is asking for it.
  
Stephen Hemminger July 25, 2024, 2:56 p.m. UTC | #8
On Thu, 25 Jul 2024 06:48:47 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> >>
> >> The above benchmark (rand_perf_autotest with rte_rand() implemented with
> >> getentropy()) was run on Linux 5.15 and glibc 2.35, so a regular system
> >> call was used.
> >>
> >> (getentropy() delegates to getrandom(), so the performance is the same.)  
> > 
> > I would trust the upstream kernel support for secure random more than
> > anything DPDK could develop. As soon as we get deeper into crypto it
> > opens up a whole new security domain and attack surface.
> >   
> 
> I much agree here.
> 
> What potentially would be useful is an EAL-level OS wrapper. So 
> getrandom() for UNIX-like OSes, and something else for Windows. In 
> addition, you could make larger getrandom() calls to shave off some 
> cycles on the average (at least for the non-vDSO case).
> 
> It seems to me we should defer the introduction of anything like that 
> until a) it's needed by a DPDK library, or b) someone on the application 
> side is asking for it.

Agreed. It doesn't make sense for DPDK to become a crypto library.
The community doesn't have the expertise and the infrastructure is missing
and there are several other projects that handle that OpenSSL etc.
  
Shunzhi Wen July 26, 2024, 6:34 p.m. UTC | #9
> I'm missing a rationale here. Why is this useful?
>
This creates an API for HW that supports cryptographically secure random number generation.

> If you want to extend <rte_random.h> with a cryptographically secure
> random number generator, that's fine.
>
> To have an API that's only available on certain ARM CPUs is not.
>
> NAK
>
The primary goal of this patch is to provide a direct interface to HW,
instead of letting kernel handle it. This is not an API just for Arm
CPUs, as other vendors also have similar HW features. For instance,
Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
can easily implement this API.

> A new function should be called something with "secure", rather than "true"
> (which is a bit silly, since we might well live in a completely deterministic
> universe). "secure" would more clearly communicate the intent, and also
> doesn't imply any particular implementation.
>
Regarding the terminology, “cryptographically secure random number”
is a more accurate and meaningful term than “true random number.”
This change will be made in the description, and the function name will
be replaced with rte_csrand.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Stephen Hemminger July 26, 2024, 7 p.m. UTC | #10
On Fri, 26 Jul 2024 18:34:44 +0000
Shunzhi Wen <Shunzhi.Wen@arm.com> wrote:

> > I'm missing a rationale here. Why is this useful?
> >  
> This creates an API for HW that supports cryptographically secure random number generation.
> 
> > If you want to extend <rte_random.h> with a cryptographically secure
> > random number generator, that's fine.
> >
> > To have an API that's only available on certain ARM CPUs is not.
> >
> > NAK
> >  
> The primary goal of this patch is to provide a direct interface to HW,
> instead of letting kernel handle it. This is not an API just for Arm
> CPUs, as other vendors also have similar HW features. For instance,
> Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
> can easily implement this API.
> 
> > A new function should be called something with "secure", rather than "true"
> > (which is a bit silly, since we might well live in a completely deterministic
> > universe). "secure" would more clearly communicate the intent, and also
> > doesn't imply any particular implementation.
> >  
> Regarding the terminology, “cryptographically secure random number”
> is a more accurate and meaningful term than “true random number.”
> This change will be made in the description, and the function name will
> be replaced with rte_csrand.

If you decide to rte_csrand() it should fallback to get_random or get_entropy.

Note: many people don't fully trust RDRAND or ARM CPU instructions. 
That is why the Linux entropy calls do not use only the HW instructions.
  
Wathsala Wathawana Vithanage July 26, 2024, 8:12 p.m. UTC | #11
> 
> If you decide to rte_csrand() it should fallback to get_random or get_entropy.
> 
> Note: many people don't fully trust RDRAND or ARM CPU instructions.
> That is why the Linux entropy calls do not use only the HW instructions.

Thanks Stephen. I understand the concern, would it be acceptable for you if
rte_csrand() by default used Linux entropy calls and still had a runtime parameter
to enable HW based csrng for supported CPU in rte_csrand for those who need it?
  
Mattias Rönnblom July 26, 2024, 10:33 p.m. UTC | #12
On 2024-07-26 20:34, Shunzhi Wen wrote:
>> I'm missing a rationale here. Why is this useful?
>>
> This creates an API for HW that supports cryptographically secure random number generation.
> 
>> If you want to extend <rte_random.h> with a cryptographically secure
>> random number generator, that's fine.
>>
>> To have an API that's only available on certain ARM CPUs is not.
>>
>> NAK
>>
> The primary goal of this patch is to provide a direct interface to HW,
> instead of letting kernel handle it. This is not an API just for Arm
> CPUs, as other vendors also have similar HW features. For instance,
> Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
> can easily implement this API.
> 

No DPDK library (or PMD) currently needs this functionality, and no 
application, to my knowledge, has asked for this. If an app or a DPDK 
library would require cryptographically secure random numbers, it would 
most likely require it on all CPU/OS platforms (and with all DPDK -march 
flags).

RDRAND is only available on certain x86_64 CPUs, and is incredibly slow 
- slower than getting entropy via the kernel, even with non-vDSO syscalls.

Agner Fog lists the RDRAND latency as ~3700 cc for Zen 2. Later 
generations of both AMD and Intel CPUs have much shorter latencies, but 
a reciprocal throughput so low that one have to wait thousands of clock 
cycles before issuing another RDRAND, or risk stalling the core.

My Raptor Lake seems to require ~1000 cc retire RDRAND, which is ~11x 
slower than getting entropy (in bulk) via getentropy().

What is the latency for the ARM equivalent? Does it also have a 
reciprocal throughput issue?

>> A new function should be called something with "secure", rather than "true"
>> (which is a bit silly, since we might well live in a completely deterministic
>> universe). "secure" would more clearly communicate the intent, and also
>> doesn't imply any particular implementation.
>>
> Regarding the terminology, “cryptographically secure random number”
> is a more accurate and meaningful term than “true random number.”
> This change will be made in the description, and the function name will
> be replaced with rte_csrand.
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Wathsala Wathawana Vithanage July 27, 2024, 3:45 p.m. UTC | #13
Hi Mattias,

> > The primary goal of this patch is to provide a direct interface to HW,
> > instead of letting kernel handle it. This is not an API just for Arm
> > CPUs, as other vendors also have similar HW features. For instance,
> > Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
> > can easily implement this API.
> >
> 
> No DPDK library (or PMD) currently needs this functionality, and no
> application, to my knowledge, has asked for this. If an app or a DPDK library
> would require cryptographically secure random numbers, it would most likely
> require it on all CPU/OS platforms (and with all DPDK -march flags).
> 

I'm sorry, I'm not following this. Are you saying 

(a) adding a feature someone hasn't already asked for is impossible?

(b) it is impossible to add support for a feature that is only available in a few CPUs
of an architecture family?

> RDRAND is only available on certain x86_64 CPUs, and is incredibly slow
> - slower than getting entropy via the kernel, even with non-vDSO syscalls.
> 
> Agner Fog lists the RDRAND latency as ~3700 cc for Zen 2. Later generations of
> both AMD and Intel CPUs have much shorter latencies, but a reciprocal
> throughput so low that one have to wait thousands of clock cycles before
> issuing another RDRAND, or risk stalling the core.
> 
> My Raptor Lake seems to require ~1000 cc retire RDRAND, which is ~11x
> slower than getting entropy (in bulk) via getentropy().
> 
> What is the latency for the ARM equivalent? Does it also have a reciprocal
> throughput issue?
> 

Agree, from the numbers you are showing, it looks like they are all slow and
unsuitable for the data plane. But aren't there multi-plane DPDK applications
with control-plane threads that can benefit from a CSRNG, albeit slow?
  
Stephen Hemminger July 27, 2024, 3:54 p.m. UTC | #14
On Sat, 27 Jul 2024 15:45:55 +0000
Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com> wrote:

> Hi Mattias,
> 
> > > The primary goal of this patch is to provide a direct interface to HW,
> > > instead of letting kernel handle it. This is not an API just for Arm
> > > CPUs, as other vendors also have similar HW features. For instance,
> > > Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
> > > can easily implement this API.
> > >  
> > 
> > No DPDK library (or PMD) currently needs this functionality, and no
> > application, to my knowledge, has asked for this. If an app or a DPDK library
> > would require cryptographically secure random numbers, it would most likely
> > require it on all CPU/OS platforms (and with all DPDK -march flags).
> >   
> 
> I'm sorry, I'm not following this. Are you saying 
> 
> (a) adding a feature someone hasn't already asked for is impossible?
> 
> (b) it is impossible to add support for a feature that is only available in a few CPUs
> of an architecture family?
> 
> > RDRAND is only available on certain x86_64 CPUs, and is incredibly slow
> > - slower than getting entropy via the kernel, even with non-vDSO syscalls.
> > 
> > Agner Fog lists the RDRAND latency as ~3700 cc for Zen 2. Later generations of
> > both AMD and Intel CPUs have much shorter latencies, but a reciprocal
> > throughput so low that one have to wait thousands of clock cycles before
> > issuing another RDRAND, or risk stalling the core.
> > 
> > My Raptor Lake seems to require ~1000 cc retire RDRAND, which is ~11x
> > slower than getting entropy (in bulk) via getentropy().
> > 
> > What is the latency for the ARM equivalent? Does it also have a reciprocal
> > throughput issue?
> >   
> 
> Agree, from the numbers you are showing, it looks like they are all slow and
> unsuitable for the data plane. But aren't there multi-plane DPDK applications
> with control-plane threads that can benefit from a CSRNG, albeit slow?


The issue is that RDRAND and ARM RNG are both hardware features and the
trust level is debatable. (See Wikipedia on RDRAND). The DPDK should stay
out of this security fight because if DPDK offers direct access to HW
RNG then naive vendors will start using it in applications. Then when
it is revealed that some nation state has compromised HW RNG the blame
will land on DPDK for enabling this. Lets not not get into the supply
chain problem here. DPDK is an application environment, not a benchmark
environment.


The answer is to have API's like (rte_csrand) which then call the OS
level primitives. The trust is then passed to the OS. I trust Linus,
Theo de Raadt, and the rest of the open OS community to evaluate and
integrate the best secure random number generator.
  
Mattias Rönnblom July 27, 2024, 5:07 p.m. UTC | #15
On 2024-07-27 17:45, Wathsala Wathawana Vithanage wrote:
> Hi Mattias,
> 
>>> The primary goal of this patch is to provide a direct interface to HW,
>>> instead of letting kernel handle it. This is not an API just for Arm
>>> CPUs, as other vendors also have similar HW features. For instance,
>>> Intel and AMD has support for x86 RDRAND and RDSEED instructions, thus
>>> can easily implement this API.
>>>
>>
>> No DPDK library (or PMD) currently needs this functionality, and no
>> application, to my knowledge, has asked for this. If an app or a DPDK library
>> would require cryptographically secure random numbers, it would most likely
>> require it on all CPU/OS platforms (and with all DPDK -march flags).
>>
> 
> I'm sorry, I'm not following this. Are you saying
> 
> (a) adding a feature someone hasn't already asked for is impossible?
> 

No, not if you can explain why this feature will be useful. You guys 
made no such attempt.

> (b) it is impossible to add support for a feature that is only available in a few CPUs
> of an architecture family?
> 

Cryptographically secure random numbers are available on all CPUs, via 
the operating system. Arguably, such random numbers are more secure than 
anything that a machine instruction can offer.

If your patch are to have non-zero chance of being accepted, it should 
include a base implementation based on getrandom() (and the Windows 
equivalent), with the proper optimizations (e.g., batching entropy 
requests to the kernel on a per-lcore basis).

You would also need to provide a rationale why ARM CPU HW random numbers 
are better than what the kernel can offer. The only potential reason I 
can think of is performance, so you would need to quantify that in some way.

In addition, reliance on CPU CSRNG would need to be a build-time option, 
and disabled by default.

Plus, what I've mentioned several times, give a rationale why DPDK 
should have this functionality.

>> RDRAND is only available on certain x86_64 CPUs, and is incredibly slow
>> - slower than getting entropy via the kernel, even with non-vDSO syscalls.
>>
>> Agner Fog lists the RDRAND latency as ~3700 cc for Zen 2. Later generations of
>> both AMD and Intel CPUs have much shorter latencies, but a reciprocal
>> throughput so low that one have to wait thousands of clock cycles before
>> issuing another RDRAND, or risk stalling the core.
>>
>> My Raptor Lake seems to require ~1000 cc retire RDRAND, which is ~11x
>> slower than getting entropy (in bulk) via getentropy().
>>
>> What is the latency for the ARM equivalent? Does it also have a reciprocal
>> throughput issue?
>>
> 
> Agree, from the numbers you are showing, it looks like they are all slow and
> unsuitable for the data plane. But aren't there multi-plane DPDK applications
> with control-plane threads that can benefit from a CSRNG, albeit slow?
> 
>
  
Wathsala Wathawana Vithanage July 27, 2024, 10:27 p.m. UTC | #16
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, July 27, 2024 10:54 AM
> To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
> Cc: Mattias Rönnblom <hofors@lysator.liu.se>; Shunzhi Wen
> <Shunzhi.Wen@arm.com>; thomas@monjalon.net; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Tyler Retzlaff <roretzla@linux.microsoft.com>; Min Zhou
> <zhoumin@loongson.cn>; David Christensen <drc@linux.ibm.com>; Stanislaw
> Kardach <stanislaw.kardach@gmail.com>; Konstantin Ananyev
> <konstantin.v.ananyev@yandex.ru>; dev@dpdk.org; nd <nd@arm.com>; Jack
> Bond-Preston <Jack.Bond-Preston@arm.com>; Dhruv Tripathi
> <Dhruv.Tripathi@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [PATCH] eal: add support for TRNG with Arm RNG feature
> 
> On Sat, 27 Jul 2024 15:45:55 +0000
> Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com> wrote:
> 
> > Hi Mattias,
> >
> > > > The primary goal of this patch is to provide a direct interface to
> > > > HW, instead of letting kernel handle it. This is not an API just
> > > > for Arm CPUs, as other vendors also have similar HW features. For
> > > > instance, Intel and AMD has support for x86 RDRAND and RDSEED
> > > > instructions, thus can easily implement this API.
> > > >
> > >
> > > No DPDK library (or PMD) currently needs this functionality, and no
> > > application, to my knowledge, has asked for this. If an app or a
> > > DPDK library would require cryptographically secure random numbers,
> > > it would most likely require it on all CPU/OS platforms (and with all DPDK -
> march flags).
> > >
> >
> > I'm sorry, I'm not following this. Are you saying
> >
> > (a) adding a feature someone hasn't already asked for is impossible?
> >
> > (b) it is impossible to add support for a feature that is only
> > available in a few CPUs of an architecture family?
> >
> > > RDRAND is only available on certain x86_64 CPUs, and is incredibly
> > > slow
> > > - slower than getting entropy via the kernel, even with non-vDSO syscalls.
> > >
> > > Agner Fog lists the RDRAND latency as ~3700 cc for Zen 2. Later
> > > generations of both AMD and Intel CPUs have much shorter latencies,
> > > but a reciprocal throughput so low that one have to wait thousands
> > > of clock cycles before issuing another RDRAND, or risk stalling the core.
> > >
> > > My Raptor Lake seems to require ~1000 cc retire RDRAND, which is
> > > ~11x slower than getting entropy (in bulk) via getentropy().
> > >
> > > What is the latency for the ARM equivalent? Does it also have a
> > > reciprocal throughput issue?
> > >
> >
> > Agree, from the numbers you are showing, it looks like they are all
> > slow and unsuitable for the data plane. But aren't there multi-plane
> > DPDK applications with control-plane threads that can benefit from a
> CSRNG, albeit slow?
> 
> 
> The issue is that RDRAND and ARM RNG are both hardware features and the
> trust level is debatable. (See Wikipedia on RDRAND). The DPDK should stay
> out of this security fight because if DPDK offers direct access to HW RNG then
> naive vendors will start using it in applications. Then when it is revealed that
> some nation state has compromised HW RNG the blame will land on DPDK for
> enabling this. Lets not not get into the supply chain problem here. DPDK is an
> application environment, not a benchmark environment.
> 
> 
> The answer is to have API's like (rte_csrand) which then call the OS level
> primitives. The trust is then passed to the OS. I trust Linus, Theo de Raadt, and
> the rest of the open OS community to evaluate and integrate the best secure
> random number generator.

Perhaps, you missed my previous email, I understand your concern. Is it acceptable
to you if rte_csrand uses the kernel RNG by default and has a build/run-time
parameter to switch to HW RNG for those who consciously make that decision?
  
Wathsala Wathawana Vithanage July 27, 2024, 10:45 p.m. UTC | #17
> If your patch are to have non-zero chance of being accepted, it should include
> a base implementation based on getrandom() (and the Windows equivalent),
> with the proper optimizations (e.g., batching entropy requests to the kernel
> on a per-lcore basis).
> 
> You would also need to provide a rationale why ARM CPU HW random
> numbers are better than what the kernel can offer. The only potential reason I
> can think of is performance, so you would need to quantify that in some way.
> 
> In addition, reliance on CPU CSRNG would need to be a build-time option, and
> disabled by default.
> 
> Plus, what I've mentioned several times, give a rationale why DPDK should
> have this functionality.
> 

Thank you for the clarification. If I understood correctly following is what's needed.

1- An implementation of rte_csrand that calls getrandom() with per-core entropy caching. Which will be the default for DPDK.
2- Build time flag to enable HW CSRNG for those who want to override the default kernel getrandom() feature. 
3- Rationale for why CPU HW CSRNG is required.

To elaborate on Item 3, the rationale is simple, we want to enable the HW feature for those who need it.
  
Stephen Hemminger July 27, 2024, 11:55 p.m. UTC | #18
On Sat, 27 Jul 2024 22:27:05 +0000
Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com> wrote:

> > The answer is to have API's like (rte_csrand) which then call the OS level
> > primitives. The trust is then passed to the OS. I trust Linus, Theo de Raadt, and
> > the rest of the open OS community to evaluate and integrate the best secure
> > random number generator.  
> 
> Perhaps, you missed my previous email, I understand your concern. Is it acceptable
> to you if rte_csrand uses the kernel RNG by default and has a build/run-time
> parameter to switch to HW RNG for those who consciously make that decision?

No, because then DPDK is endorsing use of HW RNG as sole source of randomness.
If someone really, really wants to do that they can put in their own code
in their own application.
  
Mattias Rönnblom July 28, 2024, 6:42 a.m. UTC | #19
On 2024-07-28 00:45, Wathsala Wathawana Vithanage wrote:
>> If your patch are to have non-zero chance of being accepted, it should include
>> a base implementation based on getrandom() (and the Windows equivalent),
>> with the proper optimizations (e.g., batching entropy requests to the kernel
>> on a per-lcore basis).
>>
>> You would also need to provide a rationale why ARM CPU HW random
>> numbers are better than what the kernel can offer. The only potential reason I
>> can think of is performance, so you would need to quantify that in some way.
>>
>> In addition, reliance on CPU CSRNG would need to be a build-time option, and
>> disabled by default.
>>
>> Plus, what I've mentioned several times, give a rationale why DPDK should
>> have this functionality.
>>
> 
> Thank you for the clarification. If I understood correctly following is what's needed.
> 

...to have a non-zero chance of getting accepted.

> 1- An implementation of rte_csrand that calls getrandom() with per-core entropy caching. Which will be the default for DPDK.
> 2- Build time flag to enable HW CSRNG for those who want to override the default kernel getrandom() feature.
> 3- Rationale for why CPU HW CSRNG is required.
> 
> To elaborate on Item 3, the rationale is simple, we want to enable the HW feature for those who need it.

That's circular. "The reason we want this feature implementation to be 
included is to satisfy those who want this feature implementation."

Stop thinking like an ARM developer on a "software enablement" mission, 
and start thinking like a DPDK library or application developer.
  
Mattias Rönnblom July 28, 2024, 6:46 a.m. UTC | #20
On 2024-07-28 01:55, Stephen Hemminger wrote:
> On Sat, 27 Jul 2024 22:27:05 +0000
> Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com> wrote:
> 
>>> The answer is to have API's like (rte_csrand) which then call the OS level
>>> primitives. The trust is then passed to the OS. I trust Linus, Theo de Raadt, and
>>> the rest of the open OS community to evaluate and integrate the best secure
>>> random number generator.
>>
>> Perhaps, you missed my previous email, I understand your concern. Is it acceptable
>> to you if rte_csrand uses the kernel RNG by default and has a build/run-time
>> parameter to switch to HW RNG for those who consciously make that decision?
> 
> No, because then DPDK is endorsing use of HW RNG as sole source of randomness.
> If someone really, really wants to do that they can put in their own code
> in their own application.

That's a good point. Even a build-time option (with the required caveats 
in the documentation) could be seen as an endorsement.

A DPDK-based security library of some sort could provide hooks for 
custom RNGs.
  
Stephen Hemminger July 28, 2024, 3:52 p.m. UTC | #21
On Sun, 28 Jul 2024 08:42:42 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> > 1- An implementation of rte_csrand that calls getrandom() with per-core entropy caching. Which will be the default for DPDK.
> > 2- Build time flag to enable HW CSRNG for those who want to override the default kernel getrandom() feature.
> > 3- Rationale for why CPU HW CSRNG is required.
> > 
> > To elaborate on Item 3, the rationale is simple, we want to enable the HW feature for those who need it.  
> 
> That's circular. "The reason we want this feature implementation to be 
> included is to satisfy those who want this feature implementation."
> 
> Stop thinking like an ARM developer on a "software enablement" mission, 
> and start thinking like a DPDK library or application developer.

The Linux kernel in arch/arm64/include/asm/archrandom.h has support
already. The support includes lots of checks for capabilities and
caveats for the limitations of the ARM implementation that are not
reflected in this simplistic version.
  
Wathsala Wathawana Vithanage July 29, 2024, 4:34 a.m. UTC | #22
> 
> The Linux kernel in arch/arm64/include/asm/archrandom.h has support
> already. The support includes lots of checks for capabilities and caveats for the
> limitations of the ARM implementation that are not reflected in this simplistic
> version.

No, such checks are unnecessary, because
(a) the patch already uses the capability check carried out by Linux kernel by checking 
the capabilities flag in auxiliary vector (AT_HWCAP). 
(b) it uses Arm acle intrinsic __rndr which is identical to the implementation in the Linux 
kernel where it checks PSTATE.NZCV bits.
  
Wathsala Wathawana Vithanage July 29, 2024, 4:48 a.m. UTC | #23
> > > The answer is to have API's like (rte_csrand) which then call the OS
> > > level primitives. The trust is then passed to the OS. I trust Linus,
> > > Theo de Raadt, and the rest of the open OS community to evaluate and
> > > integrate the best secure random number generator.
> >
> > Perhaps, you missed my previous email, I understand your concern. Is
> > it acceptable to you if rte_csrand uses the kernel RNG by default and
> > has a build/run-time parameter to switch to HW RNG for those who
> consciously make that decision?
> 
> No, because then DPDK is endorsing use of HW RNG as sole source of
> randomness.
> If someone really, really wants to do that they can put in their own code in
> their own application.

How does HW RNG become the sole source of randomness if the default is 
kernel's implementation of CSRNG?

As far as I understand, endorsing is not same as optionality.
  
Wathsala Wathawana Vithanage July 29, 2024, 6:34 a.m. UTC | #24
> ...to have a non-zero chance of getting accepted.
> 

I see too much uncertainty in that statement with "non-zero chance". As the maintainer
of this library and someone with the community's best interest in mind, can you outline
the requirements for a rte_csrand implementation with a higher degree of confidence?

I want to know the criteria that must be satisfied for the acceptance probability to be
close to 1. I'm sorry if I have not been abundantly clear on the requirements that I thought
you would look for in such a patch. Maybe I misread your previous email.

> 
> That's circular. "The reason we want this feature implementation to be
> included is to satisfy those who want this feature implementation."
> 
> Stop thinking like an ARM developer on a "software enablement" mission, and
> start thinking like a DPDK library or application developer.

We all think different in one way or the other, but what's more important is finding a
common ground.
  
Mattias Rönnblom July 29, 2024, 6:47 a.m. UTC | #25
On 2024-07-29 08:34, Wathsala Wathawana Vithanage wrote:
>> ...to have a non-zero chance of getting accepted.
>>
> 
> I see too much uncertainty in that statement with "non-zero chance". As the maintainer
> of this library and someone with the community's best interest in mind, can you outline
> the requirements for a rte_csrand implementation with a higher degree of confidence?
> 
> I want to know the criteria that must be satisfied for the acceptance probability to be
> close to 1. I'm sorry if I have not been abundantly clear on the requirements that I thought
> you would look for in such a patch. Maybe I misread your previous email.
> 

Without a rationale why rte_csrand() functionality is something that 
should be in DPDK, and a rationale why the ARM CPU CSRNG is superior to 
getentropy(), it doesn't really matter how the patch set looks like.

I've repeatedly asked for this information, and you've repeatedly 
ignored it. This does not further your cause.

>>
>> That's circular. "The reason we want this feature implementation to be
>> included is to satisfy those who want this feature implementation."
>>
>> Stop thinking like an ARM developer on a "software enablement" mission, and
>> start thinking like a DPDK library or application developer.
> 
> We all think different in one way or the other, but what's more important is finding a
> common ground.
  
Wathsala Wathawana Vithanage July 29, 2024, 6:16 p.m. UTC | #26
> 
> Without a rationale why rte_csrand() functionality is something that should be
> in DPDK, and a rationale why the ARM CPU CSRNG is superior to getentropy(),
> it doesn't really matter how the patch set looks like.
> 
> I've repeatedly asked for this information, and you've repeatedly ignored it.
> This does not further your cause.
> 
 
I don't want to get into a debate on what's more superior because DPDK already have similar
Setups, take OpenSSL and Marvell's security accelerator for instance. Rationale is simple it boils
down to freedom of choice. 

I have been reiterating that I'm ready to make Kernel getrandom() the default in rte_csrand()
and HW RNG (not limited Arm) a build time option along with your other demands for various
optimizations. Having a build time option to enable HW CSRNG doesn't hinder your freedom to
choose a CSRNG implementation of your linking. 
Neither you nor I are in a place to decide what's right for others; the best we can do is to
collaborate on providing them with options. Leave the decision to users, application developers,
and integrators.

I believe that the coexistence of support for OpenSSL and other HW security accelerators in
DPDK already establishes rationale and precedent.
  
Stephen Hemminger July 29, 2024, 6:31 p.m. UTC | #27
On Mon, 29 Jul 2024 18:16:14 +0000
Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com> wrote:

> > 
> > Without a rationale why rte_csrand() functionality is something that should be
> > in DPDK, and a rationale why the ARM CPU CSRNG is superior to getentropy(),
> > it doesn't really matter how the patch set looks like.
> > 
> > I've repeatedly asked for this information, and you've repeatedly ignored it.
> > This does not further your cause.
> >   
>  
> I don't want to get into a debate on what's more superior because DPDK already have similar
> Setups, take OpenSSL and Marvell's security accelerator for instance. Rationale is simple it boils
> down to freedom of choice. 
> 
> I have been reiterating that I'm ready to make Kernel getrandom() the default in rte_csrand()
> and HW RNG (not limited Arm) a build time option along with your other demands for various
> optimizations. Having a build time option to enable HW CSRNG doesn't hinder your freedom to
> choose a CSRNG implementation of your linking. 
> Neither you nor I are in a place to decide what's right for others; the best we can do is to
> collaborate on providing them with options. Leave the decision to users, application developers,
> and integrators.
> 
> I believe that the coexistence of support for OpenSSL and other HW security accelerators in
> DPDK already establishes rationale and precedent.

Unless there is a specific direct usage in DPDK, my conclusion is that
there is no added benefit to users by adding this to the API in DPDK.
Users are free to use what they want, getentropy, HW instructions.
  
Mattias Rönnblom July 29, 2024, 7:11 p.m. UTC | #28
On 2024-07-29 20:16, Wathsala Wathawana Vithanage wrote:
>>
>> Without a rationale why rte_csrand() functionality is something that should be
>> in DPDK, and a rationale why the ARM CPU CSRNG is superior to getentropy(),
>> it doesn't really matter how the patch set looks like.
>>
>> I've repeatedly asked for this information, and you've repeatedly ignored it.
>> This does not further your cause.
>>
>   
> I don't want to get into a debate on what's more superior because DPDK already have similar
> Setups, take OpenSSL and Marvell's security accelerator for instance. Rationale is simple it boils
> down to freedom of choice.
> 
> I have been reiterating that I'm ready to make Kernel getrandom() the default in rte_csrand()
> and HW RNG (not limited Arm) a build time option along with your other demands for various
> optimizations. Having a build time option to enable HW CSRNG doesn't hinder your freedom to
> choose a CSRNG implementation of your linking.
> Neither you nor I are in a place to decide what's right for others; the best we can do is to
> collaborate on providing them with options. Leave the decision to users, application developers,
> and integrators.
> 
> I believe that the coexistence of support for OpenSSL and other HW security accelerators in
> DPDK already establishes rationale and precedent.
> 
> 

I feel no obligation to offer (potentially relatively uninformed) DPDK 
users with options which I suspect have no benefits, only drawbacks. In 
the x86_64 HW RNG case, I think it's fair to say we *know* it's bad idea 
to use it as a high-performance CSRNG. In the ARM case, you choose to 
leave us in the dark, so I can only assume it looks similar there.

The typical networking SoC's crypto-related hardware offload can pretty 
much always demonstrate benefits.
  
Morten Brørup July 29, 2024, 7:30 p.m. UTC | #29
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Monday, 29 July 2024 21.12
> 
> On 2024-07-29 20:16, Wathsala Wathawana Vithanage wrote:
> >>
> >> Without a rationale why rte_csrand() functionality is something that
> should be
> >> in DPDK, and a rationale why the ARM CPU CSRNG is superior to
> getentropy(),
> >> it doesn't really matter how the patch set looks like.
> >>
> >> I've repeatedly asked for this information, and you've repeatedly
> ignored it.
> >> This does not further your cause.
> >>
> >
> > I don't want to get into a debate on what's more superior because DPDK
> already have similar
> > Setups, take OpenSSL and Marvell's security accelerator for instance.
> Rationale is simple it boils
> > down to freedom of choice.
> >
> > I have been reiterating that I'm ready to make Kernel getrandom() the
> default in rte_csrand()
> > and HW RNG (not limited Arm) a build time option along with your other
> demands for various
> > optimizations. Having a build time option to enable HW CSRNG doesn't
> hinder your freedom to
> > choose a CSRNG implementation of your linking.
> > Neither you nor I are in a place to decide what's right for others;
> the best we can do is to
> > collaborate on providing them with options. Leave the decision to
> users, application developers,
> > and integrators.
> >
> > I believe that the coexistence of support for OpenSSL and other HW
> security accelerators in
> > DPDK already establishes rationale and precedent.
> >
> >
> 
> I feel no obligation to offer (potentially relatively uninformed) DPDK
> users with options which I suspect have no benefits, only drawbacks. In
> the x86_64 HW RNG case, I think it's fair to say we *know* it's bad idea
> to use it as a high-performance CSRNG. In the ARM case, you choose to
> leave us in the dark, so I can only assume it looks similar there.
> 
> The typical networking SoC's crypto-related hardware offload can pretty
> much always demonstrate benefits.

I have been following this discussion, and tend to agree with Mattias and Stephen.

However, I'll take another view at it from a different angle.

DPDK already offers other crypto functions through the crypto drivers. So DPDK is already in the crypto space.

getentropy() is not available on Windows, so adding a cross-platform cryptographically secure random number generator function to DPDK could be useful.


However, the rte_random functions in the EAL is for high-speed pseudo-random number generation, not for crypto use.

No crypto functions - not even true random number generation - belong in the EAL, but in a crypto library. Repeat: No crypto in EAL!

So, either add the new true random number generator as some sort of crypto driver, or add a new crypto library for true random number generation.

Regarding the API, having a 64 bit true random number generator is useless for crypto purposes; the function needs to generate a sequence of random bytes for crypto initialization vectors and similar. It could have an API resembling this:
int rte_crypto_random(char * const buf, size_t num);

However, as Mattias mentions, the underlying implementation must support typical networking SoC's crypto-related hardware offload to be practically useful.
In other words: It is not as simple as we wish for. It seems some sort of "true random generator" driver infrastructure might be required for such SoC hardware offloads.
  

Patch

diff --git a/.mailmap b/.mailmap
index ac06962e82..23209edfd2 100644
--- a/.mailmap
+++ b/.mailmap
@@ -338,6 +338,7 @@  Dexia Li <dexia.li@jaguarmicro.com>
 Dexuan Cui <decui@microsoft.com>
 Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
 Dheemanth Mallikarjun <dheemanthm@vmware.com>
+Dhruv Tripathi <dhruv.tripathi@arm.com>
 Diana Wang <na.wang@corigine.com>
 Didier Pallard <didier.pallard@6wind.com>
 Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
@@ -1353,6 +1354,7 @@  Shuki Katzenelson <shuki@lightbitslabs.com>
 Shun Hao <shunh@nvidia.com>
 Shu Shen <shu.shen@radisys.com>
 Shujing Dong <shujing.dong@corigine.com>
+Shunzhi Wen <shunzhi.wen@arm.com>
 Shweta Choudaha <shweta.choudaha@att.com>
 Shyam Kumar Shrivastav <shrivastav.shyam@gmail.com>
 Shy Shyman <shys@nvidia.com> <shys@mellanox.com>
diff --git a/app/test/test_rand_perf.c b/app/test/test_rand_perf.c
index 30204e12c0..b61cc75014 100644
--- a/app/test/test_rand_perf.c
+++ b/app/test/test_rand_perf.c
@@ -20,6 +20,7 @@  static volatile uint64_t vsum;
 
 enum rand_type {
 	rand_type_64,
+	rand_type_true_rand_64,
 	rand_type_float,
 	rand_type_bounded_best_case,
 	rand_type_bounded_worst_case
@@ -31,6 +32,8 @@  rand_type_desc(enum rand_type rand_type)
 	switch (rand_type) {
 	case rand_type_64:
 		return "Full 64-bit [rte_rand()]";
+	case rand_type_true_rand_64:
+		return "Full 64-bit True Random [rte_trand()]";
 	case rand_type_float:
 		return "Floating point [rte_drand()]";
 	case rand_type_bounded_best_case:
@@ -50,6 +53,9 @@  test_rand_perf_type(enum rand_type rand_type)
 	uint64_t end;
 	uint64_t sum = 0;
 	uint64_t op_latency;
+	int ret;
+	uint64_t val;
+	uint32_t fail_count = 0;
 
 	start = rte_rdtsc();
 
@@ -58,6 +64,13 @@  test_rand_perf_type(enum rand_type rand_type)
 		case rand_type_64:
 			sum += rte_rand();
 			break;
+		case rand_type_true_rand_64:
+			ret = rte_trand(&val);
+			if (ret == 0)
+				sum += val;
+			else
+				fail_count++;
+			break;
 		case rand_type_float:
 			sum += 1000. * rte_drand();
 			break;
@@ -77,8 +90,15 @@  test_rand_perf_type(enum rand_type rand_type)
 
 	op_latency = (end - start) / ITERATIONS;
 
-	printf("%s: %"PRId64" TSC cycles/op\n", rand_type_desc(rand_type),
-	       op_latency);
+	if (!fail_count)
+		printf("%s: %"PRId64" TSC cycles/op\n",
+		       rand_type_desc(rand_type),
+		       op_latency);
+	else
+		printf("%s: %"PRId64" TSC cycles/op (failed %d time(s))\n",
+		       rand_type_desc(rand_type),
+		       op_latency,
+			   fail_count);
 }
 
 static int
@@ -89,6 +109,7 @@  test_rand_perf(void)
 	printf("Pseudo-random number generation latencies:\n");
 
 	test_rand_perf_type(rand_type_64);
+	test_rand_perf_type(rand_type_true_rand_64);
 	test_rand_perf_type(rand_type_float);
 	test_rand_perf_type(rand_type_bounded_best_case);
 	test_rand_perf_type(rand_type_bounded_worst_case);
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 012935d5d7..13be94254e 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -95,7 +95,7 @@  part_number_config_arm = {
     },
     '0xd49': {
         'march': 'armv9-a',
-        'march_features': ['sve2'],
+        'march_features': ['sve2', 'rng'],
         'fallback_march': 'armv8.5-a',
         'mcpu': 'neoverse-n2',
         'flags': [
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..6aa067339f 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -36,6 +36,9 @@  enum rte_cpu_flag_t {
 	RTE_CPUFLAG_SVEF64MM,
 	RTE_CPUFLAG_SVEBF16,
 	RTE_CPUFLAG_AARCH64,
+
+	/* RNDR, RNDRRS instructions */
+	RTE_CPUFLAG_RNG,
 };
 
 #include "generic/rte_cpuflags.h"
diff --git a/lib/eal/arm/meson.build b/lib/eal/arm/meson.build
index 6fba3d6ba7..e9e468cbf0 100644
--- a/lib/eal/arm/meson.build
+++ b/lib/eal/arm/meson.build
@@ -9,4 +9,5 @@  sources += files(
         'rte_hypervisor.c',
         'rte_mmu.c',
         'rte_power_intrinsics.c',
+        'rte_random.c',
 )
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 7ba4f8ba97..56074f0c6a 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -116,6 +116,7 @@  const struct feature_entry rte_cpu_feature_table[] = {
 	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
 	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
 	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
+	FEAT_DEF(RNG,		REG_HWCAP2,   16)
 };
 #endif /* RTE_ARCH */
 
diff --git a/lib/eal/arm/rte_random.c b/lib/eal/arm/rte_random.c
new file mode 100644
index 0000000000..74c8fa733b
--- /dev/null
+++ b/lib/eal/arm/rte_random.c
@@ -0,0 +1,20 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "arm_acle.h"
+#include "rte_common.h"
+#include "rte_random.h"
+#include <errno.h>
+
+int
+rte_trand(uint64_t *val)
+{
+#if defined __ARM_FEATURE_RNG
+	int ret = __rndr(val);
+	return (ret == 0) ? 0 : -ENODATA;
+#else
+	RTE_SET_USED(val);
+	return -ENOTSUP;
+#endif
+}
diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/eal_common_random.c
similarity index 100%
rename from lib/eal/common/rte_random.c
rename to lib/eal/common/eal_common_random.c
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 22a626ba6f..c4405aa48b 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -32,7 +32,6 @@  sources += files(
         'malloc_elem.c',
         'malloc_heap.c',
         'rte_malloc.c',
-        'rte_random.c',
         'rte_reciprocal.c',
         'rte_service.c',
         'rte_version.c',
@@ -48,6 +47,7 @@  if not is_windows
             'eal_common_trace.c',
             'eal_common_trace_ctf.c',
             'eal_common_trace_utils.c',
+            'eal_common_random.c',
             'hotplug_mp.c',
             'malloc_mp.c',
             'rte_keepalive.c',
diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
index 5031c6fe5f..e6b5ac46ed 100644
--- a/lib/eal/include/rte_random.h
+++ b/lib/eal/include/rte_random.h
@@ -15,6 +15,7 @@ 
 extern "C" {
 #endif
 
+#include <rte_compat.h>
 #include <stdint.h>
 
 /**
@@ -84,6 +85,22 @@  rte_rand_max(uint64_t upper_bound);
  */
 double rte_drand(void);
 
+/**
+ * Get a true random value.
+ *
+ * The generator is cryptographically secure.
+ *
+ * @param val
+ *   A pointer to store a true random value between 0 and (1<<64)-1.
+ * @return
+ *   0 on success
+ *   -ENODATA on failure
+ *   -ENOTSUP if unsupported
+ */
+__rte_experimental
+int
+rte_trand(uint64_t *val);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/meson.build b/lib/eal/loongarch/meson.build
index 3acfe6c3bd..f1dc741598 100644
--- a/lib/eal/loongarch/meson.build
+++ b/lib/eal/loongarch/meson.build
@@ -9,4 +9,5 @@  sources += files(
         'rte_hypervisor.c',
         'rte_mmu.c',
         'rte_power_intrinsics.c',
+        'rte_random.c',
 )
diff --git a/lib/eal/loongarch/rte_random.c b/lib/eal/loongarch/rte_random.c
new file mode 100644
index 0000000000..d033c47ea1
--- /dev/null
+++ b/lib/eal/loongarch/rte_random.c
@@ -0,0 +1,14 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "rte_common.h"
+#include "rte_random.h"
+#include <errno.h>
+
+int
+rte_trand(uint64_t *val)
+{
+	RTE_SET_USED(val);
+	return -ENOTSUP;
+}
diff --git a/lib/eal/ppc/meson.build b/lib/eal/ppc/meson.build
index eeeaeee240..4bf3add6f3 100644
--- a/lib/eal/ppc/meson.build
+++ b/lib/eal/ppc/meson.build
@@ -9,4 +9,5 @@  sources += files(
         'rte_hypervisor.c',
         'rte_mmu.c',
         'rte_power_intrinsics.c',
+        'rte_random.c',
 )
diff --git a/lib/eal/ppc/rte_random.c b/lib/eal/ppc/rte_random.c
new file mode 100644
index 0000000000..d033c47ea1
--- /dev/null
+++ b/lib/eal/ppc/rte_random.c
@@ -0,0 +1,14 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "rte_common.h"
+#include "rte_random.h"
+#include <errno.h>
+
+int
+rte_trand(uint64_t *val)
+{
+	RTE_SET_USED(val);
+	return -ENOTSUP;
+}
diff --git a/lib/eal/riscv/meson.build b/lib/eal/riscv/meson.build
index 6fba3d6ba7..e9e468cbf0 100644
--- a/lib/eal/riscv/meson.build
+++ b/lib/eal/riscv/meson.build
@@ -9,4 +9,5 @@  sources += files(
         'rte_hypervisor.c',
         'rte_mmu.c',
         'rte_power_intrinsics.c',
+        'rte_random.c',
 )
diff --git a/lib/eal/riscv/rte_random.c b/lib/eal/riscv/rte_random.c
new file mode 100644
index 0000000000..d033c47ea1
--- /dev/null
+++ b/lib/eal/riscv/rte_random.c
@@ -0,0 +1,14 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "rte_common.h"
+#include "rte_random.h"
+#include <errno.h>
+
+int
+rte_trand(uint64_t *val)
+{
+	RTE_SET_USED(val);
+	return -ENOTSUP;
+}
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 3df50c3fbb..d8b69ea609 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -396,6 +396,7 @@  EXPERIMENTAL {
 
 	# added in 24.03
 	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
+	rte_trand;
 };
 
 INTERNAL {
diff --git a/lib/eal/x86/meson.build b/lib/eal/x86/meson.build
index e08dffa13d..b67bdb3238 100644
--- a/lib/eal/x86/meson.build
+++ b/lib/eal/x86/meson.build
@@ -10,4 +10,5 @@  sources += files(
         'rte_mmu.c',
         'rte_spinlock.c',
         'rte_power_intrinsics.c',
+        'rte_random.c',
 )
diff --git a/lib/eal/x86/rte_random.c b/lib/eal/x86/rte_random.c
new file mode 100644
index 0000000000..d033c47ea1
--- /dev/null
+++ b/lib/eal/x86/rte_random.c
@@ -0,0 +1,14 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "rte_common.h"
+#include "rte_random.h"
+#include <errno.h>
+
+int
+rte_trand(uint64_t *val)
+{
+	RTE_SET_USED(val);
+	return -ENOTSUP;
+}