[2/2] eal: resolve getentropy at run time for random seed

Message ID 20200421195446.1730-3-dg@adax.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series eal: choose initial PRNG seed source at runtime |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues
ci/travis-robot success Travis build: passed

Commit Message

Dan Gora April 21, 2020, 7:54 p.m. UTC
The getentropy() function was introduced into glibc v2.25 and so is
not available on all supported platforms.  Previously, if DPDK was
compiled (using meson) on a system which has getentropy(), it would
introduce a dependency on glibc v2.25 which would prevent that binary
from running on a system with an older glibc.  Similarly if DPDK was
compiled on a system which did not have getentropy(), getentropy()
could not be used even if the execution system supported it.

Introduce a new static function, __rte_getentropy() which will try to
resolve the getentropy() function dynamically using dlopen()/dlsym(),
returning failure if the getentropy() function cannot be resolved or
if it fails.

This also allows getentropy() to be used as the random seed source
when the traditional Makefile build for DPDK is used.

Signed-off-by: Dan Gora <dg@adax.com>
---
 lib/librte_eal/common/rte_random.c | 33 ++++++++++++++++++++++++------
 lib/librte_eal/meson.build         |  3 ---
 2 files changed, 27 insertions(+), 9 deletions(-)
  

Comments

Stephen Hemminger April 21, 2020, 9:03 p.m. UTC | #1
On Tue, 21 Apr 2020 16:54:45 -0300
Dan Gora <dg@adax.com> wrote:

> The getentropy() function was introduced into glibc v2.25 and so is
> not available on all supported platforms.  Previously, if DPDK was
> compiled (using meson) on a system which has getentropy(), it would
> introduce a dependency on glibc v2.25 which would prevent that binary
> from running on a system with an older glibc.  Similarly if DPDK was
> compiled on a system which did not have getentropy(), getentropy()
> could not be used even if the execution system supported it.
> 
> Introduce a new static function, __rte_getentropy() which will try to
> resolve the getentropy() function dynamically using dlopen()/dlsym(),
> returning failure if the getentropy() function cannot be resolved or
> if it fails.
> 
> This also allows getentropy() to be used as the random seed source
> when the traditional Makefile build for DPDK is used.
> 
> Signed-off-by: Dan Gora <dg@adax.com>

I don't think DPDK has a requirement that building on newer glibc
has to work when run on older version. Glibc doesn't support it.
  
Dan Gora April 21, 2020, 9:08 p.m. UTC | #2
On Tue, Apr 21, 2020 at 6:03 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 21 Apr 2020 16:54:45 -0300
> Dan Gora <dg@adax.com> wrote:
>
> > The getentropy() function was introduced into glibc v2.25 and so is
> > not available on all supported platforms.  Previously, if DPDK was
> > compiled (using meson) on a system which has getentropy(), it would
> > introduce a dependency on glibc v2.25 which would prevent that binary
> > from running on a system with an older glibc.  Similarly if DPDK was
> > compiled on a system which did not have getentropy(), getentropy()
> > could not be used even if the execution system supported it.
> >
> > Introduce a new static function, __rte_getentropy() which will try to
> > resolve the getentropy() function dynamically using dlopen()/dlsym(),
> > returning failure if the getentropy() function cannot be resolved or
> > if it fails.
> >
> > This also allows getentropy() to be used as the random seed source
> > when the traditional Makefile build for DPDK is used.
> >
> > Signed-off-by: Dan Gora <dg@adax.com>
>
> I don't think DPDK has a requirement that building on newer glibc
> has to work when run on older version. Glibc doesn't support it.

Maybe it's not a requirement, but it's nice to not have to worry about
it.  Plus, you can compile on a machine which doesn't have
getentropy(), but still use it if it's available on the execution
machine.  It just removes one more requirement that has to match
between the  compilation and target environement.

thanks
dan
  
Mattias Rönnblom April 22, 2020, 8:28 a.m. UTC | #3
On 2020-04-21 21:54, Dan Gora wrote:
> The getentropy() function was introduced into glibc v2.25 and so is
> not available on all supported platforms.  Previously, if DPDK was
> compiled (using meson) on a system which has getentropy(), it would
> introduce a dependency on glibc v2.25 which would prevent that binary
> from running on a system with an older glibc.  Similarly if DPDK was
> compiled on a system which did not have getentropy(), getentropy()
> could not be used even if the execution system supported it.
>
> Introduce a new static function, __rte_getentropy() which will try to
> resolve the getentropy() function dynamically using dlopen()/dlsym(),
> returning failure if the getentropy() function cannot be resolved or
> if it fails.


Two other options: providing a DPDK-native syscall wrapper for 
getrandom(), or falling back to reading /dev/urandom. Have you 
considered any of those two options? If so, why do you prefer 
dlopen()/dlsym()?


Failure to run on old libc seems like a non-issue to me.


> This also allows getentropy() to be used as the random seed source
> when the traditional Makefile build for DPDK is used.
>
> Signed-off-by: Dan Gora <dg@adax.com>
> ---
>   lib/librte_eal/common/rte_random.c | 33 ++++++++++++++++++++++++------
>   lib/librte_eal/meson.build         |  3 ---
>   2 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/lib/librte_eal/common/rte_random.c b/lib/librte_eal/common/rte_random.c
> index df02f1307..f8203f4c7 100644
> --- a/lib/librte_eal/common/rte_random.c
> +++ b/lib/librte_eal/common/rte_random.c
> @@ -7,6 +7,7 @@
>   #endif
>   #include <stdlib.h>
>   #include <unistd.h>
> +#include <dlfcn.h>
>   
>   #include <rte_branch_prediction.h>
>   #include <rte_cycles.h>
> @@ -176,18 +177,38 @@ rte_rand_max(uint64_t upper_bound)
>   	return res;
>   }
>   
> +/* Try to use the getentropy() function from glibc >= 2.25 */
> +static int
> +__rte_getentropy(uint64_t *ge_seed)
> +{
> +	void *handle = NULL;
> +	void **sym;
> +	int (*getentropy_p)(void *__buffer, size_t __length);
> +	int gc_rc;
> +
> +	handle = dlopen("libc.so.6", RTLD_LAZY);
> +	if (!handle)
!= NULL
> +		return -1;
> +
> +	sym = dlsym(handle, "getentropy");
> +	if (!sym || !*sym)

!= NULL again


dlsym() returns a "void *". The man page says nothing about 
de-referencing it. Is that allowed?


> +		/* Cannot resolve getentropy */
> +		return -1;
Missing a dlclose()
> +
> +	getentropy_p = (int (*)(void *, size_t)) sym;
> +	gc_rc = (*getentropy_p)((void *)ge_seed, sizeof(*ge_seed));
> +	dlclose(handle);
> +	return gc_rc;
> +}
> +
>   static uint64_t
>   __rte_random_initial_seed(void)
>   {
> -#ifdef RTE_LIBEAL_USE_GETENTROPY
> -	int ge_rc;
>   	uint64_t ge_seed;
>   
> -	ge_rc = getentropy(&ge_seed, sizeof(ge_seed));
> -
> -	if (ge_rc == 0)
> +	if (__rte_getentropy(&ge_seed) == 0)
>   		return ge_seed;
> -#endif
> +
>   #if defined(RTE_ARCH_X86)
>   	/* first fallback: rdseed instruction, if available */
>   	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RDSEED)) {
> diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
> index 0267c3b9d..748359b8c 100644
> --- a/lib/librte_eal/meson.build
> +++ b/lib/librte_eal/meson.build
> @@ -15,9 +15,6 @@ deps += 'kvargs'
>   if dpdk_conf.has('RTE_USE_LIBBSD')
>   	ext_deps += libbsd
>   endif
> -if cc.has_function('getentropy', prefix : '#include <unistd.h>')
> -	cflags += '-DRTE_LIBEAL_USE_GETENTROPY'
> -endif
>   if cc.has_header('getopt.h')
>   	cflags += ['-DHAVE_GETOPT_H', '-DHAVE_GETOPT', '-DHAVE_GETOPT_LONG']
>   endif
  
Dan Gora April 22, 2020, 5:44 p.m. UTC | #4
On Wed, Apr 22, 2020 at 5:28 AM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-21 21:54, Dan Gora wrote:
> > The getentropy() function was introduced into glibc v2.25 and so is
> > not available on all supported platforms.  Previously, if DPDK was
> > compiled (using meson) on a system which has getentropy(), it would
> > introduce a dependency on glibc v2.25 which would prevent that binary
> > from running on a system with an older glibc.  Similarly if DPDK was
> > compiled on a system which did not have getentropy(), getentropy()
> > could not be used even if the execution system supported it.
> >
> > Introduce a new static function, __rte_getentropy() which will try to
> > resolve the getentropy() function dynamically using dlopen()/dlsym(),
> > returning failure if the getentropy() function cannot be resolved or
> > if it fails.
>
>
> Two other options: providing a DPDK-native syscall wrapper for
> getrandom(), or falling back to reading /dev/urandom. Have you
> considered any of those two options? If so, why do you prefer
> dlopen()/dlsym()?

I didn't give any thought at all to using /dev/urandom.  The goal was
not really to change how the thing worked, just to remove the
dependency on glibc 2.25.

Using a syscall wrapper is not really going to be any easier, and in
fact probably more complicated.  Here is the code in glibc for
getentropy():

int
getentropy (void *buffer, size_t length)
{
  /* The interface is documented to return EIO for buffer lengths
     longer than 256 bytes.  */
  if (length > 256)
    {
      __set_errno (EIO);
      return -1;
    }

  /* Try to fill the buffer completely.  Even with the 256 byte limit
     above, we might still receive an EINTR error (when blocking
     during boot).  */
  void *end = buffer + length;
  while (buffer < end)
    {
      /* NB: No cancellation point.  */
      ssize_t bytes = INLINE_SYSCALL_CALL (getrandom, buffer, end - buffer, 0);
      if (bytes < 0)
        {
          if (errno == EINTR)
            /* Try again if interrupted by a signal.  */
            continue;
          else
            return -1;
        }
      if (bytes == 0)
        {
          /* No more bytes available.  This should not happen under
             normal circumstances.  */
          __set_errno (EIO);
          return -1;
        }
      /* Try again in case of a short read.  */
      buffer += bytes;
    }
  return 0;
}

__rte_getentropy() is easier than this.  Plus the syscall interface
does not solve the problem of being able to compile on a system with
the syscall and run it on a system without it or vice versa, which is
what I was trying to solve.

> Failure to run on old libc seems like a non-issue to me.

Well, again, it's a new dependency that didn't exist before.. We sell
to telco customers, so we have to support 10s of different target
platforms of various ages.  If they update their system, we'd have to
recompile our code to be able to use getentropy().  Similarly, if we
compiled on a system which has getentropy(), but the target system
doesn't, then they cannot run our binary because of the glibc 2.25
dependency.  That means that we have to have separate versions with
and without getentropy().  It's a maintenance headache for no real
benefit.

To my mind, since getentropy() can block it seems like it would
probably be better to just remove it entirely, but I suppose that's up
to the person(s) who put it in in the first place.

> > This also allows getentropy() to be used as the random seed source
> > when the traditional Makefile build for DPDK is used.
> >
> > Signed-off-by: Dan Gora <dg@adax.com>
> > ---
> >   lib/librte_eal/common/rte_random.c | 33 ++++++++++++++++++++++++------
> >   lib/librte_eal/meson.build         |  3 ---
> >   2 files changed, 27 insertions(+), 9 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/rte_random.c b/lib/librte_eal/common/rte_random.c
> > index df02f1307..f8203f4c7 100644
> > --- a/lib/librte_eal/common/rte_random.c
> > +++ b/lib/librte_eal/common/rte_random.c
> > @@ -7,6 +7,7 @@
> >   #endif
> >   #include <stdlib.h>
> >   #include <unistd.h>
> > +#include <dlfcn.h>
> >
> >   #include <rte_branch_prediction.h>
> >   #include <rte_cycles.h>
> > @@ -176,18 +177,38 @@ rte_rand_max(uint64_t upper_bound)
> >       return res;
> >   }
> >
> > +/* Try to use the getentropy() function from glibc >= 2.25 */
> > +static int
> > +__rte_getentropy(uint64_t *ge_seed)
> > +{
> > +     void *handle = NULL;
> > +     void **sym;
> > +     int (*getentropy_p)(void *__buffer, size_t __length);
> > +     int gc_rc;
> > +
> > +     handle = dlopen("libc.so.6", RTLD_LAZY);
> > +     if (!handle)
> != NULL
> > +             return -1;
> > +
> > +     sym = dlsym(handle, "getentropy");
> > +     if (!sym || !*sym)
>
> != NULL again
>
>
> dlsym() returns a "void *". The man page says nothing about
> de-referencing it. Is that allowed?

This was blindly stolen from mlx5_common.c, so blame them :P

> > +             /* Cannot resolve getentropy */
> > +             return -1;
> Missing a dlclose()

This was fixed in version 2 of my patch...
  
Mattias Rönnblom April 22, 2020, 8:14 p.m. UTC | #5
On 2020-04-22 19:44, Dan Gora wrote:
> On Wed, Apr 22, 2020 at 5:28 AM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> On 2020-04-21 21:54, Dan Gora wrote:
>>> The getentropy() function was introduced into glibc v2.25 and so is
>>> not available on all supported platforms.  Previously, if DPDK was
>>> compiled (using meson) on a system which has getentropy(), it would
>>> introduce a dependency on glibc v2.25 which would prevent that binary
>>> from running on a system with an older glibc.  Similarly if DPDK was
>>> compiled on a system which did not have getentropy(), getentropy()
>>> could not be used even if the execution system supported it.
>>>
>>> Introduce a new static function, __rte_getentropy() which will try to
>>> resolve the getentropy() function dynamically using dlopen()/dlsym(),
>>> returning failure if the getentropy() function cannot be resolved or
>>> if it fails.
>>
>> Two other options: providing a DPDK-native syscall wrapper for
>> getrandom(), or falling back to reading /dev/urandom. Have you
>> considered any of those two options? If so, why do you prefer
>> dlopen()/dlsym()?
> I didn't give any thought at all to using /dev/urandom.  The goal was
> not really to change how the thing worked, just to remove the
> dependency on glibc 2.25.


/dev/urandom is basically only a different interface to the same 
underlying mechanism.

Such an alternative would look something like:

static int
getentropy(void *buffer, size_t length)
{
         int rc = -1;
         int old_errno = errno;
         int fd;

         fd = open("/dev/urandom", O_RDONLY);

         if (fd < 0)
                 goto out;

         if (read(fd, buffer, length) != length)
                 goto out_close;

         rc = 0;

out_close:
         close(fd);
out:
         errno = old_errno;

         return rc;
}


> Using a syscall wrapper is not really going to be any easier, and in
> fact probably more complicated.  Here is the code in glibc for
> getentropy():
>
> int
> getentropy (void *buffer, size_t length)
> {
>    /* The interface is documented to return EIO for buffer lengths
>       longer than 256 bytes.  */
>    if (length > 256)
>      {
>        __set_errno (EIO);
>        return -1;
>      }
>
>    /* Try to fill the buffer completely.  Even with the 256 byte limit
>       above, we might still receive an EINTR error (when blocking
>       during boot).  */
>    void *end = buffer + length;
>    while (buffer < end)
>      {
>        /* NB: No cancellation point.  */
>        ssize_t bytes = INLINE_SYSCALL_CALL (getrandom, buffer, end - buffer, 0);
>        if (bytes < 0)
>          {
>            if (errno == EINTR)
>              /* Try again if interrupted by a signal.  */
>              continue;
>            else
>              return -1;
>          }
>        if (bytes == 0)
>          {
>            /* No more bytes available.  This should not happen under
>               normal circumstances.  */
>            __set_errno (EIO);
>            return -1;
>          }
>        /* Try again in case of a short read.  */
>        buffer += bytes;
>      }
>    return 0;
> }
>
> __rte_getentropy() is easier than this.  Plus the syscall interface
> does not solve the problem of being able to compile on a system with
> the syscall and run it on a system without it or vice versa, which is
> what I was trying to solve.
>
>> Failure to run on old libc seems like a non-issue to me.
> Well, again, it's a new dependency that didn't exist before.. We sell
> to telco customers, so we have to support 10s of different target
> platforms of various ages.  If they update their system, we'd have to
> recompile our code to be able to use getentropy().  Similarly, if we
> compiled on a system which has getentropy(), but the target system
> doesn't, then they cannot run our binary because of the glibc 2.25
> dependency.  That means that we have to have separate versions with
> and without getentropy().  It's a maintenance headache for no real
> benefit.


I'm not sure I follow. Why would you need to recompile DPDK in case they 
upgrade their system? It sounds like you care about initial seeding, 
since you want getentropy() if it exists, but then in the next paragraph 
you want to throw it out, so I'm a little confused.


Why doesn't the standard practice of compiling against the oldest 
supported libc work for you?


> To my mind, since getentropy() can block it seems like it would
> probably be better to just remove it entirely, but I suppose that's up
> to the person(s) who put it in in the first place.


Maybe I'm wrong, but I found it unlikely that a DPDK application would 
start before the entropy pool was initialized. After this point, 
getentropy() will not block. Do you consider this a real problem?


>>> This also allows getentropy() to be used as the random seed source
>>> when the traditional Makefile build for DPDK is used.
>>>
>>> Signed-off-by: Dan Gora <dg@adax.com>
>>> ---
>>>    lib/librte_eal/common/rte_random.c | 33 ++++++++++++++++++++++++------
>>>    lib/librte_eal/meson.build         |  3 ---
>>>    2 files changed, 27 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/common/rte_random.c b/lib/librte_eal/common/rte_random.c
>>> index df02f1307..f8203f4c7 100644
>>> --- a/lib/librte_eal/common/rte_random.c
>>> +++ b/lib/librte_eal/common/rte_random.c
>>> @@ -7,6 +7,7 @@
>>>    #endif
>>>    #include <stdlib.h>
>>>    #include <unistd.h>
>>> +#include <dlfcn.h>
>>>
>>>    #include <rte_branch_prediction.h>
>>>    #include <rte_cycles.h>
>>> @@ -176,18 +177,38 @@ rte_rand_max(uint64_t upper_bound)
>>>        return res;
>>>    }
>>>
>>> +/* Try to use the getentropy() function from glibc >= 2.25 */
>>> +static int
>>> +__rte_getentropy(uint64_t *ge_seed)
>>> +{
>>> +     void *handle = NULL;
>>> +     void **sym;
>>> +     int (*getentropy_p)(void *__buffer, size_t __length);
>>> +     int gc_rc;
>>> +
>>> +     handle = dlopen("libc.so.6", RTLD_LAZY);
>>> +     if (!handle)
>> != NULL
>>> +             return -1;
>>> +
>>> +     sym = dlsym(handle, "getentropy");
>>> +     if (!sym || !*sym)
>> != NULL again
>>
>>
>> dlsym() returns a "void *". The man page says nothing about
>> de-referencing it. Is that allowed?
> This was blindly stolen from mlx5_common.c, so blame them :P
>
>>> +             /* Cannot resolve getentropy */
>>> +             return -1;
>> Missing a dlclose()
> This was fixed in version 2 of my patch...
  
Dan Gora April 22, 2020, 8:35 p.m. UTC | #6
On Wed, Apr 22, 2020 at 5:14 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-22 19:44, Dan Gora wrote:
> > On Wed, Apr 22, 2020 at 5:28 AM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >> On 2020-04-21 21:54, Dan Gora wrote:
> >>> The getentropy() function was introduced into glibc v2.25 and so is
> >>> not available on all supported platforms.  Previously, if DPDK was
> >>> compiled (using meson) on a system which has getentropy(), it would
> >>> introduce a dependency on glibc v2.25 which would prevent that binary
> >>> from running on a system with an older glibc.  Similarly if DPDK was
> >>> compiled on a system which did not have getentropy(), getentropy()
> >>> could not be used even if the execution system supported it.
> >>>
> >>> Introduce a new static function, __rte_getentropy() which will try to
> >>> resolve the getentropy() function dynamically using dlopen()/dlsym(),
> >>> returning failure if the getentropy() function cannot be resolved or
> >>> if it fails.
> >>
> >> Two other options: providing a DPDK-native syscall wrapper for
> >> getrandom(), or falling back to reading /dev/urandom. Have you
> >> considered any of those two options? If so, why do you prefer
> >> dlopen()/dlsym()?
> > I didn't give any thought at all to using /dev/urandom.  The goal was
> > not really to change how the thing worked, just to remove the
> > dependency on glibc 2.25.
>
>
> /dev/urandom is basically only a different interface to the same
> underlying mechanism.
>
> Such an alternative would look something like:
>
> static int
> getentropy(void *buffer, size_t length)
> {
>          int rc = -1;
>          int old_errno = errno;
>          int fd;
>
>          fd = open("/dev/urandom", O_RDONLY);
>
>          if (fd < 0)
>                  goto out;
>
>          if (read(fd, buffer, length) != length)
>                  goto out_close;
>
>          rc = 0;
>
> out_close:
>          close(fd);
> out:
>          errno = old_errno;
>
>          return rc;
> }

That's fine with me, but like I said I wasn't trying to change how any
of this worked, just work around glibc dependencies.  There seems to
be some subtle difference between /dev/urandom and /dev/random, but...

https://patches-gcc.linaro.org/comment/14484/

> >> Failure to run on old libc seems like a non-issue to me.
> > Well, again, it's a new dependency that didn't exist before.. We sell
> > to telco customers, so we have to support 10s of different target
> > platforms of various ages.  If they update their system, we'd have to
> > recompile our code to be able to use getentropy().  Similarly, if we
> > compiled on a system which has getentropy(), but the target system
> > doesn't, then they cannot run our binary because of the glibc 2.25
> > dependency.  That means that we have to have separate versions with
> > and without getentropy().  It's a maintenance headache for no real
> > benefit.
>
>
> I'm not sure I follow. Why would you need to recompile DPDK in case they
> upgrade their system? It sounds like you care about initial seeding,
> since you want getentropy() if it exists, but then in the next paragraph
> you want to throw it out, so I'm a little confused.

Well  _I_ wouldn't but maybe someone wants getentropy() for the
initial seed.. I assume that's why it was added in the first place..
For my application we don't care at all.  I just want to get rid of
this dependency on glibc 2.25 and have the behavior be the same on
meson and Makefile builds on the same complication system.

> Why doesn't the standard practice of compiling against the oldest
> supported libc work for you?

I guess I didn't realize that was "standard practice" but even so it
still adds an unnecessary restriction on the complication platform.

> > To my mind, since getentropy() can block it seems like it would
> > probably be better to just remove it entirely, but I suppose that's up
> > to the person(s) who put it in in the first place.
>
>
> Maybe I'm wrong, but I found it unlikely that a DPDK application would
> start before the entropy pool was initialized. After this point,
> getentropy() will not block. Do you consider this a real problem?

Not really.. I don't know the original motivations for adding
getentropy() in the first place.  I assumed that there was some good
reason.  If there's not we could just back out all of these
complications and revert commit faf8fd252785ee8b1 (et al...)

Again, I don't have any dog in the fight about how to get the seed, I
just wanted to remove the glibc dependency and the different behavior
for Makefile and meson builds on the same complication system.

For me, using dlsym() or reading from /dev/[u]random is 6 for half a
dozen.  Both have precedents in other places in the DPDK code base.  I
can fix it either way.

thanks
dan
  
Luca Boccassi April 23, 2020, 10:04 a.m. UTC | #7
On Wed, 2020-04-22 at 17:35 -0300, Dan Gora wrote:
> On Wed, Apr 22, 2020 at 5:14 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
> > On 2020-04-22 19:44, Dan Gora wrote:
> > > On Wed, Apr 22, 2020 at 5:28 AM Mattias Rönnblom
> > > <mattias.ronnblom@ericsson.com> wrote:
> > > > On 2020-04-21 21:54, Dan Gora wrote:
> > > > > The getentropy() function was introduced into glibc v2.25 and so is
> > > > > not available on all supported platforms.  Previously, if DPDK was
> > > > > compiled (using meson) on a system which has getentropy(), it would
> > > > > introduce a dependency on glibc v2.25 which would prevent that binary
> > > > > from running on a system with an older glibc.  Similarly if DPDK was
> > > > > compiled on a system which did not have getentropy(), getentropy()
> > > > > could not be used even if the execution system supported it.
> > > > > 
> > > > > Introduce a new static function, __rte_getentropy() which will try to
> > > > > resolve the getentropy() function dynamically using dlopen()/dlsym(),
> > > > > returning failure if the getentropy() function cannot be resolved or
> > > > > if it fails.
> > > > 
> > > > Two other options: providing a DPDK-native syscall wrapper for
> > > > getrandom(), or falling back to reading /dev/urandom. Have you
> > > > considered any of those two options? If so, why do you prefer
> > > > dlopen()/dlsym()?
> > > I didn't give any thought at all to using /dev/urandom.  The goal was
> > > not really to change how the thing worked, just to remove the
> > > dependency on glibc 2.25.
> > 
> > /dev/urandom is basically only a different interface to the same
> > underlying mechanism.

This is not the whole story though - while the end result when all
works is the same, there are important differences in getting there.
There's a reason a programmatic interface was added - it's just better
in general.
Just to name one - opening files has implications for LSMs like
SELinux. You now need a specific policy to allow it, which means
applications that upgrade from one version of DPDK to the next will
break.

In general, I do not think we should go backwards. The programmatic
interface to the random pools are good and we should use them by
default - of course by all means add fallbacks to urandom if they are
not available.

But as Stephen said glibc generally does not support compiling on new +
running on old - so if it's not this that breaks, it will be something
else.
  
Mattias Rönnblom April 23, 2020, 12:36 p.m. UTC | #8
On 2020-04-22 22:35, Dan Gora wrote:
> On Wed, Apr 22, 2020 at 5:14 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> On 2020-04-22 19:44, Dan Gora wrote:
>>> On Wed, Apr 22, 2020 at 5:28 AM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>> On 2020-04-21 21:54, Dan Gora wrote:
>>>>> The getentropy() function was introduced into glibc v2.25 and so is
>>>>> not available on all supported platforms.  Previously, if DPDK was
>>>>> compiled (using meson) on a system which has getentropy(), it would
>>>>> introduce a dependency on glibc v2.25 which would prevent that binary
>>>>> from running on a system with an older glibc.  Similarly if DPDK was
>>>>> compiled on a system which did not have getentropy(), getentropy()
>>>>> could not be used even if the execution system supported it.
>>>>>
>>>>> Introduce a new static function, __rte_getentropy() which will try to
>>>>> resolve the getentropy() function dynamically using dlopen()/dlsym(),
>>>>> returning failure if the getentropy() function cannot be resolved or
>>>>> if it fails.
>>>> Two other options: providing a DPDK-native syscall wrapper for
>>>> getrandom(), or falling back to reading /dev/urandom. Have you
>>>> considered any of those two options? If so, why do you prefer
>>>> dlopen()/dlsym()?
>>> I didn't give any thought at all to using /dev/urandom.  The goal was
>>> not really to change how the thing worked, just to remove the
>>> dependency on glibc 2.25.
>>
>> /dev/urandom is basically only a different interface to the same
>> underlying mechanism.
>>
>> Such an alternative would look something like:
>>
>> static int
>> getentropy(void *buffer, size_t length)
>> {
>>           int rc = -1;
>>           int old_errno = errno;
>>           int fd;
>>
>>           fd = open("/dev/urandom", O_RDONLY);
>>
>>           if (fd < 0)
>>                   goto out;
>>
>>           if (read(fd, buffer, length) != length)
>>                   goto out_close;
>>
>>           rc = 0;
>>
>> out_close:
>>           close(fd);
>> out:
>>           errno = old_errno;
>>
>>           return rc;
>> }
> That's fine with me, but like I said I wasn't trying to change how any
> of this worked, just work around glibc dependencies.  There seems to
> be some subtle difference between /dev/urandom and /dev/random, but...
>
> https://protect2.fireeye.com/v1/url?k=1705be57-4b8f6b41-1705fecc-862f14a9365e-bb983def357fdfad&q=1&e=10fec9c1-51b3-4bc3-b77d-7eb39787d007&u=https%3A%2F%2Fpatches-gcc.linaro.org%2Fcomment%2F14484%2F
>
>>>> Failure to run on old libc seems like a non-issue to me.
>>> Well, again, it's a new dependency that didn't exist before.. We sell
>>> to telco customers, so we have to support 10s of different target
>>> platforms of various ages.  If they update their system, we'd have to
>>> recompile our code to be able to use getentropy().  Similarly, if we
>>> compiled on a system which has getentropy(), but the target system
>>> doesn't, then they cannot run our binary because of the glibc 2.25
>>> dependency.  That means that we have to have separate versions with
>>> and without getentropy().  It's a maintenance headache for no real
>>> benefit.
>>
>> I'm not sure I follow. Why would you need to recompile DPDK in case they
>> upgrade their system? It sounds like you care about initial seeding,
>> since you want getentropy() if it exists, but then in the next paragraph
>> you want to throw it out, so I'm a little confused.
> Well  _I_ wouldn't but maybe someone wants getentropy() for the
> initial seed.. I assume that's why it was added in the first place..
> For my application we don't care at all.  I just want to get rid of
> this dependency on glibc 2.25 and have the behavior be the same on
> meson and Makefile builds on the same complication system.


The reason for trying to avoid a wall time-based seed as the default is 
that application instances started at the roughly the same time might 
end up having a the same seed, which in turn might impact their behavior 
in an adverse way. For example, random back-off timers may be the same. 
On x86_64, TSC has a high resolution, but on other platforms its 
equivalent the clock rate is much lower.


>> Why doesn't the standard practice of compiling against the oldest
>> supported libc work for you?
> I guess I didn't realize that was "standard practice" but even so it
> still adds an unnecessary restriction on the complication platform.


If DPDK has the policy of attempting to allow DPDK applications compiled 
against one glibc version to run against another, older, version, we can 
go ahead and discuss the details further. That would be up to the tech 
board to decide. I would vote against it.


If the fix was simple, that's one thing. dlopen()/dlsym() doesn't 
qualify as such, nor does a syscall wrapper, as you pointed out.


>>> To my mind, since getentropy() can block it seems like it would
>>> probably be better to just remove it entirely, but I suppose that's up
>>> to the person(s) who put it in in the first place.
>>
>> Maybe I'm wrong, but I found it unlikely that a DPDK application would
>> start before the entropy pool was initialized. After this point,
>> getentropy() will not block. Do you consider this a real problem?
> Not really.. I don't know the original motivations for adding
> getentropy() in the first place.  I assumed that there was some good
> reason.  If there's not we could just back out all of these
> complications and revert commit faf8fd252785ee8b1 (et al...)
>
> Again, I don't have any dog in the fight about how to get the seed, I
> just wanted to remove the glibc dependency and the different behavior
> for Makefile and meson builds on the same complication system.
>
> For me, using dlsym() or reading from /dev/[u]random is 6 for half a
> dozen.  Both have precedents in other places in the DPDK code base.  I
> can fix it either way.
>
> thanks
> dan
  
Dan Gora April 23, 2020, 5:27 p.m. UTC | #9
On Thu, Apr 23, 2020 at 9:36 AM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
> >>
> >> /dev/urandom is basically only a different interface to the same
> >> underlying mechanism.
> >>
> >> Such an alternative would look something like:
> >>
> >> static int
> >> getentropy(void *buffer, size_t length)
> >> {
> >>           int rc = -1;
> >>           int old_errno = errno;
> >>           int fd;
> >>
> >>           fd = open("/dev/urandom", O_RDONLY);
> >>
> >>           if (fd < 0)
> >>                   goto out;
> >>
> >>           if (read(fd, buffer, length) != length)
> >>                   goto out_close;
> >>
> >>           rc = 0;
> >>
> >> out_close:
> >>           close(fd);
> >> out:
> >>           errno = old_errno;
> >>
> >>           return rc;
> >> }
> > That's fine with me, but like I said I wasn't trying to change how any
> > of this worked, just work around glibc dependencies.  There seems to
> > be some subtle difference between /dev/urandom and /dev/random, but...
> >
> > https://protect2.fireeye.com/v1/url?k=1705be57-4b8f6b41-1705fecc-862f14a9365e-bb983def357fdfad&q=1&e=10fec9c1-51b3-4bc3-b77d-7eb39787d007&u=https%3A%2F%2Fpatches-gcc.linaro.org%2Fcomment%2F14484%2F
> >
> >>>> Failure to run on old libc seems like a non-issue to me.
> >>> Well, again, it's a new dependency that didn't exist before.. We sell
> >>> to telco customers, so we have to support 10s of different target
> >>> platforms of various ages.  If they update their system, we'd have to
> >>> recompile our code to be able to use getentropy().  Similarly, if we
> >>> compiled on a system which has getentropy(), but the target system
> >>> doesn't, then they cannot run our binary because of the glibc 2.25
> >>> dependency.  That means that we have to have separate versions with
> >>> and without getentropy().  It's a maintenance headache for no real
> >>> benefit.
> >>
> >> I'm not sure I follow. Why would you need to recompile DPDK in case they
> >> upgrade their system? It sounds like you care about initial seeding,
> >> since you want getentropy() if it exists, but then in the next paragraph
> >> you want to throw it out, so I'm a little confused.
> > Well  _I_ wouldn't but maybe someone wants getentropy() for the
> > initial seed.. I assume that's why it was added in the first place..
> > For my application we don't care at all.  I just want to get rid of
> > this dependency on glibc 2.25 and have the behavior be the same on
> > meson and Makefile builds on the same complication system.
>
>
> The reason for trying to avoid a wall time-based seed as the default is
> that application instances started at the roughly the same time might
> end up having a the same seed, which in turn might impact their behavior
> in an adverse way. For example, random back-off timers may be the same.
> On x86_64, TSC has a high resolution, but on other platforms its
> equivalent the clock rate is much lower.
>
>
> >> Why doesn't the standard practice of compiling against the oldest
> >> supported libc work for you?
> > I guess I didn't realize that was "standard practice" but even so it
> > still adds an unnecessary restriction on the complication platform.
>
>
> If DPDK has the policy of attempting to allow DPDK applications compiled
> against one glibc version to run against another, older, version, we can
> go ahead and discuss the details further. That would be up to the tech
> board to decide. I would vote against it.

I don't know why anyone would vote against removing an unnecessary
dependency, which was only introduced in v19.08 anyways.

> If the fix was simple, that's one thing. dlopen()/dlsym() doesn't
> qualify as such, nor does a syscall wrapper, as you pointed out.

The dlopen/dlsym() method is used in at least 4 other places in DPDK.
It's not that complicated.  There is plenty of precedence for it being
done this way.

I sent a v4 of the patch which emulated getentropy() using
/dev/urandom as you suggested. Did you see that?

thanks
dan
  
Dan Gora April 23, 2020, 5:38 p.m. UTC | #10
On Thu, Apr 23, 2020 at 12:59 PM Luca Boccassi <bluca@debian.org> wrote:
> > >
> > > /dev/urandom is basically only a different interface to the same
> > > underlying mechanism.
>
> This is not the whole story though - while the end result when all
> works is the same, there are important differences in getting there.
> There's a reason a programmatic interface was added - it's just better
> in general.
> Just to name one - opening files has implications for LSMs like
> SELinux. You now need a specific policy to allow it, which means
> applications that upgrade from one version of DPDK to the next will
> break.

DPDK opens _tons_ of files. This would not be the first file that DPDK
has to open.  And it's not like /dev/urandom is a new interface.  It's
been around forever.

If this is such a major problem, then that would argue for using the
dlsym()/dlopen() method to try to find the getentropy glibc function
that I sent in v3 of these patches.

> In general, I do not think we should go backwards. The programmatic
> interface to the random pools are good and we should use them by
> default - of course by all means add fallbacks to urandom if they are
> not available.

The original problem was that the "programmatic interface to the
random pools" (that is, getentropy()) can only be determined at
compilation time and if found introduce a new dependency on glibc 2.25
that can easily be avoided by emulating it (as I did here in v4 of the
patches) or by trying to dynamically find the symbol at run time using
dlopen()/dlsym() (as I did in v3 of the patches).

> But as Stephen said glibc generally does not support compiling on new +
> running on old - so if it's not this that breaks, it will be something
> else.

Well that's not necessarily true.  Most glibc interfaces have been
around forever and you can easily see what versions of glibc are
needed by running ldd on your application.  I don't see the point in
introducing a new dependency on a very recent version of glibc which
is not supported by all supported DPDK platforms when it can easily be
worked around.

The issue here is that the original patch to add getentropy():
1) Added a _new_ dependency on glibc 2.25.
2) Added a _new_ dependency that the rdseed CPU flag on the execution
machine has to match the complication machine.
3) Has different behavior if the DPDK is compiled with meson or with
Make on the same complication platform.

thanks,
dan
  
Luca Boccassi April 27, 2020, 12:44 p.m. UTC | #11
On Thu, 2020-04-23 at 14:38 -0300, Dan Gora wrote:
> On Thu, Apr 23, 2020 at 12:59 PM Luca Boccassi <bluca@debian.org> wrote:
> > > > /dev/urandom is basically only a different interface to the same
> > > > underlying mechanism.
> > 
> > This is not the whole story though - while the end result when all
> > works is the same, there are important differences in getting there.
> > There's a reason a programmatic interface was added - it's just better
> > in general.
> > Just to name one - opening files has implications for LSMs like
> > SELinux. You now need a specific policy to allow it, which means
> > applications that upgrade from one version of DPDK to the next will
> > break.
> 
> DPDK opens _tons_ of files. This would not be the first file that DPDK
> has to open.  And it's not like /dev/urandom is a new interface.  It's
> been around forever.

That might be the case, but it is not reason in itself to make things
harder. Especially in light of the new stability promise - this might
or might not be considered part of it, but it's worth mentioning at the
very least, as it has a real impact on users.

> If this is such a major problem, then that would argue for using the
> dlsym()/dlopen() method to try to find the getentropy glibc function
> that I sent in v3 of these patches.
> 
> > In general, I do not think we should go backwards. The programmatic
> > interface to the random pools are good and we should use them by
> > default - of course by all means add fallbacks to urandom if they are
> > not available.
> 
> The original problem was that the "programmatic interface to the
> random pools" (that is, getentropy()) can only be determined at
> compilation time and if found introduce a new dependency on glibc 2.25
> that can easily be avoided by emulating it (as I did here in v4 of the
> patches) or by trying to dynamically find the symbol at run time using
> dlopen()/dlsym() (as I did in v3 of the patches).
> 
> > But as Stephen said glibc generally does not support compiling on new +
> > running on old - so if it's not this that breaks, it will be something
> > else.
> 
> Well that's not necessarily true.  Most glibc interfaces have been
> around forever and you can easily see what versions of glibc are
> needed by running ldd on your application.  I don't see the point in
> introducing a new dependency on a very recent version of glibc which
> is not supported by all supported DPDK platforms when it can easily be
> worked around.
> 
> The issue here is that the original patch to add getentropy():
> 1) Added a _new_ dependency on glibc 2.25.
> 2) Added a _new_ dependency that the rdseed CPU flag on the execution
> machine has to match the complication machine.
> 3) Has different behavior if the DPDK is compiled with meson or with
> Make on the same complication platform.
> 
> thanks,
> dan

Adding a new dependecy happens only when building with the new version
of the library. If it's not available, then there's no new dependency. 
It sounds to me like you are trying to add workarounds for issues in
your downstream build/deployment model, where your build dependencies
are newer than your runtime dependencies. This in general is rarely
well supported.
Now I'm fine with adding workarounds as _fallbacks_ - what I am
explicitly NACKing is forcibly restricting to the least common
denominator because of issues in a third party build/deployment system
that doesn't follow the common norm.
This is especially true when dealing with RNG APIs, where the tiniest
and most innocent-looking mistake could have disastrous consequences.
  
Dan Gora April 27, 2020, 4:57 p.m. UTC | #12
On Mon, Apr 27, 2020 at 1:19 PM Luca Boccassi <bluca@debian.org> wrote:
>
> On Thu, 2020-04-23 at 14:38 -0300, Dan Gora wrote:
> > On Thu, Apr 23, 2020 at 12:59 PM Luca Boccassi <bluca@debian.org> wrote:
> > > > > /dev/urandom is basically only a different interface to the same
> > > > > underlying mechanism.
> > >
> > > This is not the whole story though - while the end result when all
> > > works is the same, there are important differences in getting there.
> > > There's a reason a programmatic interface was added - it's just better
> > > in general.
> > > Just to name one - opening files has implications for LSMs like
> > > SELinux. You now need a specific policy to allow it, which means
> > > applications that upgrade from one version of DPDK to the next will
> > > break.
> >
> > DPDK opens _tons_ of files. This would not be the first file that DPDK
> > has to open.  And it's not like /dev/urandom is a new interface.  It's
> > been around forever.
>
> That might be the case, but it is not reason in itself to make things
> harder. Especially in light of the new stability promise - this might
> or might not be considered part of it, but it's worth mentioning at the
> very least, as it has a real impact on users.

"make things harder" seems especially subjective.. I would argue that
I am in fact making things much easier by removing unnecessary
dependecies

>
> > If this is such a major problem, then that would argue for using the
> > dlsym()/dlopen() method to try to find the getentropy glibc function
> > that I sent in v3 of these patches.
> >
> > > In general, I do not think we should go backwards. The programmatic
> > > interface to the random pools are good and we should use them by
> > > default - of course by all means add fallbacks to urandom if they are
> > > not available.
> >
> > The original problem was that the "programmatic interface to the
> > random pools" (that is, getentropy()) can only be determined at
> > compilation time and if found introduce a new dependency on glibc 2.25
> > that can easily be avoided by emulating it (as I did here in v4 of the
> > patches) or by trying to dynamically find the symbol at run time using
> > dlopen()/dlsym() (as I did in v3 of the patches).
> >
> > > But as Stephen said glibc generally does not support compiling on new +
> > > running on old - so if it's not this that breaks, it will be something
> > > else.
> >
> > Well that's not necessarily true.  Most glibc interfaces have been
> > around forever and you can easily see what versions of glibc are
> > needed by running ldd on your application.  I don't see the point in
> > introducing a new dependency on a very recent version of glibc which
> > is not supported by all supported DPDK platforms when it can easily be
> > worked around.
> >
> > The issue here is that the original patch to add getentropy():
> > 1) Added a _new_ dependency on glibc 2.25.
> > 2) Added a _new_ dependency that the rdseed CPU flag on the execution
> > machine has to match the complication machine.
> > 3) Has different behavior if the DPDK is compiled with meson or with
> > Make on the same complication platform.
> >
> > thanks,
> > dan
>
> Adding a new dependecy happens only when building with the new version
> of the library. If it's not available, then there's no new dependency.

But you also do not get to use the new getentropy() if you happen to
compile on a system which does not have the latest glibc, or if you
use the makefile system..

> It sounds to me like you are trying to add workarounds for issues in
> your downstream build/deployment model, where your build dependencies
> are newer than your runtime dependencies. This in general is rarely
> well supported.

I am fully aware of that.  I am not adding "workarounds", I am
eliminating unnecessary dependencies which probably never should have
been introduced in the first place.

> Now I'm fine with adding workarounds as _fallbacks_ - what I am
> explicitly NACKing is forcibly restricting to the least common
> denominator because of issues in a third party build/deployment system
> that doesn't follow the common norm.

ugh.. this is the exact _opposite_ of what this patch does.  It is not
restricting anything to a least common denominator.  It is allowing
the DPDK to use the "best" available function, regardless of the build
system.

Restricting to the least common denominator is what the original patch did...

> This is especially true when dealing with RNG APIs, where the tiniest
> and most innocent-looking mistake could have disastrous consequences.

This does not change the functionality of the RNG at all.  It just
makes it work in the way that it was intended.  These changes were
only introduced into 19.08, so they are not historical artifacts or
anything.

thanks
dan
  
Luca Boccassi April 30, 2020, 8:41 a.m. UTC | #13
On Mon, 2020-04-27 at 13:57 -0300, Dan Gora wrote:
> On Mon, Apr 27, 2020 at 1:19 PM Luca Boccassi <bluca@debian.org> wrote:
> > On Thu, 2020-04-23 at 14:38 -0300, Dan Gora wrote:
> > > On Thu, Apr 23, 2020 at 12:59 PM Luca Boccassi <bluca@debian.org> wrote:
> > > > > > /dev/urandom is basically only a different interface to the same
> > > > > > underlying mechanism.
> > > > 
> > > > This is not the whole story though - while the end result when all
> > > > works is the same, there are important differences in getting there.
> > > > There's a reason a programmatic interface was added - it's just better
> > > > in general.
> > > > Just to name one - opening files has implications for LSMs like
> > > > SELinux. You now need a specific policy to allow it, which means
> > > > applications that upgrade from one version of DPDK to the next will
> > > > break.
> > > 
> > > DPDK opens _tons_ of files. This would not be the first file that DPDK
> > > has to open.  And it's not like /dev/urandom is a new interface.  It's
> > > been around forever.
> > 
> > That might be the case, but it is not reason in itself to make things
> > harder. Especially in light of the new stability promise - this might
> > or might not be considered part of it, but it's worth mentioning at the
> > very least, as it has a real impact on users.
> 
> "make things harder" seems especially subjective.. I would argue that
> I am in fact making things much easier by removing unnecessary
> dependecies

For someone with selinux, things would be harder. It's a consequence
worth highlighting, that's all.

> > > If this is such a major problem, then that would argue for using the
> > > dlsym()/dlopen() method to try to find the getentropy glibc function
> > > that I sent in v3 of these patches.
> > > 
> > > > In general, I do not think we should go backwards. The programmatic
> > > > interface to the random pools are good and we should use them by
> > > > default - of course by all means add fallbacks to urandom if they are
> > > > not available.
> > > 
> > > The original problem was that the "programmatic interface to the
> > > random pools" (that is, getentropy()) can only be determined at
> > > compilation time and if found introduce a new dependency on glibc 2.25
> > > that can easily be avoided by emulating it (as I did here in v4 of the
> > > patches) or by trying to dynamically find the symbol at run time using
> > > dlopen()/dlsym() (as I did in v3 of the patches).
> > > 
> > > > But as Stephen said glibc generally does not support compiling on new +
> > > > running on old - so if it's not this that breaks, it will be something
> > > > else.
> > > 
> > > Well that's not necessarily true.  Most glibc interfaces have been
> > > around forever and you can easily see what versions of glibc are
> > > needed by running ldd on your application.  I don't see the point in
> > > introducing a new dependency on a very recent version of glibc which
> > > is not supported by all supported DPDK platforms when it can easily be
> > > worked around.
> > > 
> > > The issue here is that the original patch to add getentropy():
> > > 1) Added a _new_ dependency on glibc 2.25.
> > > 2) Added a _new_ dependency that the rdseed CPU flag on the execution
> > > machine has to match the complication machine.
> > > 3) Has different behavior if the DPDK is compiled with meson or with
> > > Make on the same complication platform.
> > > 
> > > thanks,
> > > dan
> > 
> > Adding a new dependecy happens only when building with the new version
> > of the library. If it's not available, then there's no new dependency.
> 
> But you also do not get to use the new getentropy() if you happen to
> compile on a system which does not have the latest glibc, or if you
> use the makefile system..

And that's perfectly fine - backward compatibility workarounds are not
a problem to me.

> > It sounds to me like you are trying to add workarounds for issues in
> > your downstream build/deployment model, where your build dependencies
> > are newer than your runtime dependencies. This in general is rarely
> > well supported.
> 
> I am fully aware of that.  I am not adding "workarounds", I am
> eliminating unnecessary dependencies which probably never should have
> been introduced in the first place.

It's not unnecessary. It's a better interface, and worth using if
available.

> > Now I'm fine with adding workarounds as _fallbacks_ - what I am
> > explicitly NACKing is forcibly restricting to the least common
> > denominator because of issues in a third party build/deployment system
> > that doesn't follow the common norm.
> 
> ugh.. this is the exact _opposite_ of what this patch does.  It is not
> restricting anything to a least common denominator.  It is allowing
> the DPDK to use the "best" available function, regardless of the build
> system.
> 
> Restricting to the least common denominator is what the original patch did...

This is restricting to the least common denominator of /dev/urandom,
which is a bad interface, frail with issues that everybody is moving
away from, in favour of the programmatic API that this patch is
removing, in order to fix a corner case with a non-standard, third-
party build system that downgrades dependencies at runtime vs build
time.

> > This is especially true when dealing with RNG APIs, where the tiniest
> > and most innocent-looking mistake could have disastrous consequences.
> 
> This does not change the functionality of the RNG at all.  It just
> makes it work in the way that it was intended.  These changes were
> only introduced into 19.08, so they are not historical artifacts or
> anything.

It's reimplementing a syscall using a different interface which has
different semantics. A small mistake there is going to cost us dearly.
  
Dan Gora April 30, 2020, 8:43 p.m. UTC | #14
On Thu, Apr 30, 2020 at 5:29 PM Luca Boccassi <bluca@debian.org> wrote:

> > > Adding a new dependecy happens only when building with the new version
> > > of the library. If it's not available, then there's no new dependency.
> >
> > But you also do not get to use the new getentropy() if you happen to
> > compile on a system which does not have the latest glibc, or if you
> > use the makefile system..
>
> And that's perfectly fine - backward compatibility workarounds are not
> a problem to me.
>
> > > It sounds to me like you are trying to add workarounds for issues in
> > > your downstream build/deployment model, where your build dependencies
> > > are newer than your runtime dependencies. This in general is rarely
> > > well supported.
> >
> > I am fully aware of that.  I am not adding "workarounds", I am
> > eliminating unnecessary dependencies which probably never should have
> > been introduced in the first place.
>
> It's not unnecessary. It's a better interface, and worth using if
> available.

"if available" is the key phrase here.. It not only has to be
available on  the execution machine, it has to be available on the
compilation machine as well...
>
> > > Now I'm fine with adding workarounds as _fallbacks_ - what I am
> > > explicitly NACKing is forcibly restricting to the least common
> > > denominator because of issues in a third party build/deployment system
> > > that doesn't follow the common norm.
> >
> > ugh.. this is the exact _opposite_ of what this patch does.  It is not
> > restricting anything to a least common denominator.  It is allowing
> > the DPDK to use the "best" available function, regardless of the build
> > system.
> >
> > Restricting to the least common denominator is what the original patch did...
>
> This is restricting to the least common denominator of /dev/urandom,
> which is a bad interface, frail with issues that everybody is moving
> away from, in favour of the programmatic API that this patch is
> removing, in order to fix a corner case with a non-standard, third-
> party build system that downgrades dependencies at runtime vs build
> time.

Well, no, because rdseed is used first if available and /dev/urandom
is used next..

And this is not a corner case at all.. There are lots of linux
distributions which DPDK claims to support which do not support
getentropy().  This is hardly a non-standard build system.  You really
compile and support a separate binary for every possible system that
your customers will use?

And what is this about third parties?  Last I checked DPDK was an API
framework, not a proprietary standalone application.  It is designed,
by definition, to be used by "third parties".  Or does it only have to
work in Intel's toolchains?

>
> > > This is especially true when dealing with RNG APIs, where the tiniest
> > > and most innocent-looking mistake could have disastrous consequences.
> >
> > This does not change the functionality of the RNG at all.  It just
> > makes it work in the way that it was intended.  These changes were
> > only introduced into 19.08, so they are not historical artifacts or
> > anything.
>
> It's reimplementing a syscall using a different interface which has
> different semantics. A small mistake there is going to cost us dearly.

The code was copied almost verbatim from the getentropy() function in
glibc, it's hard to see it going that wrong there.  The code can be
tested using the same test cases that the original patch used.  I
don't see how there is a difference in test coverage here.

If it's such a big deal to use /dev/urandom, then what about my v3
patch which used dlopen()/dlsym() to try to find getentropy()?  Is
that not then acceptable?   dlopen/dlsym() are used in several placed
in DPDK.

thanks
dan
  
Luca Boccassi May 1, 2020, 10:33 a.m. UTC | #15
On Thu, 2020-04-30 at 17:43 -0300, Dan Gora wrote:
> On Thu, Apr 30, 2020 at 5:29 PM Luca Boccassi <bluca@debian.org> wrote:
> 
> > > > Adding a new dependecy happens only when building with the new version
> > > > of the library. If it's not available, then there's no new dependency.
> > > 
> > > But you also do not get to use the new getentropy() if you happen to
> > > compile on a system which does not have the latest glibc, or if you
> > > use the makefile system..
> > 
> > And that's perfectly fine - backward compatibility workarounds are not
> > a problem to me.
> > 
> > > > It sounds to me like you are trying to add workarounds for issues in
> > > > your downstream build/deployment model, where your build dependencies
> > > > are newer than your runtime dependencies. This in general is rarely
> > > > well supported.
> > > 
> > > I am fully aware of that.  I am not adding "workarounds", I am
> > > eliminating unnecessary dependencies which probably never should have
> > > been introduced in the first place.
> > 
> > It's not unnecessary. It's a better interface, and worth using if
> > available.
> 
> "if available" is the key phrase here.. It not only has to be
> available on  the execution machine, it has to be available on the
> compilation machine as well...

Yes, same as every other dependency.

> > > > Now I'm fine with adding workarounds as _fallbacks_ - what I am
> > > > explicitly NACKing is forcibly restricting to the least common
> > > > denominator because of issues in a third party build/deployment system
> > > > that doesn't follow the common norm.
> > > 
> > > ugh.. this is the exact _opposite_ of what this patch does.  It is not
> > > restricting anything to a least common denominator.  It is allowing
> > > the DPDK to use the "best" available function, regardless of the build
> > > system.
> > > 
> > > Restricting to the least common denominator is what the original patch did...
> > 
> > This is restricting to the least common denominator of /dev/urandom,
> > which is a bad interface, frail with issues that everybody is moving
> > away from, in favour of the programmatic API that this patch is
> > removing, in order to fix a corner case with a non-standard, third-
> > party build system that downgrades dependencies at runtime vs build
> > time.
> 
> Well, no, because rdseed is used first if available and /dev/urandom
> is used next..
> 
> And this is not a corner case at all.. There are lots of linux
> distributions which DPDK claims to support which do not support
> getentropy().  This is hardly a non-standard build system.  You really
> compile and support a separate binary for every possible system that
> your customers will use?

Yes, that's how Linux distributions work. At the very least, most
deployments ensure dependencies are not downgraded, as that's rarely
supported, for very good reasons.

> And what is this about third parties?  Last I checked DPDK was an API
> framework, not a proprietary standalone application.  It is designed,
> by definition, to be used by "third parties".  Or does it only have to
> work in Intel's toolchains?

Third party was the wrong expression. DPDK is a set of libraries with a
build system (two actually) which links them against their
dependencies. It is the reponsibility of who does deployment to make
sure those dependencies, as established at build time, are satisfied at
runtime. Up until now, downgrading at runtime vs build time was not
supported as far as I know, so what you are effectively asking is to
double the size of the support matrix, and for the project to ensure
that every single dependency can always be built against a new version
and used against an older one.

> > > > This is especially true when dealing with RNG APIs, where the tiniest
> > > > and most innocent-looking mistake could have disastrous consequences.
> > > 
> > > This does not change the functionality of the RNG at all.  It just
> > > makes it work in the way that it was intended.  These changes were
> > > only introduced into 19.08, so they are not historical artifacts or
> > > anything.
> > 
> > It's reimplementing a syscall using a different interface which has
> > different semantics. A small mistake there is going to cost us dearly.
> 
> The code was copied almost verbatim from the getentropy() function in
> glibc, it's hard to see it going that wrong there.  The code can be
> tested using the same test cases that the original patch used.  I
> don't see how there is a difference in test coverage here.

And who's going to keep it updated as the implementation in glibc
changes, if/when a bug is found there?
Also, glibc is LGPL2.1 licensed, which means copying almost verbatim a
non-trivial piece of code can possibly result in librte_eal as a whole
to be covered under the terms of LGPL2.1 (IANAL disclaimer). This is
unlikely to be well received, given it's BSD-3-clause now.

> If it's such a big deal to use /dev/urandom, then what about my v3
> patch which used dlopen()/dlsym() to try to find getentropy()?  Is
> that not then acceptable?   dlopen/dlsym() are used in several placed
> in DPDK.

I don't have a problem with that specifically, but it needs to be clear
to the maintainers that what you are effectively asking is for all
dependencies to support runtime-downgrading transparently and out of
the box. This comes with a huge additional cost, and I do not believe
it is supported at the moment.
  
Dan Gora May 1, 2020, 9:05 p.m. UTC | #16
On Fri, May 1, 2020 at 1:29 PM Luca Boccassi <bluca@debian.org> wrote:
> >
> > Well, no, because rdseed is used first if available and /dev/urandom
> > is used next..
> >
> > And this is not a corner case at all.. There are lots of linux
> > distributions which DPDK claims to support which do not support
> > getentropy().  This is hardly a non-standard build system.  You really
> > compile and support a separate binary for every possible system that
> > your customers will use?
>
> Yes, that's how Linux distributions work. At the very least, most
> deployments ensure dependencies are not downgraded, as that's rarely
> supported, for very good reasons.

I can assure you that most commercial companies do not want to have to
support building a separate binary of their product for every since
version of every single linux distribution.  And again no dependencies
are being "downgraded" here.  The binary will use the _best_ method
available on the target system, not the least common denominator of
the build system as it does now.

I don't really consider that using '/dev/urandom' is a "downgrade"
from getentropy().  getentropy() was only introduced in the last
couple of years, so it's not like it was some super important new
method of getting random numbers, it's just a more convenient method.
Under the hood, I would guess both methods are exactly the same and
will give you the same quality of random number.  It's just the API
interface which has been improved.  And I don't see /dev/urandom going
away any time soon.

Again, if you're still hung up on using /dev/urandom we can use v3 of
my patch to search for the getentropy()  symbol.

> > And what is this about third parties?  Last I checked DPDK was an API
> > framework, not a proprietary standalone application.  It is designed,
> > by definition, to be used by "third parties".  Or does it only have to
> > work in Intel's toolchains?
>
> Third party was the wrong expression. DPDK is a set of libraries with a
> build system (two actually) which links them against their
> dependencies. It is the reponsibility of who does deployment to make
> sure those dependencies, as established at build time, are satisfied at
> runtime. Up until now, downgrading at runtime vs build time was not
> supported as far as I know,

Of course it is.. That why the rte_cpu_get_flag_enabled() function
exists, so that the code can choose the best function at run-time, not
at build time.  There are numerous examples of this in DPDK already.

> so what you are effectively asking is to
> double the size of the support matrix, and for the project to ensure
> that every single dependency can always be built against a new version
> and used against an older one.

No, this is not correct.  The size of the support/test matrix is
exactly the same, what changes is the number of build systems that
have to be supported.

Before my patches you had to test targets with:

no rdseed and no getentropy
rdseed and no getentropy
no rdseed and getentropy
rdseed and getentropy.

For each of these test cases you had to have a separate build system
to generate a different binary (let's say app/test).

Oh and you also had to test with building with make and building with
meson because getentropy was only supported with meson.

Now, with my patches, you can build 'app/test' on *any* of these
systems, and use make _or_ meson, and it will run exactly the same
way as if you compiled it on the matching test system.

That is the difference.

Let's take another example:  Say I have an application which I have to
support on RHEL 7.7 and 7.5.  7.7 has getentropy, 7.5 does not (ignore
the rdseed case for now).

So I am faced with:
1) Build on the RHEL 7.7 system.  This allows the app to use
getentropy(), but it  cannot run on 7.5.
2) Build on the RHEL 7.5 system.  This allows the app to run on 7.5
and 7.7, but only uses the TSC counter.
3) Build on both systems.  Ok, now I have to build and support two
separate binaries.
4) Determine if  getentropy() exists at run time or use /dev/urandom.
Now I can build on either system and it will use getentropy() on 7.7
and the TSC counter on 7.5.

How is #3 "better" than #4?  You still have to test on 7.7 and 7.5.
The amount of work in testing has not changed at all, only the number
of build systems and binary versions that you have to maintain.

And I'm not saying that this has to be a hard and fast rule for
everything for now and forever.  I am fixing a small function which is
used exactly once.  The original patch which introduced getentropy()
in v19.08 created an extra dependency that the target system have
glibc 2.25, _even if it does not use this function at all_.

That said, it seems to me to be a good idea to remove as many build
time dependencies as possible and use things like
rte_cpu_get_flag_enabled() to determine the best code path at run time
(that is, initialization time, not in the fastpath obviously).  It
just makes life a lot easier... If the compiler can generate the code,
then it should be possible in many cases.

> > > > > This is especially true when dealing with RNG APIs, where the tiniest
> > > > > and most innocent-looking mistake could have disastrous consequences.
> > > >
> > > > This does not change the functionality of the RNG at all.  It just
> > > > makes it work in the way that it was intended.  These changes were
> > > > only introduced into 19.08, so they are not historical artifacts or
> > > > anything.
> > >
> > > It's reimplementing a syscall using a different interface which has
> > > different semantics. A small mistake there is going to cost us dearly.
> >
> > The code was copied almost verbatim from the getentropy() function in
> > glibc, it's hard to see it going that wrong there.  The code can be
> > tested using the same test cases that the original patch used.  I
> > don't see how there is a difference in test coverage here.
>
> And who's going to keep it updated as the implementation in glibc
> changes, if/when a bug is found there?

What happens when any bug is found in DPDK... It gets fixed.  Why is
this any different?

> Also, glibc is LGPL2.1 licensed, which means copying almost verbatim a
> non-trivial piece of code can possibly result in librte_eal as a whole
> to be covered under the terms of LGPL2.1 (IANAL disclaimer). This is
> unlikely to be well received, given it's BSD-3-clause now.

Well, verbatim, is an exaggeration and IANAL either, but I would
consider what was written to be general enough to not really run into
any problems.  If others disagree, the v3 patch is there waiting too..

> > If it's such a big deal to use /dev/urandom, then what about my v3
> > patch which used dlopen()/dlsym() to try to find getentropy()?  Is
> > that not then acceptable?   dlopen/dlsym() are used in several placed
> > in DPDK.
>
> I don't have a problem with that specifically, but it needs to be clear
> to the maintainers that what you are effectively asking is for all
> dependencies to support runtime-downgrading transparently and out of
> the box.

Again, I am not asking for all build time dependencies to be
eliminated, just this one, which was only introduced in v19.08, which
creates a dependency on glibc 2.25, even if the application does not
use the RNG at all.

This is already done in numerous places in the DPDK. See
aesni_gcm_create().  See rte_hash_crc_set_alg().  See
rte_hash_create().  They all check to see if certain CPU flags are
supported and use the appropriate function.  This is not "downgrading"
anything.  It is using the best available function that is supported
on the target system, not the build system.  That's exactly what this
(or the v3) patch does.

> This comes with a huge additional cost, and I do not believe
> it is supported at the moment.

It does not come at any additional cost really, all it does is reduce
the number of build systems which have to be supported.  It is exactly
the same idea as my first patch which checks for the RDSEED
instruction at run time and if the target system does not support it
"downgrades" to use the TSC counter.  Except now this is done at
run-time, not at build time.

thanks,
dan
  
Mattias Rönnblom May 4, 2020, 8:04 a.m. UTC | #17
On 2020-05-01 23:05, Dan Gora wrote:
> On Fri, May 1, 2020 at 1:29 PM Luca Boccassi <bluca@debian.org> wrote:
>>> Well, no, because rdseed is used first if available and /dev/urandom
>>> is used next..
>>>
>>> And this is not a corner case at all.. There are lots of linux
>>> distributions which DPDK claims to support which do not support
>>> getentropy().  This is hardly a non-standard build system.  You really
>>> compile and support a separate binary for every possible system that
>>> your customers will use?
>> Yes, that's how Linux distributions work. At the very least, most
>> deployments ensure dependencies are not downgraded, as that's rarely
>> supported, for very good reasons.
> I can assure you that most commercial companies do not want to have to
> support building a separate binary of their product for every since
> version of every single linux distribution.  And again no dependencies
> are being "downgraded" here.  The binary will use the _best_ method
> available on the target system, not the least common denominator of
> the build system as it does now.
>
> I don't really consider that using '/dev/urandom' is a "downgrade"
> from getentropy().  getentropy() was only introduced in the last
> couple of years, so it's not like it was some super important new
> method of getting random numbers, it's just a more convenient method.
> Under the hood, I would guess both methods are exactly the same and
> will give you the same quality of random number.  It's just the API
> interface which has been improved.  And I don't see /dev/urandom going
> away any time soon.
>
> Again, if you're still hung up on using /dev/urandom we can use v3 of
> my patch to search for the getentropy()  symbol.
>
>>> And what is this about third parties?  Last I checked DPDK was an API
>>> framework, not a proprietary standalone application.  It is designed,
>>> by definition, to be used by "third parties".  Or does it only have to
>>> work in Intel's toolchains?
>> Third party was the wrong expression. DPDK is a set of libraries with a
>> build system (two actually) which links them against their
>> dependencies. It is the reponsibility of who does deployment to make
>> sure those dependencies, as established at build time, are satisfied at
>> runtime. Up until now, downgrading at runtime vs build time was not
>> supported as far as I know,
> Of course it is.. That why the rte_cpu_get_flag_enabled() function
> exists, so that the code can choose the best function at run-time, not
> at build time.  There are numerous examples of this in DPDK already.
>
>> so what you are effectively asking is to
>> double the size of the support matrix, and for the project to ensure
>> that every single dependency can always be built against a new version
>> and used against an older one.
> No, this is not correct.  The size of the support/test matrix is
> exactly the same, what changes is the number of build systems that
> have to be supported.
>
> Before my patches you had to test targets with:
>
> no rdseed and no getentropy
> rdseed and no getentropy
> no rdseed and getentropy
> rdseed and getentropy.


rdseed will go away eventually, when all reasonable systems have 
getentropy(). That might well take a couple of years.


> For each of these test cases you had to have a separate build system
> to generate a different binary (let's say app/test).
>
> Oh and you also had to test with building with make and building with
> meson because getentropy was only supported with meson.
>
> Now, with my patches, you can build 'app/test' on *any* of these
> systems, and use make _or_ meson, and it will run exactly the same
> way as if you compiled it on the matching test system.
>
> That is the difference.
>
> Let's take another example:  Say I have an application which I have to
> support on RHEL 7.7 and 7.5.  7.7 has getentropy, 7.5 does not (ignore
> the rdseed case for now).
>
> So I am faced with:
> 1) Build on the RHEL 7.7 system.  This allows the app to use
> getentropy(), but it  cannot run on 7.5.
> 2) Build on the RHEL 7.5 system.  This allows the app to run on 7.5
> and 7.7, but only uses the TSC counter.
> 3) Build on both systems.  Ok, now I have to build and support two
> separate binaries.
> 4) Determine if  getentropy() exists at run time or use /dev/urandom.
> Now I can build on either system and it will use getentropy() on 7.7
> and the TSC counter on 7.5.


You build against the oldest libc supported. The rest of the 
dependencies, you carry with your application. Again, I believe this to 
be standard practice, and has been so for ages. Containers are really an 
extension of this method, but it was common pattern when commercial 
applications were more common (especially true for things like Solaris- 
or HP-UX-based apps). You didn't even care about getentropy(), so I 
can't understand how this is a problem.


> How is #3 "better" than #4?  You still have to test on 7.7 and 7.5.
> The amount of work in testing has not changed at all, only the number
> of build systems and binary versions that you have to maintain.
>
> And I'm not saying that this has to be a hard and fast rule for
> everything for now and forever.  I am fixing a small function which is
> used exactly once.  The original patch which introduced getentropy()
> in v19.08 created an extra dependency that the target system have
> glibc 2.25, _even if it does not use this function at all_.
>
> That said, it seems to me to be a good idea to remove as many build
> time dependencies as possible and use things like
> rte_cpu_get_flag_enabled() to determine the best code path at run time
> (that is, initialization time, not in the fastpath obviously).  It
> just makes life a lot easier... If the compiler can generate the code,
> then it should be possible in many cases.
>
>>>>>> This is especially true when dealing with RNG APIs, where the tiniest
>>>>>> and most innocent-looking mistake could have disastrous consequences.
>>>>> This does not change the functionality of the RNG at all.  It just
>>>>> makes it work in the way that it was intended.  These changes were
>>>>> only introduced into 19.08, so they are not historical artifacts or
>>>>> anything.
>>>> It's reimplementing a syscall using a different interface which has
>>>> different semantics. A small mistake there is going to cost us dearly.
>>> The code was copied almost verbatim from the getentropy() function in
>>> glibc, it's hard to see it going that wrong there.  The code can be
>>> tested using the same test cases that the original patch used.  I
>>> don't see how there is a difference in test coverage here.
>> And who's going to keep it updated as the implementation in glibc
>> changes, if/when a bug is found there?
> What happens when any bug is found in DPDK... It gets fixed.  Why is
> this any different?
>
>> Also, glibc is LGPL2.1 licensed, which means copying almost verbatim a
>> non-trivial piece of code can possibly result in librte_eal as a whole
>> to be covered under the terms of LGPL2.1 (IANAL disclaimer). This is
>> unlikely to be well received, given it's BSD-3-clause now.
> Well, verbatim, is an exaggeration and IANAL either, but I would
> consider what was written to be general enough to not really run into
> any problems.  If others disagree, the v3 patch is there waiting too..
>
>>> If it's such a big deal to use /dev/urandom, then what about my v3
>>> patch which used dlopen()/dlsym() to try to find getentropy()?  Is
>>> that not then acceptable?   dlopen/dlsym() are used in several placed
>>> in DPDK.
>> I don't have a problem with that specifically, but it needs to be clear
>> to the maintainers that what you are effectively asking is for all
>> dependencies to support runtime-downgrading transparently and out of
>> the box.
> Again, I am not asking for all build time dependencies to be
> eliminated, just this one, which was only introduced in v19.08, which
> creates a dependency on glibc 2.25, even if the application does not
> use the RNG at all.


There's no new dependency if you build against an older glibc version.


> This is already done in numerous places in the DPDK. See
> aesni_gcm_create().  See rte_hash_crc_set_alg().  See
> rte_hash_create().  They all check to see if certain CPU flags are
> supported and use the appropriate function.  This is not "downgrading"
> anything.  It is using the best available function that is supported
> on the target system, not the build system.  That's exactly what this
> (or the v3) patch does.
>
>> This comes with a huge additional cost, and I do not believe
>> it is supported at the moment.
> It does not come at any additional cost really, all it does is reduce
> the number of build systems which have to be supported.  It is exactly
> the same idea as my first patch which checks for the RDSEED
> instruction at run time and if the target system does not support it
> "downgrades" to use the TSC counter.  Except now this is done at
> run-time, not at build time.
>
The make build is going away completely.
  
Dan Gora May 4, 2020, 2:13 p.m. UTC | #18
On Mon, May 4, 2020 at 5:04 AM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:

> >> so what you are effectively asking is to
> >> double the size of the support matrix, and for the project to ensure
> >> that every single dependency can always be built against a new version
> >> and used against an older one.
> > No, this is not correct.  The size of the support/test matrix is
> > exactly the same, what changes is the number of build systems that
> > have to be supported.
> >
> > Before my patches you had to test targets with:
> >
> > no rdseed and no getentropy
> > rdseed and no getentropy
> > no rdseed and getentropy
> > rdseed and getentropy.
>
>
> rdseed will go away eventually, when all reasonable systems have
> getentropy(). That might well take a couple of years.

So then why did you include it in the first place?

> > For each of these test cases you had to have a separate build system
> > to generate a different binary (let's say app/test).
> >
> > Oh and you also had to test with building with make and building with
> > meson because getentropy was only supported with meson.
> >
> > Now, with my patches, you can build 'app/test' on *any* of these
> > systems, and use make _or_ meson, and it will run exactly the same
> > way as if you compiled it on the matching test system.
> >
> > That is the difference.
> >
> > Let's take another example:  Say I have an application which I have to
> > support on RHEL 7.7 and 7.5.  7.7 has getentropy, 7.5 does not (ignore
> > the rdseed case for now).
> >
> > So I am faced with:
> > 1) Build on the RHEL 7.7 system.  This allows the app to use
> > getentropy(), but it  cannot run on 7.5.
> > 2) Build on the RHEL 7.5 system.  This allows the app to run on 7.5
> > and 7.7, but only uses the TSC counter.
> > 3) Build on both systems.  Ok, now I have to build and support two
> > separate binaries.
> > 4) Determine if  getentropy() exists at run time or use /dev/urandom.
> > Now I can build on either system and it will use getentropy() on 7.7
> > and the TSC counter on 7.5.
>
>
> You build against the oldest libc supported. The rest of the
> dependencies, you carry with your application.

What does this even mean?   That I'm supposed to distribute a proper
copy of binutils/glibc for each Linux distribution that we have to
support?  Am I supposed to distribute that as source and have our
customers compile it from source or do I have to maintain a proper
version of glibc for every linux distribution that we have to support?

> Again, I believe this to
> be standard practice, and has been so for ages. Containers are really an
> extension of this method, but it was common pattern when commercial
> applications were more common (especially true for things like Solaris-
> or HP-UX-based apps).

This was not at all true for Solaris or HPUX.  Software that we wrote
worked on every release of Solaris from 2.6 up until Solaris 9.  We
had some 8 glorious years of binary stability.  And there were not 15
different distributions of Solaris or HPUX that we had to support.

> You didn't even care about getentropy(), so I
> can't understand how this is a problem.

Because the patch which introduced getentropy() made it so that if I
build on the wrong machine the binary will not work everywhere because
of this unnecessary dependency on glibc 2.25.  If I compile on a
machine which happened to have rdseed, suddenly my application would
not even start on a machine which did not have rdseed.  That was not a
problem until 19.08.

> > How is #3 "better" than #4?  You still have to test on 7.7 and 7.5.
> > The amount of work in testing has not changed at all, only the number
> > of build systems and binary versions that you have to maintain.
> >
> > And I'm not saying that this has to be a hard and fast rule for
> > everything for now and forever.  I am fixing a small function which is
> > used exactly once.  The original patch which introduced getentropy()
> > in v19.08 created an extra dependency that the target system have
> > glibc 2.25, _even if it does not use this function at all_.
> >
> > That said, it seems to me to be a good idea to remove as many build
> > time dependencies as possible and use things like
> > rte_cpu_get_flag_enabled() to determine the best code path at run time
> > (that is, initialization time, not in the fastpath obviously).  It
> > just makes life a lot easier... If the compiler can generate the code,
> > then it should be possible in many cases.
> >
> >>>>>> This is especially true when dealing with RNG APIs, where the tiniest
> >>>>>> and most innocent-looking mistake could have disastrous consequences.
> >>>>> This does not change the functionality of the RNG at all.  It just
> >>>>> makes it work in the way that it was intended.  These changes were
> >>>>> only introduced into 19.08, so they are not historical artifacts or
> >>>>> anything.
> >>>> It's reimplementing a syscall using a different interface which has
> >>>> different semantics. A small mistake there is going to cost us dearly.
> >>> The code was copied almost verbatim from the getentropy() function in
> >>> glibc, it's hard to see it going that wrong there.  The code can be
> >>> tested using the same test cases that the original patch used.  I
> >>> don't see how there is a difference in test coverage here.
> >> And who's going to keep it updated as the implementation in glibc
> >> changes, if/when a bug is found there?
> > What happens when any bug is found in DPDK... It gets fixed.  Why is
> > this any different?
> >
> >> Also, glibc is LGPL2.1 licensed, which means copying almost verbatim a
> >> non-trivial piece of code can possibly result in librte_eal as a whole
> >> to be covered under the terms of LGPL2.1 (IANAL disclaimer). This is
> >> unlikely to be well received, given it's BSD-3-clause now.
> > Well, verbatim, is an exaggeration and IANAL either, but I would
> > consider what was written to be general enough to not really run into
> > any problems.  If others disagree, the v3 patch is there waiting too..
> >
> >>> If it's such a big deal to use /dev/urandom, then what about my v3
> >>> patch which used dlopen()/dlsym() to try to find getentropy()?  Is
> >>> that not then acceptable?   dlopen/dlsym() are used in several placed
> >>> in DPDK.
> >> I don't have a problem with that specifically, but it needs to be clear
> >> to the maintainers that what you are effectively asking is for all
> >> dependencies to support runtime-downgrading transparently and out of
> >> the box.
> > Again, I am not asking for all build time dependencies to be
> > eliminated, just this one, which was only introduced in v19.08, which
> > creates a dependency on glibc 2.25, even if the application does not
> > use the RNG at all.
>
>
> There's no new dependency if you build against an older glibc version.

Because the point is that now I have to care.  Now I have to check and
make sure that the machine I build on has the right glibc version. I
have to find a machine which does not have rdseed on it in order to
build my application.  Why should I even have to care about this?

Why are you arguing against all this now?  You suggested to use
/dev/urandom.  You used it in your original patches for these changes.
You said these changes were "a good idea".  Why is it suddenly a bad
idea now?

> > This is already done in numerous places in the DPDK. See
> > aesni_gcm_create().  See rte_hash_crc_set_alg().  See
> > rte_hash_create().  They all check to see if certain CPU flags are
> > supported and use the appropriate function.  This is not "downgrading"
> > anything.  It is using the best available function that is supported
> > on the target system, not the build system.  That's exactly what this
> > (or the v3) patch does.
> >
> >> This comes with a huge additional cost, and I do not believe
> >> it is supported at the moment.
> > It does not come at any additional cost really, all it does is reduce
> > the number of build systems which have to be supported.  It is exactly
> > the same idea as my first patch which checks for the RDSEED
> > instruction at run time and if the target system does not support it
> > "downgrades" to use the TSC counter.  Except now this is done at
> > run-time, not at build time.
> >
> The make build is going away completely.

Well it's not going away for at least a year or two and that has
nothing to do with the issues at hand here.  And I sure hope that
someone actually writes some documentation for how to build third
party applications using DPDK and meson before that happens because
currently there is none.  But it's pretty clear the DPDK community
doesn't really care about lowly "third party developers", so I'm sure
that will not happen.

thanks
dan
  
Dan Gora May 4, 2020, 2:19 p.m. UTC | #19
On Mon, May 4, 2020 at 11:13 AM Dan Gora <dg@adax.com> wrote:
>
> On Mon, May 4, 2020 at 5:04 AM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>
> > >> so what you are effectively asking is to
> > >> double the size of the support matrix, and for the project to ensure
> > >> that every single dependency can always be built against a new version
> > >> and used against an older one.
> > > No, this is not correct.  The size of the support/test matrix is
> > > exactly the same, what changes is the number of build systems that
> > > have to be supported.
> > >
> > > Before my patches you had to test targets with:
> > >
> > > no rdseed and no getentropy
> > > rdseed and no getentropy
> > > no rdseed and getentropy
> > > rdseed and getentropy.
> >
> >
> > rdseed will go away eventually, when all reasonable systems have
> > getentropy(). That might well take a couple of years.

So then why not just use only /dev/urandom for now, then change DPDK
to use getentropy() when "all reasonable systems have getentropy()"?

*problem solved*

thanks
dan
  
Dan Gora June 2, 2020, 5:10 a.m. UTC | #20
Can these patches be considered again for 20.08?

I thought that I addressed all of the issues or at least provided
reasonable responses to all of the concerns.

thanks
dan

On Mon, May 4, 2020 at 11:19 AM Dan Gora <dg@adax.com> wrote:
>
> On Mon, May 4, 2020 at 11:13 AM Dan Gora <dg@adax.com> wrote:
> >
> > On Mon, May 4, 2020 at 5:04 AM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >
> > > >> so what you are effectively asking is to
> > > >> double the size of the support matrix, and for the project to ensure
> > > >> that every single dependency can always be built against a new version
> > > >> and used against an older one.
> > > > No, this is not correct.  The size of the support/test matrix is
> > > > exactly the same, what changes is the number of build systems that
> > > > have to be supported.
> > > >
> > > > Before my patches you had to test targets with:
> > > >
> > > > no rdseed and no getentropy
> > > > rdseed and no getentropy
> > > > no rdseed and getentropy
> > > > rdseed and getentropy.
> > >
> > >
> > > rdseed will go away eventually, when all reasonable systems have
> > > getentropy(). That might well take a couple of years.
>
> So then why not just use only /dev/urandom for now, then change DPDK
> to use getentropy() when "all reasonable systems have getentropy()"?
>
> *problem solved*
>
> thanks
> dan
  
Dan Gora June 9, 2020, 3:37 p.m. UTC | #21
On Tue, Jun 2, 2020 at 2:10 AM Dan Gora <dg@adax.com> wrote:
>
> Can these patches be considered again for 20.08?
>
> I thought that I addressed all of the issues or at least provided
> reasonable responses to all of the concerns.

So I guess I'll take that as a "no"...

I have to admit that I just laughed when I read the recent thread
about "why or why cannot we not expand our community?"

This is a big part of it.

dan
  
Thomas Monjalon June 10, 2020, 8:07 a.m. UTC | #22
04/05/2020 16:13, Dan Gora:
> On Mon, May 4, 2020 at 5:04 AM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
> > The make build is going away completely.
> 
> Well it's not going away for at least a year or two

It will be removed in 20.11.

> and that has
> nothing to do with the issues at hand here.  And I sure hope that
> someone actually writes some documentation for how to build third
> party applications using DPDK and meson before that happens because
> currently there is none.

meson provides a standard pkg-config file for DPDK,
so there is nothing specific to do.
Plus, our examples are using pkg-config to demonstrate it.

> But it's pretty clear the DPDK community
> doesn't really care about lowly "third party developers", so I'm sure
> that will not happen.

This is a tough assumption.

I did not follow this whole discussion,
but it seems Luca, Stephen and Mattias gave some replies.
If you disagree with their comments, I recommend doing a summary of the
discussion and request the DPDK Technical Board to give a decision.
  
Thomas Monjalon June 10, 2020, 8:15 a.m. UTC | #23
09/06/2020 17:37, Dan Gora:
> On Tue, Jun 2, 2020 at 2:10 AM Dan Gora <dg@adax.com> wrote:
> >
> > Can these patches be considered again for 20.08?
> >
> > I thought that I addressed all of the issues or at least provided
> > reasonable responses to all of the concerns.
> 
> So I guess I'll take that as a "no"...
> 
> I have to admit that I just laughed when I read the recent thread
> about "why or why cannot we not expand our community?"
> 
> This is a big part of it.

It seems there is a communication issue in this thread.
Some maintainers look to be against the solution,
and you seem frustrated because of a blocked usage issue.

Please could you explain what is the initial user problem?
It seems you want to run a recent DPDK with an old glibc. Am I right?
  
Luca Boccassi June 10, 2020, 8:33 a.m. UTC | #24
On Wed, 2020-06-10 at 10:15 +0200, Thomas Monjalon wrote:
> 09/06/2020 17:37, Dan Gora:
> > On Tue, Jun 2, 2020 at 2:10 AM Dan Gora <dg@adax.com> wrote:
> > > Can these patches be considered again for 20.08?
> > > 
> > > I thought that I addressed all of the issues or at least provided
> > > reasonable responses to all of the concerns.
> > 
> > So I guess I'll take that as a "no"...
> > 
> > I have to admit that I just laughed when I read the recent thread
> > about "why or why cannot we not expand our community?"
> > 
> > This is a big part of it.
> 
> It seems there is a communication issue in this thread.
> Some maintainers look to be against the solution,
> and you seem frustrated because of a blocked usage issue.
> 
> Please could you explain what is the initial user problem?
> It seems you want to run a recent DPDK with an old glibc. Am I right?

Building DPDK with a recent glibc and then running it with an old
glibc, to be precise. This model is not supportable.
  
Stephen Hemminger June 12, 2023, 3:55 p.m. UTC | #25
On Wed, 10 Jun 2020 09:33:04 +0100
Luca Boccassi <bluca@debian.org> wrote:

> > It seems there is a communication issue in this thread.
> > Some maintainers look to be against the solution,
> > and you seem frustrated because of a blocked usage issue.
> > 
> > Please could you explain what is the initial user problem?
> > It seems you want to run a recent DPDK with an old glibc. Am I right?  
> 
> Building DPDK with a recent glibc and then running it with an old
> glibc, to be precise. This model is not supportable.

There has not been any further action on this patchset in years.
And the end of the thread describes how the patch is trying to solve a problem
that DPDK and glibc don't want to support. Marking it as rejected.
  

Patch

diff --git a/lib/librte_eal/common/rte_random.c b/lib/librte_eal/common/rte_random.c
index df02f1307..f8203f4c7 100644
--- a/lib/librte_eal/common/rte_random.c
+++ b/lib/librte_eal/common/rte_random.c
@@ -7,6 +7,7 @@ 
 #endif
 #include <stdlib.h>
 #include <unistd.h>
+#include <dlfcn.h>
 
 #include <rte_branch_prediction.h>
 #include <rte_cycles.h>
@@ -176,18 +177,38 @@  rte_rand_max(uint64_t upper_bound)
 	return res;
 }
 
+/* Try to use the getentropy() function from glibc >= 2.25 */
+static int
+__rte_getentropy(uint64_t *ge_seed)
+{
+	void *handle = NULL;
+	void **sym;
+	int (*getentropy_p)(void *__buffer, size_t __length);
+	int gc_rc;
+
+	handle = dlopen("libc.so.6", RTLD_LAZY);
+	if (!handle)
+		return -1;
+
+	sym = dlsym(handle, "getentropy");
+	if (!sym || !*sym)
+		/* Cannot resolve getentropy */
+		return -1;
+
+	getentropy_p = (int (*)(void *, size_t)) sym;
+	gc_rc = (*getentropy_p)((void *)ge_seed, sizeof(*ge_seed));
+	dlclose(handle);
+	return gc_rc;
+}
+
 static uint64_t
 __rte_random_initial_seed(void)
 {
-#ifdef RTE_LIBEAL_USE_GETENTROPY
-	int ge_rc;
 	uint64_t ge_seed;
 
-	ge_rc = getentropy(&ge_seed, sizeof(ge_seed));
-
-	if (ge_rc == 0)
+	if (__rte_getentropy(&ge_seed) == 0)
 		return ge_seed;
-#endif
+
 #if defined(RTE_ARCH_X86)
 	/* first fallback: rdseed instruction, if available */
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RDSEED)) {
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index 0267c3b9d..748359b8c 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -15,9 +15,6 @@  deps += 'kvargs'
 if dpdk_conf.has('RTE_USE_LIBBSD')
 	ext_deps += libbsd
 endif
-if cc.has_function('getentropy', prefix : '#include <unistd.h>')
-	cflags += '-DRTE_LIBEAL_USE_GETENTROPY'
-endif
 if cc.has_header('getopt.h')
 	cflags += ['-DHAVE_GETOPT_H', '-DHAVE_GETOPT', '-DHAVE_GETOPT_LONG']
 endif