[v2,4/4] add ABI checks

Message ID 20200129172621.28565-5-david.marchand@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series add ABI checks |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/travis-robot success Travis build: passed
ci/Intel-compilation fail apply issues

Commit Message

David Marchand Jan. 29, 2020, 5:26 p.m. UTC
  For normal developers, those checks are disabled.

Enabling them requires a configuration that will trigger the ABI dumps
generation as part of the existing devtools/test-build.sh and
devtools/test-meson-builds.sh scripts.

Those checks are enabled in the CI for the default meson options on x86
and aarch64 so that proposed patches are validated via our CI robot.
A cache of the ABI is stored in travis jobs to avoid rebuilding too
often.

Checks can be only informational by setting ABI_CHECKS_WARN_ONLY when
breaking the ABI in a future release.

Explicit suppression rules have been added on internal structures
exposed to crypto and security drivers as the current ABI policy does
not apply to them.
This could be improved in the future by carefully splitting the headers
content with application and driver "users" in mind.

We currently have issues reported for librte_crypto recent changes for
which suppression rules have been added too.

Mellanox glue libraries are explicitly skipped as they are not part of
the application ABI.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changelog since v1:
- reworked the scripts so that the build test scripts clone and build the
  reference automatically. A developer only needs to set one variable to
  enable the checks,
- meson builds are done with debug so that abidiff can inspect the
  structures,
- abidiff checks only public types by looking at installed headers,
- abidiff has some issue when comparing a dump with a .so built with clang
  so all diff are now done with dump files only,
- suppression rules have been added to waive warnings on exposed internal
  types,
- an abi breakage has been reported on changes in cryptodev.
  For now, suppression rules have been put in place to let the CI run,

---
 .ci/linux-build.sh                  | 23 +++++++++++
 .travis.yml                         | 20 +++++++++-
 MAINTAINERS                         |  2 +
 devtools/check-abi.sh               | 59 +++++++++++++++++++++++++++++
 devtools/dpdk.abignore              | 20 ++++++++++
 devtools/gen-abi.sh                 | 26 +++++++++++++
 devtools/test-build.sh              | 45 ++++++++++++++++++++--
 devtools/test-meson-builds.sh       | 35 ++++++++++++++++-
 doc/guides/contributing/patches.rst | 13 +++++++
 9 files changed, 236 insertions(+), 7 deletions(-)
 create mode 100755 devtools/check-abi.sh
 create mode 100644 devtools/dpdk.abignore
 create mode 100755 devtools/gen-abi.sh
  

Comments

Thomas Monjalon Jan. 29, 2020, 5:42 p.m. UTC | #1
Anoob, Akhil,

Please we need to revert or fix the ABI breakages in cryptodev very soon.
The FIXME section below must be empty.

Thanks

29/01/2020 18:26, David Marchand:
> We currently have issues reported for librte_crypto recent changes for
> which suppression rules have been added too.
[..]
> --- /dev/null
> +++ b/devtools/dpdk.abignore
> +; FIXME
> +[suppress_type]
> +        type_kind = enum
> +        name = rte_crypto_aead_algorithm
> +        changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
> +[suppress_type]
> +        type_kind = enum
> +        name = rte_crypto_asym_xform_type
> +        changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> +[suppress_variable]
> +        name = rte_crypto_aead_algorithm_strings
  
Anoob Joseph Jan. 29, 2020, 6:10 p.m. UTC | #2
Hi Thomas,

The asymmetric crypto library is experimental. Changes to experimental code paths is allowed, right?

Also, I was wondering why changing the LIST_END would cause breakage. Before we introduced the ABI checks and ABI freeze policy, it was always allowed to add enums to the end. I'm just trying to understand the real impact of this case.

If we don't allow the LIST_END to be modified, then it means no feature can be implemented in between. And the best way to overcome that would be to just remove the LIST_END or set LIST_END to a very high value.

Thanks,
Anoob

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Wednesday, January 29, 2020 11:13 PM
> To: akhil.goyal@nxp.com; Anoob Joseph <anoobj@marvell.com>
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>;
> bruce.richardson@intel.com; nhorman@tuxdriver.com; John McNamara
> <john.mcnamara@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> Anoob, Akhil,
> 
> Please we need to revert or fix the ABI breakages in cryptodev very soon.
> The FIXME section below must be empty.
> 
> Thanks
> 
> 29/01/2020 18:26, David Marchand:
> > We currently have issues reported for librte_crypto recent changes for
> > which suppression rules have been added too.
> [..]
> > --- /dev/null
> > +++ b/devtools/dpdk.abignore
> > +; FIXME
> > +[suppress_type]
> > +        type_kind = enum
> > +        name = rte_crypto_aead_algorithm
> > +        changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
> > +[suppress_type]
> > +        type_kind = enum
> > +        name = rte_crypto_asym_xform_type
> > +        changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > +[suppress_variable]
> > +        name = rte_crypto_aead_algorithm_strings
> 
> 
>
  
David Marchand Jan. 29, 2020, 8:03 p.m. UTC | #3
On Wed, Jan 29, 2020 at 7:10 PM Anoob Joseph <anoobj@marvell.com> wrote:
> The asymmetric crypto library is experimental. Changes to experimental code paths is allowed, right?

The asymmetric crypto enum is referenced by a function part of the stable ABI.
It is possible to waive this enum, if we are sure no use out of the
experimental asym crypto APIs is possible.

The rest of the changes touch stable symbols.

Adding the abidiff report:

  [C]'function void rte_cryptodev_info_get(uint8_t,
rte_cryptodev_info*)' at rte_cryptodev.c:1115:1 has some indirect
sub-type changes:
    parameter 2 of type 'rte_cryptodev_info*' has sub-type changes:
      in pointed to type 'struct rte_cryptodev_info' at rte_cryptodev.h:468:1:
        type size hasn't changed
        1 data member change:
         type of 'const rte_cryptodev_capabilities*
rte_cryptodev_info::capabilities' changed:
           in pointed to type 'const rte_cryptodev_capabilities':
             in unqualified underlying type 'struct
rte_cryptodev_capabilities' at rte_cryptodev.h:176:1:
               type size hasn't changed
               1 data member change:
                type of '__anonymous_union__ ' changed:
                  type size hasn't changed
                  1 data member change:
                   type of 'rte_cryptodev_asymmetric_capability
__anonymous_union__::asym' changed:
                     type size hasn't changed
                     1 data member change:
                      type of
'rte_cryptodev_asymmetric_xform_capability
rte_cryptodev_asymmetric_capability::xform_capa' changed:
                        type size hasn't changed
                        1 data member change:
                         type of 'rte_crypto_asym_xform_type
rte_cryptodev_asymmetric_xform_capability::xform_type' changed:
                           type size hasn't changed
                           2 enumerator insertions:

'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECDSA' value '7'

'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECPM' value '8'
                           1 enumerator change:

'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END' from
value '7' to '9' at rte_crypto_asym.h:60:1


  [C]'function int
rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
      in pointed to type 'enum rte_crypto_aead_algorithm' at
rte_crypto_sym.h:346:1:
        type size hasn't changed
        1 enumerator insertion:
          'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
value '3'
        1 enumerator change:
          'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
value '3' to '4' at rte_crypto_sym.h:346:1


  [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
rte_crypto_sym.h:358:1:
    size of symbol (in bytes) changed from 24 to 32
  
Akhil Goyal Jan. 29, 2020, 8:13 p.m. UTC | #4
> 
> On Wed, Jan 29, 2020 at 7:10 PM Anoob Joseph <anoobj@marvell.com> wrote:
> > The asymmetric crypto library is experimental. Changes to experimental code
> paths is allowed, right?
> 
> The asymmetric crypto enum is referenced by a function part of the stable ABI.
> It is possible to waive this enum, if we are sure no use out of the
> experimental asym crypto APIs is possible.
> 
> The rest of the changes touch stable symbols.
> 
> Adding the abidiff report:
> 
>   [C]'function void rte_cryptodev_info_get(uint8_t,
> rte_cryptodev_info*)' at rte_cryptodev.c:1115:1 has some indirect
> sub-type changes:
>     parameter 2 of type 'rte_cryptodev_info*' has sub-type changes:
>       in pointed to type 'struct rte_cryptodev_info' at rte_cryptodev.h:468:1:
>         type size hasn't changed
>         1 data member change:
>          type of 'const rte_cryptodev_capabilities*
> rte_cryptodev_info::capabilities' changed:
>            in pointed to type 'const rte_cryptodev_capabilities':
>              in unqualified underlying type 'struct
> rte_cryptodev_capabilities' at rte_cryptodev.h:176:1:
>                type size hasn't changed
>                1 data member change:
>                 type of '__anonymous_union__ ' changed:
>                   type size hasn't changed
>                   1 data member change:
>                    type of 'rte_cryptodev_asymmetric_capability
> __anonymous_union__::asym' changed:
>                      type size hasn't changed
>                      1 data member change:
>                       type of
> 'rte_cryptodev_asymmetric_xform_capability
> rte_cryptodev_asymmetric_capability::xform_capa' changed:
>                         type size hasn't changed
>                         1 data member change:
>                          type of 'rte_crypto_asym_xform_type
> rte_cryptodev_asymmetric_xform_capability::xform_type' changed:
>                            type size hasn't changed
>                            2 enumerator insertions:
> 
> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECDSA' value '7'
> 
> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECPM' value '8'
>                            1 enumerator change:
> 
> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END'
> from
> value '7' to '9' at rte_crypto_asym.h:60:1
> 

I believe these enums will be used only in case of ASYM case which is experimental.

> 
>   [C]'function int
> rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
> char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
>       in pointed to type 'enum rte_crypto_aead_algorithm' at
> rte_crypto_sym.h:346:1:
>         type size hasn't changed
>         1 enumerator insertion:
>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
> value '3'
>         1 enumerator change:
>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
> value '3' to '4' at rte_crypto_sym.h:346:1
> 
> 
>   [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
> rte_crypto_sym.h:358:1:
>     size of symbol (in bytes) changed from 24 to 32
> 
> 
+Fiona and Arek 

We may need to revert the chacha-poly patches.

> --
> David Marchand
  
Fiona Trahe Jan. 30, 2020, 1:06 p.m. UTC | #5
We were unaware the LIST_END change could constitute an ABI breakage, but can see how it affects the array size when picked up.
We're exploring options.

I agree with Anoob's point that if we don't allow the LIST_END to be modified, then it means no feature can be implemented without ABI breakage.
Anyone  object to removing those LIST_END elements - or have a better suggestion? Would have to be in 20.11 I suppose.

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Anoob Joseph
> Sent: Wednesday, January 29, 2020 6:10 PM
> To: Thomas Monjalon <thomas@monjalon.net>; akhil.goyal@nxp.com
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; nhorman@tuxdriver.com; Mcnamara, John
> <john.mcnamara@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> Hi Thomas,
> 
> The asymmetric crypto library is experimental. Changes to experimental code paths is allowed, right?
> 
> Also, I was wondering why changing the LIST_END would cause breakage. Before we introduced the ABI
> checks and ABI freeze policy, it was always allowed to add enums to the end. I'm just trying to
> understand the real impact of this case.
> 
> If we don't allow the LIST_END to be modified, then it means no feature can be implemented in
> between. And the best way to overcome that would be to just remove the LIST_END or set LIST_END to
> a very high value.
> 
> Thanks,
> Anoob
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> > Sent: Wednesday, January 29, 2020 11:13 PM
> > To: akhil.goyal@nxp.com; Anoob Joseph <anoobj@marvell.com>
> > Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>;
> > bruce.richardson@intel.com; nhorman@tuxdriver.com; John McNamara
> > <john.mcnamara@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> >
> > Anoob, Akhil,
> >
> > Please we need to revert or fix the ABI breakages in cryptodev very soon.
> > The FIXME section below must be empty.
> >
> > Thanks
> >
> > 29/01/2020 18:26, David Marchand:
> > > We currently have issues reported for librte_crypto recent changes for
> > > which suppression rules have been added too.
> > [..]
> > > --- /dev/null
> > > +++ b/devtools/dpdk.abignore
> > > +; FIXME
> > > +[suppress_type]
> > > +        type_kind = enum
> > > +        name = rte_crypto_aead_algorithm
> > > +        changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
> > > +[suppress_type]
> > > +        type_kind = enum
> > > +        name = rte_crypto_asym_xform_type
> > > +        changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > > +[suppress_variable]
> > > +        name = rte_crypto_aead_algorithm_strings
> >
> >
> >
  
Thomas Monjalon Jan. 30, 2020, 3:59 p.m. UTC | #6
30/01/2020 14:06, Trahe, Fiona:
> We were unaware the LIST_END change could constitute an ABI breakage, but can see how it affects the array size when picked up.
> We're exploring options.
> 
> I agree with Anoob's point that if we don't allow the LIST_END to be modified, then it means no feature can be implemented without ABI breakage.
> Anyone  object to removing those LIST_END elements - or have a better suggestion? Would have to be in 20.11 I suppose.

Yes, having max value right after the last value is ridiculous,
it prevents adding any value.
In 20.11, we should remove all these *_END and *_MAX from API enums
and replace them with a separate #define with reasonnable maximums.
  
Ferruh Yigit Jan. 30, 2020, 4:09 p.m. UTC | #7
On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> 
> 
>>
>> On Wed, Jan 29, 2020 at 7:10 PM Anoob Joseph <anoobj@marvell.com> wrote:
>>> The asymmetric crypto library is experimental. Changes to experimental code
>> paths is allowed, right?
>>
>> The asymmetric crypto enum is referenced by a function part of the stable ABI.
>> It is possible to waive this enum, if we are sure no use out of the
>> experimental asym crypto APIs is possible.
>>
>> The rest of the changes touch stable symbols.
>>
>> Adding the abidiff report:
>>
>>   [C]'function void rte_cryptodev_info_get(uint8_t,
>> rte_cryptodev_info*)' at rte_cryptodev.c:1115:1 has some indirect
>> sub-type changes:
>>     parameter 2 of type 'rte_cryptodev_info*' has sub-type changes:
>>       in pointed to type 'struct rte_cryptodev_info' at rte_cryptodev.h:468:1:
>>         type size hasn't changed
>>         1 data member change:
>>          type of 'const rte_cryptodev_capabilities*
>> rte_cryptodev_info::capabilities' changed:
>>            in pointed to type 'const rte_cryptodev_capabilities':
>>              in unqualified underlying type 'struct
>> rte_cryptodev_capabilities' at rte_cryptodev.h:176:1:
>>                type size hasn't changed
>>                1 data member change:
>>                 type of '__anonymous_union__ ' changed:
>>                   type size hasn't changed
>>                   1 data member change:
>>                    type of 'rte_cryptodev_asymmetric_capability
>> __anonymous_union__::asym' changed:
>>                      type size hasn't changed
>>                      1 data member change:
>>                       type of
>> 'rte_cryptodev_asymmetric_xform_capability
>> rte_cryptodev_asymmetric_capability::xform_capa' changed:
>>                         type size hasn't changed
>>                         1 data member change:
>>                          type of 'rte_crypto_asym_xform_type
>> rte_cryptodev_asymmetric_xform_capability::xform_type' changed:
>>                            type size hasn't changed
>>                            2 enumerator insertions:
>>
>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECDSA' value '7'
>>
>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECPM' value '8'
>>                            1 enumerator change:
>>
>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END'
>> from
>> value '7' to '9' at rte_crypto_asym.h:60:1
>>
> 
> I believe these enums will be used only in case of ASYM case which is experimental.

Independent from being experiment and not, this shouldn't be a problem, I think
this is a false positive.

The ABI break can happen when a struct has been shared between the application
and the library (DPDK) and the layout of that memory know differently by
application and the library.

Here in all cases, there is no layout/size change.

As to the value changes of the enums, since application compiled with old DPDK,
it will know only up to '6', 7 and more means invalid to the application. So it
won't send these values also it should ignore these values from library. Only
consequence is old application won't able to use new features those new enums
provide but that is expected/normal.

> 
>>
>>   [C]'function int
>> rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
>> char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
>>     parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
>>       in pointed to type 'enum rte_crypto_aead_algorithm' at
>> rte_crypto_sym.h:346:1:
>>         type size hasn't changed
>>         1 enumerator insertion:
>>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
>> value '3'
>>         1 enumerator change:
>>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
>> value '3' to '4' at rte_crypto_sym.h:346:1

Same as above, no layout change.

>>
>>
>>   [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
>> rte_crypto_sym.h:358:1:
>>     size of symbol (in bytes) changed from 24 to 32
>>

The shared memory size changes, but this is global variable in the library, and
the values application can request 'RTE_CRYPTO_AEAD_AES_CCM' &
'RTE_CRYPTO_AEAD_AES_GCM' is already there, so there is no backward
compatibility issue here.

>>
> +Fiona and Arek 
> 
> We may need to revert the chacha-poly patches.
> 

I don't see any ABI break in this case, can someone explain if I am missing
anything here?
  
Ferruh Yigit Jan. 30, 2020, 4:42 p.m. UTC | #8
On 1/30/2020 3:59 PM, Thomas Monjalon wrote:
> 30/01/2020 14:06, Trahe, Fiona:
>> We were unaware the LIST_END change could constitute an ABI breakage, but can see how it affects the array size when picked up.
>> We're exploring options.
>>
>> I agree with Anoob's point that if we don't allow the LIST_END to be modified, then it means no feature can be implemented without ABI breakage.
>> Anyone  object to removing those LIST_END elements - or have a better suggestion? Would have to be in 20.11 I suppose.
> 
> Yes, having max value right after the last value is ridiculous,
> it prevents adding any value.
> In 20.11, we should remove all these *_END and *_MAX from API enums
> and replace them with a separate #define with reasonnable maximums.
> 
>

I disagree, that kind of usage is common and lets loops iterate on the valid
elements, and it is not a source of ABI break on its own.
Indeed other way around, not having MAX, is problematic, if we don't have the
MAX value to compare and decide if a provided value is valid or not, and when
new version of the library introduces the new values, how old application can
detect unsupported new values?

As far as I can see the problem occurs when that *_END and *_MAX used to define
the size of array in the public struct. This usage prevents adding new values
and I already send a deprecation notice for it:
https://patches.dpdk.org/patch/65359/
  
Thomas Monjalon Jan. 30, 2020, 8:18 p.m. UTC | #9
30/01/2020 17:09, Ferruh Yigit:
> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > 
> > 
> >>
> >> On Wed, Jan 29, 2020 at 7:10 PM Anoob Joseph <anoobj@marvell.com> wrote:
> >>> The asymmetric crypto library is experimental. Changes to experimental code
> >> paths is allowed, right?
> >>
> >> The asymmetric crypto enum is referenced by a function part of the stable ABI.
> >> It is possible to waive this enum, if we are sure no use out of the
> >> experimental asym crypto APIs is possible.
> >>
> >> The rest of the changes touch stable symbols.
> >>
> >> Adding the abidiff report:
> >>
> >>   [C]'function void rte_cryptodev_info_get(uint8_t,
> >> rte_cryptodev_info*)' at rte_cryptodev.c:1115:1 has some indirect
> >> sub-type changes:
> >>     parameter 2 of type 'rte_cryptodev_info*' has sub-type changes:
> >>       in pointed to type 'struct rte_cryptodev_info' at rte_cryptodev.h:468:1:
> >>         type size hasn't changed
> >>         1 data member change:
> >>          type of 'const rte_cryptodev_capabilities*
> >> rte_cryptodev_info::capabilities' changed:
> >>            in pointed to type 'const rte_cryptodev_capabilities':
> >>              in unqualified underlying type 'struct
> >> rte_cryptodev_capabilities' at rte_cryptodev.h:176:1:
> >>                type size hasn't changed
> >>                1 data member change:
> >>                 type of '__anonymous_union__ ' changed:
> >>                   type size hasn't changed
> >>                   1 data member change:
> >>                    type of 'rte_cryptodev_asymmetric_capability
> >> __anonymous_union__::asym' changed:
> >>                      type size hasn't changed
> >>                      1 data member change:
> >>                       type of
> >> 'rte_cryptodev_asymmetric_xform_capability
> >> rte_cryptodev_asymmetric_capability::xform_capa' changed:
> >>                         type size hasn't changed
> >>                         1 data member change:
> >>                          type of 'rte_crypto_asym_xform_type
> >> rte_cryptodev_asymmetric_xform_capability::xform_type' changed:
> >>                            type size hasn't changed
> >>                            2 enumerator insertions:
> >>
> >> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECDSA' value '7'
> >>
> >> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECPM' value '8'
> >>                            1 enumerator change:
> >>
> >> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END'
> >> from
> >> value '7' to '9' at rte_crypto_asym.h:60:1
> >>
> > 
> > I believe these enums will be used only in case of ASYM case which is experimental.
> 
> Independent from being experiment and not, this shouldn't be a problem, I think
> this is a false positive.
> 
> The ABI break can happen when a struct has been shared between the application
> and the library (DPDK) and the layout of that memory know differently by
> application and the library.
> 
> Here in all cases, there is no layout/size change.
> 
> As to the value changes of the enums, since application compiled with old DPDK,
> it will know only up to '6', 7 and more means invalid to the application. So it
> won't send these values also it should ignore these values from library. Only
> consequence is old application won't able to use new features those new enums
> provide but that is expected/normal.

If library give higher value than expected by the application,
if the application uses this value as array index,
there can be an access out of bounds.


> >>   [C]'function int
> >> rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
> >> char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
> >>     parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
> >>       in pointed to type 'enum rte_crypto_aead_algorithm' at
> >> rte_crypto_sym.h:346:1:
> >>         type size hasn't changed
> >>         1 enumerator insertion:
> >>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
> >> value '3'
> >>         1 enumerator change:
> >>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
> >> value '3' to '4' at rte_crypto_sym.h:346:1
> 
> Same as above, no layout change.
> 
> >>
> >>
> >>   [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
> >> rte_crypto_sym.h:358:1:
> >>     size of symbol (in bytes) changed from 24 to 32
> >>
> 
> The shared memory size changes, but this is global variable in the library, and
> the values application can request 'RTE_CRYPTO_AEAD_AES_CCM' &
> 'RTE_CRYPTO_AEAD_AES_GCM' is already there, so there is no backward
> compatibility issue here.

For this one, I don't know what is the breakage.


> > +Fiona and Arek 
> > 
> > We may need to revert the chacha-poly patches.
> > 
> 
> I don't see any ABI break in this case, can someone explain if I am missing
> anything here?
  
Ananyev, Konstantin Jan. 30, 2020, 11:49 p.m. UTC | #10
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Thursday, January 30, 2020 4:00 PM
> To: Anoob Joseph <anoobj@marvell.com>; akhil.goyal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> nhorman@tuxdriver.com; Mcnamara, John <john.mcnamara@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>; Kusztal, ArkadiuszX
> <arkadiuszx.kusztal@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> 30/01/2020 14:06, Trahe, Fiona:
> > We were unaware the LIST_END change could constitute an ABI breakage, but can see how it affects the array size when picked up.
> > We're exploring options.
> >
> > I agree with Anoob's point that if we don't allow the LIST_END to be modified, then it means no feature can be implemented without ABI
> breakage.
> > Anyone  object to removing those LIST_END elements - or have a better suggestion? Would have to be in 20.11 I suppose.
> 
> Yes, having max value right after the last value is ridiculous,
> it prevents adding any value.
> In 20.11, we should remove all these *_END and *_MAX from API enums
> and replace them with a separate #define with reasonnable maximums.
> 

I think we'd better avoid public structs that have array of _MAX elems in them.
  
Ferruh Yigit Jan. 31, 2020, 9:03 a.m. UTC | #11
On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> 30/01/2020 17:09, Ferruh Yigit:
>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>
>>>
>>>>
>>>> On Wed, Jan 29, 2020 at 7:10 PM Anoob Joseph <anoobj@marvell.com> wrote:
>>>>> The asymmetric crypto library is experimental. Changes to experimental code
>>>> paths is allowed, right?
>>>>
>>>> The asymmetric crypto enum is referenced by a function part of the stable ABI.
>>>> It is possible to waive this enum, if we are sure no use out of the
>>>> experimental asym crypto APIs is possible.
>>>>
>>>> The rest of the changes touch stable symbols.
>>>>
>>>> Adding the abidiff report:
>>>>
>>>>   [C]'function void rte_cryptodev_info_get(uint8_t,
>>>> rte_cryptodev_info*)' at rte_cryptodev.c:1115:1 has some indirect
>>>> sub-type changes:
>>>>     parameter 2 of type 'rte_cryptodev_info*' has sub-type changes:
>>>>       in pointed to type 'struct rte_cryptodev_info' at rte_cryptodev.h:468:1:
>>>>         type size hasn't changed
>>>>         1 data member change:
>>>>          type of 'const rte_cryptodev_capabilities*
>>>> rte_cryptodev_info::capabilities' changed:
>>>>            in pointed to type 'const rte_cryptodev_capabilities':
>>>>              in unqualified underlying type 'struct
>>>> rte_cryptodev_capabilities' at rte_cryptodev.h:176:1:
>>>>                type size hasn't changed
>>>>                1 data member change:
>>>>                 type of '__anonymous_union__ ' changed:
>>>>                   type size hasn't changed
>>>>                   1 data member change:
>>>>                    type of 'rte_cryptodev_asymmetric_capability
>>>> __anonymous_union__::asym' changed:
>>>>                      type size hasn't changed
>>>>                      1 data member change:
>>>>                       type of
>>>> 'rte_cryptodev_asymmetric_xform_capability
>>>> rte_cryptodev_asymmetric_capability::xform_capa' changed:
>>>>                         type size hasn't changed
>>>>                         1 data member change:
>>>>                          type of 'rte_crypto_asym_xform_type
>>>> rte_cryptodev_asymmetric_xform_capability::xform_type' changed:
>>>>                            type size hasn't changed
>>>>                            2 enumerator insertions:
>>>>
>>>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECDSA' value '7'
>>>>
>>>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECPM' value '8'
>>>>                            1 enumerator change:
>>>>
>>>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END'
>>>> from
>>>> value '7' to '9' at rte_crypto_asym.h:60:1
>>>>
>>>
>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>
>> Independent from being experiment and not, this shouldn't be a problem, I think
>> this is a false positive.
>>
>> The ABI break can happen when a struct has been shared between the application
>> and the library (DPDK) and the layout of that memory know differently by
>> application and the library.
>>
>> Here in all cases, there is no layout/size change.
>>
>> As to the value changes of the enums, since application compiled with old DPDK,
>> it will know only up to '6', 7 and more means invalid to the application. So it
>> won't send these values also it should ignore these values from library. Only
>> consequence is old application won't able to use new features those new enums
>> provide but that is expected/normal.
> 
> If library give higher value than expected by the application,
> if the application uses this value as array index,
> there can be an access out of bounds.

First this concern is not an ABI break concern, but application should ignore
any value bigger than the MAX value it knows.
Otherwise this would mean we can't add any new enum or define to the project,
which is wrong I believe.

> 
> 
>>>>   [C]'function int
>>>> rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
>>>> char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
>>>>     parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
>>>>       in pointed to type 'enum rte_crypto_aead_algorithm' at
>>>> rte_crypto_sym.h:346:1:
>>>>         type size hasn't changed
>>>>         1 enumerator insertion:
>>>>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
>>>> value '3'
>>>>         1 enumerator change:
>>>>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
>>>> value '3' to '4' at rte_crypto_sym.h:346:1
>>
>> Same as above, no layout change.
>>
>>>>
>>>>
>>>>   [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
>>>> rte_crypto_sym.h:358:1:
>>>>     size of symbol (in bytes) changed from 24 to 32
>>>>
>>
>> The shared memory size changes, but this is global variable in the library, and
>> the values application can request 'RTE_CRYPTO_AEAD_AES_CCM' &
>> 'RTE_CRYPTO_AEAD_AES_GCM' is already there, so there is no backward
>> compatibility issue here.
> 
> For this one, I don't know what is the breakage.
> 
> 
>>> +Fiona and Arek 
>>>
>>> We may need to revert the chacha-poly patches.
>>>
>>
>> I don't see any ABI break in this case, can someone explain if I am missing
>> anything here?
> 
> 
> 
> 
>
  
Ferruh Yigit Jan. 31, 2020, 9:07 a.m. UTC | #12
On 1/30/2020 11:49 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
>> Sent: Thursday, January 30, 2020 4:00 PM
>> To: Anoob Joseph <anoobj@marvell.com>; akhil.goyal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
>> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Richardson, Bruce <bruce.richardson@intel.com>;
>> nhorman@tuxdriver.com; Mcnamara, John <john.mcnamara@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>; Kusztal, ArkadiuszX
>> <arkadiuszx.kusztal@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
>>
>> 30/01/2020 14:06, Trahe, Fiona:
>>> We were unaware the LIST_END change could constitute an ABI breakage, but can see how it affects the array size when picked up.
>>> We're exploring options.
>>>
>>> I agree with Anoob's point that if we don't allow the LIST_END to be modified, then it means no feature can be implemented without ABI
>> breakage.
>>> Anyone  object to removing those LIST_END elements - or have a better suggestion? Would have to be in 20.11 I suppose.
>>
>> Yes, having max value right after the last value is ridiculous,
>> it prevents adding any value.
>> In 20.11, we should remove all these *_END and *_MAX from API enums
>> and replace them with a separate #define with reasonnable maximums.
>>
> 
> I think we'd better avoid public structs that have array of _MAX elems in them.
> 

That should fix, but we need to wait for 20.11 for the change, and what should
be the new array size?
  
Ananyev, Konstantin Jan. 31, 2020, 9:37 a.m. UTC | #13
> -----Original Message-----
> From: Yigit, Ferruh <ferruh.yigit@intel.com>
> Sent: Friday, January 31, 2020 9:07 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Anoob Joseph
> <anoobj@marvell.com>; akhil.goyal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> nhorman@tuxdriver.com; Mcnamara, John <john.mcnamara@intel.com>; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> On 1/30/2020 11:49 PM, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> >> Sent: Thursday, January 30, 2020 4:00 PM
> >> To: Anoob Joseph <anoobj@marvell.com>; akhil.goyal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> >> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> >> nhorman@tuxdriver.com; Mcnamara, John <john.mcnamara@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>; Kusztal, ArkadiuszX
> >> <arkadiuszx.kusztal@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> >>
> >> 30/01/2020 14:06, Trahe, Fiona:
> >>> We were unaware the LIST_END change could constitute an ABI breakage, but can see how it affects the array size when picked up.
> >>> We're exploring options.
> >>>
> >>> I agree with Anoob's point that if we don't allow the LIST_END to be modified, then it means no feature can be implemented without
> ABI
> >> breakage.
> >>> Anyone  object to removing those LIST_END elements - or have a better suggestion? Would have to be in 20.11 I suppose.
> >>
> >> Yes, having max value right after the last value is ridiculous,
> >> it prevents adding any value.
> >> In 20.11, we should remove all these *_END and *_MAX from API enums
> >> and replace them with a separate #define with reasonnable maximums.
> >>
> >
> > I think we'd better avoid public structs that have array of _MAX elems in them.
> >
> 
> That should fix, but we need to wait for 20.11 for the change, and what should
> be the new array size?

Make it dynamic whenever possible?
Make Input/output args to provide both pointer and size,
or use some predefined value for terminating element (NULL, -1, etc.)?
  
Ananyev, Konstantin Jan. 31, 2020, 10:26 a.m. UTC | #14
> 
> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > 30/01/2020 17:09, Ferruh Yigit:
> >> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>
> >>>
> >>>>
> >>>> On Wed, Jan 29, 2020 at 7:10 PM Anoob Joseph <anoobj@marvell.com> wrote:
> >>>>> The asymmetric crypto library is experimental. Changes to experimental code
> >>>> paths is allowed, right?
> >>>>
> >>>> The asymmetric crypto enum is referenced by a function part of the stable ABI.
> >>>> It is possible to waive this enum, if we are sure no use out of the
> >>>> experimental asym crypto APIs is possible.
> >>>>
> >>>> The rest of the changes touch stable symbols.
> >>>>
> >>>> Adding the abidiff report:
> >>>>
> >>>>   [C]'function void rte_cryptodev_info_get(uint8_t,
> >>>> rte_cryptodev_info*)' at rte_cryptodev.c:1115:1 has some indirect
> >>>> sub-type changes:
> >>>>     parameter 2 of type 'rte_cryptodev_info*' has sub-type changes:
> >>>>       in pointed to type 'struct rte_cryptodev_info' at rte_cryptodev.h:468:1:
> >>>>         type size hasn't changed
> >>>>         1 data member change:
> >>>>          type of 'const rte_cryptodev_capabilities*
> >>>> rte_cryptodev_info::capabilities' changed:
> >>>>            in pointed to type 'const rte_cryptodev_capabilities':
> >>>>              in unqualified underlying type 'struct
> >>>> rte_cryptodev_capabilities' at rte_cryptodev.h:176:1:
> >>>>                type size hasn't changed
> >>>>                1 data member change:
> >>>>                 type of '__anonymous_union__ ' changed:
> >>>>                   type size hasn't changed
> >>>>                   1 data member change:
> >>>>                    type of 'rte_cryptodev_asymmetric_capability
> >>>> __anonymous_union__::asym' changed:
> >>>>                      type size hasn't changed
> >>>>                      1 data member change:
> >>>>                       type of
> >>>> 'rte_cryptodev_asymmetric_xform_capability
> >>>> rte_cryptodev_asymmetric_capability::xform_capa' changed:
> >>>>                         type size hasn't changed
> >>>>                         1 data member change:
> >>>>                          type of 'rte_crypto_asym_xform_type
> >>>> rte_cryptodev_asymmetric_xform_capability::xform_type' changed:
> >>>>                            type size hasn't changed
> >>>>                            2 enumerator insertions:
> >>>>
> >>>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECDSA' value '7'
> >>>>
> >>>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_ECPM' value '8'
> >>>>                            1 enumerator change:
> >>>>
> >>>> 'rte_crypto_asym_xform_type::RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END'
> >>>> from
> >>>> value '7' to '9' at rte_crypto_asym.h:60:1
> >>>>
> >>>
> >>> I believe these enums will be used only in case of ASYM case which is experimental.
> >>
> >> Independent from being experiment and not, this shouldn't be a problem, I think
> >> this is a false positive.
> >>
> >> The ABI break can happen when a struct has been shared between the application
> >> and the library (DPDK) and the layout of that memory know differently by
> >> application and the library.
> >>
> >> Here in all cases, there is no layout/size change.
> >>
> >> As to the value changes of the enums, since application compiled with old DPDK,
> >> it will know only up to '6', 7 and more means invalid to the application. So it
> >> won't send these values also it should ignore these values from library. Only
> >> consequence is old application won't able to use new features those new enums
> >> provide but that is expected/normal.
> >
> > If library give higher value than expected by the application,
> > if the application uses this value as array index,
> > there can be an access out of bounds.
> 
> First this concern is not an ABI break concern, but application should ignore
> any value bigger than the MAX value it knows.
> Otherwise this would mean we can't add any new enum or define to the project,
> which is wrong I believe.
> 
> >
> >
> >>>>   [C]'function int
> >>>> rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
> >>>> char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
> >>>>     parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
> >>>>       in pointed to type 'enum rte_crypto_aead_algorithm' at
> >>>> rte_crypto_sym.h:346:1:
> >>>>         type size hasn't changed
> >>>>         1 enumerator insertion:
> >>>>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
> >>>> value '3'
> >>>>         1 enumerator change:
> >>>>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
> >>>> value '3' to '4' at rte_crypto_sym.h:346:1
> >>
> >> Same as above, no layout change.
> >>
> >>>>
> >>>>
> >>>>   [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
> >>>> rte_crypto_sym.h:358:1:
> >>>>     size of symbol (in bytes) changed from 24 to 32
> >>>>
> >>
> >> The shared memory size changes, but this is global variable in the library, and
> >> the values application can request 'RTE_CRYPTO_AEAD_AES_CCM' &
> >> 'RTE_CRYPTO_AEAD_AES_GCM' is already there, so there is no backward
> >> compatibility issue here.
> >
> > For this one, I don't know what is the breakage.

Reading through this report, I am also don't see why it is considered as ABI breakage.
Yes, size of rte_crypto_aead_algorithm_strings[] has changed, but this array is not public one.
Also I don't see any place where we use RTE_CRYPTO_AEAD_LIST_END to define array size
in our public API.
At first glance it looks like false positive to me.
Do I miss something obvious here?
Konstantin

> >
> >
> >>> +Fiona and Arek
> >>>
> >>> We may need to revert the chacha-poly patches.
> >>>
> >>
> >> I don't see any ABI break in this case, can someone explain if I am missing
> >> anything here?
> >
> >
> >
> >
> >
  
Fiona Trahe Jan. 31, 2020, 2:16 p.m. UTC | #15
> > > I believe these enums will be used only in case of ASYM case which is experimental.
> >
> > Independent from being experiment and not, this shouldn't be a problem, I think
> > this is a false positive.
> >
> > The ABI break can happen when a struct has been shared between the application
> > and the library (DPDK) and the layout of that memory know differently by
> > application and the library.
> >
> > Here in all cases, there is no layout/size change.
> >
> > As to the value changes of the enums, since application compiled with old DPDK,
> > it will know only up to '6', 7 and more means invalid to the application. So it
> > won't send these values also it should ignore these values from library. Only
> > consequence is old application won't able to use new features those new enums
> > provide but that is expected/normal.
> 
> If library give higher value than expected by the application,
> if the application uses this value as array index,
> there can be an access out of bounds.
[Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
But for the same issue with sym crypto below, I believe Ferruh's explanation makes
sense and I don't see how there can be an API breakage.
So if an application hasn't compiled against the new lib it will be still using the old value 
which will be within bounds. If it's picking up the higher new value from the lib it must
have been compiled against the lib so shouldn't have problems.
There are also no structs on the API which contain arrays using this for sizing, so I don't see an 
opportunity for an appl to have a mismatch in memory addresses.

 

> > >>   [C]'function int
> > >> rte_cryptodev_get_aead_algo_enum(rte_crypto_aead_algorithm*, const
> > >> char*)' at rte_cryptodev.c:239:1 has some indirect sub-type changes:
> > >>     parameter 1 of type 'rte_crypto_aead_algorithm*' has sub-type changes:
> > >>       in pointed to type 'enum rte_crypto_aead_algorithm' at
> > >> rte_crypto_sym.h:346:1:
> > >>         type size hasn't changed
> > >>         1 enumerator insertion:
> > >>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_CHACHA20_POLY1305'
> > >> value '3'
> > >>         1 enumerator change:
> > >>           'rte_crypto_aead_algorithm::RTE_CRYPTO_AEAD_LIST_END' from
> > >> value '3' to '4' at rte_crypto_sym.h:346:1
> >
> > Same as above, no layout change.
> >
> > >>
> > >>
> > >>   [C]'const char* rte_crypto_aead_algorithm_strings[1]' was changed at
> > >> rte_crypto_sym.h:358:1:
> > >>     size of symbol (in bytes) changed from 24 to 32
> > >>
> >
> > The shared memory size changes, but this is global variable in the library, and
> > the values application can request 'RTE_CRYPTO_AEAD_AES_CCM' &
> > 'RTE_CRYPTO_AEAD_AES_GCM' is already there, so there is no backward
> > compatibility issue here.
> 
> For this one, I don't know what is the breakage.
> 
> 
> > > +Fiona and Arek
> > >
> > > We may need to revert the chacha-poly patches.
> > >
> >
> > I don't see any ABI break in this case, can someone explain if I am missing
> > anything here?
> 
> 
> 
>
  
Thomas Monjalon Feb. 2, 2020, 1:05 p.m. UTC | #16
31/01/2020 15:16, Trahe, Fiona:
> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > 30/01/2020 17:09, Ferruh Yigit:
> > > On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > > >  
> > > > I believe these enums will be used only in case of ASYM case which is experimental.
> > >
> > > Independent from being experiment and not, this shouldn't be a problem, I think
> > > this is a false positive.
> > >
> > > The ABI break can happen when a struct has been shared between the application
> > > and the library (DPDK) and the layout of that memory know differently by
> > > application and the library.
> > >
> > > Here in all cases, there is no layout/size change.
> > >
> > > As to the value changes of the enums, since application compiled with old DPDK,
> > > it will know only up to '6', 7 and more means invalid to the application. So it
> > > won't send these values also it should ignore these values from library. Only
> > > consequence is old application won't able to use new features those new enums
> > > provide but that is expected/normal.
> > 
> > If library give higher value than expected by the application,
> > if the application uses this value as array index,
> > there can be an access out of bounds.
> 
> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> sense and I don't see how there can be an API breakage.
> So if an application hasn't compiled against the new lib it will be still using the old value 
> which will be within bounds. If it's picking up the higher new value from the lib it must
> have been compiled against the lib so shouldn't have problems.

You say there is no ABI issue because the application will be re-compiled
for the updated library. Indeed, compilation fixes compatibility issues.
But this is not relevant for ABI compatibility.
ABI compatibility means we can upgrade the library without recompiling
the application and it must work.
You think it is a false positive because you assume the application
"picks" the new value. I think you miss the case where the new value
is returned by a function in the upgraded library.

> There are also no structs on the API which contain arrays using this
> for sizing, so I don't see an opportunity for an appl to have a
> mismatch in memory addresses.

Let me demonstrate where the API may "use" the new value
RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.

Once upon a time a DPDK application counting the number of devices
supporting each AEAD algo (in order to find the best supported algo).
It is done in an array indexed by algo id:
	int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
The application is compiled with DPDK 19.11,
where RTE_CRYPTO_AEAD_LIST_END = 3.
So the size of the application array aead_dev_count is 3.
This binary is run with DPDK 20.02,
where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
When calling rte_cryptodev_info_get() on a device QAT_GEN3,
rte_cryptodev_info.capabilities.sym.aead.algo is set to
RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
The application uses this value:
	++ aead_dev_count[info.capabilities.sym.aead.algo];
The application is crashing because of out of bound access.
  
Ananyev, Konstantin Feb. 2, 2020, 2:41 p.m. UTC | #17
> 31/01/2020 15:16, Trahe, Fiona:
> > On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > > 30/01/2020 17:09, Ferruh Yigit:
> > > > On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > > > >
> > > > > I believe these enums will be used only in case of ASYM case which is experimental.
> > > >
> > > > Independent from being experiment and not, this shouldn't be a problem, I think
> > > > this is a false positive.
> > > >
> > > > The ABI break can happen when a struct has been shared between the application
> > > > and the library (DPDK) and the layout of that memory know differently by
> > > > application and the library.
> > > >
> > > > Here in all cases, there is no layout/size change.
> > > >
> > > > As to the value changes of the enums, since application compiled with old DPDK,
> > > > it will know only up to '6', 7 and more means invalid to the application. So it
> > > > won't send these values also it should ignore these values from library. Only
> > > > consequence is old application won't able to use new features those new enums
> > > > provide but that is expected/normal.
> > >
> > > If library give higher value than expected by the application,
> > > if the application uses this value as array index,
> > > there can be an access out of bounds.
> >
> > [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> > But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> > sense and I don't see how there can be an API breakage.
> > So if an application hasn't compiled against the new lib it will be still using the old value
> > which will be within bounds. If it's picking up the higher new value from the lib it must
> > have been compiled against the lib so shouldn't have problems.
> 
> You say there is no ABI issue because the application will be re-compiled
> for the updated library. Indeed, compilation fixes compatibility issues.
> But this is not relevant for ABI compatibility.
> ABI compatibility means we can upgrade the library without recompiling
> the application and it must work.
> You think it is a false positive because you assume the application
> "picks" the new value. I think you miss the case where the new value
> is returned by a function in the upgraded library.
> 
> > There are also no structs on the API which contain arrays using this
> > for sizing, so I don't see an opportunity for an appl to have a
> > mismatch in memory addresses.
> 
> Let me demonstrate where the API may "use" the new value
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> 
> Once upon a time a DPDK application counting the number of devices
> supporting each AEAD algo (in order to find the best supported algo).
> It is done in an array indexed by algo id:
> 	int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> The application is compiled with DPDK 19.11,
> where RTE_CRYPTO_AEAD_LIST_END = 3.
> So the size of the application array aead_dev_count is 3.
> This binary is run with DPDK 20.02,
> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> The application uses this value:
> 	++ aead_dev_count[info.capabilities.sym.aead.algo];
> The application is crashing because of out of bound access.

I'd say this is an example of bad written app.
It probably should check that returned by library value doesn't
exceed its internal array size. 
Konstantin
  
Ferruh Yigit Feb. 3, 2020, 9:30 a.m. UTC | #18
On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> 
>> 31/01/2020 15:16, Trahe, Fiona:
>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>> 30/01/2020 17:09, Ferruh Yigit:
>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>>>>
>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>>>>
>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
>>>>> this is a false positive.
>>>>>
>>>>> The ABI break can happen when a struct has been shared between the application
>>>>> and the library (DPDK) and the layout of that memory know differently by
>>>>> application and the library.
>>>>>
>>>>> Here in all cases, there is no layout/size change.
>>>>>
>>>>> As to the value changes of the enums, since application compiled with old DPDK,
>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
>>>>> won't send these values also it should ignore these values from library. Only
>>>>> consequence is old application won't able to use new features those new enums
>>>>> provide but that is expected/normal.
>>>>
>>>> If library give higher value than expected by the application,
>>>> if the application uses this value as array index,
>>>> there can be an access out of bounds.
>>>
>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>> sense and I don't see how there can be an API breakage.
>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>> have been compiled against the lib so shouldn't have problems.
>>
>> You say there is no ABI issue because the application will be re-compiled
>> for the updated library. Indeed, compilation fixes compatibility issues.
>> But this is not relevant for ABI compatibility.
>> ABI compatibility means we can upgrade the library without recompiling
>> the application and it must work.
>> You think it is a false positive because you assume the application
>> "picks" the new value. I think you miss the case where the new value
>> is returned by a function in the upgraded library.
>>
>>> There are also no structs on the API which contain arrays using this
>>> for sizing, so I don't see an opportunity for an appl to have a
>>> mismatch in memory addresses.
>>
>> Let me demonstrate where the API may "use" the new value
>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>
>> Once upon a time a DPDK application counting the number of devices
>> supporting each AEAD algo (in order to find the best supported algo).
>> It is done in an array indexed by algo id:
>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>> The application is compiled with DPDK 19.11,
>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>> So the size of the application array aead_dev_count is 3.
>> This binary is run with DPDK 20.02,
>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>> The application uses this value:
>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>> The application is crashing because of out of bound access.
> 
> I'd say this is an example of bad written app.
> It probably should check that returned by library value doesn't
> exceed its internal array size.

+1

Application should ignore values >= MAX.

Do you suggest we don't extend any enum or define between ABI breakage releases
to be sure bad written applications not affected?
  
Neil Horman Feb. 3, 2020, 11:50 a.m. UTC | #19
On Mon, Feb 03, 2020 at 09:30:06AM +0000, Ferruh Yigit wrote:
> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > 
> >> 31/01/2020 15:16, Trahe, Fiona:
> >>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>> 30/01/2020 17:09, Ferruh Yigit:
> >>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>>>>
> >>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
> >>>>>
> >>>>> Independent from being experiment and not, this shouldn't be a problem, I think
> >>>>> this is a false positive.
> >>>>>
> >>>>> The ABI break can happen when a struct has been shared between the application
> >>>>> and the library (DPDK) and the layout of that memory know differently by
> >>>>> application and the library.
> >>>>>
> >>>>> Here in all cases, there is no layout/size change.
> >>>>>
> >>>>> As to the value changes of the enums, since application compiled with old DPDK,
> >>>>> it will know only up to '6', 7 and more means invalid to the application. So it
> >>>>> won't send these values also it should ignore these values from library. Only
> >>>>> consequence is old application won't able to use new features those new enums
> >>>>> provide but that is expected/normal.
> >>>>
> >>>> If library give higher value than expected by the application,
> >>>> if the application uses this value as array index,
> >>>> there can be an access out of bounds.
> >>>
> >>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>> sense and I don't see how there can be an API breakage.
> >>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>> have been compiled against the lib so shouldn't have problems.
> >>
> >> You say there is no ABI issue because the application will be re-compiled
> >> for the updated library. Indeed, compilation fixes compatibility issues.
> >> But this is not relevant for ABI compatibility.
> >> ABI compatibility means we can upgrade the library without recompiling
> >> the application and it must work.
> >> You think it is a false positive because you assume the application
> >> "picks" the new value. I think you miss the case where the new value
> >> is returned by a function in the upgraded library.
> >>
> >>> There are also no structs on the API which contain arrays using this
> >>> for sizing, so I don't see an opportunity for an appl to have a
> >>> mismatch in memory addresses.
> >>
> >> Let me demonstrate where the API may "use" the new value
> >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>
> >> Once upon a time a DPDK application counting the number of devices
> >> supporting each AEAD algo (in order to find the best supported algo).
> >> It is done in an array indexed by algo id:
> >> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >> The application is compiled with DPDK 19.11,
> >> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >> So the size of the application array aead_dev_count is 3.
> >> This binary is run with DPDK 20.02,
> >> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >> The application uses this value:
> >> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >> The application is crashing because of out of bound access.
> > 
> > I'd say this is an example of bad written app.
> > It probably should check that returned by library value doesn't
> > exceed its internal array size.
> 
> +1
> 
> Application should ignore values >= MAX.
> 
The example is still somewhat valid in it general principle though.  While
extending an ennumeration may be flagged by libabigail as an ABI breakage, its
not necessecarily a false positive.  By extending the ennumeration, all the
previous entries in an array defined by said ennumeration remain constant in
their offsets, so you can 'get away with such a change' in terms of preserving
backwards compatibility in the above example, but you cannot, for example,
shuffle the values in the ennumeration, as doing so would cause a functional
breakage (i.e. requesting an instance of RTE_CRYPTO_AEAD_CHACHA20_POLY1305 might
instead give you an instance of RTE_CRYPTO_AEAD_AES_GCM.  

These sorts of changes are the type that we could collectively waive in terms of
ABI checking, as they should be ok, but the errors from libabigail should be
taken as an indicator that this API could be rewritten (for example by removing
the abi entirely, and adding an API call that returns an array of instance name
and ids), so that changes of the above sort arent required.


> Do you suggest we don't extend any enum or define between ABI breakage releases
> to be sure bad written applications not affected?
> 
As noted above, we could waive such corner cases, and probably be fine, but the
error from the ABI check still serves a valid purpose in that its an indicator
that your library API is ABI sensitive to code changes that re-architecture may
address

Neil
  
Ferruh Yigit Feb. 3, 2020, 1:09 p.m. UTC | #20
On 2/3/2020 11:50 AM, Neil Horman wrote:
> On Mon, Feb 03, 2020 at 09:30:06AM +0000, Ferruh Yigit wrote:
>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>
>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>> 30/01/2020 17:09, Ferruh Yigit:
>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>>>>>>
>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>>>>>>
>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
>>>>>>> this is a false positive.
>>>>>>>
>>>>>>> The ABI break can happen when a struct has been shared between the application
>>>>>>> and the library (DPDK) and the layout of that memory know differently by
>>>>>>> application and the library.
>>>>>>>
>>>>>>> Here in all cases, there is no layout/size change.
>>>>>>>
>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
>>>>>>> won't send these values also it should ignore these values from library. Only
>>>>>>> consequence is old application won't able to use new features those new enums
>>>>>>> provide but that is expected/normal.
>>>>>>
>>>>>> If library give higher value than expected by the application,
>>>>>> if the application uses this value as array index,
>>>>>> there can be an access out of bounds.
>>>>>
>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>> sense and I don't see how there can be an API breakage.
>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>> have been compiled against the lib so shouldn't have problems.
>>>>
>>>> You say there is no ABI issue because the application will be re-compiled
>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>> But this is not relevant for ABI compatibility.
>>>> ABI compatibility means we can upgrade the library without recompiling
>>>> the application and it must work.
>>>> You think it is a false positive because you assume the application
>>>> "picks" the new value. I think you miss the case where the new value
>>>> is returned by a function in the upgraded library.
>>>>
>>>>> There are also no structs on the API which contain arrays using this
>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>> mismatch in memory addresses.
>>>>
>>>> Let me demonstrate where the API may "use" the new value
>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>
>>>> Once upon a time a DPDK application counting the number of devices
>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>> It is done in an array indexed by algo id:
>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>> The application is compiled with DPDK 19.11,
>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>> So the size of the application array aead_dev_count is 3.
>>>> This binary is run with DPDK 20.02,
>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>> The application uses this value:
>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>> The application is crashing because of out of bound access.
>>>
>>> I'd say this is an example of bad written app.
>>> It probably should check that returned by library value doesn't
>>> exceed its internal array size.
>>
>> +1
>>
>> Application should ignore values >= MAX.
>>
> The example is still somewhat valid in it general principle though.  While
> extending an ennumeration may be flagged by libabigail as an ABI breakage, its
> not necessecarily a false positive.  By extending the ennumeration, all the
> previous entries in an array defined by said ennumeration remain constant in
> their offsets, so you can 'get away with such a change' in terms of preserving
> backwards compatibility in the above example, but you cannot, for example,
> shuffle the values in the ennumeration, as doing so would cause a functional
> breakage (i.e. requesting an instance of RTE_CRYPTO_AEAD_CHACHA20_POLY1305 might
> instead give you an instance of RTE_CRYPTO_AEAD_AES_GCM.  

+1 the change/shuffle of the existing values are problematic, but we don't have
it in this case.

> 
> These sorts of changes are the type that we could collectively waive in terms of
> ABI checking, as they should be ok, but the errors from libabigail should be
> taken as an indicator that this API could be rewritten (for example by removing
> the abi entirely, and adding an API call that returns an array of instance name
> and ids), so that changes of the above sort arent required.

We can spend more time on it, but I can't see for now how to escape returning
enumaration as indication of type, and this looks legitimate sage as long as
other side verifies the received value is valid in the type range.

> 
> 
>> Do you suggest we don't extend any enum or define between ABI breakage releases
>> to be sure bad written applications not affected?
>>
> As noted above, we could waive such corner cases, and probably be fine, but the
> error from the ABI check still serves a valid purpose in that its an indicator
> that your library API is ABI sensitive to code changes that re-architecture may
> address
> 

The concern is when there are cases we can waive, we can't directly rely on the
tool and automate it. These indicators good for improving the code, but not good
to use it as build time checker.
Is there any way to reduce the failure only to definite ABI breakages?
  
Dodji Seketeli Feb. 3, 2020, 2 p.m. UTC | #21
Hello,

Ferruh Yigit <ferruh.yigit@intel.com> a écrit:

[...]

> +1 the change/shuffle of the existing values are problematic, but we don't have
> it in this case.

Right.

[...]

> The concern is when there are cases we can waive, we can't directly rely on the
> tool and automate it. These indicators good for improving the code, but not good
> to use it as build time checker.

Well, it depends.  The tooling as it is today have the capability to
automatically "waive" some classes of A{B,P}I change reports that you
guys (the developers) deem harmless, in the context of your project.

For instance, in the precise case of interest here, one could define a
"suppression specification" to teach the ABI verifier that, for the enum
rte_crypto_asym_xform_type, the only enumerator which numerical value is
allowed to change is the one named RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END.

The content of the suppression specification file would look like:

    [suppress_type]
      # So, in practise, this rule is to allow adding enumerators
      # only to the of the the rte_crypto_asym_xform_type enum,
      # right before the RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
      # enumerator which is meant to always be the last enumerator.
      type_kind = enum
      name = rte_crypto_asym_xform_type
      changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END

This way, you can hopefully reduce the surface of the changes you want
to see reported, tailored in a way that is specific to your project.
This should hopefully bring the system closer to a state that would
allow you guys to having something that is automated enough to have it
be triggered at build time.

And if there is some sensibly needed tweaking that the libabigail
tooling doesn't allow you guys to do right away, I'd be happy to hear
about it and try to add the functionality to the framework for you guys.

> Is there any way to reduce the failure only to definite ABI breakages?

I hope my comment above somewhat answers this question of yours.  If it
does not, please tell me.

Cheers,
  
Ferruh Yigit Feb. 3, 2020, 2:46 p.m. UTC | #22
On 2/3/2020 2:00 PM, Dodji Seketeli wrote:
> Hello,
> 
> Ferruh Yigit <ferruh.yigit@intel.com> a écrit:
> 
> [...]
> 
>> +1 the change/shuffle of the existing values are problematic, but we don't have
>> it in this case.
> 
> Right.
> 
> [...]
> 
>> The concern is when there are cases we can waive, we can't directly rely on the
>> tool and automate it. These indicators good for improving the code, but not good
>> to use it as build time checker.
> 
> Well, it depends.  The tooling as it is today have the capability to
> automatically "waive" some classes of A{B,P}I change reports that you
> guys (the developers) deem harmless, in the context of your project.
> 
> For instance, in the precise case of interest here, one could define a
> "suppression specification" to teach the ABI verifier that, for the enum
> rte_crypto_asym_xform_type, the only enumerator which numerical value is
> allowed to change is the one named RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END.
> 
> The content of the suppression specification file would look like:
> 
>     [suppress_type]
>       # So, in practise, this rule is to allow adding enumerators
>       # only to the of the the rte_crypto_asym_xform_type enum,
>       # right before the RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
>       # enumerator which is meant to always be the last enumerator.
>       type_kind = enum
>       name = rte_crypto_asym_xform_type
>       changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> 
> This way, you can hopefully reduce the surface of the changes you want
> to see reported, tailored in a way that is specific to your project.
> This should hopefully bring the system closer to a state that would
> allow you guys to having something that is automated enough to have it
> be triggered at build time.

Thanks, at least this provides a way to silence the warnings not an issue for us
as we hit them.

Is there a more global, don't warn on new enums kind of option?
Although I assume this is not possible since _END or _MAX enum value will be
changing and tool can't know its usage and will report the change.

> 
> And if there is some sensibly needed tweaking that the libabigail
> tooling doesn't allow you guys to do right away, I'd be happy to hear
> about it and try to add the functionality to the framework for you guys.
> 
>> Is there any way to reduce the failure only to definite ABI breakages?
> 
> I hope my comment above somewhat answers this question of yours.  If it
> does not, please tell me.
> 
> Cheers,
>
  
Fiona Trahe Feb. 3, 2020, 3:08 p.m. UTC | #23
> >
> > These sorts of changes are the type that we could collectively waive in terms of
> > ABI checking, as they should be ok, but the errors from libabigail should be
> > taken as an indicator that this API could be rewritten (for example by removing
> > the abi entirely, and adding an API call that returns an array of instance name
> > and ids), so that changes of the above sort arent required.
> 
> We can spend more time on it, but I can't see for now how to escape returning
> enumaration as indication of type, and this looks legitimate sage as long as
> other side verifies the received value is valid in the type range.
[Fiona] Regarding re-work to make the original code more robust to ABI breakage
One option would be to remove LIST_END from the enum, but keep the enum and allow appending values to it.
Instead of LIST_END have a static var keeping track of the MAX_NUM_AEAD_ALGOS
and an API call rte_cryptodev_get_max_aead_algos() forcing an application to dynamically size any array to accommodate any new values.
The API is safer I think - but there are other pitfalls with this approach - the MAX can more easily get out-of-sync with the enum.
And the application still needs to safely handle values it doesn't recognise.
Anyone think this is a better way?

I still think the best solution is to suppress changes to the LIST_END element.
  
Thomas Monjalon Feb. 3, 2020, 5:09 p.m. UTC | #24
03/02/2020 10:30, Ferruh Yigit:
> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > 02/02/2020 14:05, Thomas Monjalon:
> >> 31/01/2020 15:16, Trahe, Fiona:
> >>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>> 30/01/2020 17:09, Ferruh Yigit:
> >>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>>>>
> >>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
> >>>>>
> >>>>> Independent from being experiment and not, this shouldn't be a problem, I think
> >>>>> this is a false positive.
> >>>>>
> >>>>> The ABI break can happen when a struct has been shared between the application
> >>>>> and the library (DPDK) and the layout of that memory know differently by
> >>>>> application and the library.
> >>>>>
> >>>>> Here in all cases, there is no layout/size change.
> >>>>>
> >>>>> As to the value changes of the enums, since application compiled with old DPDK,
> >>>>> it will know only up to '6', 7 and more means invalid to the application. So it
> >>>>> won't send these values also it should ignore these values from library. Only
> >>>>> consequence is old application won't able to use new features those new enums
> >>>>> provide but that is expected/normal.
> >>>>
> >>>> If library give higher value than expected by the application,
> >>>> if the application uses this value as array index,
> >>>> there can be an access out of bounds.
> >>>
> >>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>> sense and I don't see how there can be an API breakage.
> >>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>> have been compiled against the lib so shouldn't have problems.
> >>
> >> You say there is no ABI issue because the application will be re-compiled
> >> for the updated library. Indeed, compilation fixes compatibility issues.
> >> But this is not relevant for ABI compatibility.
> >> ABI compatibility means we can upgrade the library without recompiling
> >> the application and it must work.
> >> You think it is a false positive because you assume the application
> >> "picks" the new value. I think you miss the case where the new value
> >> is returned by a function in the upgraded library.
> >>
> >>> There are also no structs on the API which contain arrays using this
> >>> for sizing, so I don't see an opportunity for an appl to have a
> >>> mismatch in memory addresses.
> >>
> >> Let me demonstrate where the API may "use" the new value
> >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>
> >> Once upon a time a DPDK application counting the number of devices
> >> supporting each AEAD algo (in order to find the best supported algo).
> >> It is done in an array indexed by algo id:
> >> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >> The application is compiled with DPDK 19.11,
> >> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >> So the size of the application array aead_dev_count is 3.
> >> This binary is run with DPDK 20.02,
> >> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >> The application uses this value:
> >> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >> The application is crashing because of out of bound access.
> > 
> > I'd say this is an example of bad written app.
> > It probably should check that returned by library value doesn't
> > exceed its internal array size.
> 
> +1
> 
> Application should ignore values >= MAX.

Of course, blaming the API user is a lot easier than looking at the API.
Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
as the max value for the application.
Value ranges are part of the ABI compatibility contract.
It seems you expect the application developer to be aware that
DPDK could return a higher value, so the application should
check every enum values after calling an API. CRAZY.

When we decide to announce an ABI compatibility and do some marketing,
everyone is OK. But when we need to really make our ABI compatible,
I see little or no effort. DISAPPOINTING.

> Do you suggest we don't extend any enum or define between ABI breakage releases
> to be sure bad written applications not affected?

I suggest we must consider not breaking any assumption made on the API.
Here we are breaking the enum range because nothing mentions _LIST_END
is not really the absolute end of the enum.
The solution is to make the change below in 20.02 + backport in 19.11.1:

- _LIST_END
+ _LIST_END, /* an ABI-compatible version may increase this value */
+ _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
};

Then *_LIST_END values could be ignored by libabigail with such a change.

If such a patch is not done by tomorrow, I will have to revert
Chacha-Poly commits before 20.02-rc2, because

1/ LIST_END, without any comment, means "size of range"
2/ we do not blame users for undocumented ABI changes
3/ we respect the ABI compatibility contract
  
Thomas Monjalon Feb. 3, 2020, 5:34 p.m. UTC | #25
03/02/2020 18:09, Thomas Monjalon:
> 03/02/2020 10:30, Ferruh Yigit:
> > On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > > 02/02/2020 14:05, Thomas Monjalon:
> > >> 31/01/2020 15:16, Trahe, Fiona:
> > >>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > >>>> If library give higher value than expected by the application,
> > >>>> if the application uses this value as array index,
> > >>>> there can be an access out of bounds.
> > >>>
> > >>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> > >>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> > >>> sense and I don't see how there can be an API breakage.
> > >>> So if an application hasn't compiled against the new lib it will be still using the old value
> > >>> which will be within bounds. If it's picking up the higher new value from the lib it must
> > >>> have been compiled against the lib so shouldn't have problems.
> > >>
> > >> You say there is no ABI issue because the application will be re-compiled
> > >> for the updated library. Indeed, compilation fixes compatibility issues.
> > >> But this is not relevant for ABI compatibility.
> > >> ABI compatibility means we can upgrade the library without recompiling
> > >> the application and it must work.
> > >> You think it is a false positive because you assume the application
> > >> "picks" the new value. I think you miss the case where the new value
> > >> is returned by a function in the upgraded library.
> > >>
> > >>> There are also no structs on the API which contain arrays using this
> > >>> for sizing, so I don't see an opportunity for an appl to have a
> > >>> mismatch in memory addresses.
> > >>
> > >> Let me demonstrate where the API may "use" the new value
> > >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> > >>
> > >> Once upon a time a DPDK application counting the number of devices
> > >> supporting each AEAD algo (in order to find the best supported algo).
> > >> It is done in an array indexed by algo id:
> > >> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > >> The application is compiled with DPDK 19.11,
> > >> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > >> So the size of the application array aead_dev_count is 3.
> > >> This binary is run with DPDK 20.02,
> > >> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > >> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > >> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > >> The application uses this value:
> > >> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > >> The application is crashing because of out of bound access.
> > > 
> > > I'd say this is an example of bad written app.
> > > It probably should check that returned by library value doesn't
> > > exceed its internal array size.
> > 
> > +1
> > 
> > Application should ignore values >= MAX.
> 
> Of course, blaming the API user is a lot easier than looking at the API.
> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> as the max value for the application.
> Value ranges are part of the ABI compatibility contract.
> It seems you expect the application developer to be aware that
> DPDK could return a higher value, so the application should
> check every enum values after calling an API. CRAZY.
> 
> When we decide to announce an ABI compatibility and do some marketing,
> everyone is OK. But when we need to really make our ABI compatible,
> I see little or no effort. DISAPPOINTING.
> 
> > Do you suggest we don't extend any enum or define between ABI breakage releases
> > to be sure bad written applications not affected?
> 
> I suggest we must consider not breaking any assumption made on the API.
> Here we are breaking the enum range because nothing mentions _LIST_END
> is not really the absolute end of the enum.
> The solution is to make the change below in 20.02 + backport in 19.11.1:

Thinking twice, merging such change before 20.11 is breaking the
ABI assumption based on the API 19.11.0.
I ask the release maintainers (Luca, Kevin, David and me) and
the ABI maintainers (Neil and Ray) to vote for a or b solution:
	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
	b) wait 20.11 and revert Chacha-Poly from 20.02


> - _LIST_END
> + _LIST_END, /* an ABI-compatible version may increase this value */
> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> };
> 
> Then *_LIST_END values could be ignored by libabigail with such a change.
> 
> If such a patch is not done by tomorrow, I will have to revert
> Chacha-Poly commits before 20.02-rc2, because
> 
> 1/ LIST_END, without any comment, means "size of range"
> 2/ we do not blame users for undocumented ABI changes
> 3/ we respect the ABI compatibility contract
  
Ferruh Yigit Feb. 3, 2020, 5:40 p.m. UTC | #26
On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> 03/02/2020 10:30, Ferruh Yigit:
>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>> 02/02/2020 14:05, Thomas Monjalon:
>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>> 30/01/2020 17:09, Ferruh Yigit:
>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>>>>>>
>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>>>>>>
>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
>>>>>>> this is a false positive.
>>>>>>>
>>>>>>> The ABI break can happen when a struct has been shared between the application
>>>>>>> and the library (DPDK) and the layout of that memory know differently by
>>>>>>> application and the library.
>>>>>>>
>>>>>>> Here in all cases, there is no layout/size change.
>>>>>>>
>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
>>>>>>> won't send these values also it should ignore these values from library. Only
>>>>>>> consequence is old application won't able to use new features those new enums
>>>>>>> provide but that is expected/normal.
>>>>>>
>>>>>> If library give higher value than expected by the application,
>>>>>> if the application uses this value as array index,
>>>>>> there can be an access out of bounds.
>>>>>
>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>> sense and I don't see how there can be an API breakage.
>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>> have been compiled against the lib so shouldn't have problems.
>>>>
>>>> You say there is no ABI issue because the application will be re-compiled
>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>> But this is not relevant for ABI compatibility.
>>>> ABI compatibility means we can upgrade the library without recompiling
>>>> the application and it must work.
>>>> You think it is a false positive because you assume the application
>>>> "picks" the new value. I think you miss the case where the new value
>>>> is returned by a function in the upgraded library.
>>>>
>>>>> There are also no structs on the API which contain arrays using this
>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>> mismatch in memory addresses.
>>>>
>>>> Let me demonstrate where the API may "use" the new value
>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>
>>>> Once upon a time a DPDK application counting the number of devices
>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>> It is done in an array indexed by algo id:
>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>> The application is compiled with DPDK 19.11,
>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>> So the size of the application array aead_dev_count is 3.
>>>> This binary is run with DPDK 20.02,
>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>> The application uses this value:
>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>> The application is crashing because of out of bound access.
>>>
>>> I'd say this is an example of bad written app.
>>> It probably should check that returned by library value doesn't
>>> exceed its internal array size.
>>
>> +1
>>
>> Application should ignore values >= MAX.
> 
> Of course, blaming the API user is a lot easier than looking at the API.
> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> as the max value for the application.
> Value ranges are part of the ABI compatibility contract.
> It seems you expect the application developer to be aware that
> DPDK could return a higher value, so the application should
> check every enum values after calling an API. CRAZY.
> 
> When we decide to announce an ABI compatibility and do some marketing,
> everyone is OK. But when we need to really make our ABI compatible,
> I see little or no effort. DISAPPOINTING.

This is not to blame the user or to do less work, this is more sane approach
that library provides the _END/_MAX value and application uses it as valid range
check.

> 
>> Do you suggest we don't extend any enum or define between ABI breakage releases
>> to be sure bad written applications not affected?
> 
> I suggest we must consider not breaking any assumption made on the API.
> Here we are breaking the enum range because nothing mentions _LIST_END
> is not really the absolute end of the enum.
> The solution is to make the change below in 20.02 + backport in 19.11.1:
> 
> - _LIST_END
> + _LIST_END, /* an ABI-compatible version may increase this value */
> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> };
> 

What is the point of "_LIST_MAX" here?

Application should know the "_LIST_END" of when it has been compiled for the
valid range check. Next time it is compiled "_LIST_END" may be different value
but same logic applies.

When "_LIST_END" is missing, application can't protect itself, in that case
library should send only the values application knows when it is compiled, this
means either we can't extend our enum/defines until next ABI breakage, or we
need to do ABI versioning to the functions that returns an enum each time enum
value extended.

I believe it is saner to provide _END/_MAX values to the application to use. And
if required comment them to clarify the expected usage.

But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
mean anything to application.

> Then *_LIST_END values could be ignored by libabigail with such a change.
> 
> If such a patch is not done by tomorrow, I will have to revert
> Chacha-Poly commits before 20.02-rc2, because
> 
> 1/ LIST_END, without any comment, means "size of range"
> 2/ we do not blame users for undocumented ABI changes
> 3/ we respect the ABI compatibility contract
> 
>
  
Thomas Monjalon Feb. 3, 2020, 6:40 p.m. UTC | #27
03/02/2020 18:40, Ferruh Yigit:
> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > 03/02/2020 10:30, Ferruh Yigit:
> >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >>> 02/02/2020 14:05, Thomas Monjalon:
> >>>> 31/01/2020 15:16, Trahe, Fiona:
> >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>>>>>>
> >>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
> >>>>>>>
> >>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
> >>>>>>> this is a false positive.
> >>>>>>>
> >>>>>>> The ABI break can happen when a struct has been shared between the application
> >>>>>>> and the library (DPDK) and the layout of that memory know differently by
> >>>>>>> application and the library.
> >>>>>>>
> >>>>>>> Here in all cases, there is no layout/size change.
> >>>>>>>
> >>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
> >>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
> >>>>>>> won't send these values also it should ignore these values from library. Only
> >>>>>>> consequence is old application won't able to use new features those new enums
> >>>>>>> provide but that is expected/normal.
> >>>>>>
> >>>>>> If library give higher value than expected by the application,
> >>>>>> if the application uses this value as array index,
> >>>>>> there can be an access out of bounds.
> >>>>>
> >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>>>> sense and I don't see how there can be an API breakage.
> >>>>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>>>> have been compiled against the lib so shouldn't have problems.
> >>>>
> >>>> You say there is no ABI issue because the application will be re-compiled
> >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >>>> But this is not relevant for ABI compatibility.
> >>>> ABI compatibility means we can upgrade the library without recompiling
> >>>> the application and it must work.
> >>>> You think it is a false positive because you assume the application
> >>>> "picks" the new value. I think you miss the case where the new value
> >>>> is returned by a function in the upgraded library.
> >>>>
> >>>>> There are also no structs on the API which contain arrays using this
> >>>>> for sizing, so I don't see an opportunity for an appl to have a
> >>>>> mismatch in memory addresses.
> >>>>
> >>>> Let me demonstrate where the API may "use" the new value
> >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>>>
> >>>> Once upon a time a DPDK application counting the number of devices
> >>>> supporting each AEAD algo (in order to find the best supported algo).
> >>>> It is done in an array indexed by algo id:
> >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >>>> The application is compiled with DPDK 19.11,
> >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >>>> So the size of the application array aead_dev_count is 3.
> >>>> This binary is run with DPDK 20.02,
> >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >>>> The application uses this value:
> >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >>>> The application is crashing because of out of bound access.
> >>>
> >>> I'd say this is an example of bad written app.
> >>> It probably should check that returned by library value doesn't
> >>> exceed its internal array size.
> >>
> >> +1
> >>
> >> Application should ignore values >= MAX.
> > 
> > Of course, blaming the API user is a lot easier than looking at the API.
> > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > as the max value for the application.
> > Value ranges are part of the ABI compatibility contract.
> > It seems you expect the application developer to be aware that
> > DPDK could return a higher value, so the application should
> > check every enum values after calling an API. CRAZY.
> > 
> > When we decide to announce an ABI compatibility and do some marketing,
> > everyone is OK. But when we need to really make our ABI compatible,
> > I see little or no effort. DISAPPOINTING.
> 
> This is not to blame the user or to do less work, this is more sane approach
> that library provides the _END/_MAX value and application uses it as valid range
> check.
> 
> >> Do you suggest we don't extend any enum or define between ABI breakage releases
> >> to be sure bad written applications not affected?
> > 
> > I suggest we must consider not breaking any assumption made on the API.
> > Here we are breaking the enum range because nothing mentions _LIST_END
> > is not really the absolute end of the enum.
> > The solution is to make the change below in 20.02 + backport in 19.11.1:
> > 
> > - _LIST_END
> > + _LIST_END, /* an ABI-compatible version may increase this value */
> > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > };
> > 
> 
> What is the point of "_LIST_MAX" here?

_LIST_MAX is range of value that DPDK can return in the ABI contract.
So the appplication can rely on the range 0.._LIST_MAX.

> Application should know the "_LIST_END" of when it has been compiled for the
> valid range check. Next time it is compiled "_LIST_END" may be different value
> but same logic applies.

No, ABI compatibility contract means you can compile your application
with DPDK 19.11.0 and run it with DPDK 20.02.
So _LIST_END comes from 19.11 and does not include ChachaPoly.

> When "_LIST_END" is missing, application can't protect itself, in that case
> library should send only the values application knows when it is compiled, this
> means either we can't extend our enum/defines until next ABI breakage, or we
> need to do ABI versioning to the functions that returns an enum each time enum
> value extended.

If we define _LIST_MAX as a bigger value than current _LIST_END,
we have some room to add values in between.

If (as of now) we don't have _LIST_MAX room, then yes we must version
the functions returning the enum.
In this case, the proper solution is to implement
rte_cryptodev_info_get_v1911() so it filters out
RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
With this solution, an application compiled with DPDK 19.11 will keep
seeing the same range as before, while a 20.02 application could
see and use ChachaPoly.
This is another proposal that I was expecting from the crypto team,
instead of claiming there is no issue (and wasting precious time).


> I believe it is saner to provide _END/_MAX values to the application to use. And
> if required comment them to clarify the expected usage.
> 
> But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
> mean anything to application.

I don't understand what you mean. I think you misunderstood what is ABI compat.


> > Then *_LIST_END values could be ignored by libabigail with such a change.
> > 
> > If such a patch is not done by tomorrow, I will have to revert
> > Chacha-Poly commits before 20.02-rc2, because
> > 
> > 1/ LIST_END, without any comment, means "size of range"
> > 2/ we do not blame users for undocumented ABI changes
> > 3/ we respect the ABI compatibility contract
  
Ray Kinsella Feb. 3, 2020, 6:55 p.m. UTC | #28
On 03/02/2020 17:34, Thomas Monjalon wrote:
> 03/02/2020 18:09, Thomas Monjalon:
>> 03/02/2020 10:30, Ferruh Yigit:
>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>> If library give higher value than expected by the application,
>>>>>>> if the application uses this value as array index,
>>>>>>> there can be an access out of bounds.
>>>>>>
>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>> sense and I don't see how there can be an API breakage.
>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>
>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>> But this is not relevant for ABI compatibility.
>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>> the application and it must work.
>>>>> You think it is a false positive because you assume the application
>>>>> "picks" the new value. I think you miss the case where the new value
>>>>> is returned by a function in the upgraded library.
>>>>>
>>>>>> There are also no structs on the API which contain arrays using this
>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>> mismatch in memory addresses.
>>>>>
>>>>> Let me demonstrate where the API may "use" the new value
>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>
>>>>> Once upon a time a DPDK application counting the number of devices
>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>> It is done in an array indexed by algo id:
>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>> The application is compiled with DPDK 19.11,
>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>> So the size of the application array aead_dev_count is 3.
>>>>> This binary is run with DPDK 20.02,
>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>> The application uses this value:
>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>> The application is crashing because of out of bound access.
>>>>
>>>> I'd say this is an example of bad written app.
>>>> It probably should check that returned by library value doesn't
>>>> exceed its internal array size.
>>>
>>> +1
>>>
>>> Application should ignore values >= MAX.
>>
>> Of course, blaming the API user is a lot easier than looking at the API.
>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>> as the max value for the application.
>> Value ranges are part of the ABI compatibility contract.
>> It seems you expect the application developer to be aware that
>> DPDK could return a higher value, so the application should
>> check every enum values after calling an API. CRAZY.
>>
>> When we decide to announce an ABI compatibility and do some marketing,
>> everyone is OK. But when we need to really make our ABI compatible,
>> I see little or no effort. DISAPPOINTING.
>>
>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>> to be sure bad written applications not affected?
>>
>> I suggest we must consider not breaking any assumption made on the API.
>> Here we are breaking the enum range because nothing mentions _LIST_END
>> is not really the absolute end of the enum.
>> The solution is to make the change below in 20.02 + backport in 19.11.1:
> 
> Thinking twice, merging such change before 20.11 is breaking the
> ABI assumption based on the API 19.11.0.
> I ask the release maintainers (Luca, Kevin, David and me) and
> the ABI maintainers (Neil and Ray) to vote for a or b solution:
> 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1

That would still be an ABI breakage though right.

> 	b) wait 20.11 and revert Chacha-Poly from 20.02

Thanks for analysis above Fiona, Ferruh and all. 

That is a nasty one alright - there is no "good" answer here.
I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
Could do without an enumeration?

There a c) though right.
We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.

It would be painful I know.
It would also mean that Chacha-Poly would only be available to those building against >= 20.02.


> 
> 
>> - _LIST_END
>> + _LIST_END, /* an ABI-compatible version may increase this value */
>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>> };
>>
>> Then *_LIST_END values could be ignored by libabigail with such a change.
>>
>> If such a patch is not done by tomorrow, I will have to revert
>> Chacha-Poly commits before 20.02-rc2, because
>>
>> 1/ LIST_END, without any comment, means "size of range"
>> 2/ we do not blame users for undocumented ABI changes
>> 3/ we respect the ABI compatibility contract
  
Thomas Monjalon Feb. 3, 2020, 9:07 p.m. UTC | #29
03/02/2020 19:55, Ray Kinsella:
> On 03/02/2020 17:34, Thomas Monjalon wrote:
> > 03/02/2020 18:09, Thomas Monjalon:
> >> 03/02/2020 10:30, Ferruh Yigit:
> >>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >>>> 02/02/2020 14:05, Thomas Monjalon:
> >>>>> 31/01/2020 15:16, Trahe, Fiona:
> >>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>>>>> If library give higher value than expected by the application,
> >>>>>>> if the application uses this value as array index,
> >>>>>>> there can be an access out of bounds.
> >>>>>>
> >>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>>>>> sense and I don't see how there can be an API breakage.
> >>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>>>>> have been compiled against the lib so shouldn't have problems.
> >>>>>
> >>>>> You say there is no ABI issue because the application will be re-compiled
> >>>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >>>>> But this is not relevant for ABI compatibility.
> >>>>> ABI compatibility means we can upgrade the library without recompiling
> >>>>> the application and it must work.
> >>>>> You think it is a false positive because you assume the application
> >>>>> "picks" the new value. I think you miss the case where the new value
> >>>>> is returned by a function in the upgraded library.
> >>>>>
> >>>>>> There are also no structs on the API which contain arrays using this
> >>>>>> for sizing, so I don't see an opportunity for an appl to have a
> >>>>>> mismatch in memory addresses.
> >>>>>
> >>>>> Let me demonstrate where the API may "use" the new value
> >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>>>>
> >>>>> Once upon a time a DPDK application counting the number of devices
> >>>>> supporting each AEAD algo (in order to find the best supported algo).
> >>>>> It is done in an array indexed by algo id:
> >>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >>>>> The application is compiled with DPDK 19.11,
> >>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >>>>> So the size of the application array aead_dev_count is 3.
> >>>>> This binary is run with DPDK 20.02,
> >>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >>>>> The application uses this value:
> >>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >>>>> The application is crashing because of out of bound access.
> >>>>
> >>>> I'd say this is an example of bad written app.
> >>>> It probably should check that returned by library value doesn't
> >>>> exceed its internal array size.
> >>>
> >>> +1
> >>>
> >>> Application should ignore values >= MAX.
> >>
> >> Of course, blaming the API user is a lot easier than looking at the API.
> >> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> >> as the max value for the application.
> >> Value ranges are part of the ABI compatibility contract.
> >> It seems you expect the application developer to be aware that
> >> DPDK could return a higher value, so the application should
> >> check every enum values after calling an API. CRAZY.
> >>
> >> When we decide to announce an ABI compatibility and do some marketing,
> >> everyone is OK. But when we need to really make our ABI compatible,
> >> I see little or no effort. DISAPPOINTING.
> >>
> >>> Do you suggest we don't extend any enum or define between ABI breakage releases
> >>> to be sure bad written applications not affected?
> >>
> >> I suggest we must consider not breaking any assumption made on the API.
> >> Here we are breaking the enum range because nothing mentions _LIST_END
> >> is not really the absolute end of the enum.
> >> The solution is to make the change below in 20.02 + backport in 19.11.1:
> > 
> > Thinking twice, merging such change before 20.11 is breaking the
> > ABI assumption based on the API 19.11.0.
> > I ask the release maintainers (Luca, Kevin, David and me) and
> > the ABI maintainers (Neil and Ray) to vote for a or b solution:
> > 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
> 
> That would still be an ABI breakage though right.
> 
> > 	b) wait 20.11 and revert Chacha-Poly from 20.02
> 
> Thanks for analysis above Fiona, Ferruh and all. 
> 
> That is a nasty one alright - there is no "good" answer here.
> I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
> Could do without an enumeration?
> 
> There a c) though right.
> We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
> So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.

I agree there is a c) as I proposed in another email:
http://mails.dpdk.org/archives/dev/2020-February/156919.html
"
In this case, the proper solution is to implement
rte_cryptodev_info_get_v1911() so it filters out
RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
With this solution, an application compiled with DPDK 19.11 will keep
seeing the same range as before, while a 20.02 application could
see and use ChachaPoly.
"

> It would be painful I know.

Not so painful in my opinion.
Just need to call rte_cryptodev_info_get() from
rte_cryptodev_info_get_v1911() and filter the value
in the 19.11 range: [0..AES_GCM].

> It would also mean that Chacha-Poly would only be available to
> those building against >= 20.02.

Yes exactly.

The addition of comments and LIST_MAX like below are still valid
to avoid versioning after 20.11.

> >> - _LIST_END
> >> + _LIST_END, /* an ABI-compatible version may increase this value */
> >> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> >> };
> >>
> >> Then *_LIST_END values could be ignored by libabigail with such a change.

In order to avoid ABI check complaining, the best is to completely
remove LIST_END in DPDK 20.11.


> >> If such a patch is not done by tomorrow, I will have to revert
> >> Chacha-Poly commits before 20.02-rc2, because
> >>
> >> 1/ LIST_END, without any comment, means "size of range"
> >> 2/ we do not blame users for undocumented ABI changes
> >> 3/ we respect the ABI compatibility contract
  
Ferruh Yigit Feb. 4, 2020, 9:19 a.m. UTC | #30
On 2/3/2020 6:40 PM, Thomas Monjalon wrote:
> 03/02/2020 18:40, Ferruh Yigit:
>> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
>>> 03/02/2020 10:30, Ferruh Yigit:
>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>>> 30/01/2020 17:09, Ferruh Yigit:
>>>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>>>>>>>>
>>>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>>>>>>>>
>>>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
>>>>>>>>> this is a false positive.
>>>>>>>>>
>>>>>>>>> The ABI break can happen when a struct has been shared between the application
>>>>>>>>> and the library (DPDK) and the layout of that memory know differently by
>>>>>>>>> application and the library.
>>>>>>>>>
>>>>>>>>> Here in all cases, there is no layout/size change.
>>>>>>>>>
>>>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
>>>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
>>>>>>>>> won't send these values also it should ignore these values from library. Only
>>>>>>>>> consequence is old application won't able to use new features those new enums
>>>>>>>>> provide but that is expected/normal.
>>>>>>>>
>>>>>>>> If library give higher value than expected by the application,
>>>>>>>> if the application uses this value as array index,
>>>>>>>> there can be an access out of bounds.
>>>>>>>
>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>>> sense and I don't see how there can be an API breakage.
>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>>
>>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>>> But this is not relevant for ABI compatibility.
>>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>>> the application and it must work.
>>>>>> You think it is a false positive because you assume the application
>>>>>> "picks" the new value. I think you miss the case where the new value
>>>>>> is returned by a function in the upgraded library.
>>>>>>
>>>>>>> There are also no structs on the API which contain arrays using this
>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>>> mismatch in memory addresses.
>>>>>>
>>>>>> Let me demonstrate where the API may "use" the new value
>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>>
>>>>>> Once upon a time a DPDK application counting the number of devices
>>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>>> It is done in an array indexed by algo id:
>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>>> The application is compiled with DPDK 19.11,
>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>>> So the size of the application array aead_dev_count is 3.
>>>>>> This binary is run with DPDK 20.02,
>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>>> The application uses this value:
>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>>> The application is crashing because of out of bound access.
>>>>>
>>>>> I'd say this is an example of bad written app.
>>>>> It probably should check that returned by library value doesn't
>>>>> exceed its internal array size.
>>>>
>>>> +1
>>>>
>>>> Application should ignore values >= MAX.
>>>
>>> Of course, blaming the API user is a lot easier than looking at the API.
>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>>> as the max value for the application.
>>> Value ranges are part of the ABI compatibility contract.
>>> It seems you expect the application developer to be aware that
>>> DPDK could return a higher value, so the application should
>>> check every enum values after calling an API. CRAZY.
>>>
>>> When we decide to announce an ABI compatibility and do some marketing,
>>> everyone is OK. But when we need to really make our ABI compatible,
>>> I see little or no effort. DISAPPOINTING.
>>
>> This is not to blame the user or to do less work, this is more sane approach
>> that library provides the _END/_MAX value and application uses it as valid range
>> check.
>>
>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>>> to be sure bad written applications not affected?
>>>
>>> I suggest we must consider not breaking any assumption made on the API.
>>> Here we are breaking the enum range because nothing mentions _LIST_END
>>> is not really the absolute end of the enum.
>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
>>>
>>> - _LIST_END
>>> + _LIST_END, /* an ABI-compatible version may increase this value */
>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>>> };
>>>
>>
>> What is the point of "_LIST_MAX" here?
> 
> _LIST_MAX is range of value that DPDK can return in the ABI contract.
> So the appplication can rely on the range 0.._LIST_MAX.
> 
>> Application should know the "_LIST_END" of when it has been compiled for the
>> valid range check. Next time it is compiled "_LIST_END" may be different value
>> but same logic applies.
> 
> No, ABI compatibility contract means you can compile your application
> with DPDK 19.11.0 and run it with DPDK 20.02.
> So _LIST_END comes from 19.11 and does not include ChachaPoly.

That is what I mean, let me try to give a sample.

DPDK19.11 returns, A=1, B=2, END=3

Application compiled with DPDK19.11, it will process A, B and ignore anything ">= 3"

DPDK20.02 returns A=1, B=2, C=3, D=4, END=5

Old application will still only will know/use A, B and can ignore when library
sends C=3, D=4 etc...


In above, if you add another limit as you suggested, like MAX=10 and ask
application to use it,

Application compiled with DPDK19.11 will be OK since library only sends A,B and
application uses them.

But with DPDK20.02 application may have problem, since library will be sending
C=3, which is valid according to the check " <= MAX (10)", how application will
know to ignore it.

So application should use _END to know the valid ones according it, if so what
is the point of having _MAX.


> 
>> When "_LIST_END" is missing, application can't protect itself, in that case
>> library should send only the values application knows when it is compiled, this
>> means either we can't extend our enum/defines until next ABI breakage, or we
>> need to do ABI versioning to the functions that returns an enum each time enum
>> value extended.
> 
> If we define _LIST_MAX as a bigger value than current _LIST_END,
> we have some room to add values in between.
> 
> If (as of now) we don't have _LIST_MAX room, then yes we must version
> the functions returning the enum.
> In this case, the proper solution is to implement
> rte_cryptodev_info_get_v1911() so it filters out
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> With this solution, an application compiled with DPDK 19.11 will keep
> seeing the same range as before, while a 20.02 application could
> see and use ChachaPoly.
> This is another proposal that I was expecting from the crypto team,
> instead of claiming there is no issue (and wasting precious time).
> 
> 
>> I believe it is saner to provide _END/_MAX values to the application to use. And
>> if required comment them to clarify the expected usage.
>>
>> But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
>> mean anything to application.
> 
> I don't understand what you mean. I think you misunderstood what is ABI compat.
> 
> 
>>> Then *_LIST_END values could be ignored by libabigail with such a change.
>>>
>>> If such a patch is not done by tomorrow, I will have to revert
>>> Chacha-Poly commits before 20.02-rc2, because
>>>
>>> 1/ LIST_END, without any comment, means "size of range"
>>> 2/ we do not blame users for undocumented ABI changes
>>> 3/ we respect the ABI compatibility contract
> 
> 
>
  
Thomas Monjalon Feb. 4, 2020, 9:45 a.m. UTC | #31
04/02/2020 10:19, Ferruh Yigit:
> On 2/3/2020 6:40 PM, Thomas Monjalon wrote:
> > 03/02/2020 18:40, Ferruh Yigit:
> >> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> >>> 03/02/2020 10:30, Ferruh Yigit:
> >>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >>>>> 02/02/2020 14:05, Thomas Monjalon:
> >>>>>> 31/01/2020 15:16, Trahe, Fiona:
> >>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>>>>>> 30/01/2020 17:09, Ferruh Yigit:
> >>>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>>>>>>>>
> >>>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
> >>>>>>>>>
> >>>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
> >>>>>>>>> this is a false positive.
> >>>>>>>>>
> >>>>>>>>> The ABI break can happen when a struct has been shared between the application
> >>>>>>>>> and the library (DPDK) and the layout of that memory know differently by
> >>>>>>>>> application and the library.
> >>>>>>>>>
> >>>>>>>>> Here in all cases, there is no layout/size change.
> >>>>>>>>>
> >>>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
> >>>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
> >>>>>>>>> won't send these values also it should ignore these values from library. Only
> >>>>>>>>> consequence is old application won't able to use new features those new enums
> >>>>>>>>> provide but that is expected/normal.
> >>>>>>>>
> >>>>>>>> If library give higher value than expected by the application,
> >>>>>>>> if the application uses this value as array index,
> >>>>>>>> there can be an access out of bounds.
> >>>>>>>
> >>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>>>>>> sense and I don't see how there can be an API breakage.
> >>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>>>>>> have been compiled against the lib so shouldn't have problems.
> >>>>>>
> >>>>>> You say there is no ABI issue because the application will be re-compiled
> >>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >>>>>> But this is not relevant for ABI compatibility.
> >>>>>> ABI compatibility means we can upgrade the library without recompiling
> >>>>>> the application and it must work.
> >>>>>> You think it is a false positive because you assume the application
> >>>>>> "picks" the new value. I think you miss the case where the new value
> >>>>>> is returned by a function in the upgraded library.
> >>>>>>
> >>>>>>> There are also no structs on the API which contain arrays using this
> >>>>>>> for sizing, so I don't see an opportunity for an appl to have a
> >>>>>>> mismatch in memory addresses.
> >>>>>>
> >>>>>> Let me demonstrate where the API may "use" the new value
> >>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>>>>>
> >>>>>> Once upon a time a DPDK application counting the number of devices
> >>>>>> supporting each AEAD algo (in order to find the best supported algo).
> >>>>>> It is done in an array indexed by algo id:
> >>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >>>>>> The application is compiled with DPDK 19.11,
> >>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >>>>>> So the size of the application array aead_dev_count is 3.
> >>>>>> This binary is run with DPDK 20.02,
> >>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >>>>>> The application uses this value:
> >>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >>>>>> The application is crashing because of out of bound access.
> >>>>>
> >>>>> I'd say this is an example of bad written app.
> >>>>> It probably should check that returned by library value doesn't
> >>>>> exceed its internal array size.
> >>>>
> >>>> +1
> >>>>
> >>>> Application should ignore values >= MAX.
> >>>
> >>> Of course, blaming the API user is a lot easier than looking at the API.
> >>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> >>> as the max value for the application.
> >>> Value ranges are part of the ABI compatibility contract.
> >>> It seems you expect the application developer to be aware that
> >>> DPDK could return a higher value, so the application should
> >>> check every enum values after calling an API. CRAZY.
> >>>
> >>> When we decide to announce an ABI compatibility and do some marketing,
> >>> everyone is OK. But when we need to really make our ABI compatible,
> >>> I see little or no effort. DISAPPOINTING.
> >>
> >> This is not to blame the user or to do less work, this is more sane approach
> >> that library provides the _END/_MAX value and application uses it as valid range
> >> check.
> >>
> >>>> Do you suggest we don't extend any enum or define between ABI breakage releases
> >>>> to be sure bad written applications not affected?
> >>>
> >>> I suggest we must consider not breaking any assumption made on the API.
> >>> Here we are breaking the enum range because nothing mentions _LIST_END
> >>> is not really the absolute end of the enum.
> >>> The solution is to make the change below in 20.02 + backport in 19.11.1:
> >>>
> >>> - _LIST_END
> >>> + _LIST_END, /* an ABI-compatible version may increase this value */
> >>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> >>> };
> >>>
> >>
> >> What is the point of "_LIST_MAX" here?
> > 
> > _LIST_MAX is range of value that DPDK can return in the ABI contract.
> > So the appplication can rely on the range 0.._LIST_MAX.
> > 
> >> Application should know the "_LIST_END" of when it has been compiled for the
> >> valid range check. Next time it is compiled "_LIST_END" may be different value
> >> but same logic applies.
> > 
> > No, ABI compatibility contract means you can compile your application
> > with DPDK 19.11.0 and run it with DPDK 20.02.
> > So _LIST_END comes from 19.11 and does not include ChachaPoly.
> 
> That is what I mean, let me try to give a sample.
> 
> DPDK19.11 returns, A=1, B=2, END=3
> 
> Application compiled with DPDK19.11, it will process A, B and ignore anything ">= 3"

No, the application will not ignore anything ">=3" as I explained above,
and you blamed the application for it.
Nothing in the API says the application must filter value higher than 3,
because as of now, values higher than 3 are PMD bug.


> DPDK20.02 returns A=1, B=2, C=3, D=4, END=5
> 
> Old application will still only will know/use A, B and can ignore when library
> sends C=3, D=4 etc...
> 
> 
> In above, if you add another limit as you suggested, like MAX=10 and ask
> application to use it,
> 
> Application compiled with DPDK19.11 will be OK since library only sends A,B and
> application uses them.
> 
> But with DPDK20.02 application may have problem, since library will be sending
> C=3, which is valid according to the check " <= MAX (10)", how application will
> know to ignore it.

Why application should ignore value C=3 with DPDK 20.02?


> So application should use _END to know the valid ones according it, if so what
> is the point of having _MAX.
> 
> 
> >> When "_LIST_END" is missing, application can't protect itself, in that case
> >> library should send only the values application knows when it is compiled, this
> >> means either we can't extend our enum/defines until next ABI breakage, or we
> >> need to do ABI versioning to the functions that returns an enum each time enum
> >> value extended.
> > 
> > If we define _LIST_MAX as a bigger value than current _LIST_END,
> > we have some room to add values in between.
> > 
> > If (as of now) we don't have _LIST_MAX room, then yes we must version
> > the functions returning the enum.
> > In this case, the proper solution is to implement
> > rte_cryptodev_info_get_v1911() so it filters out
> > RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> > With this solution, an application compiled with DPDK 19.11 will keep
> > seeing the same range as before, while a 20.02 application could
> > see and use ChachaPoly.
> > This is another proposal that I was expecting from the crypto team,
> > instead of claiming there is no issue (and wasting precious time).
> > 
> > 
> >> I believe it is saner to provide _END/_MAX values to the application to use. And
> >> if required comment them to clarify the expected usage.
> >>
> >> But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
> >> mean anything to application.
> > 
> > I don't understand what you mean. I think you misunderstood what is ABI compat.
> > 
> > 
> >>> Then *_LIST_END values could be ignored by libabigail with such a change.
> >>>
> >>> If such a patch is not done by tomorrow, I will have to revert
> >>> Chacha-Poly commits before 20.02-rc2, because
> >>>
> >>> 1/ LIST_END, without any comment, means "size of range"
> >>> 2/ we do not blame users for undocumented ABI changes
> >>> 3/ we respect the ABI compatibility contract
  
Ferruh Yigit Feb. 4, 2020, 9:46 a.m. UTC | #32
On 2/3/2020 9:07 PM, Thomas Monjalon wrote:
> 03/02/2020 19:55, Ray Kinsella:
>> On 03/02/2020 17:34, Thomas Monjalon wrote:
>>> 03/02/2020 18:09, Thomas Monjalon:
>>>> 03/02/2020 10:30, Ferruh Yigit:
>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>>>> If library give higher value than expected by the application,
>>>>>>>>> if the application uses this value as array index,
>>>>>>>>> there can be an access out of bounds.
>>>>>>>>
>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>>>> sense and I don't see how there can be an API breakage.
>>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>>>
>>>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>>>> But this is not relevant for ABI compatibility.
>>>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>>>> the application and it must work.
>>>>>>> You think it is a false positive because you assume the application
>>>>>>> "picks" the new value. I think you miss the case where the new value
>>>>>>> is returned by a function in the upgraded library.
>>>>>>>
>>>>>>>> There are also no structs on the API which contain arrays using this
>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>>>> mismatch in memory addresses.
>>>>>>>
>>>>>>> Let me demonstrate where the API may "use" the new value
>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>>>
>>>>>>> Once upon a time a DPDK application counting the number of devices
>>>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>>>> It is done in an array indexed by algo id:
>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>>>> The application is compiled with DPDK 19.11,
>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>>>> So the size of the application array aead_dev_count is 3.
>>>>>>> This binary is run with DPDK 20.02,
>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>>>> The application uses this value:
>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>>>> The application is crashing because of out of bound access.
>>>>>>
>>>>>> I'd say this is an example of bad written app.
>>>>>> It probably should check that returned by library value doesn't
>>>>>> exceed its internal array size.
>>>>>
>>>>> +1
>>>>>
>>>>> Application should ignore values >= MAX.
>>>>
>>>> Of course, blaming the API user is a lot easier than looking at the API.
>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>>>> as the max value for the application.
>>>> Value ranges are part of the ABI compatibility contract.
>>>> It seems you expect the application developer to be aware that
>>>> DPDK could return a higher value, so the application should
>>>> check every enum values after calling an API. CRAZY.
>>>>
>>>> When we decide to announce an ABI compatibility and do some marketing,
>>>> everyone is OK. But when we need to really make our ABI compatible,
>>>> I see little or no effort. DISAPPOINTING.
>>>>
>>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>>>> to be sure bad written applications not affected?
>>>>
>>>> I suggest we must consider not breaking any assumption made on the API.
>>>> Here we are breaking the enum range because nothing mentions _LIST_END
>>>> is not really the absolute end of the enum.
>>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
>>>
>>> Thinking twice, merging such change before 20.11 is breaking the
>>> ABI assumption based on the API 19.11.0.
>>> I ask the release maintainers (Luca, Kevin, David and me) and
>>> the ABI maintainers (Neil and Ray) to vote for a or b solution:
>>> 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
>>
>> That would still be an ABI breakage though right.
>>
>>> 	b) wait 20.11 and revert Chacha-Poly from 20.02
>>
>> Thanks for analysis above Fiona, Ferruh and all. 
>>
>> That is a nasty one alright - there is no "good" answer here.
>> I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
>> Could do without an enumeration?
>>
>> There a c) though right.
>> We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
>> So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.
> 
> I agree there is a c) as I proposed in another email:
> http://mails.dpdk.org/archives/dev/2020-February/156919.html
> "
> In this case, the proper solution is to implement
> rte_cryptodev_info_get_v1911() so it filters out
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> With this solution, an application compiled with DPDK 19.11 will keep
> seeing the same range as before, while a 20.02 application could
> see and use ChachaPoly.
> "
> 
>> It would be painful I know.
> 
> Not so painful in my opinion.
> Just need to call rte_cryptodev_info_get() from
> rte_cryptodev_info_get_v1911() and filter the value
> in the 19.11 range: [0..AES_GCM].
> 
>> It would also mean that Chacha-Poly would only be available to
>> those building against >= 20.02.
> 
> Yes exactly.
> 
> The addition of comments and LIST_MAX like below are still valid
> to avoid versioning after 20.11.
> 
>>>> - _LIST_END
>>>> + _LIST_END, /* an ABI-compatible version may increase this value */
>>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>>>> };
>>>>
>>>> Then *_LIST_END values could be ignored by libabigail with such a change.
> 
> In order to avoid ABI check complaining, the best is to completely
> remove LIST_END in DPDK 20.11.

We can remove LIST_END only if we go with option (c).

Two different approach,
- Provide the LIST_END and expect application protect itself against new values
can be coming in newer version of the library
- Do ABI versioning to prevent application receive new values at all, (c).

We can select one, but I believe the selection shouldn't be based on just
silencing the ABI check tool.


> 
> 
>>>> If such a patch is not done by tomorrow, I will have to revert
>>>> Chacha-Poly commits before 20.02-rc2, because
>>>>
>>>> 1/ LIST_END, without any comment, means "size of range"
>>>> 2/ we do not blame users for undocumented ABI changes
>>>> 3/ we respect the ABI compatibility contract
> 
> 
>
  
David Marchand Feb. 4, 2020, 9:51 a.m. UTC | #33
On Mon, Feb 3, 2020 at 7:56 PM Ray Kinsella <mdr@ashroe.eu> wrote:
> On 03/02/2020 17:34, Thomas Monjalon wrote:
> > 03/02/2020 18:09, Thomas Monjalon:
> >> 03/02/2020 10:30, Ferruh Yigit:
> >>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >>>> 02/02/2020 14:05, Thomas Monjalon:
> >>>>> 31/01/2020 15:16, Trahe, Fiona:
> >>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>>>>> If library give higher value than expected by the application,
> >>>>>>> if the application uses this value as array index,
> >>>>>>> there can be an access out of bounds.
> >>>>>>
> >>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>>>>> sense and I don't see how there can be an API breakage.
> >>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>>>>> have been compiled against the lib so shouldn't have problems.
> >>>>>
> >>>>> You say there is no ABI issue because the application will be re-compiled
> >>>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >>>>> But this is not relevant for ABI compatibility.
> >>>>> ABI compatibility means we can upgrade the library without recompiling
> >>>>> the application and it must work.
> >>>>> You think it is a false positive because you assume the application
> >>>>> "picks" the new value. I think you miss the case where the new value
> >>>>> is returned by a function in the upgraded library.
> >>>>>
> >>>>>> There are also no structs on the API which contain arrays using this
> >>>>>> for sizing, so I don't see an opportunity for an appl to have a
> >>>>>> mismatch in memory addresses.
> >>>>>
> >>>>> Let me demonstrate where the API may "use" the new value
> >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>>>>
> >>>>> Once upon a time a DPDK application counting the number of devices
> >>>>> supporting each AEAD algo (in order to find the best supported algo).
> >>>>> It is done in an array indexed by algo id:
> >>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >>>>> The application is compiled with DPDK 19.11,
> >>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >>>>> So the size of the application array aead_dev_count is 3.
> >>>>> This binary is run with DPDK 20.02,
> >>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >>>>> The application uses this value:
> >>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >>>>> The application is crashing because of out of bound access.
> >>>>
> >>>> I'd say this is an example of bad written app.
> >>>> It probably should check that returned by library value doesn't
> >>>> exceed its internal array size.
> >>>
> >>> +1
> >>>
> >>> Application should ignore values >= MAX.
> >>
> >> Of course, blaming the API user is a lot easier than looking at the API.
> >> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> >> as the max value for the application.
> >> Value ranges are part of the ABI compatibility contract.
> >> It seems you expect the application developer to be aware that
> >> DPDK could return a higher value, so the application should
> >> check every enum values after calling an API. CRAZY.
> >>
> >> When we decide to announce an ABI compatibility and do some marketing,
> >> everyone is OK. But when we need to really make our ABI compatible,
> >> I see little or no effort. DISAPPOINTING.
> >>
> >>> Do you suggest we don't extend any enum or define between ABI breakage releases
> >>> to be sure bad written applications not affected?
> >>
> >> I suggest we must consider not breaking any assumption made on the API.
> >> Here we are breaking the enum range because nothing mentions _LIST_END
> >> is not really the absolute end of the enum.
> >> The solution is to make the change below in 20.02 + backport in 19.11.1:
> >
> > Thinking twice, merging such change before 20.11 is breaking the
> > ABI assumption based on the API 19.11.0.
> > I ask the release maintainers (Luca, Kevin, David and me) and
> > the ABI maintainers (Neil and Ray) to vote for a or b solution:
> >       a) add comment and LIST_MAX as below in 20.02 + 19.11.1
>
> That would still be an ABI breakage though right.

Yes.


>
> >       b) wait 20.11 and revert Chacha-Poly from 20.02
>
> Thanks for analysis above Fiona, Ferruh and all.
>
> That is a nasty one alright - there is no "good" answer here.
> I agree with Ferruh's sentiments overall, we should rethink this API for 20.11.
> Could do without an enumeration?
>
> There a c) though right.
> We could work around the issue by api versioning rte_cryptodev_info_get() and friends.

It has a lot of friends, but it sounds like the right approach.
Is someone looking into this?


> So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.
>
> It would be painful I know.
> It would also mean that Chacha-Poly would only be available to those building against >= 20.02.

Yes.


--
David Marchand
  
Ferruh Yigit Feb. 4, 2020, 9:56 a.m. UTC | #34
On 2/4/2020 9:45 AM, Thomas Monjalon wrote:
> 04/02/2020 10:19, Ferruh Yigit:
>> On 2/3/2020 6:40 PM, Thomas Monjalon wrote:
>>> 03/02/2020 18:40, Ferruh Yigit:
>>>> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
>>>>> 03/02/2020 10:30, Ferruh Yigit:
>>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>>>>> 30/01/2020 17:09, Ferruh Yigit:
>>>>>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>>>>>>>>>>
>>>>>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
>>>>>>>>>>> this is a false positive.
>>>>>>>>>>>
>>>>>>>>>>> The ABI break can happen when a struct has been shared between the application
>>>>>>>>>>> and the library (DPDK) and the layout of that memory know differently by
>>>>>>>>>>> application and the library.
>>>>>>>>>>>
>>>>>>>>>>> Here in all cases, there is no layout/size change.
>>>>>>>>>>>
>>>>>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
>>>>>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
>>>>>>>>>>> won't send these values also it should ignore these values from library. Only
>>>>>>>>>>> consequence is old application won't able to use new features those new enums
>>>>>>>>>>> provide but that is expected/normal.
>>>>>>>>>>
>>>>>>>>>> If library give higher value than expected by the application,
>>>>>>>>>> if the application uses this value as array index,
>>>>>>>>>> there can be an access out of bounds.
>>>>>>>>>
>>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>>>>> sense and I don't see how there can be an API breakage.
>>>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>>>>
>>>>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>>>>> But this is not relevant for ABI compatibility.
>>>>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>>>>> the application and it must work.
>>>>>>>> You think it is a false positive because you assume the application
>>>>>>>> "picks" the new value. I think you miss the case where the new value
>>>>>>>> is returned by a function in the upgraded library.
>>>>>>>>
>>>>>>>>> There are also no structs on the API which contain arrays using this
>>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>>>>> mismatch in memory addresses.
>>>>>>>>
>>>>>>>> Let me demonstrate where the API may "use" the new value
>>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>>>>
>>>>>>>> Once upon a time a DPDK application counting the number of devices
>>>>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>>>>> It is done in an array indexed by algo id:
>>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>>>>> The application is compiled with DPDK 19.11,
>>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>>>>> So the size of the application array aead_dev_count is 3.
>>>>>>>> This binary is run with DPDK 20.02,
>>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>>>>> The application uses this value:
>>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>>>>> The application is crashing because of out of bound access.
>>>>>>>
>>>>>>> I'd say this is an example of bad written app.
>>>>>>> It probably should check that returned by library value doesn't
>>>>>>> exceed its internal array size.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Application should ignore values >= MAX.
>>>>>
>>>>> Of course, blaming the API user is a lot easier than looking at the API.
>>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>>>>> as the max value for the application.
>>>>> Value ranges are part of the ABI compatibility contract.
>>>>> It seems you expect the application developer to be aware that
>>>>> DPDK could return a higher value, so the application should
>>>>> check every enum values after calling an API. CRAZY.
>>>>>
>>>>> When we decide to announce an ABI compatibility and do some marketing,
>>>>> everyone is OK. But when we need to really make our ABI compatible,
>>>>> I see little or no effort. DISAPPOINTING.
>>>>
>>>> This is not to blame the user or to do less work, this is more sane approach
>>>> that library provides the _END/_MAX value and application uses it as valid range
>>>> check.
>>>>
>>>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>>>>> to be sure bad written applications not affected?
>>>>>
>>>>> I suggest we must consider not breaking any assumption made on the API.
>>>>> Here we are breaking the enum range because nothing mentions _LIST_END
>>>>> is not really the absolute end of the enum.
>>>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
>>>>>
>>>>> - _LIST_END
>>>>> + _LIST_END, /* an ABI-compatible version may increase this value */
>>>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>>>>> };
>>>>>
>>>>
>>>> What is the point of "_LIST_MAX" here?
>>>
>>> _LIST_MAX is range of value that DPDK can return in the ABI contract.
>>> So the appplication can rely on the range 0.._LIST_MAX.
>>>
>>>> Application should know the "_LIST_END" of when it has been compiled for the
>>>> valid range check. Next time it is compiled "_LIST_END" may be different value
>>>> but same logic applies.
>>>
>>> No, ABI compatibility contract means you can compile your application
>>> with DPDK 19.11.0 and run it with DPDK 20.02.
>>> So _LIST_END comes from 19.11 and does not include ChachaPoly.
>>
>> That is what I mean, let me try to give a sample.
>>
>> DPDK19.11 returns, A=1, B=2, END=3
>>
>> Application compiled with DPDK19.11, it will process A, B and ignore anything ">= 3"
> 
> No, the application will not ignore anything ">=3" as I explained above,
> and you blamed the application for it.
> Nothing in the API says the application must filter value higher than 3,
> because as of now, values higher than 3 are PMD bug.

When application compiled, that is the END value, anything bigger than this
value is not valid, if any application use the return value directly I think it
is doing something wrong.
But yes there may be applications relying on library will always send in the
range. We never communicated this. But we can add comments to clarify this.

> 
> 
>> DPDK20.02 returns A=1, B=2, C=3, D=4, END=5
>>
>> Old application will still only will know/use A, B and can ignore when library
>> sends C=3, D=4 etc...
>>
>>
>> In above, if you add another limit as you suggested, like MAX=10 and ask
>> application to use it,
>>
>> Application compiled with DPDK19.11 will be OK since library only sends A,B and
>> application uses them.
>>
>> But with DPDK20.02 application may have problem, since library will be sending
>> C=3, which is valid according to the check " <= MAX (10)", how application will
>> know to ignore it.
> 
> Why application should ignore value C=3 with DPDK 20.02?

This is the application compiled with DPDK19.11, and running with DPDK20.02.

So for the application this is the value >= MAX and something it doesn't know
what to do.

> 
> 
>> So application should use _END to know the valid ones according it, if so what
>> is the point of having _MAX.
>>
>>
>>>> When "_LIST_END" is missing, application can't protect itself, in that case
>>>> library should send only the values application knows when it is compiled, this
>>>> means either we can't extend our enum/defines until next ABI breakage, or we
>>>> need to do ABI versioning to the functions that returns an enum each time enum
>>>> value extended.
>>>
>>> If we define _LIST_MAX as a bigger value than current _LIST_END,
>>> we have some room to add values in between.
>>>
>>> If (as of now) we don't have _LIST_MAX room, then yes we must version
>>> the functions returning the enum.
>>> In this case, the proper solution is to implement
>>> rte_cryptodev_info_get_v1911() so it filters out
>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
>>> With this solution, an application compiled with DPDK 19.11 will keep
>>> seeing the same range as before, while a 20.02 application could
>>> see and use ChachaPoly.
>>> This is another proposal that I was expecting from the crypto team,
>>> instead of claiming there is no issue (and wasting precious time).
>>>
>>>
>>>> I believe it is saner to provide _END/_MAX values to the application to use. And
>>>> if required comment them to clarify the expected usage.
>>>>
>>>> But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
>>>> mean anything to application.
>>>
>>> I don't understand what you mean. I think you misunderstood what is ABI compat.
>>>
>>>
>>>>> Then *_LIST_END values could be ignored by libabigail with such a change.
>>>>>
>>>>> If such a patch is not done by tomorrow, I will have to revert
>>>>> Chacha-Poly commits before 20.02-rc2, because
>>>>>
>>>>> 1/ LIST_END, without any comment, means "size of range"
>>>>> 2/ we do not blame users for undocumented ABI changes
>>>>> 3/ we respect the ABI compatibility contract
> 
> 
>
  
Bruce Richardson Feb. 4, 2020, 10:08 a.m. UTC | #35
On Tue, Feb 04, 2020 at 09:56:31AM +0000, Ferruh Yigit wrote:
> On 2/4/2020 9:45 AM, Thomas Monjalon wrote:
> > 04/02/2020 10:19, Ferruh Yigit:
> >> On 2/3/2020 6:40 PM, Thomas Monjalon wrote:
> >>> 03/02/2020 18:40, Ferruh Yigit:
> >>>> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> >>>>> 03/02/2020 10:30, Ferruh Yigit:
> >>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >>>>>>> 02/02/2020 14:05, Thomas Monjalon:
> >>>>>>>> 31/01/2020 15:16, Trahe, Fiona:
> >>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>>>>>>>> 30/01/2020 17:09, Ferruh Yigit:
> >>>>>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
> >>>>>>>>>>>
> >>>>>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
> >>>>>>>>>>> this is a false positive.
> >>>>>>>>>>>
> >>>>>>>>>>> The ABI break can happen when a struct has been shared between the application
> >>>>>>>>>>> and the library (DPDK) and the layout of that memory know differently by
> >>>>>>>>>>> application and the library.
> >>>>>>>>>>>
> >>>>>>>>>>> Here in all cases, there is no layout/size change.
> >>>>>>>>>>>
> >>>>>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
> >>>>>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
> >>>>>>>>>>> won't send these values also it should ignore these values from library. Only
> >>>>>>>>>>> consequence is old application won't able to use new features those new enums
> >>>>>>>>>>> provide but that is expected/normal.
> >>>>>>>>>>
> >>>>>>>>>> If library give higher value than expected by the application,
> >>>>>>>>>> if the application uses this value as array index,
> >>>>>>>>>> there can be an access out of bounds.
> >>>>>>>>>
> >>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >>>>>>>>> sense and I don't see how there can be an API breakage.
> >>>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
> >>>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
> >>>>>>>>> have been compiled against the lib so shouldn't have problems.
> >>>>>>>>
> >>>>>>>> You say there is no ABI issue because the application will be re-compiled
> >>>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >>>>>>>> But this is not relevant for ABI compatibility.
> >>>>>>>> ABI compatibility means we can upgrade the library without recompiling
> >>>>>>>> the application and it must work.
> >>>>>>>> You think it is a false positive because you assume the application
> >>>>>>>> "picks" the new value. I think you miss the case where the new value
> >>>>>>>> is returned by a function in the upgraded library.
> >>>>>>>>
> >>>>>>>>> There are also no structs on the API which contain arrays using this
> >>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
> >>>>>>>>> mismatch in memory addresses.
> >>>>>>>>
> >>>>>>>> Let me demonstrate where the API may "use" the new value
> >>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >>>>>>>>
> >>>>>>>> Once upon a time a DPDK application counting the number of devices
> >>>>>>>> supporting each AEAD algo (in order to find the best supported algo).
> >>>>>>>> It is done in an array indexed by algo id:
> >>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >>>>>>>> The application is compiled with DPDK 19.11,
> >>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >>>>>>>> So the size of the application array aead_dev_count is 3.
> >>>>>>>> This binary is run with DPDK 20.02,
> >>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >>>>>>>> The application uses this value:
> >>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >>>>>>>> The application is crashing because of out of bound access.
> >>>>>>>
> >>>>>>> I'd say this is an example of bad written app.
> >>>>>>> It probably should check that returned by library value doesn't
> >>>>>>> exceed its internal array size.
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Application should ignore values >= MAX.
> >>>>>
> >>>>> Of course, blaming the API user is a lot easier than looking at the API.
> >>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> >>>>> as the max value for the application.
> >>>>> Value ranges are part of the ABI compatibility contract.
> >>>>> It seems you expect the application developer to be aware that
> >>>>> DPDK could return a higher value, so the application should
> >>>>> check every enum values after calling an API. CRAZY.
> >>>>>
> >>>>> When we decide to announce an ABI compatibility and do some marketing,
> >>>>> everyone is OK. But when we need to really make our ABI compatible,
> >>>>> I see little or no effort. DISAPPOINTING.
> >>>>
> >>>> This is not to blame the user or to do less work, this is more sane approach
> >>>> that library provides the _END/_MAX value and application uses it as valid range
> >>>> check.
> >>>>
> >>>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
> >>>>>> to be sure bad written applications not affected?
> >>>>>
> >>>>> I suggest we must consider not breaking any assumption made on the API.
> >>>>> Here we are breaking the enum range because nothing mentions _LIST_END
> >>>>> is not really the absolute end of the enum.
> >>>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
> >>>>>
> >>>>> - _LIST_END
> >>>>> + _LIST_END, /* an ABI-compatible version may increase this value */
> >>>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> >>>>> };
> >>>>>
> >>>>
> >>>> What is the point of "_LIST_MAX" here?
> >>>
> >>> _LIST_MAX is range of value that DPDK can return in the ABI contract.
> >>> So the appplication can rely on the range 0.._LIST_MAX.
> >>>
> >>>> Application should know the "_LIST_END" of when it has been compiled for the
> >>>> valid range check. Next time it is compiled "_LIST_END" may be different value
> >>>> but same logic applies.
> >>>
> >>> No, ABI compatibility contract means you can compile your application
> >>> with DPDK 19.11.0 and run it with DPDK 20.02.
> >>> So _LIST_END comes from 19.11 and does not include ChachaPoly.
> >>
> >> That is what I mean, let me try to give a sample.
> >>
> >> DPDK19.11 returns, A=1, B=2, END=3
> >>
> >> Application compiled with DPDK19.11, it will process A, B and ignore anything ">= 3"
> > 
> > No, the application will not ignore anything ">=3" as I explained above,
> > and you blamed the application for it.
> > Nothing in the API says the application must filter value higher than 3,
> > because as of now, values higher than 3 are PMD bug.
> 
> When application compiled, that is the END value, anything bigger than this
> value is not valid, if any application use the return value directly I think it
> is doing something wrong.
> But yes there may be applications relying on library will always send in the
> range. We never communicated this. But we can add comments to clarify this.
> 
I don't think we should do so, as for any function returning an enum by
definition it should never return an out-of-range value. I strongly agree
with the suggestion of versioning the functions so that the ranges seen by
apps are clamped to the expected 19.11 compatible values.
  
Fiona Trahe Feb. 4, 2020, 10:10 a.m. UTC | #36
And not used for sizing  > >
> > There a c) though right.
> > We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
> 
> It has a lot of friends, but it sounds like the right approach.
> Is someone looking into this?
[Fiona] Yes. But not clear yet if can be done by tomorrow.

But even if feasible, that only works around the current issue.
There is a bigger issue to be decided here - 
Should we be removing LIST_END/MAX values from all enums in 20.11?
Or defining through API comment that they should only be used as a range boundary and
NOT to size an array. And so having a fixed value is not part of the API contract.
  
Akhil Goyal Feb. 4, 2020, 10:16 a.m. UTC | #37
Hi,
> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > 03/02/2020 10:30, Ferruh Yigit:
> >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >>> 02/02/2020 14:05, Thomas Monjalon:
> >>>> 31/01/2020 15:16, Trahe, Fiona:
> >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> >>>>>>>>
> >>>>>>>> I believe these enums will be used only in case of ASYM case which is
> experimental.
> >>>>>>>
> >>>>>>> Independent from being experiment and not, this shouldn't be a
> problem, I think
> >>>>>>> this is a false positive.
> >>>>>>>
> >>>>>>> The ABI break can happen when a struct has been shared between the
> application
> >>>>>>> and the library (DPDK) and the layout of that memory know differently
> by
> >>>>>>> application and the library.
> >>>>>>>
> >>>>>>> Here in all cases, there is no layout/size change.
> >>>>>>>
> >>>>>>> As to the value changes of the enums, since application compiled with
> old DPDK,
> >>>>>>> it will know only up to '6', 7 and more means invalid to the application.
> So it
> >>>>>>> won't send these values also it should ignore these values from library.
> Only
> >>>>>>> consequence is old application won't able to use new features those
> new enums
> >>>>>>> provide but that is expected/normal.
> >>>>>>
> >>>>>> If library give higher value than expected by the application,
> >>>>>> if the application uses this value as array index,
> >>>>>> there can be an access out of bounds.
> >>>>>
> >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a
> problem.
> >>>>> But for the same issue with sym crypto below, I believe Ferruh's
> explanation makes
> >>>>> sense and I don't see how there can be an API breakage.
> >>>>> So if an application hasn't compiled against the new lib it will be still using
> the old value
> >>>>> which will be within bounds. If it's picking up the higher new value from
> the lib it must
> >>>>> have been compiled against the lib so shouldn't have problems.
> >>>>
> >>>> You say there is no ABI issue because the application will be re-compiled
> >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >>>> But this is not relevant for ABI compatibility.
> >>>> ABI compatibility means we can upgrade the library without recompiling
> >>>> the application and it must work.
> >>>> You think it is a false positive because you assume the application
> >>>> "picks" the new value. I think you miss the case where the new value
> >>>> is returned by a function in the upgraded library.
> >>>>
> >>>>> There are also no structs on the API which contain arrays using this
> >>>>> for sizing, so I don't see an opportunity for an appl to have a
> >>>>> mismatch in memory addresses.
> >>>>
> >>>> Let me demonstrate where the API may "use" the new value
> >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> application.
> >>>>
> >>>> Once upon a time a DPDK application counting the number of devices
> >>>> supporting each AEAD algo (in order to find the best supported algo).
> >>>> It is done in an array indexed by algo id:
> >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >>>> The application is compiled with DPDK 19.11,
> >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >>>> So the size of the application array aead_dev_count is 3.
> >>>> This binary is run with DPDK 20.02,
> >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >>>> The application uses this value:
> >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >>>> The application is crashing because of out of bound access.
> >>>
> >>> I'd say this is an example of bad written app.
> >>> It probably should check that returned by library value doesn't
> >>> exceed its internal array size.
> >>
> >> +1
> >>
> >> Application should ignore values >= MAX.
> >
> > Of course, blaming the API user is a lot easier than looking at the API.
> > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > as the max value for the application.
> > Value ranges are part of the ABI compatibility contract.
> > It seems you expect the application developer to be aware that
> > DPDK could return a higher value, so the application should
> > check every enum values after calling an API. CRAZY.
> >
> > When we decide to announce an ABI compatibility and do some marketing,
> > everyone is OK. But when we need to really make our ABI compatible,
> > I see little or no effort. DISAPPOINTING.
> 
> This is not to blame the user or to do less work, this is more sane approach
> that library provides the _END/_MAX value and application uses it as valid range
> check.
> 
> >
> >> Do you suggest we don't extend any enum or define between ABI breakage
> releases
> >> to be sure bad written applications not affected?
> >
> > I suggest we must consider not breaking any assumption made on the API.
> > Here we are breaking the enum range because nothing mentions _LIST_END
> > is not really the absolute end of the enum.
> > The solution is to make the change below in 20.02 + backport in 19.11.1:
> >
> > - _LIST_END
> > + _LIST_END, /* an ABI-compatible version may increase this value */
> > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > };
> >
> 
> What is the point of "_LIST_MAX" here?
> 
> Application should know the "_LIST_END" of when it has been compiled for the
> valid range check. Next time it is compiled "_LIST_END" may be different value
> but same logic applies.
> 
> When "_LIST_END" is missing, application can't protect itself, in that case
> library should send only the values application knows when it is compiled, this
> means either we can't extend our enum/defines until next ABI breakage, or we
> need to do ABI versioning to the functions that returns an enum each time enum
> value extended.
> 
> I believe it is saner to provide _END/_MAX values to the application to use. And
> if required comment them to clarify the expected usage.
> 
> But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
> mean anything to application.
> 

Can we have something like 
enum rte_crypto_aead_algorithm {
        RTE_CRYPTO_AEAD_AES_CCM = 1,
        /**< AES algorithm in CCM mode. */
        RTE_CRYPTO_AEAD_AES_GCM,
        /**< AES algorithm in GCM mode. */
        RTE_CRYPTO_AEAD_LIST_END,
        /**< List end for 19.11 ABI compatibility */
        RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
        /**< Chacha20 cipher with poly1305 authenticator */
        RTE_CRYPTO_AEAD_LIST_END_2011
        /**< List end for 20.11 ABI compatibility */
};

And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the end and remove RTE_CRYPTO_AEAD_LIST_END_2011

I believe it will be ok for any application which need to use the chacha poly assume that this algo is
Experimental and will move to formal list in 20.11. This can be documented in the documentation.
I believe there is no way to add a new enum as experimental so far. This way we can formalize this
requirement as well.

I believe this way effect of ABI breakage will be nullified.


-Akhil

> > Then *_LIST_END values could be ignored by libabigail with such a change.
> >
> > If such a patch is not done by tomorrow, I will have to revert
> > Chacha-Poly commits before 20.02-rc2, because
> >
> > 1/ LIST_END, without any comment, means "size of range"
> > 2/ we do not blame users for undocumented ABI changes
> > 3/ we respect the ABI compatibility contract
> >
> >
  
Kevin Traynor Feb. 4, 2020, 10:17 a.m. UTC | #38
On 04/02/2020 09:56, Ferruh Yigit wrote:
> On 2/4/2020 9:45 AM, Thomas Monjalon wrote:
>> 04/02/2020 10:19, Ferruh Yigit:
>>> On 2/3/2020 6:40 PM, Thomas Monjalon wrote:
>>>> 03/02/2020 18:40, Ferruh Yigit:
>>>>> On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
>>>>>> 03/02/2020 10:30, Ferruh Yigit:
>>>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>>>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>>>>>> 30/01/2020 17:09, Ferruh Yigit:
>>>>>>>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe these enums will be used only in case of ASYM case which is experimental.
>>>>>>>>>>>>
>>>>>>>>>>>> Independent from being experiment and not, this shouldn't be a problem, I think
>>>>>>>>>>>> this is a false positive.
>>>>>>>>>>>>
>>>>>>>>>>>> The ABI break can happen when a struct has been shared between the application
>>>>>>>>>>>> and the library (DPDK) and the layout of that memory know differently by
>>>>>>>>>>>> application and the library.
>>>>>>>>>>>>
>>>>>>>>>>>> Here in all cases, there is no layout/size change.
>>>>>>>>>>>>
>>>>>>>>>>>> As to the value changes of the enums, since application compiled with old DPDK,
>>>>>>>>>>>> it will know only up to '6', 7 and more means invalid to the application. So it
>>>>>>>>>>>> won't send these values also it should ignore these values from library. Only
>>>>>>>>>>>> consequence is old application won't able to use new features those new enums
>>>>>>>>>>>> provide but that is expected/normal.
>>>>>>>>>>>
>>>>>>>>>>> If library give higher value than expected by the application,
>>>>>>>>>>> if the application uses this value as array index,
>>>>>>>>>>> there can be an access out of bounds.
>>>>>>>>>>
>>>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>>>>>> sense and I don't see how there can be an API breakage.
>>>>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>>>>>
>>>>>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>>>>>> But this is not relevant for ABI compatibility.
>>>>>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>>>>>> the application and it must work.
>>>>>>>>> You think it is a false positive because you assume the application
>>>>>>>>> "picks" the new value. I think you miss the case where the new value
>>>>>>>>> is returned by a function in the upgraded library.
>>>>>>>>>
>>>>>>>>>> There are also no structs on the API which contain arrays using this
>>>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>>>>>> mismatch in memory addresses.
>>>>>>>>>
>>>>>>>>> Let me demonstrate where the API may "use" the new value
>>>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>>>>>
>>>>>>>>> Once upon a time a DPDK application counting the number of devices
>>>>>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>>>>>> It is done in an array indexed by algo id:
>>>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>>>>>> The application is compiled with DPDK 19.11,
>>>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>>>>>> So the size of the application array aead_dev_count is 3.
>>>>>>>>> This binary is run with DPDK 20.02,
>>>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>>>>>> The application uses this value:
>>>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>>>>>> The application is crashing because of out of bound access.
>>>>>>>>
>>>>>>>> I'd say this is an example of bad written app.
>>>>>>>> It probably should check that returned by library value doesn't
>>>>>>>> exceed its internal array size.
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Application should ignore values >= MAX.
>>>>>>
>>>>>> Of course, blaming the API user is a lot easier than looking at the API.
>>>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>>>>>> as the max value for the application.
>>>>>> Value ranges are part of the ABI compatibility contract.
>>>>>> It seems you expect the application developer to be aware that
>>>>>> DPDK could return a higher value, so the application should
>>>>>> check every enum values after calling an API. CRAZY.
>>>>>>
>>>>>> When we decide to announce an ABI compatibility and do some marketing,
>>>>>> everyone is OK. But when we need to really make our ABI compatible,
>>>>>> I see little or no effort. DISAPPOINTING.
>>>>>
>>>>> This is not to blame the user or to do less work, this is more sane approach
>>>>> that library provides the _END/_MAX value and application uses it as valid range
>>>>> check.
>>>>>
>>>>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>>>>>> to be sure bad written applications not affected?
>>>>>>
>>>>>> I suggest we must consider not breaking any assumption made on the API.
>>>>>> Here we are breaking the enum range because nothing mentions _LIST_END
>>>>>> is not really the absolute end of the enum.
>>>>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
>>>>>>
>>>>>> - _LIST_END
>>>>>> + _LIST_END, /* an ABI-compatible version may increase this value */
>>>>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>>>>>> };
>>>>>>
>>>>>
>>>>> What is the point of "_LIST_MAX" here?
>>>>
>>>> _LIST_MAX is range of value that DPDK can return in the ABI contract.
>>>> So the appplication can rely on the range 0.._LIST_MAX.
>>>>
>>>>> Application should know the "_LIST_END" of when it has been compiled for the
>>>>> valid range check. Next time it is compiled "_LIST_END" may be different value
>>>>> but same logic applies.
>>>>
>>>> No, ABI compatibility contract means you can compile your application
>>>> with DPDK 19.11.0 and run it with DPDK 20.02.
>>>> So _LIST_END comes from 19.11 and does not include ChachaPoly.
>>>
>>> That is what I mean, let me try to give a sample.
>>>
>>> DPDK19.11 returns, A=1, B=2, END=3
>>>
>>> Application compiled with DPDK19.11, it will process A, B and ignore anything ">= 3"
>>
>> No, the application will not ignore anything ">=3" as I explained above,
>> and you blamed the application for it.
>> Nothing in the API says the application must filter value higher than 3,
>> because as of now, values higher than 3 are PMD bug.
> 
> When application compiled, that is the END value, anything bigger than this
> value is not valid, if any application use the return value directly I think it
> is doing something wrong.

I don't think we can make an assumption that the application will/should
range check *and* silently ignore what it considers out of bounds values.

An application may not range check, but if it does, it may ignore these
new values (we got lucky), or it could print errors or even decide to
abort because it considers that DPDK is now returning values higher than
the (compiled) max so something must be corrupt.

Versioning sounds the best solution to me too, but I'm not sure how
difficult the mechanics are in this case.

> But yes there may be applications relying on library will always send in the
> range. We never communicated this. But we can add comments to clarify this.
> 
>>
>>
>>> DPDK20.02 returns A=1, B=2, C=3, D=4, END=5
>>>
>>> Old application will still only will know/use A, B and can ignore when library
>>> sends C=3, D=4 etc...
>>>
>>>
>>> In above, if you add another limit as you suggested, like MAX=10 and ask
>>> application to use it,
>>>
>>> Application compiled with DPDK19.11 will be OK since library only sends A,B and
>>> application uses them.
>>>
>>> But with DPDK20.02 application may have problem, since library will be sending
>>> C=3, which is valid according to the check " <= MAX (10)", how application will
>>> know to ignore it.
>>
>> Why application should ignore value C=3 with DPDK 20.02?
> 
> This is the application compiled with DPDK19.11, and running with DPDK20.02.
> 
> So for the application this is the value >= MAX and something it doesn't know
> what to do.
> 
>>
>>
>>> So application should use _END to know the valid ones according it, if so what
>>> is the point of having _MAX.
>>>
>>>
>>>>> When "_LIST_END" is missing, application can't protect itself, in that case
>>>>> library should send only the values application knows when it is compiled, this
>>>>> means either we can't extend our enum/defines until next ABI breakage, or we
>>>>> need to do ABI versioning to the functions that returns an enum each time enum
>>>>> value extended.
>>>>
>>>> If we define _LIST_MAX as a bigger value than current _LIST_END,
>>>> we have some room to add values in between.
>>>>
>>>> If (as of now) we don't have _LIST_MAX room, then yes we must version
>>>> the functions returning the enum.
>>>> In this case, the proper solution is to implement
>>>> rte_cryptodev_info_get_v1911() so it filters out
>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
>>>> With this solution, an application compiled with DPDK 19.11 will keep
>>>> seeing the same range as before, while a 20.02 application could
>>>> see and use ChachaPoly.
>>>> This is another proposal that I was expecting from the crypto team,
>>>> instead of claiming there is no issue (and wasting precious time).
>>>>
>>>>
>>>>> I believe it is saner to provide _END/_MAX values to the application to use. And
>>>>> if required comment them to clarify the expected usage.
>>>>>
>>>>> But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
>>>>> mean anything to application.
>>>>
>>>> I don't understand what you mean. I think you misunderstood what is ABI compat.
>>>>
>>>>
>>>>>> Then *_LIST_END values could be ignored by libabigail with such a change.
>>>>>>
>>>>>> If such a patch is not done by tomorrow, I will have to revert
>>>>>> Chacha-Poly commits before 20.02-rc2, because
>>>>>>
>>>>>> 1/ LIST_END, without any comment, means "size of range"
>>>>>> 2/ we do not blame users for undocumented ABI changes
>>>>>> 3/ we respect the ABI compatibility contract
>>
>>
>>
>
  
Thomas Monjalon Feb. 4, 2020, 10:24 a.m. UTC | #39
RED FLAG

I don't see a lot of reactions, so I summarize the issue.
We need action TODAY!

API makes think that rte_cryptodev_info_get() cannot return
a value >= 3 (RTE_CRYPTO_AEAD_LIST_END in 19.11).
Current 20.02 returns 3 (RTE_CRYPTO_AEAD_CHACHA20_POLY1305).
The ABI compatibility contract is broken currently.

There are 3 possible outcomes:

a) Change the API comments and backport to 19.11.1
The details are discussed between Ferruh and me.
Either put responsibility on API user (with explicit comment),
or expose ABI extension allowance with a new API max value.
In both cases, this is breaking the implicit contract of 19.11.0.
This option can be chosen only if release and ABI maintainers
vote for it.

b) Revert Chacha-Poly from 20.02-rc2.

c) Add versioned function rte_cryptodev_info_get_v1911()
which calls rte_cryptodev_info_get() and filters out
RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
So Chacha-Poly capability would be seen and usable only
if compiling with DPDK 20.02.

I hope it is clear what are the actions for everybody:
- ABI and release maintainers must say yes or no to the proposal (a)
- In the meantime, crypto team must send a patch for the proposal (c)
- If (a) and (c) are not possible at the end of today, I will take (b)

Note: do not say it is too short for (c), as it was possible to work
on such solution since the issue was exposed on last Wednesday.


03/02/2020 22:07, Thomas Monjalon:
> 03/02/2020 19:55, Ray Kinsella:
> > On 03/02/2020 17:34, Thomas Monjalon wrote:
> > > 03/02/2020 18:09, Thomas Monjalon:
> > >> 03/02/2020 10:30, Ferruh Yigit:
> > >>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > >>>> 02/02/2020 14:05, Thomas Monjalon:
> > >>>>> 31/01/2020 15:16, Trahe, Fiona:
> > >>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > >>>>>>> If library give higher value than expected by the application,
> > >>>>>>> if the application uses this value as array index,
> > >>>>>>> there can be an access out of bounds.
> > >>>>>>
> > >>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> > >>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> > >>>>>> sense and I don't see how there can be an API breakage.
> > >>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
> > >>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
> > >>>>>> have been compiled against the lib so shouldn't have problems.
> > >>>>>
> > >>>>> You say there is no ABI issue because the application will be re-compiled
> > >>>>> for the updated library. Indeed, compilation fixes compatibility issues.
> > >>>>> But this is not relevant for ABI compatibility.
> > >>>>> ABI compatibility means we can upgrade the library without recompiling
> > >>>>> the application and it must work.
> > >>>>> You think it is a false positive because you assume the application
> > >>>>> "picks" the new value. I think you miss the case where the new value
> > >>>>> is returned by a function in the upgraded library.
> > >>>>>
> > >>>>>> There are also no structs on the API which contain arrays using this
> > >>>>>> for sizing, so I don't see an opportunity for an appl to have a
> > >>>>>> mismatch in memory addresses.
> > >>>>>
> > >>>>> Let me demonstrate where the API may "use" the new value
> > >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> > >>>>>
> > >>>>> Once upon a time a DPDK application counting the number of devices
> > >>>>> supporting each AEAD algo (in order to find the best supported algo).
> > >>>>> It is done in an array indexed by algo id:
> > >>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > >>>>> The application is compiled with DPDK 19.11,
> > >>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > >>>>> So the size of the application array aead_dev_count is 3.
> > >>>>> This binary is run with DPDK 20.02,
> > >>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > >>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > >>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > >>>>> The application uses this value:
> > >>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > >>>>> The application is crashing because of out of bound access.
> > >>>>
> > >>>> I'd say this is an example of bad written app.
> > >>>> It probably should check that returned by library value doesn't
> > >>>> exceed its internal array size.
> > >>>
> > >>> +1
> > >>>
> > >>> Application should ignore values >= MAX.
> > >>
> > >> Of course, blaming the API user is a lot easier than looking at the API.
> > >> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > >> as the max value for the application.
> > >> Value ranges are part of the ABI compatibility contract.
> > >> It seems you expect the application developer to be aware that
> > >> DPDK could return a higher value, so the application should
> > >> check every enum values after calling an API. CRAZY.
> > >>
> > >> When we decide to announce an ABI compatibility and do some marketing,
> > >> everyone is OK. But when we need to really make our ABI compatible,
> > >> I see little or no effort. DISAPPOINTING.
> > >>
> > >>> Do you suggest we don't extend any enum or define between ABI breakage releases
> > >>> to be sure bad written applications not affected?
> > >>
> > >> I suggest we must consider not breaking any assumption made on the API.
> > >> Here we are breaking the enum range because nothing mentions _LIST_END
> > >> is not really the absolute end of the enum.
> > >> The solution is to make the change below in 20.02 + backport in 19.11.1:
> > > 
> > > Thinking twice, merging such change before 20.11 is breaking the
> > > ABI assumption based on the API 19.11.0.
> > > I ask the release maintainers (Luca, Kevin, David and me) and
> > > the ABI maintainers (Neil and Ray) to vote for a or b solution:
> > > 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
> > 
> > That would still be an ABI breakage though right.
> > 
> > > 	b) wait 20.11 and revert Chacha-Poly from 20.02
> > 
> > Thanks for analysis above Fiona, Ferruh and all. 
> > 
> > That is a nasty one alright - there is no "good" answer here.
> > I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
> > Could do without an enumeration?
> > 
> > There a c) though right.
> > We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
> > So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.
> 
> I agree there is a c) as I proposed in another email:
> http://mails.dpdk.org/archives/dev/2020-February/156919.html
> "
> In this case, the proper solution is to implement
> rte_cryptodev_info_get_v1911() so it filters out
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> With this solution, an application compiled with DPDK 19.11 will keep
> seeing the same range as before, while a 20.02 application could
> see and use ChachaPoly.
> "
> 
> > It would be painful I know.
> 
> Not so painful in my opinion.
> Just need to call rte_cryptodev_info_get() from
> rte_cryptodev_info_get_v1911() and filter the value
> in the 19.11 range: [0..AES_GCM].
> 
> > It would also mean that Chacha-Poly would only be available to
> > those building against >= 20.02.
> 
> Yes exactly.
> 
> The addition of comments and LIST_MAX like below are still valid
> to avoid versioning after 20.11.
> 
> > >> - _LIST_END
> > >> + _LIST_END, /* an ABI-compatible version may increase this value */
> > >> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > >> };
> > >>
> > >> Then *_LIST_END values could be ignored by libabigail with such a change.
> 
> In order to avoid ABI check complaining, the best is to completely
> remove LIST_END in DPDK 20.11.
> 
> 
> > >> If such a patch is not done by tomorrow, I will have to revert
> > >> Chacha-Poly commits before 20.02-rc2, because
> > >>
> > >> 1/ LIST_END, without any comment, means "size of range"
> > >> 2/ we do not blame users for undocumented ABI changes
> > >> 3/ we respect the ABI compatibility contract
  
Thomas Monjalon Feb. 4, 2020, 10:28 a.m. UTC | #40
04/02/2020 11:16, Akhil Goyal:
> Hi,
> > On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > > 03/02/2020 10:30, Ferruh Yigit:
> > >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > >>> 02/02/2020 14:05, Thomas Monjalon:
> > >>>> 31/01/2020 15:16, Trahe, Fiona:
> > >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> > >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > >>>>>>>>
> > >>>>>>>> I believe these enums will be used only in case of ASYM case which is
> > experimental.
> > >>>>>>>
> > >>>>>>> Independent from being experiment and not, this shouldn't be a
> > problem, I think
> > >>>>>>> this is a false positive.
> > >>>>>>>
> > >>>>>>> The ABI break can happen when a struct has been shared between the
> > application
> > >>>>>>> and the library (DPDK) and the layout of that memory know differently
> > by
> > >>>>>>> application and the library.
> > >>>>>>>
> > >>>>>>> Here in all cases, there is no layout/size change.
> > >>>>>>>
> > >>>>>>> As to the value changes of the enums, since application compiled with
> > old DPDK,
> > >>>>>>> it will know only up to '6', 7 and more means invalid to the application.
> > So it
> > >>>>>>> won't send these values also it should ignore these values from library.
> > Only
> > >>>>>>> consequence is old application won't able to use new features those
> > new enums
> > >>>>>>> provide but that is expected/normal.
> > >>>>>>
> > >>>>>> If library give higher value than expected by the application,
> > >>>>>> if the application uses this value as array index,
> > >>>>>> there can be an access out of bounds.
> > >>>>>
> > >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a
> > problem.
> > >>>>> But for the same issue with sym crypto below, I believe Ferruh's
> > explanation makes
> > >>>>> sense and I don't see how there can be an API breakage.
> > >>>>> So if an application hasn't compiled against the new lib it will be still using
> > the old value
> > >>>>> which will be within bounds. If it's picking up the higher new value from
> > the lib it must
> > >>>>> have been compiled against the lib so shouldn't have problems.
> > >>>>
> > >>>> You say there is no ABI issue because the application will be re-compiled
> > >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> > >>>> But this is not relevant for ABI compatibility.
> > >>>> ABI compatibility means we can upgrade the library without recompiling
> > >>>> the application and it must work.
> > >>>> You think it is a false positive because you assume the application
> > >>>> "picks" the new value. I think you miss the case where the new value
> > >>>> is returned by a function in the upgraded library.
> > >>>>
> > >>>>> There are also no structs on the API which contain arrays using this
> > >>>>> for sizing, so I don't see an opportunity for an appl to have a
> > >>>>> mismatch in memory addresses.
> > >>>>
> > >>>> Let me demonstrate where the API may "use" the new value
> > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> > application.
> > >>>>
> > >>>> Once upon a time a DPDK application counting the number of devices
> > >>>> supporting each AEAD algo (in order to find the best supported algo).
> > >>>> It is done in an array indexed by algo id:
> > >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > >>>> The application is compiled with DPDK 19.11,
> > >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > >>>> So the size of the application array aead_dev_count is 3.
> > >>>> This binary is run with DPDK 20.02,
> > >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > >>>> The application uses this value:
> > >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > >>>> The application is crashing because of out of bound access.
> > >>>
> > >>> I'd say this is an example of bad written app.
> > >>> It probably should check that returned by library value doesn't
> > >>> exceed its internal array size.
> > >>
> > >> +1
> > >>
> > >> Application should ignore values >= MAX.
> > >
> > > Of course, blaming the API user is a lot easier than looking at the API.
> > > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > > as the max value for the application.
> > > Value ranges are part of the ABI compatibility contract.
> > > It seems you expect the application developer to be aware that
> > > DPDK could return a higher value, so the application should
> > > check every enum values after calling an API. CRAZY.
> > >
> > > When we decide to announce an ABI compatibility and do some marketing,
> > > everyone is OK. But when we need to really make our ABI compatible,
> > > I see little or no effort. DISAPPOINTING.
> > 
> > This is not to blame the user or to do less work, this is more sane approach
> > that library provides the _END/_MAX value and application uses it as valid range
> > check.
> > 
> > >
> > >> Do you suggest we don't extend any enum or define between ABI breakage
> > releases
> > >> to be sure bad written applications not affected?
> > >
> > > I suggest we must consider not breaking any assumption made on the API.
> > > Here we are breaking the enum range because nothing mentions _LIST_END
> > > is not really the absolute end of the enum.
> > > The solution is to make the change below in 20.02 + backport in 19.11.1:
> > >
> > > - _LIST_END
> > > + _LIST_END, /* an ABI-compatible version may increase this value */
> > > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > > };
> > >
> > 
> > What is the point of "_LIST_MAX" here?
> > 
> > Application should know the "_LIST_END" of when it has been compiled for the
> > valid range check. Next time it is compiled "_LIST_END" may be different value
> > but same logic applies.
> > 
> > When "_LIST_END" is missing, application can't protect itself, in that case
> > library should send only the values application knows when it is compiled, this
> > means either we can't extend our enum/defines until next ABI breakage, or we
> > need to do ABI versioning to the functions that returns an enum each time enum
> > value extended.
> > 
> > I believe it is saner to provide _END/_MAX values to the application to use. And
> > if required comment them to clarify the expected usage.
> > 
> > But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
> > mean anything to application.
> > 
> 
> Can we have something like 
> enum rte_crypto_aead_algorithm {
>         RTE_CRYPTO_AEAD_AES_CCM = 1,
>         /**< AES algorithm in CCM mode. */
>         RTE_CRYPTO_AEAD_AES_GCM,
>         /**< AES algorithm in GCM mode. */
>         RTE_CRYPTO_AEAD_LIST_END,
>         /**< List end for 19.11 ABI compatibility */
>         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
>         /**< Chacha20 cipher with poly1305 authenticator */
>         RTE_CRYPTO_AEAD_LIST_END_2011
>         /**< List end for 20.11 ABI compatibility */
> };
> 
> And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the end and remove RTE_CRYPTO_AEAD_LIST_END_2011
> 
> I believe it will be ok for any application which need to use the chacha poly assume that this algo is
> Experimental and will move to formal list in 20.11. This can be documented in the documentation.
> I believe there is no way to add a new enum as experimental so far. This way we can formalize this
> requirement as well.
> 
> I believe this way effect of ABI breakage will be nullified.

This is a possibility in the (a) proposal.
But it breaks API (and ABI) because a high value is returned
while not expected by the application.

I guess ABI and release maintainers will vote no to such breakage.
Note: I vote no.


> > > Then *_LIST_END values could be ignored by libabigail with such a change.
> > >
> > > If such a patch is not done by tomorrow, I will have to revert
> > > Chacha-Poly commits before 20.02-rc2, because
> > >
> > > 1/ LIST_END, without any comment, means "size of range"
> > > 2/ we do not blame users for undocumented ABI changes
> > > 3/ we respect the ABI compatibility contract
  
Akhil Goyal Feb. 4, 2020, 10:32 a.m. UTC | #41
> 
> 04/02/2020 11:16, Akhil Goyal:
> > Hi,
> > > On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > > > 03/02/2020 10:30, Ferruh Yigit:
> > > >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > > >>> 02/02/2020 14:05, Thomas Monjalon:
> > > >>>> 31/01/2020 15:16, Trahe, Fiona:
> > > >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > > >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> > > >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > > >>>>>>>>
> > > >>>>>>>> I believe these enums will be used only in case of ASYM case which
> is
> > > experimental.
> > > >>>>>>>
> > > >>>>>>> Independent from being experiment and not, this shouldn't be a
> > > problem, I think
> > > >>>>>>> this is a false positive.
> > > >>>>>>>
> > > >>>>>>> The ABI break can happen when a struct has been shared between
> the
> > > application
> > > >>>>>>> and the library (DPDK) and the layout of that memory know
> differently
> > > by
> > > >>>>>>> application and the library.
> > > >>>>>>>
> > > >>>>>>> Here in all cases, there is no layout/size change.
> > > >>>>>>>
> > > >>>>>>> As to the value changes of the enums, since application compiled
> with
> > > old DPDK,
> > > >>>>>>> it will know only up to '6', 7 and more means invalid to the
> application.
> > > So it
> > > >>>>>>> won't send these values also it should ignore these values from
> library.
> > > Only
> > > >>>>>>> consequence is old application won't able to use new features
> those
> > > new enums
> > > >>>>>>> provide but that is expected/normal.
> > > >>>>>>
> > > >>>>>> If library give higher value than expected by the application,
> > > >>>>>> if the application uses this value as array index,
> > > >>>>>> there can be an access out of bounds.
> > > >>>>>
> > > >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a
> > > problem.
> > > >>>>> But for the same issue with sym crypto below, I believe Ferruh's
> > > explanation makes
> > > >>>>> sense and I don't see how there can be an API breakage.
> > > >>>>> So if an application hasn't compiled against the new lib it will be still
> using
> > > the old value
> > > >>>>> which will be within bounds. If it's picking up the higher new value
> from
> > > the lib it must
> > > >>>>> have been compiled against the lib so shouldn't have problems.
> > > >>>>
> > > >>>> You say there is no ABI issue because the application will be re-
> compiled
> > > >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> > > >>>> But this is not relevant for ABI compatibility.
> > > >>>> ABI compatibility means we can upgrade the library without
> recompiling
> > > >>>> the application and it must work.
> > > >>>> You think it is a false positive because you assume the application
> > > >>>> "picks" the new value. I think you miss the case where the new value
> > > >>>> is returned by a function in the upgraded library.
> > > >>>>
> > > >>>>> There are also no structs on the API which contain arrays using this
> > > >>>>> for sizing, so I don't see an opportunity for an appl to have a
> > > >>>>> mismatch in memory addresses.
> > > >>>>
> > > >>>> Let me demonstrate where the API may "use" the new value
> > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> > > application.
> > > >>>>
> > > >>>> Once upon a time a DPDK application counting the number of devices
> > > >>>> supporting each AEAD algo (in order to find the best supported algo).
> > > >>>> It is done in an array indexed by algo id:
> > > >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > > >>>> The application is compiled with DPDK 19.11,
> > > >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > > >>>> So the size of the application array aead_dev_count is 3.
> > > >>>> This binary is run with DPDK 20.02,
> > > >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > > >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > > >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > > >>>> The application uses this value:
> > > >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > > >>>> The application is crashing because of out of bound access.
> > > >>>
> > > >>> I'd say this is an example of bad written app.
> > > >>> It probably should check that returned by library value doesn't
> > > >>> exceed its internal array size.
> > > >>
> > > >> +1
> > > >>
> > > >> Application should ignore values >= MAX.
> > > >
> > > > Of course, blaming the API user is a lot easier than looking at the API.
> > > > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > > > as the max value for the application.
> > > > Value ranges are part of the ABI compatibility contract.
> > > > It seems you expect the application developer to be aware that
> > > > DPDK could return a higher value, so the application should
> > > > check every enum values after calling an API. CRAZY.
> > > >
> > > > When we decide to announce an ABI compatibility and do some marketing,
> > > > everyone is OK. But when we need to really make our ABI compatible,
> > > > I see little or no effort. DISAPPOINTING.
> > >
> > > This is not to blame the user or to do less work, this is more sane approach
> > > that library provides the _END/_MAX value and application uses it as valid
> range
> > > check.
> > >
> > > >
> > > >> Do you suggest we don't extend any enum or define between ABI
> breakage
> > > releases
> > > >> to be sure bad written applications not affected?
> > > >
> > > > I suggest we must consider not breaking any assumption made on the API.
> > > > Here we are breaking the enum range because nothing mentions
> _LIST_END
> > > > is not really the absolute end of the enum.
> > > > The solution is to make the change below in 20.02 + backport in 19.11.1:
> > > >
> > > > - _LIST_END
> > > > + _LIST_END, /* an ABI-compatible version may increase this value */
> > > > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > > > };
> > > >
> > >
> > > What is the point of "_LIST_MAX" here?
> > >
> > > Application should know the "_LIST_END" of when it has been compiled for
> the
> > > valid range check. Next time it is compiled "_LIST_END" may be different
> value
> > > but same logic applies.
> > >
> > > When "_LIST_END" is missing, application can't protect itself, in that case
> > > library should send only the values application knows when it is compiled,
> this
> > > means either we can't extend our enum/defines until next ABI breakage, or
> we
> > > need to do ABI versioning to the functions that returns an enum each time
> enum
> > > value extended.
> > >
> > > I believe it is saner to provide _END/_MAX values to the application to use.
> And
> > > if required comment them to clarify the expected usage.
> > >
> > > But in above suggestion application can't use or rely on "_LIST_MAX", it
> doesn't
> > > mean anything to application.
> > >
> >
> > Can we have something like
> > enum rte_crypto_aead_algorithm {
> >         RTE_CRYPTO_AEAD_AES_CCM = 1,
> >         /**< AES algorithm in CCM mode. */
> >         RTE_CRYPTO_AEAD_AES_GCM,
> >         /**< AES algorithm in GCM mode. */
> >         RTE_CRYPTO_AEAD_LIST_END,
> >         /**< List end for 19.11 ABI compatibility */
> >         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
> >         /**< Chacha20 cipher with poly1305 authenticator */
> >         RTE_CRYPTO_AEAD_LIST_END_2011
> >         /**< List end for 20.11 ABI compatibility */
> > };
> >
> > And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the end
> and remove RTE_CRYPTO_AEAD_LIST_END_2011
> >
> > I believe it will be ok for any application which need to use the chacha poly
> assume that this algo is
> > Experimental and will move to formal list in 20.11. This can be documented in
> the documentation.
> > I believe there is no way to add a new enum as experimental so far. This way
> we can formalize this
> > requirement as well.
> >
> > I believe this way effect of ABI breakage will be nullified.
> 
> This is a possibility in the (a) proposal.
> But it breaks API (and ABI) because a high value is returned
> while not expected by the application.
> 
> I guess ABI and release maintainers will vote no to such breakage.
> Note: I vote no.
> 

If that is the case, I would say we should go with b).

Versioned APIs does not look good and adds more confusion.

> 
> > > > Then *_LIST_END values could be ignored by libabigail with such a change.
> > > >
> > > > If such a patch is not done by tomorrow, I will have to revert
> > > > Chacha-Poly commits before 20.02-rc2, because
> > > >
> > > > 1/ LIST_END, without any comment, means "size of range"
> > > > 2/ we do not blame users for undocumented ABI changes
> > > > 3/ we respect the ABI compatibility contract
> 
> 
> 
>
  
Thomas Monjalon Feb. 4, 2020, 10:38 a.m. UTC | #42
04/02/2020 11:10, Trahe, Fiona:
> And not used for sizing  > >
> > > There a c) though right.
> > > We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
> > 
> > It has a lot of friends, but it sounds like the right approach.
> > Is someone looking into this?
> [Fiona] Yes. But not clear yet if can be done by tomorrow.

Should be done by today now.

> But even if feasible, that only works around the current issue.
> There is a bigger issue to be decided here - 
> Should we be removing LIST_END/MAX values from all enums in 20.11?
> Or defining through API comment that they should only be used as a range boundary and
> NOT to size an array. And so having a fixed value is not part of the API contract.

Please let's discuss 20.11 API later. It is not so urgent.
  
Bruce Richardson Feb. 4, 2020, 11:35 a.m. UTC | #43
On Tue, Feb 04, 2020 at 10:32:01AM +0000, Akhil Goyal wrote:
> 
> > 
> > 04/02/2020 11:16, Akhil Goyal:
> > > Hi,
> > > > On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > > > > 03/02/2020 10:30, Ferruh Yigit:
> > > > >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > > > >>> 02/02/2020 14:05, Thomas Monjalon:
> > > > >>>> 31/01/2020 15:16, Trahe, Fiona:
> > > > >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > > > >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> > > > >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> I believe these enums will be used only in case of ASYM case which
> > is
> > > > experimental.
> > > > >>>>>>>
> > > > >>>>>>> Independent from being experiment and not, this shouldn't be a
> > > > problem, I think
> > > > >>>>>>> this is a false positive.
> > > > >>>>>>>
> > > > >>>>>>> The ABI break can happen when a struct has been shared between
> > the
> > > > application
> > > > >>>>>>> and the library (DPDK) and the layout of that memory know
> > differently
> > > > by
> > > > >>>>>>> application and the library.
> > > > >>>>>>>
> > > > >>>>>>> Here in all cases, there is no layout/size change.
> > > > >>>>>>>
> > > > >>>>>>> As to the value changes of the enums, since application compiled
> > with
> > > > old DPDK,
> > > > >>>>>>> it will know only up to '6', 7 and more means invalid to the
> > application.
> > > > So it
> > > > >>>>>>> won't send these values also it should ignore these values from
> > library.
> > > > Only
> > > > >>>>>>> consequence is old application won't able to use new features
> > those
> > > > new enums
> > > > >>>>>>> provide but that is expected/normal.
> > > > >>>>>>
> > > > >>>>>> If library give higher value than expected by the application,
> > > > >>>>>> if the application uses this value as array index,
> > > > >>>>>> there can be an access out of bounds.
> > > > >>>>>
> > > > >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a
> > > > problem.
> > > > >>>>> But for the same issue with sym crypto below, I believe Ferruh's
> > > > explanation makes
> > > > >>>>> sense and I don't see how there can be an API breakage.
> > > > >>>>> So if an application hasn't compiled against the new lib it will be still
> > using
> > > > the old value
> > > > >>>>> which will be within bounds. If it's picking up the higher new value
> > from
> > > > the lib it must
> > > > >>>>> have been compiled against the lib so shouldn't have problems.
> > > > >>>>
> > > > >>>> You say there is no ABI issue because the application will be re-
> > compiled
> > > > >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> > > > >>>> But this is not relevant for ABI compatibility.
> > > > >>>> ABI compatibility means we can upgrade the library without
> > recompiling
> > > > >>>> the application and it must work.
> > > > >>>> You think it is a false positive because you assume the application
> > > > >>>> "picks" the new value. I think you miss the case where the new value
> > > > >>>> is returned by a function in the upgraded library.
> > > > >>>>
> > > > >>>>> There are also no structs on the API which contain arrays using this
> > > > >>>>> for sizing, so I don't see an opportunity for an appl to have a
> > > > >>>>> mismatch in memory addresses.
> > > > >>>>
> > > > >>>> Let me demonstrate where the API may "use" the new value
> > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> > > > application.
> > > > >>>>
> > > > >>>> Once upon a time a DPDK application counting the number of devices
> > > > >>>> supporting each AEAD algo (in order to find the best supported algo).
> > > > >>>> It is done in an array indexed by algo id:
> > > > >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > > > >>>> The application is compiled with DPDK 19.11,
> > > > >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > > > >>>> So the size of the application array aead_dev_count is 3.
> > > > >>>> This binary is run with DPDK 20.02,
> > > > >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > > > >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > > > >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > > > >>>> The application uses this value:
> > > > >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > > > >>>> The application is crashing because of out of bound access.
> > > > >>>
> > > > >>> I'd say this is an example of bad written app.
> > > > >>> It probably should check that returned by library value doesn't
> > > > >>> exceed its internal array size.
> > > > >>
> > > > >> +1
> > > > >>
> > > > >> Application should ignore values >= MAX.
> > > > >
> > > > > Of course, blaming the API user is a lot easier than looking at the API.
> > > > > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > > > > as the max value for the application.
> > > > > Value ranges are part of the ABI compatibility contract.
> > > > > It seems you expect the application developer to be aware that
> > > > > DPDK could return a higher value, so the application should
> > > > > check every enum values after calling an API. CRAZY.
> > > > >
> > > > > When we decide to announce an ABI compatibility and do some marketing,
> > > > > everyone is OK. But when we need to really make our ABI compatible,
> > > > > I see little or no effort. DISAPPOINTING.
> > > >
> > > > This is not to blame the user or to do less work, this is more sane approach
> > > > that library provides the _END/_MAX value and application uses it as valid
> > range
> > > > check.
> > > >
> > > > >
> > > > >> Do you suggest we don't extend any enum or define between ABI
> > breakage
> > > > releases
> > > > >> to be sure bad written applications not affected?
> > > > >
> > > > > I suggest we must consider not breaking any assumption made on the API.
> > > > > Here we are breaking the enum range because nothing mentions
> > _LIST_END
> > > > > is not really the absolute end of the enum.
> > > > > The solution is to make the change below in 20.02 + backport in 19.11.1:
> > > > >
> > > > > - _LIST_END
> > > > > + _LIST_END, /* an ABI-compatible version may increase this value */
> > > > > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > > > > };
> > > > >
> > > >
> > > > What is the point of "_LIST_MAX" here?
> > > >
> > > > Application should know the "_LIST_END" of when it has been compiled for
> > the
> > > > valid range check. Next time it is compiled "_LIST_END" may be different
> > value
> > > > but same logic applies.
> > > >
> > > > When "_LIST_END" is missing, application can't protect itself, in that case
> > > > library should send only the values application knows when it is compiled,
> > this
> > > > means either we can't extend our enum/defines until next ABI breakage, or
> > we
> > > > need to do ABI versioning to the functions that returns an enum each time
> > enum
> > > > value extended.
> > > >
> > > > I believe it is saner to provide _END/_MAX values to the application to use.
> > And
> > > > if required comment them to clarify the expected usage.
> > > >
> > > > But in above suggestion application can't use or rely on "_LIST_MAX", it
> > doesn't
> > > > mean anything to application.
> > > >
> > >
> > > Can we have something like
> > > enum rte_crypto_aead_algorithm {
> > >         RTE_CRYPTO_AEAD_AES_CCM = 1,
> > >         /**< AES algorithm in CCM mode. */
> > >         RTE_CRYPTO_AEAD_AES_GCM,
> > >         /**< AES algorithm in GCM mode. */
> > >         RTE_CRYPTO_AEAD_LIST_END,
> > >         /**< List end for 19.11 ABI compatibility */
> > >         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
> > >         /**< Chacha20 cipher with poly1305 authenticator */
> > >         RTE_CRYPTO_AEAD_LIST_END_2011
> > >         /**< List end for 20.11 ABI compatibility */
> > > };
> > >
> > > And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the end
> > and remove RTE_CRYPTO_AEAD_LIST_END_2011
> > >
> > > I believe it will be ok for any application which need to use the chacha poly
> > assume that this algo is
> > > Experimental and will move to formal list in 20.11. This can be documented in
> > the documentation.
> > > I believe there is no way to add a new enum as experimental so far. This way
> > we can formalize this
> > > requirement as well.
> > >
> > > I believe this way effect of ABI breakage will be nullified.
> > 
> > This is a possibility in the (a) proposal.
> > But it breaks API (and ABI) because a high value is returned
> > while not expected by the application.
> > 
> > I guess ABI and release maintainers will vote no to such breakage.
> > Note: I vote no.
> > 
> 
> If that is the case, I would say we should go with b).
> 
> Versioned APIs does not look good and adds more confusion.
> 
How does it add confusion, it's the standard and recommended way to fix
things like this? To maintain stable ABIs in the medium and long term we
need to get used to using versioning and not be afraid of it. Developers
will soon get used to the added bit of complexity it involves.

Regards,
/Bruce
  
Fiona Trahe Feb. 4, 2020, 12:44 p.m. UTC | #44
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, February 4, 2020 10:24 AM
> To: David Marchand <david.marchand@redhat.com>; nhorman@tuxdriver.com; bluca@debian.org;
> ktraynor@redhat.com; Ray Kinsella <mdr@ashroe.eu>; dev@dpdk.org; Akhil Goyal
> <akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph
> <anoobj@marvell.com>; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Mcnamara, John <john.mcnamara@intel.com>; dodji@seketeli.net;
> Andrew Rybchenko <arybchenko@solarflare.com>; aconole@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> RED FLAG
> 
> I don't see a lot of reactions, so I summarize the issue.
> We need action TODAY!
> 
> API makes think that rte_cryptodev_info_get() cannot return
> a value >= 3 (RTE_CRYPTO_AEAD_LIST_END in 19.11).
> Current 20.02 returns 3 (RTE_CRYPTO_AEAD_CHACHA20_POLY1305).
> The ABI compatibility contract is broken currently.
> 
> There are 3 possible outcomes:
> 
> a) Change the API comments and backport to 19.11.1
> The details are discussed between Ferruh and me.
> Either put responsibility on API user (with explicit comment),
> or expose ABI extension allowance with a new API max value.
> In both cases, this is breaking the implicit contract of 19.11.0.
> This option can be chosen only if release and ABI maintainers
> vote for it.
> 
> b) Revert Chacha-Poly from 20.02-rc2.
> 
> c) Add versioned function rte_cryptodev_info_get_v1911()
> which calls rte_cryptodev_info_get() and filters out
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> So Chacha-Poly capability would be seen and usable only
> if compiling with DPDK 20.02.
> 
> I hope it is clear what are the actions for everybody:
> - ABI and release maintainers must say yes or no to the proposal (a)
> - In the meantime, crypto team must send a patch for the proposal (c)
> - If (a) and (c) are not possible at the end of today, I will take (b)
> 
> Note: do not say it is too short for (c), as it was possible to work
> on such solution since the issue was exposed on last Wednesday.
> 
[Fiona] Thanks for your understanding and help with our scheduling Thomas.
We are working on a patch, when it is ready we will send it.
If it's not ready by end of your today, of course, go ahead with (b) and
we will work towards 20.05.
  
Kevin Traynor Feb. 4, 2020, 12:57 p.m. UTC | #45
On 04/02/2020 10:24, Thomas Monjalon wrote:
> RED FLAG
> 
> I don't see a lot of reactions, so I summarize the issue.
> We need action TODAY!
> 
> API makes think that rte_cryptodev_info_get() cannot return
> a value >= 3 (RTE_CRYPTO_AEAD_LIST_END in 19.11).
> Current 20.02 returns 3 (RTE_CRYPTO_AEAD_CHACHA20_POLY1305).
> The ABI compatibility contract is broken currently.
> 
> There are 3 possible outcomes:
> 
> a) Change the API comments and backport to 19.11.1
> The details are discussed between Ferruh and me.
> Either put responsibility on API user (with explicit comment),
> or expose ABI extension allowance with a new API max value.
> In both cases, this is breaking the implicit contract of 19.11.0.
> This option can be chosen only if release and ABI maintainers
> vote for it.
> 
> b) Revert Chacha-Poly from 20.02-rc2.
> 
> c) Add versioned function rte_cryptodev_info_get_v1911()
> which calls rte_cryptodev_info_get() and filters out
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> So Chacha-Poly capability would be seen and usable only
> if compiling with DPDK 20.02.
> 

Maybe a separate version of rte_cryptodev_get_aead_algo_enum() also
needed to handle chacha string differently.

> I hope it is clear what are the actions for everybody:
> - ABI and release maintainers must say yes or no to the proposal (a)

My 2c for a) is No.

> - In the meantime, crypto team must send a patch for the proposal (c)
> - If (a) and (c) are not possible at the end of today, I will take (b)
> 
> Note: do not say it is too short for (c), as it was possible to work
> on such solution since the issue was exposed on last Wednesday.
> 

Could it be reverted today if necessary and re-added later in the
release cycle? It seems like something modular that should not
invalidate earlier testing.

> 
> 03/02/2020 22:07, Thomas Monjalon:
>> 03/02/2020 19:55, Ray Kinsella:
>>> On 03/02/2020 17:34, Thomas Monjalon wrote:
>>>> 03/02/2020 18:09, Thomas Monjalon:
>>>>> 03/02/2020 10:30, Ferruh Yigit:
>>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>>>>> If library give higher value than expected by the application,
>>>>>>>>>> if the application uses this value as array index,
>>>>>>>>>> there can be an access out of bounds.
>>>>>>>>>
>>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>>>>> sense and I don't see how there can be an API breakage.
>>>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>>>>
>>>>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>>>>> But this is not relevant for ABI compatibility.
>>>>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>>>>> the application and it must work.
>>>>>>>> You think it is a false positive because you assume the application
>>>>>>>> "picks" the new value. I think you miss the case where the new value
>>>>>>>> is returned by a function in the upgraded library.
>>>>>>>>
>>>>>>>>> There are also no structs on the API which contain arrays using this
>>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>>>>> mismatch in memory addresses.
>>>>>>>>
>>>>>>>> Let me demonstrate where the API may "use" the new value
>>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>>>>
>>>>>>>> Once upon a time a DPDK application counting the number of devices
>>>>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>>>>> It is done in an array indexed by algo id:
>>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>>>>> The application is compiled with DPDK 19.11,
>>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>>>>> So the size of the application array aead_dev_count is 3.
>>>>>>>> This binary is run with DPDK 20.02,
>>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>>>>> The application uses this value:
>>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>>>>> The application is crashing because of out of bound access.
>>>>>>>
>>>>>>> I'd say this is an example of bad written app.
>>>>>>> It probably should check that returned by library value doesn't
>>>>>>> exceed its internal array size.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Application should ignore values >= MAX.
>>>>>
>>>>> Of course, blaming the API user is a lot easier than looking at the API.
>>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>>>>> as the max value for the application.
>>>>> Value ranges are part of the ABI compatibility contract.
>>>>> It seems you expect the application developer to be aware that
>>>>> DPDK could return a higher value, so the application should
>>>>> check every enum values after calling an API. CRAZY.
>>>>>
>>>>> When we decide to announce an ABI compatibility and do some marketing,
>>>>> everyone is OK. But when we need to really make our ABI compatible,
>>>>> I see little or no effort. DISAPPOINTING.
>>>>>
>>>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>>>>> to be sure bad written applications not affected?
>>>>>
>>>>> I suggest we must consider not breaking any assumption made on the API.
>>>>> Here we are breaking the enum range because nothing mentions _LIST_END
>>>>> is not really the absolute end of the enum.
>>>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
>>>>
>>>> Thinking twice, merging such change before 20.11 is breaking the
>>>> ABI assumption based on the API 19.11.0.
>>>> I ask the release maintainers (Luca, Kevin, David and me) and
>>>> the ABI maintainers (Neil and Ray) to vote for a or b solution:
>>>> 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
>>>
>>> That would still be an ABI breakage though right.
>>>
>>>> 	b) wait 20.11 and revert Chacha-Poly from 20.02
>>>
>>> Thanks for analysis above Fiona, Ferruh and all. 
>>>
>>> That is a nasty one alright - there is no "good" answer here.
>>> I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
>>> Could do without an enumeration?
>>>
>>> There a c) though right.
>>> We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
>>> So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.
>>
>> I agree there is a c) as I proposed in another email:
>> http://mails.dpdk.org/archives/dev/2020-February/156919.html
>> "
>> In this case, the proper solution is to implement
>> rte_cryptodev_info_get_v1911() so it filters out
>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
>> With this solution, an application compiled with DPDK 19.11 will keep
>> seeing the same range as before, while a 20.02 application could
>> see and use ChachaPoly.
>> "
>>
>>> It would be painful I know.
>>
>> Not so painful in my opinion.
>> Just need to call rte_cryptodev_info_get() from
>> rte_cryptodev_info_get_v1911() and filter the value
>> in the 19.11 range: [0..AES_GCM].
>>
>>> It would also mean that Chacha-Poly would only be available to
>>> those building against >= 20.02.
>>
>> Yes exactly.
>>
>> The addition of comments and LIST_MAX like below are still valid
>> to avoid versioning after 20.11.
>>
>>>>> - _LIST_END
>>>>> + _LIST_END, /* an ABI-compatible version may increase this value */
>>>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>>>>> };
>>>>>
>>>>> Then *_LIST_END values could be ignored by libabigail with such a change.
>>
>> In order to avoid ABI check complaining, the best is to completely
>> remove LIST_END in DPDK 20.11.
>>
>>
>>>>> If such a patch is not done by tomorrow, I will have to revert
>>>>> Chacha-Poly commits before 20.02-rc2, because
>>>>>
>>>>> 1/ LIST_END, without any comment, means "size of range"
>>>>> 2/ we do not blame users for undocumented ABI changes
>>>>> 3/ we respect the ABI compatibility contract
> 
> 
>
  
Aaron Conole Feb. 4, 2020, 2:44 p.m. UTC | #46
Thomas Monjalon <thomas@monjalon.net> writes:

> RED FLAG
>
> I don't see a lot of reactions, so I summarize the issue.
> We need action TODAY!
>
> API makes think that rte_cryptodev_info_get() cannot return
> a value >= 3 (RTE_CRYPTO_AEAD_LIST_END in 19.11).
> Current 20.02 returns 3 (RTE_CRYPTO_AEAD_CHACHA20_POLY1305).
> The ABI compatibility contract is broken currently.
>
> There are 3 possible outcomes:
>
> a) Change the API comments and backport to 19.11.1
> The details are discussed between Ferruh and me.
> Either put responsibility on API user (with explicit comment),
> or expose ABI extension allowance with a new API max value.
> In both cases, this is breaking the implicit contract of 19.11.0.
> This option can be chosen only if release and ABI maintainers
> vote for it.
>
> b) Revert Chacha-Poly from 20.02-rc2.
>
> c) Add versioned function rte_cryptodev_info_get_v1911()
> which calls rte_cryptodev_info_get() and filters out
> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> So Chacha-Poly capability would be seen and usable only
> if compiling with DPDK 20.02.
>
> I hope it is clear what are the actions for everybody:
> - ABI and release maintainers must say yes or no to the proposal (a)
> - In the meantime, crypto team must send a patch for the proposal (c)
> - If (a) and (c) are not possible at the end of today, I will take (b)
>
> Note: do not say it is too short for (c), as it was possible to work
> on such solution since the issue was exposed on last Wednesday.

While I'm not a maintainer, if I my opinion counts for anything, I'd
choose option c or b.  Absolutely NACK to a.

>
> 03/02/2020 22:07, Thomas Monjalon:
>> 03/02/2020 19:55, Ray Kinsella:
>> > On 03/02/2020 17:34, Thomas Monjalon wrote:
>> > > 03/02/2020 18:09, Thomas Monjalon:
>> > >> 03/02/2020 10:30, Ferruh Yigit:
>> > >>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>> > >>>> 02/02/2020 14:05, Thomas Monjalon:
>> > >>>>> 31/01/2020 15:16, Trahe, Fiona:
>> > >>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>> > >>>>>>> If library give higher value than expected by the application,
>> > >>>>>>> if the application uses this value as array index,
>> > >>>>>>> there can be an access out of bounds.
>> > >>>>>>
>> > >>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>> > >>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>> > >>>>>> sense and I don't see how there can be an API breakage.
>> > >>>>>> So if an application hasn't compiled against the new lib it
>> > >>>>>> will be still using the old value
>> > >>>>>> which will be within bounds. If it's picking up the higher
>> > >>>>>> new value from the lib it must
>> > >>>>>> have been compiled against the lib so shouldn't have problems.
>> > >>>>>
>> > >>>>> You say there is no ABI issue because the application will be re-compiled
>> > >>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>> > >>>>> But this is not relevant for ABI compatibility.
>> > >>>>> ABI compatibility means we can upgrade the library without recompiling
>> > >>>>> the application and it must work.
>> > >>>>> You think it is a false positive because you assume the application
>> > >>>>> "picks" the new value. I think you miss the case where the new value
>> > >>>>> is returned by a function in the upgraded library.
>> > >>>>>
>> > >>>>>> There are also no structs on the API which contain arrays using this
>> > >>>>>> for sizing, so I don't see an opportunity for an appl to have a
>> > >>>>>> mismatch in memory addresses.
>> > >>>>>
>> > >>>>> Let me demonstrate where the API may "use" the new value
>> > >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>> > >>>>>
>> > >>>>> Once upon a time a DPDK application counting the number of devices
>> > >>>>> supporting each AEAD algo (in order to find the best supported algo).
>> > >>>>> It is done in an array indexed by algo id:
>> > >>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>> > >>>>> The application is compiled with DPDK 19.11,
>> > >>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>> > >>>>> So the size of the application array aead_dev_count is 3.
>> > >>>>> This binary is run with DPDK 20.02,
>> > >>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>> > >>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>> > >>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>> > >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>> > >>>>> The application uses this value:
>> > >>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>> > >>>>> The application is crashing because of out of bound access.
>> > >>>>
>> > >>>> I'd say this is an example of bad written app.
>> > >>>> It probably should check that returned by library value doesn't
>> > >>>> exceed its internal array size.
>> > >>>
>> > >>> +1
>> > >>>
>> > >>> Application should ignore values >= MAX.
>> > >>
>> > >> Of course, blaming the API user is a lot easier than looking at the API.
>> > >> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>> > >> as the max value for the application.
>> > >> Value ranges are part of the ABI compatibility contract.
>> > >> It seems you expect the application developer to be aware that
>> > >> DPDK could return a higher value, so the application should
>> > >> check every enum values after calling an API. CRAZY.
>> > >>
>> > >> When we decide to announce an ABI compatibility and do some marketing,
>> > >> everyone is OK. But when we need to really make our ABI compatible,
>> > >> I see little or no effort. DISAPPOINTING.
>> > >>
>> > >>> Do you suggest we don't extend any enum or define between ABI breakage releases
>> > >>> to be sure bad written applications not affected?
>> > >>
>> > >> I suggest we must consider not breaking any assumption made on the API.
>> > >> Here we are breaking the enum range because nothing mentions _LIST_END
>> > >> is not really the absolute end of the enum.
>> > >> The solution is to make the change below in 20.02 + backport in 19.11.1:
>> > > 
>> > > Thinking twice, merging such change before 20.11 is breaking the
>> > > ABI assumption based on the API 19.11.0.
>> > > I ask the release maintainers (Luca, Kevin, David and me) and
>> > > the ABI maintainers (Neil and Ray) to vote for a or b solution:
>> > > 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
>> > 
>> > That would still be an ABI breakage though right.
>> > 
>> > > 	b) wait 20.11 and revert Chacha-Poly from 20.02
>> > 
>> > Thanks for analysis above Fiona, Ferruh and all. 
>> > 
>> > That is a nasty one alright - there is no "good" answer here.
>> > I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
>> > Could do without an enumeration?
>> > 
>> > There a c) though right.
>> > We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
>> > So they only support/acknowledge the existence of Chacha-Poly for
>> > applications build against > 20.02.
>> 
>> I agree there is a c) as I proposed in another email:
>> http://mails.dpdk.org/archives/dev/2020-February/156919.html
>> "
>> In this case, the proper solution is to implement
>> rte_cryptodev_info_get_v1911() so it filters out
>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
>> With this solution, an application compiled with DPDK 19.11 will keep
>> seeing the same range as before, while a 20.02 application could
>> see and use ChachaPoly.
>> "
>> 
>> > It would be painful I know.
>> 
>> Not so painful in my opinion.
>> Just need to call rte_cryptodev_info_get() from
>> rte_cryptodev_info_get_v1911() and filter the value
>> in the 19.11 range: [0..AES_GCM].
>> 
>> > It would also mean that Chacha-Poly would only be available to
>> > those building against >= 20.02.
>> 
>> Yes exactly.
>> 
>> The addition of comments and LIST_MAX like below are still valid
>> to avoid versioning after 20.11.
>> 
>> > >> - _LIST_END
>> > >> + _LIST_END, /* an ABI-compatible version may increase this value */
>> > >> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
>> > >> };
>> > >>
>> > >> Then *_LIST_END values could be ignored by libabigail with such a change.
>> 
>> In order to avoid ABI check complaining, the best is to completely
>> remove LIST_END in DPDK 20.11.
>> 
>> 
>> > >> If such a patch is not done by tomorrow, I will have to revert
>> > >> Chacha-Poly commits before 20.02-rc2, because
>> > >>
>> > >> 1/ LIST_END, without any comment, means "size of range"
>> > >> 2/ we do not blame users for undocumented ABI changes
>> > >> 3/ we respect the ABI compatibility contract
  
Fiona Trahe Feb. 4, 2020, 3:52 p.m. UTC | #47
> We are working on a patch, when it is ready we will send it.
> If it's not ready by end of your today, of course, go ahead with (b) and
> we will work towards 20.05.

We will not be sending a patch today.
The patch we're working on will provide two versions of rte_cryptodev_info_get(),
both call the same PMD function from the dev_ops info_get fn ptr.
The default version operates s as normal, the 19.11 version searches
through the list returned by the PMD, looking for sym.aead.algo = ChaChaPoly, it needs to strip it from the list.
As PMDs just pass a ptr to their capabilities list ( it isn't a linked list, but an array
with an end marker  = RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API
layer detects Chacha it must allocate some space and store a local copy of the
trimmed list. This must be stored only once per device.

This versioning will apply to any PMD which wants to take advantage of the new API between now and 20.11.

Note, I expect the ABI checker tools will still complain of ABI breakage as the LIST_END value will still change.

We are also reviewing all other cryptodev APIs in case there is any other API which needs versioning.
 
Anyone see any problem with this approach?
  
Thomas Monjalon Feb. 4, 2020, 3:59 p.m. UTC | #48
04/02/2020 16:52, Trahe, Fiona:
> 
> > We are working on a patch, when it is ready we will send it.
> > If it's not ready by end of your today, of course, go ahead with (b) and
> > we will work towards 20.05.
> 
> We will not be sending a patch today.
> The patch we're working on will provide two versions of rte_cryptodev_info_get(),
> both call the same PMD function from the dev_ops info_get fn ptr.
> The default version operates s as normal, the 19.11 version searches
> through the list returned by the PMD, looking for sym.aead.algo = ChaChaPoly, it needs to strip it from the list.
> As PMDs just pass a ptr to their capabilities list ( it isn't a linked list, but an array
> with an end marker  = RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API
> layer detects Chacha it must allocate some space and store a local copy of the
> trimmed list. This must be stored only once per device.

I don't understand what you have to store.
Can't you just set the algo to 0 if it is ChaCha?

> This versioning will apply to any PMD which wants to take advantage of the new API between now and 20.11.
> 
> Note, I expect the ABI checker tools will still complain of ABI breakage as the LIST_END value will still change.

Right, you need to update the ignore list for the tool.

> We are also reviewing all other cryptodev APIs in case there is any other API which needs versioning.
>  
> Anyone see any problem with this approach?

The other issue is with all other functions accepting this enum as input.
We should continue returning an error if getting Chacha as input with
19.11 version of these functions.
But I would tend to consider this small ABI breakage can be ignored
as it is in the error path.
  
Fiona Trahe Feb. 4, 2020, 5:46 p.m. UTC | #49
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, February 4, 2020 4:00 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>
> Cc: David Marchand <david.marchand@redhat.com>; nhorman@tuxdriver.com; bluca@debian.org;
> ktraynor@redhat.com; Ray Kinsella <mdr@ashroe.eu>; dev@dpdk.org; Akhil Goyal
> <akhil.goyal@nxp.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph <anoobj@marvell.com>; Kusztal,
> ArkadiuszX <arkadiuszx.kusztal@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; dodji@seketeli.net; Andrew Rybchenko
> <arybchenko@solarflare.com>; aconole@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> 04/02/2020 16:52, Trahe, Fiona:
> >
> > > We are working on a patch, when it is ready we will send it.
> > > If it's not ready by end of your today, of course, go ahead with (b) and
> > > we will work towards 20.05.
> >
> > We will not be sending a patch today.
> > The patch we're working on will provide two versions of rte_cryptodev_info_get(),
> > both call the same PMD function from the dev_ops info_get fn ptr.
> > The default version operates s as normal, the 19.11 version searches
> > through the list returned by the PMD, looking for sym.aead.algo = ChaChaPoly, it needs to strip it from
> the list.
> > As PMDs just pass a ptr to their capabilities list ( it isn't a linked list, but an array
> > with an end marker  = RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API
> > layer detects Chacha it must allocate some space and store a local copy of the
> > trimmed list. This must be stored only once per device.
> 
> I don't understand what you have to store.
> Can't you just set the algo to 0 if it is ChaCha?
[Fiona] it returns a pointer to data in the PMD domain, which the API couldn't and shouldn't overwrite, e.g.
static const struct rte_cryptodev_capabilities qat_gen3_sym_capabilities[]

> 
> > This versioning will apply to any PMD which wants to take advantage of the new API between now and
> 20.11.
> >
> > Note, I expect the ABI checker tools will still complain of ABI breakage as the LIST_END value will still
> change.
> 
> Right, you need to update the ignore list for the tool.
> 
> > We are also reviewing all other cryptodev APIs in case there is any other API which needs versioning.
> >
> > Anyone see any problem with this approach?
> 
> The other issue is with all other functions accepting this enum as input.
> We should continue returning an error if getting Chacha as input with
> 19.11 version of these functions.
> But I would tend to consider this small ABI breakage can be ignored
> as it is in the error path.
[Fiona] The QAT PMD tests for and handles this error. I expect other PMDs do too.
  
Neil Horman Feb. 4, 2020, 7:49 p.m. UTC | #50
On Tue, Feb 04, 2020 at 09:44:53AM -0500, Aaron Conole wrote:
> Thomas Monjalon <thomas@monjalon.net> writes:
> 
> > RED FLAG
> >
> > I don't see a lot of reactions, so I summarize the issue.
> > We need action TODAY!
> >
> > API makes think that rte_cryptodev_info_get() cannot return
> > a value >= 3 (RTE_CRYPTO_AEAD_LIST_END in 19.11).
> > Current 20.02 returns 3 (RTE_CRYPTO_AEAD_CHACHA20_POLY1305).
> > The ABI compatibility contract is broken currently.
> >
> > There are 3 possible outcomes:
> >
> > a) Change the API comments and backport to 19.11.1
> > The details are discussed between Ferruh and me.
> > Either put responsibility on API user (with explicit comment),
> > or expose ABI extension allowance with a new API max value.
> > In both cases, this is breaking the implicit contract of 19.11.0.
> > This option can be chosen only if release and ABI maintainers
> > vote for it.
> >
> > b) Revert Chacha-Poly from 20.02-rc2.
> >
> > c) Add versioned function rte_cryptodev_info_get_v1911()
> > which calls rte_cryptodev_info_get() and filters out
> > RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> > So Chacha-Poly capability would be seen and usable only
> > if compiling with DPDK 20.02.
> >
> > I hope it is clear what are the actions for everybody:
> > - ABI and release maintainers must say yes or no to the proposal (a)
> > - In the meantime, crypto team must send a patch for the proposal (c)
> > - If (a) and (c) are not possible at the end of today, I will take (b)
> >
> > Note: do not say it is too short for (c), as it was possible to work
> > on such solution since the issue was exposed on last Wednesday.
> 
> While I'm not a maintainer, if I my opinion counts for anything, I'd
> choose option c or b.  Absolutely NACK to a.
> 
Agreed, options c and b are reasonable, a isn't.  ABI commitments are ours, not
users.

Neil

> >
> > 03/02/2020 22:07, Thomas Monjalon:
> >> 03/02/2020 19:55, Ray Kinsella:
> >> > On 03/02/2020 17:34, Thomas Monjalon wrote:
> >> > > 03/02/2020 18:09, Thomas Monjalon:
> >> > >> 03/02/2020 10:30, Ferruh Yigit:
> >> > >>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> >> > >>>> 02/02/2020 14:05, Thomas Monjalon:
> >> > >>>>> 31/01/2020 15:16, Trahe, Fiona:
> >> > >>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> >> > >>>>>>> If library give higher value than expected by the application,
> >> > >>>>>>> if the application uses this value as array index,
> >> > >>>>>>> there can be an access out of bounds.
> >> > >>>>>>
> >> > >>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
> >> > >>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
> >> > >>>>>> sense and I don't see how there can be an API breakage.
> >> > >>>>>> So if an application hasn't compiled against the new lib it
> >> > >>>>>> will be still using the old value
> >> > >>>>>> which will be within bounds. If it's picking up the higher
> >> > >>>>>> new value from the lib it must
> >> > >>>>>> have been compiled against the lib so shouldn't have problems.
> >> > >>>>>
> >> > >>>>> You say there is no ABI issue because the application will be re-compiled
> >> > >>>>> for the updated library. Indeed, compilation fixes compatibility issues.
> >> > >>>>> But this is not relevant for ABI compatibility.
> >> > >>>>> ABI compatibility means we can upgrade the library without recompiling
> >> > >>>>> the application and it must work.
> >> > >>>>> You think it is a false positive because you assume the application
> >> > >>>>> "picks" the new value. I think you miss the case where the new value
> >> > >>>>> is returned by a function in the upgraded library.
> >> > >>>>>
> >> > >>>>>> There are also no structs on the API which contain arrays using this
> >> > >>>>>> for sizing, so I don't see an opportunity for an appl to have a
> >> > >>>>>> mismatch in memory addresses.
> >> > >>>>>
> >> > >>>>> Let me demonstrate where the API may "use" the new value
> >> > >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
> >> > >>>>>
> >> > >>>>> Once upon a time a DPDK application counting the number of devices
> >> > >>>>> supporting each AEAD algo (in order to find the best supported algo).
> >> > >>>>> It is done in an array indexed by algo id:
> >> > >>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> >> > >>>>> The application is compiled with DPDK 19.11,
> >> > >>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> >> > >>>>> So the size of the application array aead_dev_count is 3.
> >> > >>>>> This binary is run with DPDK 20.02,
> >> > >>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> >> > >>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> >> > >>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> >> > >>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> >> > >>>>> The application uses this value:
> >> > >>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> >> > >>>>> The application is crashing because of out of bound access.
> >> > >>>>
> >> > >>>> I'd say this is an example of bad written app.
> >> > >>>> It probably should check that returned by library value doesn't
> >> > >>>> exceed its internal array size.
> >> > >>>
> >> > >>> +1
> >> > >>>
> >> > >>> Application should ignore values >= MAX.
> >> > >>
> >> > >> Of course, blaming the API user is a lot easier than looking at the API.
> >> > >> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> >> > >> as the max value for the application.
> >> > >> Value ranges are part of the ABI compatibility contract.
> >> > >> It seems you expect the application developer to be aware that
> >> > >> DPDK could return a higher value, so the application should
> >> > >> check every enum values after calling an API. CRAZY.
> >> > >>
> >> > >> When we decide to announce an ABI compatibility and do some marketing,
> >> > >> everyone is OK. But when we need to really make our ABI compatible,
> >> > >> I see little or no effort. DISAPPOINTING.
> >> > >>
> >> > >>> Do you suggest we don't extend any enum or define between ABI breakage releases
> >> > >>> to be sure bad written applications not affected?
> >> > >>
> >> > >> I suggest we must consider not breaking any assumption made on the API.
> >> > >> Here we are breaking the enum range because nothing mentions _LIST_END
> >> > >> is not really the absolute end of the enum.
> >> > >> The solution is to make the change below in 20.02 + backport in 19.11.1:
> >> > > 
> >> > > Thinking twice, merging such change before 20.11 is breaking the
> >> > > ABI assumption based on the API 19.11.0.
> >> > > I ask the release maintainers (Luca, Kevin, David and me) and
> >> > > the ABI maintainers (Neil and Ray) to vote for a or b solution:
> >> > > 	a) add comment and LIST_MAX as below in 20.02 + 19.11.1
> >> > 
> >> > That would still be an ABI breakage though right.
> >> > 
> >> > > 	b) wait 20.11 and revert Chacha-Poly from 20.02
> >> > 
> >> > Thanks for analysis above Fiona, Ferruh and all. 
> >> > 
> >> > That is a nasty one alright - there is no "good" answer here.
> >> > I agree with Ferruh's sentiments overall, we should rethink this API for 20.11. 
> >> > Could do without an enumeration?
> >> > 
> >> > There a c) though right.
> >> > We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
> >> > So they only support/acknowledge the existence of Chacha-Poly for
> >> > applications build against > 20.02.
> >> 
> >> I agree there is a c) as I proposed in another email:
> >> http://mails.dpdk.org/archives/dev/2020-February/156919.html
> >> "
> >> In this case, the proper solution is to implement
> >> rte_cryptodev_info_get_v1911() so it filters out
> >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability.
> >> With this solution, an application compiled with DPDK 19.11 will keep
> >> seeing the same range as before, while a 20.02 application could
> >> see and use ChachaPoly.
> >> "
> >> 
> >> > It would be painful I know.
> >> 
> >> Not so painful in my opinion.
> >> Just need to call rte_cryptodev_info_get() from
> >> rte_cryptodev_info_get_v1911() and filter the value
> >> in the 19.11 range: [0..AES_GCM].
> >> 
> >> > It would also mean that Chacha-Poly would only be available to
> >> > those building against >= 20.02.
> >> 
> >> Yes exactly.
> >> 
> >> The addition of comments and LIST_MAX like below are still valid
> >> to avoid versioning after 20.11.
> >> 
> >> > >> - _LIST_END
> >> > >> + _LIST_END, /* an ABI-compatible version may increase this value */
> >> > >> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> >> > >> };
> >> > >>
> >> > >> Then *_LIST_END values could be ignored by libabigail with such a change.
> >> 
> >> In order to avoid ABI check complaining, the best is to completely
> >> remove LIST_END in DPDK 20.11.
> >> 
> >> 
> >> > >> If such a patch is not done by tomorrow, I will have to revert
> >> > >> Chacha-Poly commits before 20.02-rc2, because
> >> > >>
> >> > >> 1/ LIST_END, without any comment, means "size of range"
> >> > >> 2/ we do not blame users for undocumented ABI changes
> >> > >> 3/ we respect the ABI compatibility contract
> 
>
  
Neil Horman Feb. 4, 2020, 9:59 p.m. UTC | #51
On Tue, Feb 04, 2020 at 10:16:56AM +0000, Akhil Goyal wrote:
> Hi,
> > On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > > 03/02/2020 10:30, Ferruh Yigit:
> > >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > >>> 02/02/2020 14:05, Thomas Monjalon:
> > >>>> 31/01/2020 15:16, Trahe, Fiona:
> > >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> > >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > >>>>>>>>
> > >>>>>>>> I believe these enums will be used only in case of ASYM case which is
> > experimental.
> > >>>>>>>
> > >>>>>>> Independent from being experiment and not, this shouldn't be a
> > problem, I think
> > >>>>>>> this is a false positive.
> > >>>>>>>
> > >>>>>>> The ABI break can happen when a struct has been shared between the
> > application
> > >>>>>>> and the library (DPDK) and the layout of that memory know differently
> > by
> > >>>>>>> application and the library.
> > >>>>>>>
> > >>>>>>> Here in all cases, there is no layout/size change.
> > >>>>>>>
> > >>>>>>> As to the value changes of the enums, since application compiled with
> > old DPDK,
> > >>>>>>> it will know only up to '6', 7 and more means invalid to the application.
> > So it
> > >>>>>>> won't send these values also it should ignore these values from library.
> > Only
> > >>>>>>> consequence is old application won't able to use new features those
> > new enums
> > >>>>>>> provide but that is expected/normal.
> > >>>>>>
> > >>>>>> If library give higher value than expected by the application,
> > >>>>>> if the application uses this value as array index,
> > >>>>>> there can be an access out of bounds.
> > >>>>>
> > >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a
> > problem.
> > >>>>> But for the same issue with sym crypto below, I believe Ferruh's
> > explanation makes
> > >>>>> sense and I don't see how there can be an API breakage.
> > >>>>> So if an application hasn't compiled against the new lib it will be still using
> > the old value
> > >>>>> which will be within bounds. If it's picking up the higher new value from
> > the lib it must
> > >>>>> have been compiled against the lib so shouldn't have problems.
> > >>>>
> > >>>> You say there is no ABI issue because the application will be re-compiled
> > >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> > >>>> But this is not relevant for ABI compatibility.
> > >>>> ABI compatibility means we can upgrade the library without recompiling
> > >>>> the application and it must work.
> > >>>> You think it is a false positive because you assume the application
> > >>>> "picks" the new value. I think you miss the case where the new value
> > >>>> is returned by a function in the upgraded library.
> > >>>>
> > >>>>> There are also no structs on the API which contain arrays using this
> > >>>>> for sizing, so I don't see an opportunity for an appl to have a
> > >>>>> mismatch in memory addresses.
> > >>>>
> > >>>> Let me demonstrate where the API may "use" the new value
> > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> > application.
> > >>>>
> > >>>> Once upon a time a DPDK application counting the number of devices
> > >>>> supporting each AEAD algo (in order to find the best supported algo).
> > >>>> It is done in an array indexed by algo id:
> > >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > >>>> The application is compiled with DPDK 19.11,
> > >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > >>>> So the size of the application array aead_dev_count is 3.
> > >>>> This binary is run with DPDK 20.02,
> > >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > >>>> The application uses this value:
> > >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > >>>> The application is crashing because of out of bound access.
> > >>>
> > >>> I'd say this is an example of bad written app.
> > >>> It probably should check that returned by library value doesn't
> > >>> exceed its internal array size.
> > >>
> > >> +1
> > >>
> > >> Application should ignore values >= MAX.
> > >
> > > Of course, blaming the API user is a lot easier than looking at the API.
> > > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > > as the max value for the application.
> > > Value ranges are part of the ABI compatibility contract.
> > > It seems you expect the application developer to be aware that
> > > DPDK could return a higher value, so the application should
> > > check every enum values after calling an API. CRAZY.
> > >
> > > When we decide to announce an ABI compatibility and do some marketing,
> > > everyone is OK. But when we need to really make our ABI compatible,
> > > I see little or no effort. DISAPPOINTING.
> > 
> > This is not to blame the user or to do less work, this is more sane approach
> > that library provides the _END/_MAX value and application uses it as valid range
> > check.
> > 
> > >
> > >> Do you suggest we don't extend any enum or define between ABI breakage
> > releases
> > >> to be sure bad written applications not affected?
> > >
> > > I suggest we must consider not breaking any assumption made on the API.
> > > Here we are breaking the enum range because nothing mentions _LIST_END
> > > is not really the absolute end of the enum.
> > > The solution is to make the change below in 20.02 + backport in 19.11.1:
> > >
> > > - _LIST_END
> > > + _LIST_END, /* an ABI-compatible version may increase this value */
> > > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > > };
> > >
> > 
> > What is the point of "_LIST_MAX" here?
> > 
> > Application should know the "_LIST_END" of when it has been compiled for the
> > valid range check. Next time it is compiled "_LIST_END" may be different value
> > but same logic applies.
> > 
> > When "_LIST_END" is missing, application can't protect itself, in that case
> > library should send only the values application knows when it is compiled, this
> > means either we can't extend our enum/defines until next ABI breakage, or we
> > need to do ABI versioning to the functions that returns an enum each time enum
> > value extended.
> > 
> > I believe it is saner to provide _END/_MAX values to the application to use. And
> > if required comment them to clarify the expected usage.
> > 
> > But in above suggestion application can't use or rely on "_LIST_MAX", it doesn't
> > mean anything to application.
> > 
> 
> Can we have something like 
> enum rte_crypto_aead_algorithm {
>         RTE_CRYPTO_AEAD_AES_CCM = 1,
>         /**< AES algorithm in CCM mode. */
>         RTE_CRYPTO_AEAD_AES_GCM,
>         /**< AES algorithm in GCM mode. */
>         RTE_CRYPTO_AEAD_LIST_END,
>         /**< List end for 19.11 ABI compatibility */
>         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
>         /**< Chacha20 cipher with poly1305 authenticator */
>         RTE_CRYPTO_AEAD_LIST_END_2011
>         /**< List end for 20.11 ABI compatibility */
> };
> 
> And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the end and remove RTE_CRYPTO_AEAD_LIST_END_2011
> 
> I believe it will be ok for any application which need to use the chacha poly assume that this algo is
> Experimental and will move to formal list in 20.11. This can be documented in the documentation.
> I believe there is no way to add a new enum as experimental so far. This way we can formalize this
> requirement as well.
> 
> I believe this way effect of ABI breakage will be nullified.
> 
Thats not really helpful though, in that libabigail will then complain that
you've aliased an old ennumeration name in the ABI to a new name.

A better solution would be do one of the following:

a) add an API call - something like rte_crypto_get_max_alg(), which returns at
run time the maximum number of algorithms available, so that the application is
forced to learn that number at run time, rather than at compile time

b) Modify the API such that you pass in an algorithm name rather than an index
value defined by an ennumeration.  You may also add an API call that dumps all
the available strings back to the user.

a) would be nice, but it would still require an ABI change, which is less than
optimal

b) is nice because it gives you flexibility in how you search for algs - if you
want to add a new alg, you just update your internal tables with a new name
string, and applications get get its information by querying based on that
string, rather than an index.

b is also nice because it can just superscede the existing implementation (i.e.
the current implementation remains unchanged, but only supports the current,
existing algs), if you want to get the new algs, you use the new API calls
(which can also find the already existing algs), and at some point in the future
we can just deprecate the old API.

Neil

> 
> -Akhil
> 
> > > Then *_LIST_END values could be ignored by libabigail with such a change.
> > >
> > > If such a patch is not done by tomorrow, I will have to revert
> > > Chacha-Poly commits before 20.02-rc2, because
> > >
> > > 1/ LIST_END, without any comment, means "size of range"
> > > 2/ we do not blame users for undocumented ABI changes
> > > 3/ we respect the ABI compatibility contract
> > >
> > >
>
  
Neil Horman Feb. 4, 2020, 10:10 p.m. UTC | #52
On Tue, Feb 04, 2020 at 10:32:01AM +0000, Akhil Goyal wrote:
> 
> > 
> > 04/02/2020 11:16, Akhil Goyal:
> > > Hi,
> > > > On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > > > > 03/02/2020 10:30, Ferruh Yigit:
> > > > >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > > > >>> 02/02/2020 14:05, Thomas Monjalon:
> > > > >>>> 31/01/2020 15:16, Trahe, Fiona:
> > > > >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > > > >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> > > > >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> I believe these enums will be used only in case of ASYM case which
> > is
> > > > experimental.
> > > > >>>>>>>
> > > > >>>>>>> Independent from being experiment and not, this shouldn't be a
> > > > problem, I think
> > > > >>>>>>> this is a false positive.
> > > > >>>>>>>
> > > > >>>>>>> The ABI break can happen when a struct has been shared between
> > the
> > > > application
> > > > >>>>>>> and the library (DPDK) and the layout of that memory know
> > differently
> > > > by
> > > > >>>>>>> application and the library.
> > > > >>>>>>>
> > > > >>>>>>> Here in all cases, there is no layout/size change.
> > > > >>>>>>>
> > > > >>>>>>> As to the value changes of the enums, since application compiled
> > with
> > > > old DPDK,
> > > > >>>>>>> it will know only up to '6', 7 and more means invalid to the
> > application.
> > > > So it
> > > > >>>>>>> won't send these values also it should ignore these values from
> > library.
> > > > Only
> > > > >>>>>>> consequence is old application won't able to use new features
> > those
> > > > new enums
> > > > >>>>>>> provide but that is expected/normal.
> > > > >>>>>>
> > > > >>>>>> If library give higher value than expected by the application,
> > > > >>>>>> if the application uses this value as array index,
> > > > >>>>>> there can be an access out of bounds.
> > > > >>>>>
> > > > >>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a
> > > > problem.
> > > > >>>>> But for the same issue with sym crypto below, I believe Ferruh's
> > > > explanation makes
> > > > >>>>> sense and I don't see how there can be an API breakage.
> > > > >>>>> So if an application hasn't compiled against the new lib it will be still
> > using
> > > > the old value
> > > > >>>>> which will be within bounds. If it's picking up the higher new value
> > from
> > > > the lib it must
> > > > >>>>> have been compiled against the lib so shouldn't have problems.
> > > > >>>>
> > > > >>>> You say there is no ABI issue because the application will be re-
> > compiled
> > > > >>>> for the updated library. Indeed, compilation fixes compatibility issues.
> > > > >>>> But this is not relevant for ABI compatibility.
> > > > >>>> ABI compatibility means we can upgrade the library without
> > recompiling
> > > > >>>> the application and it must work.
> > > > >>>> You think it is a false positive because you assume the application
> > > > >>>> "picks" the new value. I think you miss the case where the new value
> > > > >>>> is returned by a function in the upgraded library.
> > > > >>>>
> > > > >>>>> There are also no structs on the API which contain arrays using this
> > > > >>>>> for sizing, so I don't see an opportunity for an appl to have a
> > > > >>>>> mismatch in memory addresses.
> > > > >>>>
> > > > >>>> Let me demonstrate where the API may "use" the new value
> > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> > > > application.
> > > > >>>>
> > > > >>>> Once upon a time a DPDK application counting the number of devices
> > > > >>>> supporting each AEAD algo (in order to find the best supported algo).
> > > > >>>> It is done in an array indexed by algo id:
> > > > >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > > > >>>> The application is compiled with DPDK 19.11,
> > > > >>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
> > > > >>>> So the size of the application array aead_dev_count is 3.
> > > > >>>> This binary is run with DPDK 20.02,
> > > > >>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > > > >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > > > >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > > > >>>> The application uses this value:
> > > > >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > > > >>>> The application is crashing because of out of bound access.
> > > > >>>
> > > > >>> I'd say this is an example of bad written app.
> > > > >>> It probably should check that returned by library value doesn't
> > > > >>> exceed its internal array size.
> > > > >>
> > > > >> +1
> > > > >>
> > > > >> Application should ignore values >= MAX.
> > > > >
> > > > > Of course, blaming the API user is a lot easier than looking at the API.
> > > > > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
> > > > > as the max value for the application.
> > > > > Value ranges are part of the ABI compatibility contract.
> > > > > It seems you expect the application developer to be aware that
> > > > > DPDK could return a higher value, so the application should
> > > > > check every enum values after calling an API. CRAZY.
> > > > >
> > > > > When we decide to announce an ABI compatibility and do some marketing,
> > > > > everyone is OK. But when we need to really make our ABI compatible,
> > > > > I see little or no effort. DISAPPOINTING.
> > > >
> > > > This is not to blame the user or to do less work, this is more sane approach
> > > > that library provides the _END/_MAX value and application uses it as valid
> > range
> > > > check.
> > > >
> > > > >
> > > > >> Do you suggest we don't extend any enum or define between ABI
> > breakage
> > > > releases
> > > > >> to be sure bad written applications not affected?
> > > > >
> > > > > I suggest we must consider not breaking any assumption made on the API.
> > > > > Here we are breaking the enum range because nothing mentions
> > _LIST_END
> > > > > is not really the absolute end of the enum.
> > > > > The solution is to make the change below in 20.02 + backport in 19.11.1:
> > > > >
> > > > > - _LIST_END
> > > > > + _LIST_END, /* an ABI-compatible version may increase this value */
> > > > > + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */
> > > > > };
> > > > >
> > > >
> > > > What is the point of "_LIST_MAX" here?
> > > >
> > > > Application should know the "_LIST_END" of when it has been compiled for
> > the
> > > > valid range check. Next time it is compiled "_LIST_END" may be different
> > value
> > > > but same logic applies.
> > > >
> > > > When "_LIST_END" is missing, application can't protect itself, in that case
> > > > library should send only the values application knows when it is compiled,
> > this
> > > > means either we can't extend our enum/defines until next ABI breakage, or
> > we
> > > > need to do ABI versioning to the functions that returns an enum each time
> > enum
> > > > value extended.
> > > >
> > > > I believe it is saner to provide _END/_MAX values to the application to use.
> > And
> > > > if required comment them to clarify the expected usage.
> > > >
> > > > But in above suggestion application can't use or rely on "_LIST_MAX", it
> > doesn't
> > > > mean anything to application.
> > > >
> > >
> > > Can we have something like
> > > enum rte_crypto_aead_algorithm {
> > >         RTE_CRYPTO_AEAD_AES_CCM = 1,
> > >         /**< AES algorithm in CCM mode. */
> > >         RTE_CRYPTO_AEAD_AES_GCM,
> > >         /**< AES algorithm in GCM mode. */
> > >         RTE_CRYPTO_AEAD_LIST_END,
> > >         /**< List end for 19.11 ABI compatibility */
> > >         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
> > >         /**< Chacha20 cipher with poly1305 authenticator */
> > >         RTE_CRYPTO_AEAD_LIST_END_2011
> > >         /**< List end for 20.11 ABI compatibility */
> > > };
> > >
> > > And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the end
> > and remove RTE_CRYPTO_AEAD_LIST_END_2011
> > >
> > > I believe it will be ok for any application which need to use the chacha poly
> > assume that this algo is
> > > Experimental and will move to formal list in 20.11. This can be documented in
> > the documentation.
> > > I believe there is no way to add a new enum as experimental so far. This way
> > we can formalize this
> > > requirement as well.
> > >
> > > I believe this way effect of ABI breakage will be nullified.
> > 
> > This is a possibility in the (a) proposal.
> > But it breaks API (and ABI) because a high value is returned
> > while not expected by the application.
> > 
> > I guess ABI and release maintainers will vote no to such breakage.
> > Note: I vote no.
> > 
> 
> If that is the case, I would say we should go with b).
> 
> Versioned APIs does not look good and adds more confusion.
> 
What makes you say that?

Versioned APIs are the way you maintain backward compatibility.

If a library doesn't use versioned API's, then they either:

1) break frequently, causing application headaches
2) have APIS that are so mature, strictly defined, and small, they never change anyway
3) go to the trouble of creating compat libs for as far back as they need to
support

DPDK doesn't yet have a mature, stable API, so we have to do (1) or (2).  (1)
has already been declared a bad idea, because application developers and distros
have declared a desire for backwards compatibility.  We could go with (3)
instead of ABI versioning, but between compat libs and versioning, the latter is
the much less difficult way to handle that.

Neil
  
Anoob Joseph Feb. 5, 2020, 6:16 a.m. UTC | #53
Hi Akhil, Neil, Fiona

Sorry for the late response. I want to propose a new change in line with what you folks had proposed.

May be we can treat the new features EXPERIMENTAL until a new stable release.

enum rte_crypto_aead_algorithm {
         RTE_CRYPTO_AEAD_AES_CCM = 1,
         /**< AES algorithm in CCM mode. */
         RTE_CRYPTO_AEAD_AES_GCM,
         /**< AES algorithm in GCM mode. */
         RTE_CRYPTO_AEAD_LIST_END,
         /**< List end for stable */
         /** EXPERIMENTAL */
         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
         /**< Chacha20 cipher with poly1305 authenticator */
         RTE_CRYPTO_AEAD_LIST_END_EXPERIMENTAL
         /**< List end */
 };

And then introduce an experimental API,

const struct rte_cryptodev_capabilities *
rte_cryptodev_get_exp_capabilites(uint8_t dev_id);

The PMD owner is expected to add new capabilities (only new ones) to this one until the new feature is deemed stable (ie, in one of the next stable releases). We don't expect users to change their API/ABI. For applications using EXPERIMENTAL is allowed to use the above capabilities to get the EXPERIMENTAL features.

This does involve moving around code in PMD when one feature is added, but that's the risk PMD owner is taking by upstreaming as EXPERIMENTAL and not in stable release.

Thanks,
Anoob

> -----Original Message-----
> From: Neil Horman <nhorman@tuxdriver.com>
> Sent: Wednesday, February 5, 2020 3:40 AM
> To: Akhil Goyal <akhil.goyal@nxp.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>;
> dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Anoob Joseph
> <anoobj@marvell.com>; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>; dodji@seketeli.net; Andrew Rybchenko
> <arybchenko@solarflare.com>; aconole@redhat.com; bluca@debian.org;
> ktraynor@redhat.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Tue, Feb 04, 2020 at 10:32:01AM +0000, Akhil Goyal wrote:
> >
> > >
> > > 04/02/2020 11:16, Akhil Goyal:
> > > > Hi,
> > > > > On 2/3/2020 5:09 PM, Thomas Monjalon wrote:
> > > > > > 03/02/2020 10:30, Ferruh Yigit:
> > > > > >> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
> > > > > >>> 02/02/2020 14:05, Thomas Monjalon:
> > > > > >>>> 31/01/2020 15:16, Trahe, Fiona:
> > > > > >>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
> > > > > >>>>>> 30/01/2020 17:09, Ferruh Yigit:
> > > > > >>>>>>> On 1/29/2020 8:13 PM, Akhil Goyal wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> I believe these enums will be used only in case of ASYM
> > > > > >>>>>>>> case which
> > > is
> > > > > experimental.
> > > > > >>>>>>>
> > > > > >>>>>>> Independent from being experiment and not, this
> > > > > >>>>>>> shouldn't be a
> > > > > problem, I think
> > > > > >>>>>>> this is a false positive.
> > > > > >>>>>>>
> > > > > >>>>>>> The ABI break can happen when a struct has been shared
> > > > > >>>>>>> between
> > > the
> > > > > application
> > > > > >>>>>>> and the library (DPDK) and the layout of that memory
> > > > > >>>>>>> know
> > > differently
> > > > > by
> > > > > >>>>>>> application and the library.
> > > > > >>>>>>>
> > > > > >>>>>>> Here in all cases, there is no layout/size change.
> > > > > >>>>>>>
> > > > > >>>>>>> As to the value changes of the enums, since application
> > > > > >>>>>>> compiled
> > > with
> > > > > old DPDK,
> > > > > >>>>>>> it will know only up to '6', 7 and more means invalid to
> > > > > >>>>>>> the
> > > application.
> > > > > So it
> > > > > >>>>>>> won't send these values also it should ignore these
> > > > > >>>>>>> values from
> > > library.
> > > > > Only
> > > > > >>>>>>> consequence is old application won't able to use new
> > > > > >>>>>>> features
> > > those
> > > > > new enums
> > > > > >>>>>>> provide but that is expected/normal.
> > > > > >>>>>>
> > > > > >>>>>> If library give higher value than expected by the
> > > > > >>>>>> application, if the application uses this value as array
> > > > > >>>>>> index, there can be an access out of bounds.
> > > > > >>>>>
> > > > > >>>>> [Fiona] All asymmetric APIs are experimental so above
> > > > > >>>>> shouldn't be a
> > > > > problem.
> > > > > >>>>> But for the same issue with sym crypto below, I believe
> > > > > >>>>> Ferruh's
> > > > > explanation makes
> > > > > >>>>> sense and I don't see how there can be an API breakage.
> > > > > >>>>> So if an application hasn't compiled against the new lib
> > > > > >>>>> it will be still
> > > using
> > > > > the old value
> > > > > >>>>> which will be within bounds. If it's picking up the higher
> > > > > >>>>> new value
> > > from
> > > > > the lib it must
> > > > > >>>>> have been compiled against the lib so shouldn't have problems.
> > > > > >>>>
> > > > > >>>> You say there is no ABI issue because the application will
> > > > > >>>> be re-
> > > compiled
> > > > > >>>> for the updated library. Indeed, compilation fixes compatibility
> issues.
> > > > > >>>> But this is not relevant for ABI compatibility.
> > > > > >>>> ABI compatibility means we can upgrade the library without
> > > recompiling
> > > > > >>>> the application and it must work.
> > > > > >>>> You think it is a false positive because you assume the
> > > > > >>>> application "picks" the new value. I think you miss the
> > > > > >>>> case where the new value is returned by a function in the upgraded
> library.
> > > > > >>>>
> > > > > >>>>> There are also no structs on the API which contain arrays
> > > > > >>>>> using this for sizing, so I don't see an opportunity for
> > > > > >>>>> an appl to have a mismatch in memory addresses.
> > > > > >>>>
> > > > > >>>> Let me demonstrate where the API may "use" the new value
> > > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the
> > > > > application.
> > > > > >>>>
> > > > > >>>> Once upon a time a DPDK application counting the number of
> > > > > >>>> devices supporting each AEAD algo (in order to find the best
> supported algo).
> > > > > >>>> It is done in an array indexed by algo id:
> > > > > >>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
> > > > > >>>> The application is compiled with DPDK 19.11, where
> > > > > >>>> RTE_CRYPTO_AEAD_LIST_END = 3.
> > > > > >>>> So the size of the application array aead_dev_count is 3.
> > > > > >>>> This binary is run with DPDK 20.02, where
> > > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
> > > > > >>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
> > > > > >>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
> > > > > >>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
> > > > > >>>> The application uses this value:
> > > > > >>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
> > > > > >>>> The application is crashing because of out of bound access.
> > > > > >>>
> > > > > >>> I'd say this is an example of bad written app.
> > > > > >>> It probably should check that returned by library value
> > > > > >>> doesn't exceed its internal array size.
> > > > > >>
> > > > > >> +1
> > > > > >>
> > > > > >> Application should ignore values >= MAX.
> > > > > >
> > > > > > Of course, blaming the API user is a lot easier than looking at the API.
> > > > > > Here the API has RTE_CRYPTO_AEAD_LIST_END which can be
> > > > > > understood as the max value for the application.
> > > > > > Value ranges are part of the ABI compatibility contract.
> > > > > > It seems you expect the application developer to be aware that
> > > > > > DPDK could return a higher value, so the application should
> > > > > > check every enum values after calling an API. CRAZY.
> > > > > >
> > > > > > When we decide to announce an ABI compatibility and do some
> > > > > > marketing, everyone is OK. But when we need to really make our
> > > > > > ABI compatible, I see little or no effort. DISAPPOINTING.
> > > > >
> > > > > This is not to blame the user or to do less work, this is more
> > > > > sane approach that library provides the _END/_MAX value and
> > > > > application uses it as valid
> > > range
> > > > > check.
> > > > >
> > > > > >
> > > > > >> Do you suggest we don't extend any enum or define between ABI
> > > breakage
> > > > > releases
> > > > > >> to be sure bad written applications not affected?
> > > > > >
> > > > > > I suggest we must consider not breaking any assumption made on the
> API.
> > > > > > Here we are breaking the enum range because nothing mentions
> > > _LIST_END
> > > > > > is not really the absolute end of the enum.
> > > > > > The solution is to make the change below in 20.02 + backport in
> 19.11.1:
> > > > > >
> > > > > > - _LIST_END
> > > > > > + _LIST_END, /* an ABI-compatible version may increase this
> > > > > > + value */ _LIST_MAX = _LIST_END + 42 /* room for
> > > > > > + ABI-compatible additions */
> > > > > > };
> > > > > >
> > > > >
> > > > > What is the point of "_LIST_MAX" here?
> > > > >
> > > > > Application should know the "_LIST_END" of when it has been
> > > > > compiled for
> > > the
> > > > > valid range check. Next time it is compiled "_LIST_END" may be
> > > > > different
> > > value
> > > > > but same logic applies.
> > > > >
> > > > > When "_LIST_END" is missing, application can't protect itself,
> > > > > in that case library should send only the values application
> > > > > knows when it is compiled,
> > > this
> > > > > means either we can't extend our enum/defines until next ABI
> > > > > breakage, or
> > > we
> > > > > need to do ABI versioning to the functions that returns an enum
> > > > > each time
> > > enum
> > > > > value extended.
> > > > >
> > > > > I believe it is saner to provide _END/_MAX values to the application to
> use.
> > > And
> > > > > if required comment them to clarify the expected usage.
> > > > >
> > > > > But in above suggestion application can't use or rely on
> > > > > "_LIST_MAX", it
> > > doesn't
> > > > > mean anything to application.
> > > > >
> > > >
> > > > Can we have something like
> > > > enum rte_crypto_aead_algorithm {
> > > >         RTE_CRYPTO_AEAD_AES_CCM = 1,
> > > >         /**< AES algorithm in CCM mode. */
> > > >         RTE_CRYPTO_AEAD_AES_GCM,
> > > >         /**< AES algorithm in GCM mode. */
> > > >         RTE_CRYPTO_AEAD_LIST_END,
> > > >         /**< List end for 19.11 ABI compatibility */
> > > >         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
> > > >         /**< Chacha20 cipher with poly1305 authenticator */
> > > >         RTE_CRYPTO_AEAD_LIST_END_2011
> > > >         /**< List end for 20.11 ABI compatibility */ };
> > > >
> > > > And in 20.11 release we alter the RTE_CRYPTO_AEAD_LIST_END to the
> > > > end
> > > and remove RTE_CRYPTO_AEAD_LIST_END_2011
> > > >
> > > > I believe it will be ok for any application which need to use the
> > > > chacha poly
> > > assume that this algo is
> > > > Experimental and will move to formal list in 20.11. This can be
> > > > documented in
> > > the documentation.
> > > > I believe there is no way to add a new enum as experimental so
> > > > far. This way
> > > we can formalize this
> > > > requirement as well.
> > > >
> > > > I believe this way effect of ABI breakage will be nullified.
> > >
> > > This is a possibility in the (a) proposal.
> > > But it breaks API (and ABI) because a high value is returned while
> > > not expected by the application.
> > >
> > > I guess ABI and release maintainers will vote no to such breakage.
> > > Note: I vote no.
> > >
> >
> > If that is the case, I would say we should go with b).
> >
> > Versioned APIs does not look good and adds more confusion.
> >
> What makes you say that?
> 
> Versioned APIs are the way you maintain backward compatibility.
> 
> If a library doesn't use versioned API's, then they either:
> 
> 1) break frequently, causing application headaches
> 2) have APIS that are so mature, strictly defined, and small, they never change
> anyway
> 3) go to the trouble of creating compat libs for as far back as they need to
> support
> 
> DPDK doesn't yet have a mature, stable API, so we have to do (1) or (2).  (1) has
> already been declared a bad idea, because application developers and distros
> have declared a desire for backwards compatibility.  We could go with (3)
> instead of ABI versioning, but between compat libs and versioning, the latter is
> the much less difficult way to handle that.
> 
> Neil
  
Ray Kinsella Feb. 5, 2020, 11:10 a.m. UTC | #54
On 04/02/2020 09:51, David Marchand wrote:
> On Mon, Feb 3, 2020 at 7:56 PM Ray Kinsella <mdr@ashroe.eu> wrote:
>> On 03/02/2020 17:34, Thomas Monjalon wrote:
>>> 03/02/2020 18:09, Thomas Monjalon:
>>>> 03/02/2020 10:30, Ferruh Yigit:
>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote:
>>>>>> 02/02/2020 14:05, Thomas Monjalon:
>>>>>>> 31/01/2020 15:16, Trahe, Fiona:
>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote:
>>>>>>>>> If library give higher value than expected by the application,
>>>>>>>>> if the application uses this value as array index,
>>>>>>>>> there can be an access out of bounds.
>>>>>>>>
>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a problem.
>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's explanation makes
>>>>>>>> sense and I don't see how there can be an API breakage.
>>>>>>>> So if an application hasn't compiled against the new lib it will be still using the old value
>>>>>>>> which will be within bounds. If it's picking up the higher new value from the lib it must
>>>>>>>> have been compiled against the lib so shouldn't have problems.
>>>>>>>
>>>>>>> You say there is no ABI issue because the application will be re-compiled
>>>>>>> for the updated library. Indeed, compilation fixes compatibility issues.
>>>>>>> But this is not relevant for ABI compatibility.
>>>>>>> ABI compatibility means we can upgrade the library without recompiling
>>>>>>> the application and it must work.
>>>>>>> You think it is a false positive because you assume the application
>>>>>>> "picks" the new value. I think you miss the case where the new value
>>>>>>> is returned by a function in the upgraded library.
>>>>>>>
>>>>>>>> There are also no structs on the API which contain arrays using this
>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a
>>>>>>>> mismatch in memory addresses.
>>>>>>>
>>>>>>> Let me demonstrate where the API may "use" the new value
>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application.
>>>>>>>
>>>>>>> Once upon a time a DPDK application counting the number of devices
>>>>>>> supporting each AEAD algo (in order to find the best supported algo).
>>>>>>> It is done in an array indexed by algo id:
>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END];
>>>>>>> The application is compiled with DPDK 19.11,
>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3.
>>>>>>> So the size of the application array aead_dev_count is 3.
>>>>>>> This binary is run with DPDK 20.02,
>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3.
>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3,
>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to
>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3).
>>>>>>> The application uses this value:
>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo];
>>>>>>> The application is crashing because of out of bound access.
>>>>>>
>>>>>> I'd say this is an example of bad written app.
>>>>>> It probably should check that returned by library value doesn't
>>>>>> exceed its internal array size.
>>>>>
>>>>> +1
>>>>>
>>>>> Application should ignore values >= MAX.
>>>>
>>>> Of course, blaming the API user is a lot easier than looking at the API.
>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood
>>>> as the max value for the application.
>>>> Value ranges are part of the ABI compatibility contract.
>>>> It seems you expect the application developer to be aware that
>>>> DPDK could return a higher value, so the application should
>>>> check every enum values after calling an API. CRAZY.
>>>>
>>>> When we decide to announce an ABI compatibility and do some marketing,
>>>> everyone is OK. But when we need to really make our ABI compatible,
>>>> I see little or no effort. DISAPPOINTING.
>>>>
>>>>> Do you suggest we don't extend any enum or define between ABI breakage releases
>>>>> to be sure bad written applications not affected?
>>>>
>>>> I suggest we must consider not breaking any assumption made on the API.
>>>> Here we are breaking the enum range because nothing mentions _LIST_END
>>>> is not really the absolute end of the enum.
>>>> The solution is to make the change below in 20.02 + backport in 19.11.1:
>>>
>>> Thinking twice, merging such change before 20.11 is breaking the
>>> ABI assumption based on the API 19.11.0.
>>> I ask the release maintainers (Luca, Kevin, David and me) and
>>> the ABI maintainers (Neil and Ray) to vote for a or b solution:
>>>       a) add comment and LIST_MAX as below in 20.02 + 19.11.1
>>
>> That would still be an ABI breakage though right.
> 
> Yes.
> 
> 
>>
>>>       b) wait 20.11 and revert Chacha-Poly from 20.02
>>
>> Thanks for analysis above Fiona, Ferruh and all.
>>
>> That is a nasty one alright - there is no "good" answer here.
>> I agree with Ferruh's sentiments overall, we should rethink this API for 20.11.
>> Could do without an enumeration?
>>
>> There a c) though right.
>> We could work around the issue by api versioning rte_cryptodev_info_get() and friends.
> 
> It has a lot of friends, but it sounds like the right approach.

+1

> Is someone looking into this?

Looks to be in hand now.

> 
> 
>> So they only support/acknowledge the existence of Chacha-Poly for applications build against > 20.02.
>>
>> It would be painful I know.
>> It would also mean that Chacha-Poly would only be available to those building against >= 20.02.
> 
> Yes.
> 
> 
> --
> David Marchand
>
  
Fiona Trahe Feb. 5, 2020, 2:33 p.m. UTC | #55
Hi Anoob,

> -----Original Message-----
> From: Anoob Joseph <anoobj@marvell.com>
> Sent: Wednesday, February 5, 2020 6:16 AM
> To: Neil Horman <nhorman@tuxdriver.com>; Akhil Goyal <akhil.goyal@nxp.com>; Trahe, Fiona
> <fiona.trahe@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org;
> David Marchand <david.marchand@redhat.com>; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Mcnamara, John <john.mcnamara@intel.com>;
> dodji@seketeli.net; Andrew Rybchenko <arybchenko@solarflare.com>; aconole@redhat.com;
> bluca@debian.org; ktraynor@redhat.com
> Subject: RE: [EXT] Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> Hi Akhil, Neil, Fiona
> 
> Sorry for the late response. I want to propose a new change in line with what you folks had proposed.
> 
> May be we can treat the new features EXPERIMENTAL until a new stable release.
> 
> enum rte_crypto_aead_algorithm {
>          RTE_CRYPTO_AEAD_AES_CCM = 1,
>          /**< AES algorithm in CCM mode. */
>          RTE_CRYPTO_AEAD_AES_GCM,
>          /**< AES algorithm in GCM mode. */
>          RTE_CRYPTO_AEAD_LIST_END,
>          /**< List end for stable */
>          /** EXPERIMENTAL */
>          RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
>          /**< Chacha20 cipher with poly1305 authenticator */
>          RTE_CRYPTO_AEAD_LIST_END_EXPERIMENTAL
>          /**< List end */
>  };
> 
> And then introduce an experimental API,
> 
> const struct rte_cryptodev_capabilities *
> rte_cryptodev_get_exp_capabilites(uint8_t dev_id);
> 
> The PMD owner is expected to add new capabilities (only new ones) to this one until the new feature is
> deemed stable (ie, in one of the next stable releases). We don't expect users to change their API/ABI.
> For applications using EXPERIMENTAL is allowed to use the above capabilities to get the EXPERIMENTAL
> features.
> 
> This does involve moving around code in PMD when one feature is added, but that's the risk PMD
> owner is taking by upstreaming as EXPERIMENTAL and not in stable release.
> 
> Thanks,
> Anoob

[Fiona] Thanks for the suggestion Anoob.
I like the enum part of the idea - but not the new temporary API as the applications need to be aware of it and would have to change again when it's removed.
I explored an alternative way of using the current experimental infrastructure, i.e.:
enum rte_crypto_aead_algorithm {
         RTE_CRYPTO_AEAD_AES_CCM = 1,
         /**< AES algorithm in CCM mode. */
         RTE_CRYPTO_AEAD_AES_GCM,
         /**< AES algorithm in GCM mode. */
#ifdef ALLOW_EXPERIMENTAL_API
         RTE_CRYPTO_AEAD_CHACHA20_POLY1305,
         /**< Chacha20 cipher with poly1305 authenticator */
#endif
         RTE_CRYPTO_AEAD_LIST_END,
         /**< List end */
};

No new rte_cryptodev_get_exp_capabilities() needed.

Any PMD that implements the experimental API must do the same:
#ifdef ALLOW_EXPERIMENTAL_API
	<PMD code processing new enum>
#endif
Same with test code.
Any 19.11 production code that  wants to run against shared objects from
20.02 can be expected to build DPDK with ALLOW_EXPERIMENTAL_API disabled,
so will not pick up the new feature.
However, it appears the flag is not globally consistent, i.e. most PMDs have it set, even if the application doesn't set it. So this probably wouldn't work.
We're testing the approach outlined yesterday and believe it satisfactorily resolves the issue, so will stick with that.
  
Arkadiusz Kusztal Feb. 13, 2020, 2:51 p.m. UTC | #56
Hi,

Two comments from me,

> > > The patch we're working on will provide two versions of
> > > rte_cryptodev_info_get(), both call the same PMD function from the
> dev_ops info_get fn ptr.
> > > The default version operates s as normal, the 19.11 version searches
> > > through the list returned by the PMD, looking for sym.aead.algo =
> > > ChaChaPoly, it needs to strip it from
> > the list.
> > > As PMDs just pass a ptr to their capabilities list ( it isn't a
> > > linked list, but an array with an end marker  =
> > > RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API layer detects
> > > Chacha it must allocate some space and store a local copy of the trimmed
> list. This must be stored only once per device. 
[Arek] The problem with this solution is that we need to allocate memory.
So the question is how to handle unlikely case of malloc error when we operate inside void function rte_cryptodev_info_get?
And even if we would pass somehow error condition to the caller then what to do is another question.

> >
> > I don't understand what you have to store.
> > Can't you just set the algo to 0 if it is ChaCha?
> [Fiona] it returns a pointer to data in the PMD domain, which the API couldn't
> and shouldn't overwrite, e.g.
> static const struct rte_cryptodev_capabilities qat_gen3_sym_capabilities[]
Should we print user some information
> 
> >
> > > This versioning will apply to any PMD which wants to take advantage
> > > of the new API between now and
> > 20.11.
> > >
> > > Note, I expect the ABI checker tools will still complain of ABI
> > > breakage as the LIST_END value will still
> > change.
> >
> > Right, you need to update the ignore list for the tool.
> >
> > > We are also reviewing all other cryptodev APIs in case there is any other
> API which needs versioning.
> > >
> > > Anyone see any problem with this approach?
> >
> > The other issue is with all other functions accepting this enum as input.
> > We should continue returning an error if getting Chacha as input with
> > 19.11 version of these functions.
> > But I would tend to consider this small ABI breakage can be ignored as
> > it is in the error path.
> [Fiona] The QAT PMD tests for and handles this error. I expect other PMDs
> do too.
[Arek] - Yes, it is error path but on the other hand we explicitly specify what value we will return when calling
rte_cryptodev_sym_session_init so caller may expect EINVAL when wrong algorithm value selected (usually it probably will be ENOTSUP).
In this case when setting 3 (LIST_END) on 19.11 app and linking against 20.02 (assuming with Chacha) shared build, caller would get success on return and fully set chacha session,
which will probably result in undefined behavior.
So shouldn't this function be versioned as well then?

Regards,
Arek
  
Fiona Trahe March 16, 2020, 12:57 p.m. UTC | #57
Hi,

> -----Original Message-----
> From: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>
> Sent: Thursday, February 13, 2020 2:51 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; Thomas Monjalon <thomas@monjalon.net>
> Cc: David Marchand <david.marchand@redhat.com>; nhorman@tuxdriver.com; bluca@debian.org;
> ktraynor@redhat.com; Ray Kinsella <mdr@ashroe.eu>; dev@dpdk.org; Akhil Goyal
> <akhil.goyal@nxp.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph <anoobj@marvell.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Mcnamara, John <john.mcnamara@intel.com>;
> dodji@seketeli.net; Andrew Rybchenko <arybchenko@solarflare.com>; aconole@redhat.com
> Subject: RE: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> Hi,
> 
> Two comments from me,
> 
> > > > The patch we're working on will provide two versions of
> > > > rte_cryptodev_info_get(), both call the same PMD function from the
> > dev_ops info_get fn ptr.
> > > > The default version operates s as normal, the 19.11 version searches
> > > > through the list returned by the PMD, looking for sym.aead.algo =
> > > > ChaChaPoly, it needs to strip it from
> > > the list.
> > > > As PMDs just pass a ptr to their capabilities list ( it isn't a
> > > > linked list, but an array with an end marker  =
> > > > RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API layer detects
> > > > Chacha it must allocate some space and store a local copy of the trimmed
> > list. This must be stored only once per device.
> [Arek] The problem with this solution is that we need to allocate memory.
> So the question is how to handle unlikely case of malloc error when we operate inside void function
> rte_cryptodev_info_get?
> And even if we would pass somehow error condition to the caller then what to do is another question.
[Fiona] Quick recap: To avoid breaking ABI, we must return a set of capabilities with/without ChaChaPoly
depending on the appl version. To resolve this, within the rte_cryptodev layer, we propose to 
inspect the capabilities returned by PMD and strip ChaCha if it exists.
In that case memory for the new trimmed capabilities array has to be malloced by the lib.
All good, except how to handle a malloc fail is yet another API breakage as rte_cryptodev_get_info() returns void.
We propose to return an empty capability list, i.e. a list with only the END element (which can be done without malloc) 
in this corner case of a corner case.
Anyone see any issue with this?

> 
> > >
> > > I don't understand what you have to store.
> > > Can't you just set the algo to 0 if it is ChaCha?
> > [Fiona] it returns a pointer to data in the PMD domain, which the API couldn't
> > and shouldn't overwrite, e.g.
> > static const struct rte_cryptodev_capabilities qat_gen3_sym_capabilities[]
> Should we print user some information
> >
> > >
> > > > This versioning will apply to any PMD which wants to take advantage
> > > > of the new API between now and
> > > 20.11.
> > > >
> > > > Note, I expect the ABI checker tools will still complain of ABI
> > > > breakage as the LIST_END value will still
> > > change.
> > >
> > > Right, you need to update the ignore list for the tool.
> > >
> > > > We are also reviewing all other cryptodev APIs in case there is any other
> > API which needs versioning.
> > > >
> > > > Anyone see any problem with this approach?
> > >
> > > The other issue is with all other functions accepting this enum as input.
> > > We should continue returning an error if getting Chacha as input with
> > > 19.11 version of these functions.
> > > But I would tend to consider this small ABI breakage can be ignored as
> > > it is in the error path.
> > [Fiona] The QAT PMD tests for and handles this error. I expect other PMDs
> > do too.
> [Arek] - Yes, it is error path but on the other hand we explicitly specify what value we will return when
> calling
> rte_cryptodev_sym_session_init so caller may expect EINVAL when wrong algorithm value selected
> (usually it probably will be ENOTSUP).
> In this case when setting 3 (LIST_END) on 19.11 app and linking against 20.02 (assuming with Chacha)
> shared build, caller would get success on return and fully set chacha session,
> which will probably result in undefined behavior.
> So shouldn't this function be versioned as well then?
[Fiona] I would agree with Tomas to ignore this small ABI break, as it is already an error case
if a appl is passing in a bad value for the algorithm. Even if it does return SUCCESS, instead of ENOTSUP,
 what behaviour could the application be expecting with a session using LIST_END as an algo?



> 
> Regards,
> Arek
> 
>
  
Thomas Monjalon March 16, 2020, 1:09 p.m. UTC | #58
16/03/2020 13:57, Trahe, Fiona:
> From: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>
> > > > > The patch we're working on will provide two versions of
> > > > > rte_cryptodev_info_get(), both call the same PMD function from the
> > > dev_ops info_get fn ptr.
> > > > > The default version operates s as normal, the 19.11 version searches
> > > > > through the list returned by the PMD, looking for sym.aead.algo =
> > > > > ChaChaPoly, it needs to strip it from
> > > > the list.
> > > > > As PMDs just pass a ptr to their capabilities list ( it isn't a
> > > > > linked list, but an array with an end marker  =
> > > > > RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API layer detects
> > > > > Chacha it must allocate some space and store a local copy of the trimmed
> > > list. This must be stored only once per device.
> > 
> > [Arek] The problem with this solution is that we need to allocate memory.
> > So the question is how to handle unlikely case of malloc error when we operate inside void function
> > rte_cryptodev_info_get?
> > And even if we would pass somehow error condition to the caller then what to do is another question.
> 
> [Fiona] Quick recap: To avoid breaking ABI, we must return a set of capabilities with/without ChaChaPoly
> depending on the appl version. To resolve this, within the rte_cryptodev layer, we propose to 
> inspect the capabilities returned by PMD and strip ChaCha if it exists.
> In that case memory for the new trimmed capabilities array has to be malloced by the lib.

What happens if the capability is removed from the original capabilities input?

> All good, except how to handle a malloc fail is yet another API breakage as rte_cryptodev_get_info() returns void.
> We propose to return an empty capability list, i.e. a list with only the END element (which can be done without malloc) 
> in this corner case of a corner case.
> Anyone see any issue with this?

How can we use the feature if it is not advertised in capabilities?
  
Arkadiusz Kusztal March 17, 2020, 1:27 p.m. UTC | #59
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, March 16, 2020 2:09 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>
> Cc: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; David Marchand
> <david.marchand@redhat.com>; nhorman@tuxdriver.com;
> bluca@debian.org; ktraynor@redhat.com; Ray Kinsella <mdr@ashroe.eu>;
> dev@dpdk.org; Akhil Goyal <akhil.goyal@nxp.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph
> <anoobj@marvell.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; dodji@seketeli.net; Andrew
> Rybchenko <arybchenko@solarflare.com>; aconole@redhat.com; Trahe,
> Fiona <fiona.trahe@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> 16/03/2020 13:57, Trahe, Fiona:
> > From: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>
> > > > > > The patch we're working on will provide two versions of
> > > > > > rte_cryptodev_info_get(), both call the same PMD function from
> > > > > > the
> > > > dev_ops info_get fn ptr.
> > > > > > The default version operates s as normal, the 19.11 version
> > > > > > searches through the list returned by the PMD, looking for
> > > > > > sym.aead.algo = ChaChaPoly, it needs to strip it from
> > > > > the list.
> > > > > > As PMDs just pass a ptr to their capabilities list ( it isn't
> > > > > > a linked list, but an array with an end marker  =
> > > > > > RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API layer
> > > > > > detects Chacha it must allocate some space and store a local
> > > > > > copy of the trimmed
> > > > list. This must be stored only once per device.
> > >
> > > [Arek] The problem with this solution is that we need to allocate memory.
> > > So the question is how to handle unlikely case of malloc error when
> > > we operate inside void function rte_cryptodev_info_get?
> > > And even if we would pass somehow error condition to the caller then
> what to do is another question.
> >
> > [Fiona] Quick recap: To avoid breaking ABI, we must return a set of
> > capabilities with/without ChaChaPoly depending on the appl version. To
> > resolve this, within the rte_cryptodev layer, we propose to inspect the
> capabilities returned by PMD and strip ChaCha if it exists.
> > In that case memory for the new trimmed capabilities array has to be
> malloced by the lib.
> 
> What happens if the capability is removed from the original capabilities
> input?
> 
> > All good, except how to handle a malloc fail is yet another API breakage as
> rte_cryptodev_get_info() returns void.
> > We propose to return an empty capability list, i.e. a list with only
> > the END element (which can be done without malloc) in this corner case of
> a corner case.
> > Anyone see any issue with this?
> 
> How can we use the feature if it is not advertised in capabilities?
What Fiona meant is that empty capability would indicate error condition in this case. That's why she asked if you ok with this API breakage.
> 
>
  
Thomas Monjalon March 17, 2020, 3:10 p.m. UTC | #60
17/03/2020 14:27, Kusztal, ArkadiuszX:
> Hi Thomas,
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Monday, March 16, 2020 2:09 PM
> > To: Trahe, Fiona <fiona.trahe@intel.com>
> > Cc: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; David Marchand
> > <david.marchand@redhat.com>; nhorman@tuxdriver.com;
> > bluca@debian.org; ktraynor@redhat.com; Ray Kinsella <mdr@ashroe.eu>;
> > dev@dpdk.org; Akhil Goyal <akhil.goyal@nxp.com>; Yigit, Ferruh
> > <ferruh.yigit@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph
> > <anoobj@marvell.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> > Mcnamara, John <john.mcnamara@intel.com>; dodji@seketeli.net; Andrew
> > Rybchenko <arybchenko@solarflare.com>; aconole@redhat.com; Trahe,
> > Fiona <fiona.trahe@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> > 
> > 16/03/2020 13:57, Trahe, Fiona:
> > > From: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>
> > > > > > > The patch we're working on will provide two versions of
> > > > > > > rte_cryptodev_info_get(), both call the same PMD function from
> > > > > > > the
> > > > > dev_ops info_get fn ptr.
> > > > > > > The default version operates s as normal, the 19.11 version
> > > > > > > searches through the list returned by the PMD, looking for
> > > > > > > sym.aead.algo = ChaChaPoly, it needs to strip it from
> > > > > > the list.
> > > > > > > As PMDs just pass a ptr to their capabilities list ( it isn't
> > > > > > > a linked list, but an array with an end marker  =
> > > > > > > RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API layer
> > > > > > > detects Chacha it must allocate some space and store a local
> > > > > > > copy of the trimmed
> > > > > list. This must be stored only once per device.
> > > >
> > > > [Arek] The problem with this solution is that we need to allocate memory.
> > > > So the question is how to handle unlikely case of malloc error when
> > > > we operate inside void function rte_cryptodev_info_get?
> > > > And even if we would pass somehow error condition to the caller then
> > what to do is another question.
> > >
> > > [Fiona] Quick recap: To avoid breaking ABI, we must return a set of
> > > capabilities with/without ChaChaPoly depending on the appl version. To
> > > resolve this, within the rte_cryptodev layer, we propose to inspect the
> > capabilities returned by PMD and strip ChaCha if it exists.
> > > In that case memory for the new trimmed capabilities array has to be
> > malloced by the lib.
> > 
> > What happens if the capability is removed from the original capabilities
> > input?
> > 
> > > All good, except how to handle a malloc fail is yet another API breakage as
> > rte_cryptodev_get_info() returns void.
> > > We propose to return an empty capability list, i.e. a list with only
> > > the END element (which can be done without malloc) in this corner case of
> > a corner case.
> > > Anyone see any issue with this?
> > 
> > How can we use the feature if it is not advertised in capabilities?
> What Fiona meant is that empty capability would indicate error condition in this case. That's why she asked if you ok with this API breakage.

Sorry I'm lost.
Please could you show what would be the usage?
  
Arkadiusz Kusztal March 17, 2020, 7:10 p.m. UTC | #61
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, March 17, 2020 4:10 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; Kusztal, ArkadiuszX
> <arkadiuszx.kusztal@intel.com>
> Cc: David Marchand <david.marchand@redhat.com>;
> nhorman@tuxdriver.com; bluca@debian.org; ktraynor@redhat.com; Ray
> Kinsella <mdr@ashroe.eu>; dev@dpdk.org; Akhil Goyal
> <akhil.goyal@nxp.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph
> <anoobj@marvell.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; dodji@seketeli.net; Andrew
> Rybchenko <arybchenko@solarflare.com>; aconole@redhat.com; Trahe,
> Fiona <fiona.trahe@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> 
> 17/03/2020 14:27, Kusztal, ArkadiuszX:
> > Hi Thomas,
> >
> > > -----Original Message-----
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > Sent: Monday, March 16, 2020 2:09 PM
> > > To: Trahe, Fiona <fiona.trahe@intel.com>
> > > Cc: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; David
> > > Marchand <david.marchand@redhat.com>; nhorman@tuxdriver.com;
> > > bluca@debian.org; ktraynor@redhat.com; Ray Kinsella
> <mdr@ashroe.eu>;
> > > dev@dpdk.org; Akhil Goyal <akhil.goyal@nxp.com>; Yigit, Ferruh
> > > <ferruh.yigit@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>; dev@dpdk.org; Anoob Joseph
> > > <anoobj@marvell.com>; Richardson, Bruce
> > > <bruce.richardson@intel.com>; Mcnamara, John
> > > <john.mcnamara@intel.com>; dodji@seketeli.net; Andrew Rybchenko
> > > <arybchenko@solarflare.com>; aconole@redhat.com; Trahe, Fiona
> > > <fiona.trahe@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 4/4] add ABI checks
> > >
> > > 16/03/2020 13:57, Trahe, Fiona:
> > > > From: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>
> > > > > > > > The patch we're working on will provide two versions of
> > > > > > > > rte_cryptodev_info_get(), both call the same PMD function
> > > > > > > > from the
> > > > > > dev_ops info_get fn ptr.
> > > > > > > > The default version operates s as normal, the 19.11
> > > > > > > > version searches through the list returned by the PMD,
> > > > > > > > looking for sym.aead.algo = ChaChaPoly, it needs to strip
> > > > > > > > it from
> > > > > > > the list.
> > > > > > > > As PMDs just pass a ptr to their capabilities list ( it
> > > > > > > > isn't a linked list, but an array with an end marker  =
> > > > > > > > RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST) if the API layer
> > > > > > > > detects Chacha it must allocate some space and store a
> > > > > > > > local copy of the trimmed
> > > > > > list. This must be stored only once per device.
> > > > >
> > > > > [Arek] The problem with this solution is that we need to allocate
> memory.
> > > > > So the question is how to handle unlikely case of malloc error
> > > > > when we operate inside void function rte_cryptodev_info_get?
> > > > > And even if we would pass somehow error condition to the caller
> > > > > then
> > > what to do is another question.
> > > >
> > > > [Fiona] Quick recap: To avoid breaking ABI, we must return a set
> > > > of capabilities with/without ChaChaPoly depending on the appl
> > > > version. To resolve this, within the rte_cryptodev layer, we
> > > > propose to inspect the
> > > capabilities returned by PMD and strip ChaCha if it exists.
> > > > In that case memory for the new trimmed capabilities array has to
> > > > be
> > > malloced by the lib.
> > >
> > > What happens if the capability is removed from the original
> > > capabilities input?
> > >
> > > > All good, except how to handle a malloc fail is yet another API
> > > > breakage as
> > > rte_cryptodev_get_info() returns void.
> > > > We propose to return an empty capability list, i.e. a list with
> > > > only the END element (which can be done without malloc) in this
> > > > corner case of
> > > a corner case.
> > > > Anyone see any issue with this?
> > >
> > > How can we use the feature if it is not advertised in capabilities?
> > What Fiona meant is that empty capability would indicate error condition in
> this case. That's why she asked if you ok with this API breakage.
> 
> Sorry I'm lost.
> Please could you show what would be the usage?
> 
So this case looks more or less like that:
There are two versions of `rte_cryptodev_info_get`

rte_cryptodev_info_get_v20 (versioned)
rte_cryptodev_info_get_v2005 (new default symbol)

Default version works normally as it will be called only by app build with 20.05 version.

When prior to 20.05 app calls `rte_cryptodev_info_get` version v20 is called. This function will remove Chacha Poly from capabilities, but to achieve this we need to get some memory to store `new` set of capabilities per device (without Chacha). So:
new_capability[dev_id] = malloc( (num_of_capabilies - 1 (chacha)) * sizeof(struct rte_cryptodev_capabilities))
The small problem is how to handle malloc error:
If (new_capability[dev_id] == NULL) {
	/* What to do now as rte_cryptodev_info_get is void function, and API does not say anything about error condition */
/*So Fiona suggestion above is to inform user of an error by doing this: */
	dev_info->capabilities = cryptodev_undefined_capabilities;

/* Where 
static const struct rte_cryptodev_capabilities cryptodev_undefined_capabilities[] = {
		RTE_CRYPTODEV_END_OF_CAPABILITIES_LIST()

}; */
}
Sizeof rte_cryptodev_capabilities is 38 bytes, padded to 40. So properly constructed capabilities can take at most 1920 bytes.  No system should ever fail doing that though iam not an expert. Other option could probably be to preallocate this memory. This is how I understand that.

Another question is can something like that be done if API comments of `rte_cryptodev_info_get` function does not say anything about any potential error?

Regards,
Arek
  

Patch

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index e61aa2b0a..95bd869c3 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -30,6 +30,7 @@  fi
 
 OPTS="$OPTS --default-library=$DEF_LIB"
 OPTS="$OPTS --prefix=/usr -Dlibdir=lib"
+OPTS="$OPTS --buildtype=debugoptimized"
 meson build --werror -Dexamples=all $OPTS
 ninja -C build
 DESTDIR=$(pwd)/install ninja -C build install
@@ -40,6 +41,28 @@  if [ "$AARCH64" != "1" ]; then
     unset LD_LIBRARY_PATH
 fi
 
+if [ "$ABI_CHECKS" = "1" ]; then
+    REF_GIT_REPO=${REF_GIT_REPO:-https://dpdk.org/git/dpdk}
+    REF_GIT_TAG=${REF_GIT_TAG:-v19.11}
+
+    if [ "$(cat reference/VERSION 2>/dev/null)" != "$REF_GIT_TAG" ]; then
+        rm -rf reference
+    fi
+
+    if [ ! -d reference ]; then
+        refsrcdir=$(readlink -f $(pwd)/../dpdk-$REF_GIT_TAG)
+        git clone --single-branch -b $REF_GIT_TAG $REF_GIT_REPO $refsrcdir
+        meson --werror $OPTS $refsrcdir $refsrcdir/build
+        ninja -C $refsrcdir/build
+        DESTDIR=$(pwd)/reference ninja -C $refsrcdir/build install
+        devtools/gen-abi.sh reference
+        echo $REF_GIT_TAG > reference/VERSION
+    fi
+
+    devtools/gen-abi.sh install
+    devtools/check-abi.sh reference install ${ABI_CHECKS_WARN_ONLY:-}
+fi
+
 if [ "$RUN_TESTS" = "1" ]; then
     sudo meson test -C build --suite fast-tests -t 3
 fi
diff --git a/.travis.yml b/.travis.yml
index 8162f1c05..22539d823 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,5 +1,8 @@ 
 language: c
-cache: ccache
+cache:
+  ccache: true
+  directories:
+    - reference
 compiler:
   - gcc
   - clang
@@ -21,7 +24,7 @@  aarch64_packages: &aarch64_packages
 
 extra_packages: &extra_packages
   - *required_packages
-  - [libbsd-dev, libpcap-dev, libcrypto++-dev, libjansson4]
+  - [libbsd-dev, libpcap-dev, libcrypto++-dev, libjansson4, abigail-tools]
 
 build_32b_packages: &build_32b_packages
   - *required_packages
@@ -151,5 +154,18 @@  matrix:
         packages:
           - *required_packages
           - *doc_packages
+  - env: DEF_LIB="shared" EXTRA_PACKAGES=1 ABI_CHECKS=1
+    compiler: gcc
+    addons:
+      apt:
+        packages:
+          - *extra_packages
+  - env: DEF_LIB="shared" EXTRA_PACKAGES=1 ABI_CHECKS=1
+    arch: arm64
+    compiler: gcc
+    addons:
+      apt:
+        packages:
+          - *extra_packages
 
 script: ./.ci/${TRAVIS_OS_NAME}-build.sh
diff --git a/MAINTAINERS b/MAINTAINERS
index 94bccae6d..6dae4ee63 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -144,8 +144,10 @@  M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_eal/common/include/rte_compat.h
 F: lib/librte_eal/common/include/rte_function_versioning.h
 F: doc/guides/rel_notes/deprecation.rst
+F: devtools/check-abi.sh
 F: devtools/check-abi-version.sh
 F: devtools/check-symbol-change.sh
+F: devtools/gen-abi.sh
 F: devtools/update-abi.sh
 F: devtools/update_version_map_abi.py
 F: devtools/validate-abi.sh
diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
new file mode 100755
index 000000000..5872499ec
--- /dev/null
+++ b/devtools/check-abi.sh
@@ -0,0 +1,59 @@ 
+#!/bin/sh -e
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2019 Red Hat, Inc.
+
+if [ $# != 2 ] && [ $# != 3 ]; then
+	echo "Usage: $0 refdir newdir [warnonly]"
+	exit 1
+fi
+
+refdir=$1
+newdir=$2
+warnonly=${3:-}
+ABIDIFF_OPTIONS="--suppr $(dirname $0)/dpdk.abignore --no-added-syms"
+
+if [ ! -d $refdir ]; then
+	echo "Error: reference directory '$refdir' does not exist."
+	exit 1
+fi
+incdir=$(find $refdir -type d -a -name include)
+if [ -z "$incdir" ] || [ ! -e "$incdir" ]; then
+	echo "WARNING: could not identify a include directory for $refdir, expect false positives..."
+else
+	ABIDIFF_OPTIONS="$ABIDIFF_OPTIONS --headers-dir1 $incdir"
+fi
+
+if [ ! -d $newdir ]; then
+	echo "Error: directory to check '$newdir' does not exist."
+	exit 1
+fi
+incdir2=$(find $newdir -type d -a -name include)
+if [ -z "$incdir2" ] || [ ! -e "$incdir2" ]; then
+	echo "WARNING: could not identify a include directory for $newdir, expect false positives..."
+else
+	ABIDIFF_OPTIONS="$ABIDIFF_OPTIONS --headers-dir2 $incdir2"
+fi
+
+error=
+for dump in $(find $refdir -name "*.dump"); do
+	name=$(basename $dump)
+	# skip glue drivers, example librte_pmd_mlx5_glue.dump
+	# We can't rely on a suppression rule for now:
+	# https://sourceware.org/bugzilla/show_bug.cgi?id=25480
+	if [ "$name" != "${name%%_glue.dump}" ]; then
+		echo "Skipping ${dump}..."
+		continue
+	fi
+	dump2=$(find $newdir -name $name)
+	if [ -z "$dump2" ] || [ ! -e "$dump2" ]; then
+		echo "Error: can't find $name in $newdir"
+		error=1
+		continue
+	fi
+	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
+		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
+		error=1
+	fi
+done
+
+[ -z "$error" ] || [ -n "$warnonly" ]
diff --git a/devtools/dpdk.abignore b/devtools/dpdk.abignore
new file mode 100644
index 000000000..0c01eebea
--- /dev/null
+++ b/devtools/dpdk.abignore
@@ -0,0 +1,20 @@ 
+[suppress_function]
+        symbol_version = EXPERIMENTAL
+[suppress_variable]
+        symbol_version = EXPERIMENTAL
+
+; Explicit ignore for driver-only ABI
+[suppress_type]
+        name = rte_cryptodev_ops
+
+; FIXME
+[suppress_type]
+        type_kind = enum
+        name = rte_crypto_aead_algorithm
+        changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
+[suppress_type]
+        type_kind = enum
+        name = rte_crypto_asym_xform_type
+        changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+[suppress_variable]
+        name = rte_crypto_aead_algorithm_strings
diff --git a/devtools/gen-abi.sh b/devtools/gen-abi.sh
new file mode 100755
index 000000000..c44b0e228
--- /dev/null
+++ b/devtools/gen-abi.sh
@@ -0,0 +1,26 @@ 
+#!/bin/sh -e
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2019 Red Hat, Inc.
+
+if [ $# != 1 ]; then
+	echo "Usage: $0 installdir"
+	exit 1
+fi
+
+installdir=$1
+if [ ! -d $installdir ]; then
+	echo "Error: install directory '$installdir' does not exist."
+	exit 1
+fi
+
+dumpdir=$installdir/dump
+rm -rf $dumpdir
+mkdir -p $dumpdir
+for f in $(find $installdir -name "*.so.*"); do
+	if test -L $f; then
+		continue
+	fi
+
+	libname=$(basename $f)
+	abidw --out-file $dumpdir/${libname%.so*}.dump $f
+done
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 52305fbb8..fba30bac9 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -30,7 +30,8 @@  default_path=$PATH
 # - LIBSSO_SNOW3G_PATH
 # - LIBSSO_KASUMI_PATH
 # - LIBSSO_ZUC_PATH
-. $(dirname $(readlink -f $0))/load-devel-config
+devtools_dir=$(dirname $(readlink -f $0))
+. $devtools_dir/load-devel-config
 
 print_usage () {
 	echo "usage: $(basename $0) [-h] [-jX] [-s] [config1 [config2] ...]]"
@@ -64,6 +65,7 @@  print_help () {
 [ -z $MAKE ] && echo "Cannot find make or gmake" && exit 1
 
 J=$DPDK_MAKE_JOBS
+refsrcdir=$(mktemp -d -t dpdk-${DPDK_ABI_REF_VERSION:-}.XXX)
 builds_dir=${DPDK_BUILD_TEST_DIR:-.}
 short=false
 unset verbose
@@ -91,13 +93,14 @@  on_exit ()
 		[ "$DPDK_NOTIFY" != notify-send ] || \
 			notify-send -u low --icon=dialog-error 'DPDK build' 'failed'
 	fi
+	rm -rf $refsrcdir
 }
 # catch manual interrupt to ignore notification
 trap "signal=INT ; trap - INT ; kill -INT $$" INT
 # notify result on exit
 trap on_exit EXIT
 
-cd $(dirname $(readlink -f $0))/..
+cd $devtools_dir/..
 
 reset_env ()
 {
@@ -233,7 +236,7 @@  for conf in $configs ; do
 	# reload config with DPDK_TARGET set
 	DPDK_TARGET=$target
 	reset_env
-	. $(dirname $(readlink -f $0))/load-devel-config
+	. $devtools_dir/load-devel-config
 
 	options=$(echo $conf | sed 's,[^~+]*,,')
 	dir=$builds_dir/$conf
@@ -253,6 +256,42 @@  for conf in $configs ; do
 		EXTRA_LDFLAGS="$DPDK_DEP_LDFLAGS" $verbose \
 		O=$(readlink -f $dir)/examples
 	unset RTE_TARGET
+	if [ -n "$DPDK_ABI_REF_VERSION" ]; then
+		DPDK_ABI_REF_DIR=${DPDK_ABI_REF_DIR:-reference}
+		abirefdir=$DPDK_ABI_REF_DIR/$DPDK_ABI_REF_VERSION/$conf
+		if [ ! -d $abirefdir ]; then
+			# clone current sources
+			if [ ! -d $refsrcdir/.git ]; then
+				git clone --local --no-hardlinks \
+					--single-branch \
+					-b $DPDK_ABI_REF_VERSION \
+					$(pwd) $refsrcdir
+			fi
+
+			cd $refsrcdir
+
+			rm -rf build
+			config build $target $options
+
+			echo -n "================== Build $conf "
+			echo "($DPDK_ABI_REF_VERSION)"
+			${MAKE} -j$J EXTRA_CFLAGS="$maxerr $DPDK_DEP_CFLAGS" \
+				EXTRA_LDFLAGS="$DPDK_DEP_LDFLAGS" $verbose \
+				O=build
+			! $short || break
+			export RTE_TARGET=$target
+			${MAKE} install O=build DESTDIR=$abirefdir \
+				prefix=
+			$devtools_dir/gen-abi.sh $abirefdir
+
+			# back to current workdir
+			cd $devtools_dir/..
+		fi
+
+		echo "================== Check ABI $conf"
+		$devtools_dir/gen-abi.sh $dir/install
+		$devtools_dir/check-abi.sh $abirefdir $dir/install
+	fi
 	echo "################## $conf done."
 	unset dir
 done
diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index 254588ae6..1b410c784 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -16,8 +16,15 @@  srcdir=$(dirname $(readlink -f $0))/..
 
 MESON=${MESON:-meson}
 use_shared="--default-library=shared"
+refsrcdir=$(mktemp -d -t dpdk-${DPDK_ABI_REF_VERSION:-}.XXX)
 builds_dir=${DPDK_BUILD_TEST_DIR:-.}
 
+on_exit ()
+{
+	rm -rf $refsrcdir
+}
+trap on_exit EXIT
+
 if command -v gmake >/dev/null 2>&1 ; then
 	MAKE=gmake
 else
@@ -63,7 +70,9 @@  config () # <dir> <builddir> <meson options>
 	shift
 	builddir=$1
 	shift
-	options="--werror -Dexamples=all --prefix=/usr -Dlibdir=lib"
+	options=
+	options="$options --werror --buildtype=debugoptimized -Dexamples=all"
+	options="$options --prefix=/usr -Dlibdir=lib"
 	for option in $DPDK_MESON_OPTIONS ; do
 		options="$options -D$option"
 	done
@@ -96,7 +105,6 @@  compile () # <builddir> <installdir>
 		$ninja_cmd -C $builddir
 		$ninja_cmd -C $builddir install
 	fi
-	unset DESTDIR
 }
 
 build () # <directory> <target compiler> <meson options>
@@ -111,6 +119,29 @@  build () # <directory> <target compiler> <meson options>
 	config $srcdir $builds_dir/$targetdir $*
 	compile $builds_dir/$targetdir \
 		$(readlink -f $builds_dir/$targetdir/install)
+	if [ -n "$DPDK_ABI_REF_VERSION" ]; then
+		DPDK_ABI_REF_DIR=${DPDK_ABI_REF_DIR:-reference}
+		abirefdir=$DPDK_ABI_REF_DIR/$DPDK_ABI_REF_VERSION/$targetdir
+		if [ ! -d $abirefdir ]; then
+			# clone current sources
+			if [ ! -d $refsrcdir/.git ]; then
+				git clone --local --no-hardlinks \
+					--single-branch \
+					-b $DPDK_ABI_REF_VERSION \
+					$srcdir $refsrcdir
+			fi
+
+			rm -rf $refsrcdir/build
+			config $refsrcdir $refsrcdir/build $*
+			compile $refsrcdir/build $abirefdir
+			$srcdir/devtools/gen-abi.sh $abirefdir
+		fi
+
+		$srcdir/devtools/gen-abi.sh \
+			$(readlink -f $builds_dir/$targetdir/install)
+		$srcdir/devtools/check-abi.sh $abirefdir \
+			$(readlink -f $builds_dir/$targetdir/install)
+	fi
 }
 
 if [ "$1" = "-vv" ] ; then
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 0686450e4..2e16741ca 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -513,6 +513,19 @@  in a single subfolder called "__builds" created in the current directory.
 Setting ``DPDK_BUILD_TEST_DIR`` to an absolute directory path e.g. ``/tmp`` is also supported.
 
 
+Checking ABI compatibility
+--------------------------
+
+By default, ABI compatibility checks are disabled.
+
+To enable them, a reference version must be selected via the environment
+variable ``DPDK_ABI_REF_VERSION``.
+
+The ``devtools/test-build.sh`` and ``devtools/test-meson-builds.sh`` scripts
+then build this reference version in a temporary directory and store the
+results in the ``DPDK_ABI_REF_DIR`` directory.
+
+
 Sending Patches
 ---------------