mbox series

[v4,0/5] integrate librte_ipsec SAD into ipsec-secgw

Message ID 1579012036-326214-1-git-send-email-vladimir.medvedkin@intel.com (mailing list archive)
Headers
Series integrate librte_ipsec SAD into ipsec-secgw |

Message

Vladimir Medvedkin Jan. 14, 2020, 2:27 p.m. UTC
  This series integrates SA database (SAD) capabilities from ipsec library.
The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
Also patch series removes hardcoded limitation for maximum number of SA's
and SP's.

v4:
 - put tunnel SA's into SAD with SPI_ONLY type for performance reason

v3:
 - parse SA and SP into sorted array instead of linked list

v2:
 - get rid of maximum sp limitation

Vladimir Medvedkin (5):
  ipsec: move ipsec sad name length into .h
  examples/ipsec-secgw: implement inbound SAD
  examples/ipsec-secgw: integrate inbound SAD
  examples/ipsec-secgw: get rid of maximum sa limitation
  examples/ipsec-secgw: get rid of maximum sp limitation

 examples/ipsec-secgw/Makefile      |   1 +
 examples/ipsec-secgw/ipsec-secgw.c |   4 +-
 examples/ipsec-secgw/ipsec.h       |  11 +-
 examples/ipsec-secgw/meson.build   |   2 +-
 examples/ipsec-secgw/parser.c      |   4 +
 examples/ipsec-secgw/parser.h      |   9 ++
 examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++--------------
 examples/ipsec-secgw/sad.c         |  90 +++++++++++++
 examples/ipsec-secgw/sad.h         |  74 +++++++++++
 examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
 examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
 lib/librte_ipsec/ipsec_sad.c       |  20 +--
 lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
 13 files changed, 528 insertions(+), 171 deletions(-)
 create mode 100644 examples/ipsec-secgw/sad.c
 create mode 100644 examples/ipsec-secgw/sad.h
  

Comments

Akhil Goyal Jan. 15, 2020, 3:45 p.m. UTC | #1
Hi Vladimir,

There is more than 10% drop with this patchset on NXP hardware with both legacy mode and the ipsec lib mode. This would need some debugging.
Didn't you see any drop on intel?

Regards,
Akhil

> -----Original Message-----
> From: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Sent: Tuesday, January 14, 2020 7:57 PM
> To: dev@dpdk.org
> Cc: konstantin.ananyev@intel.com; Akhil Goyal <akhil.goyal@nxp.com>
> Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> 
> This series integrates SA database (SAD) capabilities from ipsec library.
> The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> Also patch series removes hardcoded limitation for maximum number of SA's
> and SP's.
> 
> v4:
>  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> 
> v3:
>  - parse SA and SP into sorted array instead of linked list
> 
> v2:
>  - get rid of maximum sp limitation
> 
> Vladimir Medvedkin (5):
>   ipsec: move ipsec sad name length into .h
>   examples/ipsec-secgw: implement inbound SAD
>   examples/ipsec-secgw: integrate inbound SAD
>   examples/ipsec-secgw: get rid of maximum sa limitation
>   examples/ipsec-secgw: get rid of maximum sp limitation
> 
>  examples/ipsec-secgw/Makefile      |   1 +
>  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
>  examples/ipsec-secgw/ipsec.h       |  11 +-
>  examples/ipsec-secgw/meson.build   |   2 +-
>  examples/ipsec-secgw/parser.c      |   4 +
>  examples/ipsec-secgw/parser.h      |   9 ++
>  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++--------------
>  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
>  examples/ipsec-secgw/sad.h         |  74 +++++++++++
>  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
>  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
>  lib/librte_ipsec/ipsec_sad.c       |  20 +--
>  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
>  13 files changed, 528 insertions(+), 171 deletions(-)
>  create mode 100644 examples/ipsec-secgw/sad.c
>  create mode 100644 examples/ipsec-secgw/sad.h
> 
> --
> 2.7.4
  
Akhil Goyal Jan. 17, 2020, 12:26 p.m. UTC | #2
Hi Vladimir,

The lookup logic for SAD has been brought more closer to real use case, but it looks very high on CPU and should be optimized. We cannot have 10-15% drop because of this change in SA lookup for small packet(82B) sizes where CPU is bottleneck. For large packet sizes it will not impact.

> 
> Hi Vladimir,
> 
> There is more than 10% drop with this patchset on NXP hardware with both
> legacy mode and the ipsec lib mode. This would need some debugging.
> Didn't you see any drop on intel?
> 
> Regards,
> Akhil
> 
> >
> > This series integrates SA database (SAD) capabilities from ipsec library.
> > The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> > Also patch series removes hardcoded limitation for maximum number of SA's
> > and SP's.
> >
> > v4:
> >  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> >
> > v3:
> >  - parse SA and SP into sorted array instead of linked list
> >
> > v2:
> >  - get rid of maximum sp limitation
> >
> > Vladimir Medvedkin (5):
> >   ipsec: move ipsec sad name length into .h
> >   examples/ipsec-secgw: implement inbound SAD
> >   examples/ipsec-secgw: integrate inbound SAD
> >   examples/ipsec-secgw: get rid of maximum sa limitation
> >   examples/ipsec-secgw: get rid of maximum sp limitation
> >
> >  examples/ipsec-secgw/Makefile      |   1 +
> >  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
> >  examples/ipsec-secgw/ipsec.h       |  11 +-
> >  examples/ipsec-secgw/meson.build   |   2 +-
> >  examples/ipsec-secgw/parser.c      |   4 +
> >  examples/ipsec-secgw/parser.h      |   9 ++
> >  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++-------------
> -
> >  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
> >  examples/ipsec-secgw/sad.h         |  74 +++++++++++
> >  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
> >  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
> >  lib/librte_ipsec/ipsec_sad.c       |  20 +--
> >  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
> >  13 files changed, 528 insertions(+), 171 deletions(-)
> >  create mode 100644 examples/ipsec-secgw/sad.c
> >  create mode 100644 examples/ipsec-secgw/sad.h
> >
> > --
> > 2.7.4
  
Vladimir Medvedkin Jan. 17, 2020, 5:05 p.m. UTC | #3
Hi Akhil,

Indeed with our tests we also seeing ~15% perf drop for small packets 
(~90B) and ~3-4% drop for 1KB packets. While I am looking on a ways to 
minimize the drop, I think it would be hard, if possible at all to 
eliminate it completely.
Reason for that: current SAD implementation is completely synthetic 
(using plain array structure indexed by SPI value). That provides a very 
low overhead, but doesn't provide expected functionality and can't be 
used in proper implementation.
To measure plain IPsec performance without SAD user can still use 
'--signle-sa' option.

On 15/01/2020 15:45, Akhil Goyal wrote:
> Hi Vladimir,
>
> There is more than 10% drop with this patchset on NXP hardware with both legacy mode and the ipsec lib mode. This would need some debugging.
> Didn't you see any drop on intel?
>
> Regards,
> Akhil
>
>> -----Original Message-----
>> From: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>> Sent: Tuesday, January 14, 2020 7:57 PM
>> To: dev@dpdk.org
>> Cc: konstantin.ananyev@intel.com; Akhil Goyal <akhil.goyal@nxp.com>
>> Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
>>
>> This series integrates SA database (SAD) capabilities from ipsec library.
>> The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
>> Also patch series removes hardcoded limitation for maximum number of SA's
>> and SP's.
>>
>> v4:
>>   - put tunnel SA's into SAD with SPI_ONLY type for performance reason
>>
>> v3:
>>   - parse SA and SP into sorted array instead of linked list
>>
>> v2:
>>   - get rid of maximum sp limitation
>>
>> Vladimir Medvedkin (5):
>>    ipsec: move ipsec sad name length into .h
>>    examples/ipsec-secgw: implement inbound SAD
>>    examples/ipsec-secgw: integrate inbound SAD
>>    examples/ipsec-secgw: get rid of maximum sa limitation
>>    examples/ipsec-secgw: get rid of maximum sp limitation
>>
>>   examples/ipsec-secgw/Makefile      |   1 +
>>   examples/ipsec-secgw/ipsec-secgw.c |   4 +-
>>   examples/ipsec-secgw/ipsec.h       |  11 +-
>>   examples/ipsec-secgw/meson.build   |   2 +-
>>   examples/ipsec-secgw/parser.c      |   4 +
>>   examples/ipsec-secgw/parser.h      |   9 ++
>>   examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++--------------
>>   examples/ipsec-secgw/sad.c         |  90 +++++++++++++
>>   examples/ipsec-secgw/sad.h         |  74 +++++++++++
>>   examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
>>   examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
>>   lib/librte_ipsec/ipsec_sad.c       |  20 +--
>>   lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
>>   13 files changed, 528 insertions(+), 171 deletions(-)
>>   create mode 100644 examples/ipsec-secgw/sad.c
>>   create mode 100644 examples/ipsec-secgw/sad.h
>>
>> --
>> 2.7.4
  
Akhil Goyal Jan. 20, 2020, 6:44 a.m. UTC | #4
Hi Vladimir,
The SA lookup logic and management is purely requirement based for the application. The application may only cater to <128 SAs which can be handled based on the current logic. –single-sa option cannot handle this.
Sample applications in DPDK are there to showcase the best a hardware can deliver. IMO, we cannot allow this logic on NXP hardwares. We give performance numbers based on IPSec app to customers and we cannot allow 15% degradation.
Other vendors(Marvell, ARM, AMD) please comment?
Regards,
Akhil
From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
Sent: Friday, January 17, 2020 10:35 PM
To: Akhil Goyal <akhil.goyal@nxp.com>; dev@dpdk.org
Cc: konstantin.ananyev@intel.com
Subject: Re: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw

Hi Akhil,
Indeed with our tests we also seeing ~15% perf drop for small packets (~90B) and ~3-4% drop for 1KB packets. While I am looking on a ways to minimize the drop, I think it would be hard, if possible at all to eliminate it completely.
Reason for that: current SAD implementation is completely synthetic (using plain array structure indexed by SPI value). That provides a very low overhead, but doesn't provide expected functionality and can't be used in proper implementation.
To measure plain IPsec performance without SAD user can still use '--signle-sa' option.
On 15/01/2020 15:45, Akhil Goyal wrote:

Hi Vladimir,



There is more than 10% drop with this patchset on NXP hardware with both legacy mode and the ipsec lib mode. This would need some debugging.

Didn't you see any drop on intel?



Regards,

Akhil



-----Original Message-----

From: Vladimir Medvedkin <vladimir.medvedkin@intel.com><mailto:vladimir.medvedkin@intel.com>

Sent: Tuesday, January 14, 2020 7:57 PM

To: dev@dpdk.org<mailto:dev@dpdk.org>

Cc: konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>; Akhil Goyal <akhil.goyal@nxp.com><mailto:akhil.goyal@nxp.com>

Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw



This series integrates SA database (SAD) capabilities from ipsec library.

The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.

Also patch series removes hardcoded limitation for maximum number of SA's

and SP's.



v4:

 - put tunnel SA's into SAD with SPI_ONLY type for performance reason



v3:

 - parse SA and SP into sorted array instead of linked list



v2:

 - get rid of maximum sp limitation



Vladimir Medvedkin (5):

  ipsec: move ipsec sad name length into .h

  examples/ipsec-secgw: implement inbound SAD

  examples/ipsec-secgw: integrate inbound SAD

  examples/ipsec-secgw: get rid of maximum sa limitation

  examples/ipsec-secgw: get rid of maximum sp limitation



 examples/ipsec-secgw/Makefile      |   1 +

 examples/ipsec-secgw/ipsec-secgw.c |   4 +-

 examples/ipsec-secgw/ipsec.h       |  11 +-

 examples/ipsec-secgw/meson.build   |   2 +-

 examples/ipsec-secgw/parser.c      |   4 +

 examples/ipsec-secgw/parser.h      |   9 ++

 examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++--------------

 examples/ipsec-secgw/sad.c         |  90 +++++++++++++

 examples/ipsec-secgw/sad.h         |  74 +++++++++++

 examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----

 examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----

 lib/librte_ipsec/ipsec_sad.c       |  20 +--

 lib/librte_ipsec/rte_ipsec_sad.h   |   2 +

 13 files changed, 528 insertions(+), 171 deletions(-)

 create mode 100644 examples/ipsec-secgw/sad.c

 create mode 100644 examples/ipsec-secgw/sad.h



--

2.7.4



--

Regards,

Vladimir
-->
  
Anoob Joseph Jan. 20, 2020, 12:44 p.m. UTC | #5
Hi Vladimir, Akhil,

Marvell is also observing 10-15% drop with the SAD change. I agree with Akhil's opinion and we are not in favor of making this change.

Thanks,
Anoob

From: Akhil Goyal <akhil.goyal@nxp.com> 
Sent: Monday, January 20, 2020 12:14 PM
To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; dev@dpdk.org
Cc: konstantin.ananyev@intel.com; Anoob Joseph <anoobj@marvell.com>; Thomas Monjalon <thomas@monjalon.net>; Ravi Kumar <ravi1.kumar@amd.com>; Ruifeng Wang <ruifeng.wang@arm.com>
Subject: [EXT] RE: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw

External Email
  
Ananyev, Konstantin Jan. 20, 2020, 2:45 p.m. UTC | #6
Hi Akhil,
 
> Hi Vladimir,
> The SA lookup logic and management is purely requirement based for the application. 
>The application may only cater to <128 SAs which can
> be handled based on the current logic.

Not always, current implementation can handle < 128 SA, 
whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
Yes, what we have right now has nearly zero overhead,
and might be ok for some really simple show-cases.
But for majority of production IPsec implementations, 
I believe that definitely wouldn't be enough.  

> –single-sa option cannot handle this.
> Sample applications in DPDK are there to showcase the best a hardware can deliver. 

My thought was - that's the reason we have single-sa option -
demonstrate best possible HW perf without minimal SW intervention.
For something more serious than that, we use generic SAD implementation.

> IMO, we cannot allow this logic on NXP hardwares. We
> give performance numbers based on IPSec app to customers and we cannot allow 15% degradation.

As Vladimir said, we are looking how to improve current SAD numbers
and minimize the drop.
But with same equals - plain array will always be faster than hash table,
so not sure we will be able to match existing performance.
So two questions:
1. What exact case you use for perf testing
    (total number of SAs, packets per burst belong to the same/different SAs)?
    Might be there is a way to speedup it.
    Again if 10-15% is not an affordable drop, which one is: zero or ...?
2. I think there are 2 different directions for ipsec-secgw:
   From one-side there is a desire to use it as a show-case for best-possible HW IPsec performance
  (which is understandable).
   From other side - attempt to make it as close as real-world generic ipsec processing app as possible
   (support for ESN, replay window, fragmented packets, generic proper SAD, etc).
   Obviously these goals contradict and it makes really hard for the same app to fulfill both.
   Any thoughts how to deal with that?
   One obvious would be to split the app, anything else?
    
Konstantin

> Other vendors(Marvell, ARM, AMD) please comment?
> Regards,
> Akhil
> From: Medvedkin, Vladimir <mailto:vladimir.medvedkin@intel.com>
> Sent: Friday, January 17, 2020 10:35 PM
> To: Akhil Goyal <mailto:akhil.goyal@nxp.com>; mailto:dev@dpdk.org
> Cc: mailto:konstantin.ananyev@intel.com
> Subject: Re: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> 
> Hi Akhil,
> Indeed with our tests we also seeing ~15% perf drop for small packets (~90B) and ~3-4% drop for 1KB packets. While I am looking on a ways
> to minimize the drop, I think it would be hard, if possible at all to eliminate it completely.
> Reason for that: current SAD implementation is completely synthetic (using plain array structure indexed by SPI value). That provides a very
> low overhead, but doesn't provide expected functionality and can't be used in proper implementation.
> To measure plain IPsec performance without SAD user can still use '--signle-sa' option.
> On 15/01/2020 15:45, Akhil Goyal wrote:
> Hi Vladimir,
> 
> There is more than 10% drop with this patchset on NXP hardware with both legacy mode and the ipsec lib mode. This would need some
> debugging.
> Didn't you see any drop on intel?
> 
> Regards,
> Akhil
> 
> -----Original Message-----
> From: Vladimir Medvedkin mailto:vladimir.medvedkin@intel.com
> Sent: Tuesday, January 14, 2020 7:57 PM
> To: mailto:dev@dpdk.org
> Cc: mailto:konstantin.ananyev@intel.com; Akhil Goyal mailto:akhil.goyal@nxp.com
> Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> 
> This series integrates SA database (SAD) capabilities from ipsec library.
> The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> Also patch series removes hardcoded limitation for maximum number of SA's
> and SP's.
> 
> v4:
>  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> 
> v3:
>  - parse SA and SP into sorted array instead of linked list
> 
> v2:
>  - get rid of maximum sp limitation
> 
> Vladimir Medvedkin (5):
>   ipsec: move ipsec sad name length into .h
>   examples/ipsec-secgw: implement inbound SAD
>   examples/ipsec-secgw: integrate inbound SAD
>   examples/ipsec-secgw: get rid of maximum sa limitation
>   examples/ipsec-secgw: get rid of maximum sp limitation
> 
>  examples/ipsec-secgw/Makefile      |   1 +
>  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
>  examples/ipsec-secgw/ipsec.h       |  11 +-
>  examples/ipsec-secgw/meson.build   |   2 +-
>  examples/ipsec-secgw/parser.c      |   4 +
>  examples/ipsec-secgw/parser.h      |   9 ++
>  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++--------------
>  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
>  examples/ipsec-secgw/sad.h         |  74 +++++++++++
>  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
>  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
>  lib/librte_ipsec/ipsec_sad.c       |  20 +--
>  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
>  13 files changed, 528 insertions(+), 171 deletions(-)
>  create mode 100644 examples/ipsec-secgw/sad.c
>  create mode 100644 examples/ipsec-secgw/sad.h
> 
> --
> 2.7.4
> 
> --
> Regards,
> Vladimir
> -->
  
Akhil Goyal Jan. 21, 2020, 2:47 p.m. UTC | #7
Hi Konstantin,
> 
> Hi Akhil,
> 
> > Hi Vladimir,
> > The SA lookup logic and management is purely requirement based for the
> application.
> >The application may only cater to <128 SAs which can
> > be handled based on the current logic.
> 
> Not always, current implementation can handle < 128 SA,
> whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
> Yes, what we have right now has nearly zero overhead,
> and might be ok for some really simple show-cases.
> But for majority of production IPsec implementations,
> I believe that definitely wouldn't be enough.
> 
> > –single-sa option cannot handle this.
> > Sample applications in DPDK are there to showcase the best a hardware can
> deliver.
> 
> My thought was - that's the reason we have single-sa option -
> demonstrate best possible HW perf without minimal SW intervention.
> For something more serious than that, we use generic SAD implementation.
> 
> > IMO, we cannot allow this logic on NXP hardwares. We
> > give performance numbers based on IPSec app to customers and we cannot
> allow 15% degradation.
> 
> As Vladimir said, we are looking how to improve current SAD numbers
> and minimize the drop.
> But with same equals - plain array will always be faster than hash table,
> so not sure we will be able to match existing performance.
> So two questions:
> 1. What exact case you use for perf testing
>     (total number of SAs, packets per burst belong to the same/different SAs)?
>     Might be there is a way to speedup it.
>     Again if 10-15% is not an affordable drop, which one is: zero or ...?

We should add features judiciously, we cannot drop the performance of a benchmarking
Application in lieu of adding functionality. We should only add features which are not
Impacting the performance significantly.
Every vendor may have different cases. We cannot tune for everybody.
However, I see drop in 64 outbound 64 inbound SAs all with different SPI and IPs.
Packets per burst = 32 all with different SAs.

> 2. I think there are 2 different directions for ipsec-secgw:
>    From one-side there is a desire to use it as a show-case for best-possible HW
> IPsec performance
>   (which is understandable).
>    From other side - attempt to make it as close as real-world generic ipsec
> processing app as possible
>    (support for ESN, replay window, fragmented packets, generic proper SAD,
> etc).
>    Obviously these goals contradict and it makes really hard for the same app to
> fulfill both.
>    Any thoughts how to deal with that?
>    One obvious would be to split the app, anything else?

We can have a fallback mechanism back to original functionality for whatever feature
which has some perf drop.
Splitting an app can be thought of but that would be similar to a full fledged IPSec stack
like VPP-IPSec.

> 
> Konstantin
> 
> > Other vendors(Marvell, ARM, AMD) please comment?
> > Regards,
> > Akhil
> > From: Medvedkin, Vladimir <mailto:vladimir.medvedkin@intel.com>
> > Sent: Friday, January 17, 2020 10:35 PM
> > To: Akhil Goyal <mailto:akhil.goyal@nxp.com>; mailto:dev@dpdk.org
> > Cc: mailto:konstantin.ananyev@intel.com
> > Subject: Re: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> >
> > Hi Akhil,
> > Indeed with our tests we also seeing ~15% perf drop for small packets (~90B)
> and ~3-4% drop for 1KB packets. While I am looking on a ways
> > to minimize the drop, I think it would be hard, if possible at all to eliminate it
> completely.
> > Reason for that: current SAD implementation is completely synthetic (using
> plain array structure indexed by SPI value). That provides a very
> > low overhead, but doesn't provide expected functionality and can't be used in
> proper implementation.
> > To measure plain IPsec performance without SAD user can still use '--signle-sa'
> option.
> > On 15/01/2020 15:45, Akhil Goyal wrote:
> > Hi Vladimir,
> >
> > There is more than 10% drop with this patchset on NXP hardware with both
> legacy mode and the ipsec lib mode. This would need some
> > debugging.
> > Didn't you see any drop on intel?
> >
> > Regards,
> > Akhil
> >
> > -----Original Message-----
> > From: Vladimir Medvedkin mailto:vladimir.medvedkin@intel.com
> > Sent: Tuesday, January 14, 2020 7:57 PM
> > To: mailto:dev@dpdk.org
> > Cc: mailto:konstantin.ananyev@intel.com; Akhil Goyal
> mailto:akhil.goyal@nxp.com
> > Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> >
> > This series integrates SA database (SAD) capabilities from ipsec library.
> > The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> > Also patch series removes hardcoded limitation for maximum number of SA's
> > and SP's.
> >
> > v4:
> >  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> >
> > v3:
> >  - parse SA and SP into sorted array instead of linked list
> >
> > v2:
> >  - get rid of maximum sp limitation
> >
> > Vladimir Medvedkin (5):
> >   ipsec: move ipsec sad name length into .h
> >   examples/ipsec-secgw: implement inbound SAD
> >   examples/ipsec-secgw: integrate inbound SAD
> >   examples/ipsec-secgw: get rid of maximum sa limitation
> >   examples/ipsec-secgw: get rid of maximum sp limitation
> >
> >  examples/ipsec-secgw/Makefile      |   1 +
> >  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
> >  examples/ipsec-secgw/ipsec.h       |  11 +-
> >  examples/ipsec-secgw/meson.build   |   2 +-
> >  examples/ipsec-secgw/parser.c      |   4 +
> >  examples/ipsec-secgw/parser.h      |   9 ++
> >  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++-------------
> -
> >  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
> >  examples/ipsec-secgw/sad.h         |  74 +++++++++++
> >  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
> >  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
> >  lib/librte_ipsec/ipsec_sad.c       |  20 +--
> >  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
> >  13 files changed, 528 insertions(+), 171 deletions(-)
> >  create mode 100644 examples/ipsec-secgw/sad.c
> >  create mode 100644 examples/ipsec-secgw/sad.h
> >
> > --
> > 2.7.4
> >
> > --
> > Regards,
> > Vladimir
> > -->
  
Akhil Goyal Jan. 23, 2020, 11:11 a.m. UTC | #8
Hi All,
> 
> Hi Konstantin,
> >
> > Hi Akhil,
> >
> > > Hi Vladimir,
> > > The SA lookup logic and management is purely requirement based for the
> > application.
> > >The application may only cater to <128 SAs which can
> > > be handled based on the current logic.
> >
> > Not always, current implementation can handle < 128 SA,
> > whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
> > Yes, what we have right now has nearly zero overhead,
> > and might be ok for some really simple show-cases.
> > But for majority of production IPsec implementations,
> > I believe that definitely wouldn't be enough.
> >
> > > –single-sa option cannot handle this.
> > > Sample applications in DPDK are there to showcase the best a hardware can
> > deliver.
> >
> > My thought was - that's the reason we have single-sa option -
> > demonstrate best possible HW perf without minimal SW intervention.
> > For something more serious than that, we use generic SAD implementation.
> >
> > > IMO, we cannot allow this logic on NXP hardwares. We
> > > give performance numbers based on IPSec app to customers and we cannot
> > allow 15% degradation.
> >
> > As Vladimir said, we are looking how to improve current SAD numbers
> > and minimize the drop.
> > But with same equals - plain array will always be faster than hash table,
> > so not sure we will be able to match existing performance.
> > So two questions:
> > 1. What exact case you use for perf testing
> >     (total number of SAs, packets per burst belong to the same/different SAs)?
> >     Might be there is a way to speedup it.
> >     Again if 10-15% is not an affordable drop, which one is: zero or ...?
> 
> We should add features judiciously, we cannot drop the performance of a
> benchmarking
> Application in lieu of adding functionality. We should only add features which
> are not
> Impacting the performance significantly.
> Every vendor may have different cases. We cannot tune for everybody.
> However, I see drop in 64 outbound 64 inbound SAs all with different SPI and IPs.
> Packets per burst = 32 all with different SAs.
> 

We can have two modes of lookup similar to l3fwd - EM and LPM.
LPM is O(1) while EM is more realistic. Similar logic can be added here as well.
With L3fwd also we showcase performance for best case(lpm) and the worst case(em)
What Say?

As discussed in the DPDK-status meeting today, this patchset need to be discussed in
Techboard meeting. Please include this topic in the upcoming meeting on 29th Jan.

-Akhil

> > 2. I think there are 2 different directions for ipsec-secgw:
> >    From one-side there is a desire to use it as a show-case for best-possible HW
> > IPsec performance
> >   (which is understandable).
> >    From other side - attempt to make it as close as real-world generic ipsec
> > processing app as possible
> >    (support for ESN, replay window, fragmented packets, generic proper SAD,
> > etc).
> >    Obviously these goals contradict and it makes really hard for the same app to
> > fulfill both.
> >    Any thoughts how to deal with that?
> >    One obvious would be to split the app, anything else?
> 
> We can have a fallback mechanism back to original functionality for whatever
> feature
> which has some perf drop.
> Splitting an app can be thought of but that would be similar to a full fledged
> IPSec stack
> like VPP-IPSec.
> 
> >
> > Konstantin
> >
> > > Other vendors(Marvell, ARM, AMD) please comment?
> > > Regards,
> > > Akhil
> > > From: Medvedkin, Vladimir <mailto:vladimir.medvedkin@intel.com>
> > > Sent: Friday, January 17, 2020 10:35 PM
> > > To: Akhil Goyal <mailto:akhil.goyal@nxp.com>; mailto:dev@dpdk.org
> > > Cc: mailto:konstantin.ananyev@intel.com
> > > Subject: Re: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> > >
> > > Hi Akhil,
> > > Indeed with our tests we also seeing ~15% perf drop for small packets (~90B)
> > and ~3-4% drop for 1KB packets. While I am looking on a ways
> > > to minimize the drop, I think it would be hard, if possible at all to eliminate it
> > completely.
> > > Reason for that: current SAD implementation is completely synthetic (using
> > plain array structure indexed by SPI value). That provides a very
> > > low overhead, but doesn't provide expected functionality and can't be used
> in
> > proper implementation.
> > > To measure plain IPsec performance without SAD user can still use '--signle-
> sa'
> > option.
> > > On 15/01/2020 15:45, Akhil Goyal wrote:
> > > Hi Vladimir,
> > >
> > > There is more than 10% drop with this patchset on NXP hardware with both
> > legacy mode and the ipsec lib mode. This would need some
> > > debugging.
> > > Didn't you see any drop on intel?
> > >
> > > Regards,
> > > Akhil
> > >
> > > -----Original Message-----
> > > From: Vladimir Medvedkin mailto:vladimir.medvedkin@intel.com
> > > Sent: Tuesday, January 14, 2020 7:57 PM
> > > To: mailto:dev@dpdk.org
> > > Cc: mailto:konstantin.ananyev@intel.com; Akhil Goyal
> > mailto:akhil.goyal@nxp.com
> > > Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> > >
> > > This series integrates SA database (SAD) capabilities from ipsec library.
> > > The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> > > Also patch series removes hardcoded limitation for maximum number of SA's
> > > and SP's.
> > >
> > > v4:
> > >  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> > >
> > > v3:
> > >  - parse SA and SP into sorted array instead of linked list
> > >
> > > v2:
> > >  - get rid of maximum sp limitation
> > >
> > > Vladimir Medvedkin (5):
> > >   ipsec: move ipsec sad name length into .h
> > >   examples/ipsec-secgw: implement inbound SAD
> > >   examples/ipsec-secgw: integrate inbound SAD
> > >   examples/ipsec-secgw: get rid of maximum sa limitation
> > >   examples/ipsec-secgw: get rid of maximum sp limitation
> > >
> > >  examples/ipsec-secgw/Makefile      |   1 +
> > >  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
> > >  examples/ipsec-secgw/ipsec.h       |  11 +-
> > >  examples/ipsec-secgw/meson.build   |   2 +-
> > >  examples/ipsec-secgw/parser.c      |   4 +
> > >  examples/ipsec-secgw/parser.h      |   9 ++
> > >  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++-----------
> --
> > -
> > >  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
> > >  examples/ipsec-secgw/sad.h         |  74 +++++++++++
> > >  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
> > >  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
> > >  lib/librte_ipsec/ipsec_sad.c       |  20 +--
> > >  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
> > >  13 files changed, 528 insertions(+), 171 deletions(-)
> > >  create mode 100644 examples/ipsec-secgw/sad.c
> > >  create mode 100644 examples/ipsec-secgw/sad.h
> > >
> > > --
> > > 2.7.4
> > >
> > > --
> > > Regards,
> > > Vladimir
> > > -->
  
Ananyev, Konstantin Jan. 23, 2020, 12:52 p.m. UTC | #9
Hi Akhil,

> > > > Hi Vladimir,
> > > > The SA lookup logic and management is purely requirement based for the
> > > application.
> > > >The application may only cater to <128 SAs which can
> > > > be handled based on the current logic.
> > >
> > > Not always, current implementation can handle < 128 SA,
> > > whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
> > > Yes, what we have right now has nearly zero overhead,
> > > and might be ok for some really simple show-cases.
> > > But for majority of production IPsec implementations,
> > > I believe that definitely wouldn't be enough.
> > >
> > > > –single-sa option cannot handle this.
> > > > Sample applications in DPDK are there to showcase the best a hardware can
> > > deliver.
> > >
> > > My thought was - that's the reason we have single-sa option -
> > > demonstrate best possible HW perf without minimal SW intervention.
> > > For something more serious than that, we use generic SAD implementation.
> > >
> > > > IMO, we cannot allow this logic on NXP hardwares. We
> > > > give performance numbers based on IPSec app to customers and we cannot
> > > allow 15% degradation.
> > >
> > > As Vladimir said, we are looking how to improve current SAD numbers
> > > and minimize the drop.
> > > But with same equals - plain array will always be faster than hash table,
> > > so not sure we will be able to match existing performance.
> > > So two questions:
> > > 1. What exact case you use for perf testing
> > >     (total number of SAs, packets per burst belong to the same/different SAs)?
> > >     Might be there is a way to speedup it.
> > >     Again if 10-15% is not an affordable drop, which one is: zero or ...?
> >
> > We should add features judiciously, we cannot drop the performance of a
> > benchmarking
> > Application in lieu of adding functionality. We should only add features which
> > are not
> > Impacting the performance significantly.
> > Every vendor may have different cases. We cannot tune for everybody.
> > However, I see drop in 64 outbound 64 inbound SAs all with different SPI and IPs.
> > Packets per burst = 32 all with different SAs.
> >
> 
> We can have two modes of lookup similar to l3fwd - EM and LPM.
> LPM is O(1) while EM is more realistic. Similar logic can be added here as well.
> With L3fwd also we showcase performance for best case(lpm) and the worst case(em)
> What Say?

We discussed it off-line with Vladimir and came up with similar idea:
Have a proper/generic SAD implementation and add limited size plain-array
on top of it as 1xway associative cache.
So for the case when all active SAs fit into the cache and no SPI collisions,
we should have same performance as now (with plain array).
From other side, we'll still have generic/scalable/rfc compliant implementation.
Sort of best sides from two words.
Plans are to submit v4 with such approach in next few days.   

> 
> As discussed in the DPDK-status meeting today, this patchset need to be discussed in
> Techboard meeting. Please include this topic in the upcoming meeting on 29th Jan.

As I said above, I think we found a way to deal with it without any perf drop
for existing cases.
Though sure, if you feel some extra discussion is needed, let's request to
put it into agenda.

Konstantin 

> 
> -Akhil
> 
> > > 2. I think there are 2 different directions for ipsec-secgw:
> > >    From one-side there is a desire to use it as a show-case for best-possible HW
> > > IPsec performance
> > >   (which is understandable).
> > >    From other side - attempt to make it as close as real-world generic ipsec
> > > processing app as possible
> > >    (support for ESN, replay window, fragmented packets, generic proper SAD,
> > > etc).
> > >    Obviously these goals contradict and it makes really hard for the same app to
> > > fulfill both.
> > >    Any thoughts how to deal with that?
> > >    One obvious would be to split the app, anything else?
> >
> > We can have a fallback mechanism back to original functionality for whatever
> > feature
> > which has some perf drop.
> > Splitting an app can be thought of but that would be similar to a full fledged
> > IPSec stack
> > like VPP-IPSec.
> >
> > >
> > > Konstantin
> > >
> > > > Other vendors(Marvell, ARM, AMD) please comment?
> > > > Regards,
> > > > Akhil
> > > > From: Medvedkin, Vladimir <mailto:vladimir.medvedkin@intel.com>
> > > > Sent: Friday, January 17, 2020 10:35 PM
> > > > To: Akhil Goyal <mailto:akhil.goyal@nxp.com>; mailto:dev@dpdk.org
> > > > Cc: mailto:konstantin.ananyev@intel.com
> > > > Subject: Re: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> > > >
> > > > Hi Akhil,
> > > > Indeed with our tests we also seeing ~15% perf drop for small packets (~90B)
> > > and ~3-4% drop for 1KB packets. While I am looking on a ways
> > > > to minimize the drop, I think it would be hard, if possible at all to eliminate it
> > > completely.
> > > > Reason for that: current SAD implementation is completely synthetic (using
> > > plain array structure indexed by SPI value). That provides a very
> > > > low overhead, but doesn't provide expected functionality and can't be used
> > in
> > > proper implementation.
> > > > To measure plain IPsec performance without SAD user can still use '--signle-
> > sa'
> > > option.
> > > > On 15/01/2020 15:45, Akhil Goyal wrote:
> > > > Hi Vladimir,
> > > >
> > > > There is more than 10% drop with this patchset on NXP hardware with both
> > > legacy mode and the ipsec lib mode. This would need some
> > > > debugging.
> > > > Didn't you see any drop on intel?
> > > >
> > > > Regards,
> > > > Akhil
> > > >
> > > > -----Original Message-----
> > > > From: Vladimir Medvedkin mailto:vladimir.medvedkin@intel.com
> > > > Sent: Tuesday, January 14, 2020 7:57 PM
> > > > To: mailto:dev@dpdk.org
> > > > Cc: mailto:konstantin.ananyev@intel.com; Akhil Goyal
> > > mailto:akhil.goyal@nxp.com
> > > > Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> > > >
> > > > This series integrates SA database (SAD) capabilities from ipsec library.
> > > > The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> > > > Also patch series removes hardcoded limitation for maximum number of SA's
> > > > and SP's.
> > > >
> > > > v4:
> > > >  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> > > >
> > > > v3:
> > > >  - parse SA and SP into sorted array instead of linked list
> > > >
> > > > v2:
> > > >  - get rid of maximum sp limitation
> > > >
> > > > Vladimir Medvedkin (5):
> > > >   ipsec: move ipsec sad name length into .h
> > > >   examples/ipsec-secgw: implement inbound SAD
> > > >   examples/ipsec-secgw: integrate inbound SAD
> > > >   examples/ipsec-secgw: get rid of maximum sa limitation
> > > >   examples/ipsec-secgw: get rid of maximum sp limitation
> > > >
> > > >  examples/ipsec-secgw/Makefile      |   1 +
> > > >  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
> > > >  examples/ipsec-secgw/ipsec.h       |  11 +-
> > > >  examples/ipsec-secgw/meson.build   |   2 +-
> > > >  examples/ipsec-secgw/parser.c      |   4 +
> > > >  examples/ipsec-secgw/parser.h      |   9 ++
> > > >  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++-----------
> > --
> > > -
> > > >  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
> > > >  examples/ipsec-secgw/sad.h         |  74 +++++++++++
> > > >  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
> > > >  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
> > > >  lib/librte_ipsec/ipsec_sad.c       |  20 +--
> > > >  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
> > > >  13 files changed, 528 insertions(+), 171 deletions(-)
> > > >  create mode 100644 examples/ipsec-secgw/sad.c
> > > >  create mode 100644 examples/ipsec-secgw/sad.h
> > > >
> > > > --
> > > > 2.7.4
> > > >
> > > > --
> > > > Regards,
> > > > Vladimir
> > > > -->
  
Akhil Goyal Jan. 23, 2020, 12:56 p.m. UTC | #10
Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > Hi Vladimir,
> > > > > The SA lookup logic and management is purely requirement based for the
> > > > application.
> > > > >The application may only cater to <128 SAs which can
> > > > > be handled based on the current logic.
> > > >
> > > > Not always, current implementation can handle < 128 SA,
> > > > whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
> > > > Yes, what we have right now has nearly zero overhead,
> > > > and might be ok for some really simple show-cases.
> > > > But for majority of production IPsec implementations,
> > > > I believe that definitely wouldn't be enough.
> > > >
> > > > > –single-sa option cannot handle this.
> > > > > Sample applications in DPDK are there to showcase the best a hardware
> can
> > > > deliver.
> > > >
> > > > My thought was - that's the reason we have single-sa option -
> > > > demonstrate best possible HW perf without minimal SW intervention.
> > > > For something more serious than that, we use generic SAD implementation.
> > > >
> > > > > IMO, we cannot allow this logic on NXP hardwares. We
> > > > > give performance numbers based on IPSec app to customers and we
> cannot
> > > > allow 15% degradation.
> > > >
> > > > As Vladimir said, we are looking how to improve current SAD numbers
> > > > and minimize the drop.
> > > > But with same equals - plain array will always be faster than hash table,
> > > > so not sure we will be able to match existing performance.
> > > > So two questions:
> > > > 1. What exact case you use for perf testing
> > > >     (total number of SAs, packets per burst belong to the same/different SAs)?
> > > >     Might be there is a way to speedup it.
> > > >     Again if 10-15% is not an affordable drop, which one is: zero or ...?
> > >
> > > We should add features judiciously, we cannot drop the performance of a
> > > benchmarking
> > > Application in lieu of adding functionality. We should only add features which
> > > are not
> > > Impacting the performance significantly.
> > > Every vendor may have different cases. We cannot tune for everybody.
> > > However, I see drop in 64 outbound 64 inbound SAs all with different SPI and
> IPs.
> > > Packets per burst = 32 all with different SAs.
> > >
> >
> > We can have two modes of lookup similar to l3fwd - EM and LPM.
> > LPM is O(1) while EM is more realistic. Similar logic can be added here as well.
> > With L3fwd also we showcase performance for best case(lpm) and the worst
> case(em)
> > What Say?
> 
> We discussed it off-line with Vladimir and came up with similar idea:
> Have a proper/generic SAD implementation and add limited size plain-array
> on top of it as 1xway associative cache.
> So for the case when all active SAs fit into the cache and no SPI collisions,
> we should have same performance as now (with plain array).
> From other side, we'll still have generic/scalable/rfc compliant implementation.
> Sort of best sides from two words.
> Plans are to submit v4 with such approach in next few days.

OK lets check the v4 before moving the discussion to techboard. 
@Thomas: Do you have more thoughts on this? Should we get it added in the agenda
Or wait for the v4?

> 
> >
> > As discussed in the DPDK-status meeting today, this patchset need to be
> discussed in
> > Techboard meeting. Please include this topic in the upcoming meeting on 29th
> Jan.
> 
> As I said above, I think we found a way to deal with it without any perf drop
> for existing cases.
> Though sure, if you feel some extra discussion is needed, let's request to
> put it into agenda.
> 
> Konstantin
> 
> >
> > -Akhil
> >
> > > > 2. I think there are 2 different directions for ipsec-secgw:
> > > >    From one-side there is a desire to use it as a show-case for best-possible
> HW
> > > > IPsec performance
> > > >   (which is understandable).
> > > >    From other side - attempt to make it as close as real-world generic ipsec
> > > > processing app as possible
> > > >    (support for ESN, replay window, fragmented packets, generic proper
> SAD,
> > > > etc).
> > > >    Obviously these goals contradict and it makes really hard for the same
> app to
> > > > fulfill both.
> > > >    Any thoughts how to deal with that?
> > > >    One obvious would be to split the app, anything else?
> > >
> > > We can have a fallback mechanism back to original functionality for
> whatever
> > > feature
> > > which has some perf drop.
> > > Splitting an app can be thought of but that would be similar to a full fledged
> > > IPSec stack
> > > like VPP-IPSec.
> > >
> > > >
> > > > Konstantin
> > > >
> > > > > Other vendors(Marvell, ARM, AMD) please comment?
> > > > > Regards,
> > > > > Akhil
> > > > > From: Medvedkin, Vladimir <mailto:vladimir.medvedkin@intel.com>
> > > > > Sent: Friday, January 17, 2020 10:35 PM
> > > > > To: Akhil Goyal <mailto:akhil.goyal@nxp.com>; mailto:dev@dpdk.org
> > > > > Cc: mailto:konstantin.ananyev@intel.com
> > > > > Subject: Re: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> > > > >
> > > > > Hi Akhil,
> > > > > Indeed with our tests we also seeing ~15% perf drop for small packets
> (~90B)
> > > > and ~3-4% drop for 1KB packets. While I am looking on a ways
> > > > > to minimize the drop, I think it would be hard, if possible at all to
> eliminate it
> > > > completely.
> > > > > Reason for that: current SAD implementation is completely synthetic
> (using
> > > > plain array structure indexed by SPI value). That provides a very
> > > > > low overhead, but doesn't provide expected functionality and can't be
> used
> > > in
> > > > proper implementation.
> > > > > To measure plain IPsec performance without SAD user can still use '--
> signle-
> > > sa'
> > > > option.
> > > > > On 15/01/2020 15:45, Akhil Goyal wrote:
> > > > > Hi Vladimir,
> > > > >
> > > > > There is more than 10% drop with this patchset on NXP hardware with
> both
> > > > legacy mode and the ipsec lib mode. This would need some
> > > > > debugging.
> > > > > Didn't you see any drop on intel?
> > > > >
> > > > > Regards,
> > > > > Akhil
> > > > >
> > > > > -----Original Message-----
> > > > > From: Vladimir Medvedkin mailto:vladimir.medvedkin@intel.com
> > > > > Sent: Tuesday, January 14, 2020 7:57 PM
> > > > > To: mailto:dev@dpdk.org
> > > > > Cc: mailto:konstantin.ananyev@intel.com; Akhil Goyal
> > > > mailto:akhil.goyal@nxp.com
> > > > > Subject: [PATCH v4 0/5] integrate librte_ipsec SAD into ipsec-secgw
> > > > >
> > > > > This series integrates SA database (SAD) capabilities from ipsec library.
> > > > > The goal is to make ipsec-secgw RFC compliant regarding inbound SAD.
> > > > > Also patch series removes hardcoded limitation for maximum number of
> SA's
> > > > > and SP's.
> > > > >
> > > > > v4:
> > > > >  - put tunnel SA's into SAD with SPI_ONLY type for performance reason
> > > > >
> > > > > v3:
> > > > >  - parse SA and SP into sorted array instead of linked list
> > > > >
> > > > > v2:
> > > > >  - get rid of maximum sp limitation
> > > > >
> > > > > Vladimir Medvedkin (5):
> > > > >   ipsec: move ipsec sad name length into .h
> > > > >   examples/ipsec-secgw: implement inbound SAD
> > > > >   examples/ipsec-secgw: integrate inbound SAD
> > > > >   examples/ipsec-secgw: get rid of maximum sa limitation
> > > > >   examples/ipsec-secgw: get rid of maximum sp limitation
> > > > >
> > > > >  examples/ipsec-secgw/Makefile      |   1 +
> > > > >  examples/ipsec-secgw/ipsec-secgw.c |   4 +-
> > > > >  examples/ipsec-secgw/ipsec.h       |  11 +-
> > > > >  examples/ipsec-secgw/meson.build   |   2 +-
> > > > >  examples/ipsec-secgw/parser.c      |   4 +
> > > > >  examples/ipsec-secgw/parser.h      |   9 ++
> > > > >  examples/ipsec-secgw/sa.c          | 256 +++++++++++++++++++++++------
> -----
> > > --
> > > > -
> > > > >  examples/ipsec-secgw/sad.c         |  90 +++++++++++++
> > > > >  examples/ipsec-secgw/sad.h         |  74 +++++++++++
> > > > >  examples/ipsec-secgw/sp4.c         | 114 ++++++++++++-----
> > > > >  examples/ipsec-secgw/sp6.c         | 112 +++++++++++-----
> > > > >  lib/librte_ipsec/ipsec_sad.c       |  20 +--
> > > > >  lib/librte_ipsec/rte_ipsec_sad.h   |   2 +
> > > > >  13 files changed, 528 insertions(+), 171 deletions(-)
> > > > >  create mode 100644 examples/ipsec-secgw/sad.c
> > > > >  create mode 100644 examples/ipsec-secgw/sad.h
> > > > >
> > > > > --
> > > > > 2.7.4
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Vladimir
> > > > > -->
  
Thomas Monjalon Jan. 23, 2020, 1:33 p.m. UTC | #11
23/01/2020 13:56, Akhil Goyal:
> Hi Konstantin,
> > 
> > Hi Akhil,
> > 
> > > > > > Hi Vladimir,
> > > > > > The SA lookup logic and management is purely requirement based for the
> > > > > application.
> > > > > >The application may only cater to <128 SAs which can
> > > > > > be handled based on the current logic.
> > > > >
> > > > > Not always, current implementation can handle < 128 SA,
> > > > > whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
> > > > > Yes, what we have right now has nearly zero overhead,
> > > > > and might be ok for some really simple show-cases.
> > > > > But for majority of production IPsec implementations,
> > > > > I believe that definitely wouldn't be enough.
> > > > >
> > > > > > –single-sa option cannot handle this.
> > > > > > Sample applications in DPDK are there to showcase the best a hardware
> > can
> > > > > deliver.
> > > > >
> > > > > My thought was - that's the reason we have single-sa option -
> > > > > demonstrate best possible HW perf without minimal SW intervention.
> > > > > For something more serious than that, we use generic SAD implementation.
> > > > >
> > > > > > IMO, we cannot allow this logic on NXP hardwares. We
> > > > > > give performance numbers based on IPSec app to customers and we
> > cannot
> > > > > allow 15% degradation.
> > > > >
> > > > > As Vladimir said, we are looking how to improve current SAD numbers
> > > > > and minimize the drop.
> > > > > But with same equals - plain array will always be faster than hash table,
> > > > > so not sure we will be able to match existing performance.
> > > > > So two questions:
> > > > > 1. What exact case you use for perf testing
> > > > >     (total number of SAs, packets per burst belong to the same/different SAs)?
> > > > >     Might be there is a way to speedup it.
> > > > >     Again if 10-15% is not an affordable drop, which one is: zero or ...?
> > > >
> > > > We should add features judiciously, we cannot drop the performance of a
> > > > benchmarking
> > > > Application in lieu of adding functionality. We should only add features which
> > > > are not
> > > > Impacting the performance significantly.
> > > > Every vendor may have different cases. We cannot tune for everybody.
> > > > However, I see drop in 64 outbound 64 inbound SAs all with different SPI and
> > IPs.
> > > > Packets per burst = 32 all with different SAs.
> > > >
> > >
> > > We can have two modes of lookup similar to l3fwd - EM and LPM.
> > > LPM is O(1) while EM is more realistic. Similar logic can be added here as well.
> > > With L3fwd also we showcase performance for best case(lpm) and the worst
> > case(em)
> > > What Say?
> > 
> > We discussed it off-line with Vladimir and came up with similar idea:
> > Have a proper/generic SAD implementation and add limited size plain-array
> > on top of it as 1xway associative cache.
> > So for the case when all active SAs fit into the cache and no SPI collisions,
> > we should have same performance as now (with plain array).
> > From other side, we'll still have generic/scalable/rfc compliant implementation.
> > Sort of best sides from two words.
> > Plans are to submit v4 with such approach in next few days.
> 
> OK lets check the v4 before moving the discussion to techboard. 
> @Thomas: Do you have more thoughts on this? Should we get it added in the agenda
> Or wait for the v4?

If v4 is good for both cases, it lowers the priority of the discussion.

But still, it would be interesting to state the objectives of the examples:
	- show API usage?
	- show feature performance?
	- show best hardware performance?
	- what else?
  
Ananyev, Konstantin Jan. 23, 2020, 3:46 p.m. UTC | #12
> > > > > > > The SA lookup logic and management is purely requirement based for the
> > > > > > application.
> > > > > > >The application may only cater to <128 SAs which can
> > > > > > > be handled based on the current logic.
> > > > > >
> > > > > > Not always, current implementation can handle < 128 SA,
> > > > > > whose SPI%128 never match (let say it cant't handle SPI=1 and SPI=129).
> > > > > > Yes, what we have right now has nearly zero overhead,
> > > > > > and might be ok for some really simple show-cases.
> > > > > > But for majority of production IPsec implementations,
> > > > > > I believe that definitely wouldn't be enough.
> > > > > >
> > > > > > > –single-sa option cannot handle this.
> > > > > > > Sample applications in DPDK are there to showcase the best a hardware
> > > can
> > > > > > deliver.
> > > > > >
> > > > > > My thought was - that's the reason we have single-sa option -
> > > > > > demonstrate best possible HW perf without minimal SW intervention.
> > > > > > For something more serious than that, we use generic SAD implementation.
> > > > > >
> > > > > > > IMO, we cannot allow this logic on NXP hardwares. We
> > > > > > > give performance numbers based on IPSec app to customers and we
> > > cannot
> > > > > > allow 15% degradation.
> > > > > >
> > > > > > As Vladimir said, we are looking how to improve current SAD numbers
> > > > > > and minimize the drop.
> > > > > > But with same equals - plain array will always be faster than hash table,
> > > > > > so not sure we will be able to match existing performance.
> > > > > > So two questions:
> > > > > > 1. What exact case you use for perf testing
> > > > > >     (total number of SAs, packets per burst belong to the same/different SAs)?
> > > > > >     Might be there is a way to speedup it.
> > > > > >     Again if 10-15% is not an affordable drop, which one is: zero or ...?
> > > > >
> > > > > We should add features judiciously, we cannot drop the performance of a
> > > > > benchmarking
> > > > > Application in lieu of adding functionality. We should only add features which
> > > > > are not
> > > > > Impacting the performance significantly.
> > > > > Every vendor may have different cases. We cannot tune for everybody.
> > > > > However, I see drop in 64 outbound 64 inbound SAs all with different SPI and
> > > IPs.
> > > > > Packets per burst = 32 all with different SAs.
> > > > >
> > > >
> > > > We can have two modes of lookup similar to l3fwd - EM and LPM.
> > > > LPM is O(1) while EM is more realistic. Similar logic can be added here as well.
> > > > With L3fwd also we showcase performance for best case(lpm) and the worst
> > > case(em)
> > > > What Say?
> > >
> > > We discussed it off-line with Vladimir and came up with similar idea:
> > > Have a proper/generic SAD implementation and add limited size plain-array
> > > on top of it as 1xway associative cache.
> > > So for the case when all active SAs fit into the cache and no SPI collisions,
> > > we should have same performance as now (with plain array).
> > > From other side, we'll still have generic/scalable/rfc compliant implementation.
> > > Sort of best sides from two words.
> > > Plans are to submit v4 with such approach in next few days.
> >
> > OK lets check the v4 before moving the discussion to techboard.
> > @Thomas: Do you have more thoughts on this? Should we get it added in the agenda
> > Or wait for the v4?
> 
> If v4 is good for both cases, it lowers the priority of the discussion.
> 
> But still, it would be interesting to state the objectives of the examples:
> 	- show API usage?
> 	- show feature performance?
> 	- show best hardware performance?
> 	- what else?

Agree, that’s a good topic to discuss.