[RFC] ipsec: new library for IPsec data-path processing

Message ID 1535129598-27301-1-git-send-email-konstantin.ananyev@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] ipsec: new library for IPsec data-path processing |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Ananyev, Konstantin Aug. 24, 2018, 4:53 p.m. UTC
  This RFC introduces a new library within DPDK: librte_ipsec.
The aim is to provide DPDK native high performance library for IPsec
data-path processing.
The library is supposed to utilize existing DPDK crypto-dev and
security API to provide application with transparent IPsec processing API.
The library is concentrated on data-path protocols processing (ESP and AH),
IKE protocol(s) implementation is out of scope for that library.
Though hook/callback mechanisms will be defined to allow integrate it
with existing IKE implementations.
Due to quite complex nature of IPsec protocol suite and variety of user
requirements and usage scenarios a few API levels will be provided:
1) Security Association (SA-level) API
    Operates at SA level, provides functions to:
    - initialize/teardown SA object
    - process inbound/outbound ESP/AH packets associated with the given SA
      (decrypt/encrypt, authenticate, check integrity,
       add/remove ESP/AH related headers and data, etc.).   
2) Security Association Database (SAD) API
    API to create/manage/destroy IPsec SAD.
    While DPDK IPsec library plans to have its own implementation,
    the intention is to keep it as independent from the other parts
    of IPsec library as possible.
    That is supposed to give users the ability to provide their own
    implementation of the SAD compatible with the other parts of the
    IPsec library.
3) IPsec Context (CTX) API
    This is supposed to be a high-level API, where each IPsec CTX is an
    abstraction of 'independent copy of the IPsec stack'.
    CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
    and provides:
    - de-multiplexing stream of inbound packets to particular SAs and
      further IPsec related processing. 
    - IPsec related processing for the outbound packets.
    - SA add/delete/update functionality
  
Current RFC concentrates on SA-level API only (1), 
detailed discussion for 2) and 3) will be subjects for separate RFC(s). 

SA (low) level API
  

Comments

Anoob Joseph Sept. 3, 2018, 12:41 p.m. UTC | #1
Hi Konstantin,

Few comments. Please see inline.

Thanks,

Anoob

On 24-08-2018 22:23, Konstantin Ananyev wrote:
> External Email
>
> This RFC introduces a new library within DPDK: librte_ipsec.
> The aim is to provide DPDK native high performance library for IPsec
> data-path processing.
> The library is supposed to utilize existing DPDK crypto-dev and
> security API to provide application with transparent IPsec processing API.
> The library is concentrated on data-path protocols processing (ESP and AH),
> IKE protocol(s) implementation is out of scope for that library.
> Though hook/callback mechanisms will be defined to allow integrate it
> with existing IKE implementations.
> Due to quite complex nature of IPsec protocol suite and variety of user
> requirements and usage scenarios a few API levels will be provided:
> 1) Security Association (SA-level) API
>      Operates at SA level, provides functions to:
>      - initialize/teardown SA object
>      - process inbound/outbound ESP/AH packets associated with the given SA
>        (decrypt/encrypt, authenticate, check integrity,
>         add/remove ESP/AH related headers and data, etc.).
> 2) Security Association Database (SAD) API
>      API to create/manage/destroy IPsec SAD.
>      While DPDK IPsec library plans to have its own implementation,
>      the intention is to keep it as independent from the other parts
>      of IPsec library as possible.
>      That is supposed to give users the ability to provide their own
>      implementation of the SAD compatible with the other parts of the
>      IPsec library.
> 3) IPsec Context (CTX) API
>      This is supposed to be a high-level API, where each IPsec CTX is an
>      abstraction of 'independent copy of the IPsec stack'.
>      CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
>      and provides:
>      - de-multiplexing stream of inbound packets to particular SAs and
>        further IPsec related processing.
>      - IPsec related processing for the outbound packets.
>      - SA add/delete/update functionality
[Anoob]: Security Policy is an important aspect of IPsec. An IPsec 
library without Security Policy API would be incomplete. For inline 
protocol offload, the final SP-SA check(selector check) is the only 
IPsec part being done by ipsec-secgw now. Would make sense to add that 
also in the library.
> Current RFC concentrates on SA-level API only (1),
> detailed discussion for 2) and 3) will be subjects for separate RFC(s).
>
> SA (low) level API
> ==================
>
> API described below operates on SA level.
> It provides functionality that allows user for given SA to process
> inbound and outbound IPsec packets.
> To be more specific:
> - for inbound ESP/AH packets perform decryption, authentication,
>    integrity checking, remove ESP/AH related headers
[Anoob] Anti-replay check would also be required.
> - for outbound packets perform payload encryption, attach ICV,
>    update/add IP headers, add ESP/AH headers/trailers,
>    setup related mbuf felids (ol_flags, tx_offloads, etc.).
[Anoob] Do we have any plans to handle ESN expiry? Some means to 
initiate an IKE renegotiation? I'm assuming application won't be aware 
of the sequence numbers, in this case.
> - initialize/un-initialize given SA based on user provided parameters.
>
> Processed inbound/outbound packets could be grouped by user provided
> flow id (opaque 64-bit number associated by user with given SA).
>
> SA-level API is based on top of crypto-dev/security API and relies on them
> to perform actual cipher and integrity checking.
> Due to the nature of crypto-dev API (enqueue/deque model) we use
> asynchronous API for IPsec packets destined to be processed
> by crypto-device:
> rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
> rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
> Though for packets destined for inline processing no extra overhead
> is required and simple and synchronous API: rte_ipsec_inline_process()
> is introduced for that case.
[Anoob] The API should include event-delivery as a crypto-op completion 
mechanism as well. The application could configure the event crypto 
adapter and then enqueue and dequeue to crypto device using events (via 
event dev).
> The following functionality:
>    - match inbound/outbound packets to particular SA
>    - manage crypto/security devices
>    - provide SAD/SPD related functionality
>    - determine what crypto/security device has to be used
>      for given packet(s)
> is out of scope for SA-level API.
>
> Below is the brief (and simplified) overview of expected SA-level
> API usage.
>
> /* allocate and initialize SA */
> size_t sz = rte_ipsec_sa_size();
> struct rte_ipsec_sa *sa = rte_malloc(sz);
> struct rte_ipsec_sa_prm prm;
> /* fill prm */
> rc = rte_ipsec_sa_init(sa, &prm);
> if (rc != 0) { /*handle error */}
> .....
>
> /* process inbound/outbound IPsec packets that belongs to given SA */
>
> /* inline IPsec processing was done for these packets */
> if (use_inline_ipsec)
>         n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
> /* use crypto-device to process the packets */
> else {
>       struct rte_crypto_op *cop[nb_pkts];
>       struct rte_ipsec_group grp[nb_pkts];
>
>        ....
>       /* prepare crypto ops */
>       n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
>       /* enqueue crypto ops to related crypto-dev */
>       n =  rte_cryptodev_enqueue_burst(..., cops, n);
>       if (n != nb_pkts) { /*handle failed packets */}
>       /* dequeue finished crypto ops from related crypto-dev */
>       n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
>       /* finish IPsec processing for associated packets */
>       n = rte_ipsec_crypto_process(cop, pkts, grp, n);
[Anoob] Does the SA based grouping apply to both inbound and outbound?
>       /* now we have <n> group of packets grouped by SA flow id  */
>      ....
>   }
> ...
>
> /* uninit given SA */
> rte_ipsec_sa_fini(sa);
>
> Planned scope for 18.11:
> ========================
>
> - SA-level API definition
> - ESP tunnel mode support (both IPv4/IPv6)
> - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
> - UT
[Anoob] What is UT?
> Note: Still WIP, so not all planned for 18.11 functionality is in place.
>
> Post 18.11:
> ===========
> - ESP transport mode support (both IPv4/IPv6)
> - update examples/ipsec-secgw to use librte_ipsec
> - SAD and high-level API definition and implementation
>
>
> Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>   config/common_base                     |   5 +
>   lib/Makefile                           |   2 +
>   lib/librte_ipsec/Makefile              |  24 +
>   lib/librte_ipsec/meson.build           |  10 +
>   lib/librte_ipsec/pad.h                 |  45 ++
>   lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
>   lib/librte_ipsec/rte_ipsec_version.map |  13 +
>   lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
>   lib/librte_net/rte_esp.h               |  10 +-
>   lib/meson.build                        |   2 +
>   mk/rte.app.mk                          |   2 +
>   11 files changed, 1278 insertions(+), 1 deletion(-)
>   create mode 100644 lib/librte_ipsec/Makefile
>   create mode 100644 lib/librte_ipsec/meson.build
>   create mode 100644 lib/librte_ipsec/pad.h
>   create mode 100644 lib/librte_ipsec/rte_ipsec.h
>   create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
>   create mode 100644 lib/librte_ipsec/sa.c
<snip>
> +static inline uint16_t
> +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
> +       struct rte_crypto_op *cop[], uint16_t num)
> +{
> +       int32_t rc;
> +       uint32_t i, n;
> +       union sym_op_data icv;
> +
> +       n = esn_outb_check_sqn(sa, num);
> +
> +       for (i = 0; i != n; i++) {
> +
> +               sa->sqn++;
[Anoob] Shouldn't this be done atomically?
> +               sa->iv.v8 = rte_cpu_to_be_64(sa->sqn);
> +
> +               /* update the packet itself */
> +               rc = esp_outb_tun_pkt_prepare(sa, mb[i], &icv);
> +               if (rc < 0) {
> +                       rte_errno = -rc;
> +                       break;
> +               }
> +
> +               /* update crypto op */
> +               esp_outb_tun_cop_prepare(cop[i], sa, mb[i], &icv, rc);
> +       }
> +
> +       return i;
> +}
>
<snip>
  
Ananyev, Konstantin Sept. 3, 2018, 6:21 p.m. UTC | #2
Hi Anoob,

> Hi Konstantin,
> 
> Few comments. Please see inline.
> 
> Thanks,
> 
> Anoob
> 
> On 24-08-2018 22:23, Konstantin Ananyev wrote:
> > External Email
> >
> > This RFC introduces a new library within DPDK: librte_ipsec.
> > The aim is to provide DPDK native high performance library for IPsec
> > data-path processing.
> > The library is supposed to utilize existing DPDK crypto-dev and
> > security API to provide application with transparent IPsec processing API.
> > The library is concentrated on data-path protocols processing (ESP and AH),
> > IKE protocol(s) implementation is out of scope for that library.
> > Though hook/callback mechanisms will be defined to allow integrate it
> > with existing IKE implementations.
> > Due to quite complex nature of IPsec protocol suite and variety of user
> > requirements and usage scenarios a few API levels will be provided:
> > 1) Security Association (SA-level) API
> >      Operates at SA level, provides functions to:
> >      - initialize/teardown SA object
> >      - process inbound/outbound ESP/AH packets associated with the given SA
> >        (decrypt/encrypt, authenticate, check integrity,
> >         add/remove ESP/AH related headers and data, etc.).
> > 2) Security Association Database (SAD) API
> >      API to create/manage/destroy IPsec SAD.
> >      While DPDK IPsec library plans to have its own implementation,
> >      the intention is to keep it as independent from the other parts
> >      of IPsec library as possible.
> >      That is supposed to give users the ability to provide their own
> >      implementation of the SAD compatible with the other parts of the
> >      IPsec library.
> > 3) IPsec Context (CTX) API
> >      This is supposed to be a high-level API, where each IPsec CTX is an
> >      abstraction of 'independent copy of the IPsec stack'.
> >      CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
> >      and provides:
> >      - de-multiplexing stream of inbound packets to particular SAs and
> >        further IPsec related processing.
> >      - IPsec related processing for the outbound packets.
> >      - SA add/delete/update functionality
> [Anoob]: Security Policy is an important aspect of IPsec. An IPsec
> library without Security Policy API would be incomplete. For inline
> protocol offload, the final SP-SA check(selector check) is the only
> IPsec part being done by ipsec-secgw now. Would make sense to add that
> also in the library.

You mean here, that we need some sort of SPD implementation, correct?

> > Current RFC concentrates on SA-level API only (1),
> > detailed discussion for 2) and 3) will be subjects for separate RFC(s).
> >
> > SA (low) level API
> > ==================
> >
> > API described below operates on SA level.
> > It provides functionality that allows user for given SA to process
> > inbound and outbound IPsec packets.
> > To be more specific:
> > - for inbound ESP/AH packets perform decryption, authentication,
> >    integrity checking, remove ESP/AH related headers
> [Anoob] Anti-replay check would also be required.

Yep, anti-replay and ESN support is implied as part of "integrity checking".
Probably I have to be more specific here.

> > - for outbound packets perform payload encryption, attach ICV,
> >    update/add IP headers, add ESP/AH headers/trailers,
> >    setup related mbuf felids (ol_flags, tx_offloads, etc.).
> [Anoob] Do we have any plans to handle ESN expiry? Some means to
> initiate an IKE renegotiation? I'm assuming application won't be aware
> of the sequence numbers, in this case.
> > - initialize/un-initialize given SA based on user provided parameters.
> >
> > Processed inbound/outbound packets could be grouped by user provided
> > flow id (opaque 64-bit number associated by user with given SA).
> >
> > SA-level API is based on top of crypto-dev/security API and relies on them
> > to perform actual cipher and integrity checking.
> > Due to the nature of crypto-dev API (enqueue/deque model) we use
> > asynchronous API for IPsec packets destined to be processed
> > by crypto-device:
> > rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
> > rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
> > Though for packets destined for inline processing no extra overhead
> > is required and simple and synchronous API: rte_ipsec_inline_process()
> > is introduced for that case.
> [Anoob] The API should include event-delivery as a crypto-op completion
> mechanism as well. The application could configure the event crypto
> adapter and then enqueue and dequeue to crypto device using events (via
> event dev).

Not sure what particular extra API you think is required here?
As I understand in both cases (with or without event crypto-adapter) we still have to:
 1) fill crypto-op properly
 2) enqueue it to crypto-dev (via eventdev or directly)
3)  receive processed by crypto-dev crypto-op (either via eventdev or directly)
4) check crypto-op status, do further post-processing if any

So #1 and #4 (SA-level API respnibility) remain the same for both cases.

> > The following functionality:
> >    - match inbound/outbound packets to particular SA
> >    - manage crypto/security devices
> >    - provide SAD/SPD related functionality
> >    - determine what crypto/security device has to be used
> >      for given packet(s)
> > is out of scope for SA-level API.
> >
> > Below is the brief (and simplified) overview of expected SA-level
> > API usage.
> >
> > /* allocate and initialize SA */
> > size_t sz = rte_ipsec_sa_size();
> > struct rte_ipsec_sa *sa = rte_malloc(sz);
> > struct rte_ipsec_sa_prm prm;
> > /* fill prm */
> > rc = rte_ipsec_sa_init(sa, &prm);
> > if (rc != 0) { /*handle error */}
> > .....
> >
> > /* process inbound/outbound IPsec packets that belongs to given SA */
> >
> > /* inline IPsec processing was done for these packets */
> > if (use_inline_ipsec)
> >         n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
> > /* use crypto-device to process the packets */
> > else {
> >       struct rte_crypto_op *cop[nb_pkts];
> >       struct rte_ipsec_group grp[nb_pkts];
> >
> >        ....
> >       /* prepare crypto ops */
> >       n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
> >       /* enqueue crypto ops to related crypto-dev */
> >       n =  rte_cryptodev_enqueue_burst(..., cops, n);
> >       if (n != nb_pkts) { /*handle failed packets */}
> >       /* dequeue finished crypto ops from related crypto-dev */
> >       n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
> >       /* finish IPsec processing for associated packets */
> >       n = rte_ipsec_crypto_process(cop, pkts, grp, n);
> [Anoob] Does the SA based grouping apply to both inbound and outbound?

Yes, the plan is to have it available for both cases.

> >       /* now we have <n> group of packets grouped by SA flow id  */
> >      ....
> >   }
> > ...
> >
> > /* uninit given SA */
> > rte_ipsec_sa_fini(sa);
> >
> > Planned scope for 18.11:
> > ========================
> >
> > - SA-level API definition
> > - ESP tunnel mode support (both IPv4/IPv6)
> > - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
> > - UT
> [Anoob] What is UT?

Unit-Test


> > Note: Still WIP, so not all planned for 18.11 functionality is in place.
> >
> > Post 18.11:
> > ===========
> > - ESP transport mode support (both IPv4/IPv6)
> > - update examples/ipsec-secgw to use librte_ipsec
> > - SAD and high-level API definition and implementation
> >
> >
> > Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
> > Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >   config/common_base                     |   5 +
> >   lib/Makefile                           |   2 +
> >   lib/librte_ipsec/Makefile              |  24 +
> >   lib/librte_ipsec/meson.build           |  10 +
> >   lib/librte_ipsec/pad.h                 |  45 ++
> >   lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
> >   lib/librte_ipsec/rte_ipsec_version.map |  13 +
> >   lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
> >   lib/librte_net/rte_esp.h               |  10 +-
> >   lib/meson.build                        |   2 +
> >   mk/rte.app.mk                          |   2 +
> >   11 files changed, 1278 insertions(+), 1 deletion(-)
> >   create mode 100644 lib/librte_ipsec/Makefile
> >   create mode 100644 lib/librte_ipsec/meson.build
> >   create mode 100644 lib/librte_ipsec/pad.h
> >   create mode 100644 lib/librte_ipsec/rte_ipsec.h
> >   create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
> >   create mode 100644 lib/librte_ipsec/sa.c
> <snip>
> > +static inline uint16_t
> > +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
> > +       struct rte_crypto_op *cop[], uint16_t num)
> > +{
> > +       int32_t rc;
> > +       uint32_t i, n;
> > +       union sym_op_data icv;
> > +
> > +       n = esn_outb_check_sqn(sa, num);
> > +
> > +       for (i = 0; i != n; i++) {
> > +
> > +               sa->sqn++;
> [Anoob] Shouldn't this be done atomically?

If we want to have MT-safe API for SA-datapath API, then yes.
Though it would make things more complicated here, especially for inbound with anti-replay support.
I think it is doable (spin-lock?), but would cause extra overhead and complexity.
Right now I am not sure it really worth it - comments/suggestions are welcome.
What probably could be a good compromise - runtime decision per SA basis (at sa_init()),
do we need an ST or MT behavior for given SA.

> > +               sa->iv.v8 = rte_cpu_to_be_64(sa->sqn);
> > +
> > +               /* update the packet itself */
> > +               rc = esp_outb_tun_pkt_prepare(sa, mb[i], &icv);
> > +               if (rc < 0) {
> > +                       rte_errno = -rc;
> > +                       break;
> > +               }
> > +
> > +               /* update crypto op */
> > +               esp_outb_tun_cop_prepare(cop[i], sa, mb[i], &icv, rc);
> > +       }
> > +
> > +       return i;
> > +}
> >
> <snip>

Thanks
Konstantin
  
Anoob Joseph Sept. 5, 2018, 2:39 p.m. UTC | #3
Hi Konstantin,

Please see inline.


On 03-09-2018 23:51, Ananyev, Konstantin wrote:
> External Email
>
> Hi Anoob,
>
>> Hi Konstantin,
>>
>> Few comments. Please see inline.
>>
>> Thanks,
>>
>> Anoob
>>
>> On 24-08-2018 22:23, Konstantin Ananyev wrote:
>>> External Email
>>>
>>> This RFC introduces a new library within DPDK: librte_ipsec.
>>> The aim is to provide DPDK native high performance library for IPsec
>>> data-path processing.
>>> The library is supposed to utilize existing DPDK crypto-dev and
>>> security API to provide application with transparent IPsec processing API.
>>> The library is concentrated on data-path protocols processing (ESP and AH),
>>> IKE protocol(s) implementation is out of scope for that library.
>>> Though hook/callback mechanisms will be defined to allow integrate it
>>> with existing IKE implementations.
>>> Due to quite complex nature of IPsec protocol suite and variety of user
>>> requirements and usage scenarios a few API levels will be provided:
>>> 1) Security Association (SA-level) API
>>>       Operates at SA level, provides functions to:
>>>       - initialize/teardown SA object
>>>       - process inbound/outbound ESP/AH packets associated with the given SA
>>>         (decrypt/encrypt, authenticate, check integrity,
>>>          add/remove ESP/AH related headers and data, etc.).
>>> 2) Security Association Database (SAD) API
>>>       API to create/manage/destroy IPsec SAD.
>>>       While DPDK IPsec library plans to have its own implementation,
>>>       the intention is to keep it as independent from the other parts
>>>       of IPsec library as possible.
>>>       That is supposed to give users the ability to provide their own
>>>       implementation of the SAD compatible with the other parts of the
>>>       IPsec library.
>>> 3) IPsec Context (CTX) API
>>>       This is supposed to be a high-level API, where each IPsec CTX is an
>>>       abstraction of 'independent copy of the IPsec stack'.
>>>       CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
>>>       and provides:
>>>       - de-multiplexing stream of inbound packets to particular SAs and
>>>         further IPsec related processing.
>>>       - IPsec related processing for the outbound packets.
>>>       - SA add/delete/update functionality
>> [Anoob]: Security Policy is an important aspect of IPsec. An IPsec
>> library without Security Policy API would be incomplete. For inline
>> protocol offload, the final SP-SA check(selector check) is the only
>> IPsec part being done by ipsec-secgw now. Would make sense to add that
>> also in the library.
> You mean here, that we need some sort of SPD implementation, correct?
[Anoob] Yes.
>
>>> Current RFC concentrates on SA-level API only (1),
>>> detailed discussion for 2) and 3) will be subjects for separate RFC(s).
>>>
>>> SA (low) level API
>>> ==================
>>>
>>> API described below operates on SA level.
>>> It provides functionality that allows user for given SA to process
>>> inbound and outbound IPsec packets.
>>> To be more specific:
>>> - for inbound ESP/AH packets perform decryption, authentication,
>>>     integrity checking, remove ESP/AH related headers
>> [Anoob] Anti-replay check would also be required.
> Yep, anti-replay and ESN support is implied as part of "integrity checking".
> Probably I have to be more specific here.
[Anoob] This is fine.
>
>>> - for outbound packets perform payload encryption, attach ICV,
>>>     update/add IP headers, add ESP/AH headers/trailers,
>>>     setup related mbuf felids (ol_flags, tx_offloads, etc.).
>> [Anoob] Do we have any plans to handle ESN expiry? Some means to
>> initiate an IKE renegotiation? I'm assuming application won't be aware
>> of the sequence numbers, in this case.
[Anoob] What is your plan with events like ESN expiry? IPsec spec talks 
about byte and time expiry as well.
>>> - initialize/un-initialize given SA based on user provided parameters.
>>>
>>> Processed inbound/outbound packets could be grouped by user provided
>>> flow id (opaque 64-bit number associated by user with given SA).
>>>
>>> SA-level API is based on top of crypto-dev/security API and relies on them
>>> to perform actual cipher and integrity checking.
>>> Due to the nature of crypto-dev API (enqueue/deque model) we use
>>> asynchronous API for IPsec packets destined to be processed
>>> by crypto-device:
>>> rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
>>> rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
>>> Though for packets destined for inline processing no extra overhead
>>> is required and simple and synchronous API: rte_ipsec_inline_process()
>>> is introduced for that case.
>> [Anoob] The API should include event-delivery as a crypto-op completion
>> mechanism as well. The application could configure the event crypto
>> adapter and then enqueue and dequeue to crypto device using events (via
>> event dev).
> Not sure what particular extra API you think is required here?
> As I understand in both cases (with or without event crypto-adapter) we still have to:
>   1) fill crypto-op properly
>   2) enqueue it to crypto-dev (via eventdev or directly)
> 3)  receive processed by crypto-dev crypto-op (either via eventdev or directly)
> 4) check crypto-op status, do further post-processing if any
>
> So #1 and #4 (SA-level API respnibility) remain the same for both cases.
[Anoob] rte_ipsec_inline_process works on packets not events. We might 
need a similar API which processes events.
>
>>> The following functionality:
>>>     - match inbound/outbound packets to particular SA
>>>     - manage crypto/security devices
>>>     - provide SAD/SPD related functionality
>>>     - determine what crypto/security device has to be used
>>>       for given packet(s)
>>> is out of scope for SA-level API.
>>>
>>> Below is the brief (and simplified) overview of expected SA-level
>>> API usage.
>>>
>>> /* allocate and initialize SA */
>>> size_t sz = rte_ipsec_sa_size();
>>> struct rte_ipsec_sa *sa = rte_malloc(sz);
>>> struct rte_ipsec_sa_prm prm;
>>> /* fill prm */
>>> rc = rte_ipsec_sa_init(sa, &prm);
>>> if (rc != 0) { /*handle error */}
>>> .....
>>>
>>> /* process inbound/outbound IPsec packets that belongs to given SA */
>>>
>>> /* inline IPsec processing was done for these packets */
>>> if (use_inline_ipsec)
>>>          n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
>>> /* use crypto-device to process the packets */
>>> else {
>>>        struct rte_crypto_op *cop[nb_pkts];
>>>        struct rte_ipsec_group grp[nb_pkts];
>>>
>>>         ....
>>>        /* prepare crypto ops */
>>>        n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
>>>        /* enqueue crypto ops to related crypto-dev */
>>>        n =  rte_cryptodev_enqueue_burst(..., cops, n);
>>>        if (n != nb_pkts) { /*handle failed packets */}
>>>        /* dequeue finished crypto ops from related crypto-dev */
>>>        n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
>>>        /* finish IPsec processing for associated packets */
>>>        n = rte_ipsec_crypto_process(cop, pkts, grp, n);
>> [Anoob] Does the SA based grouping apply to both inbound and outbound?
> Yes, the plan is to have it available for both cases.
[Anoob] On the inbound, shouldn't the packets be grouped+ordered based 
on inner L3+inner L4?
>
>>>        /* now we have <n> group of packets grouped by SA flow id  */
>>>       ....
>>>    }
>>> ...
>>>
>>> /* uninit given SA */
>>> rte_ipsec_sa_fini(sa);
>>>
>>> Planned scope for 18.11:
>>> ========================
>>>
>>> - SA-level API definition
>>> - ESP tunnel mode support (both IPv4/IPv6)
>>> - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
>>> - UT
>> [Anoob] What is UT?
> Unit-Test
>
>
>>> Note: Still WIP, so not all planned for 18.11 functionality is in place.
>>>
>>> Post 18.11:
>>> ===========
>>> - ESP transport mode support (both IPv4/IPv6)
>>> - update examples/ipsec-secgw to use librte_ipsec
>>> - SAD and high-level API definition and implementation
>>>
>>>
>>> Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
>>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>>> ---
>>>    config/common_base                     |   5 +
>>>    lib/Makefile                           |   2 +
>>>    lib/librte_ipsec/Makefile              |  24 +
>>>    lib/librte_ipsec/meson.build           |  10 +
>>>    lib/librte_ipsec/pad.h                 |  45 ++
>>>    lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
>>>    lib/librte_ipsec/rte_ipsec_version.map |  13 +
>>>    lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
>>>    lib/librte_net/rte_esp.h               |  10 +-
>>>    lib/meson.build                        |   2 +
>>>    mk/rte.app.mk                          |   2 +
>>>    11 files changed, 1278 insertions(+), 1 deletion(-)
>>>    create mode 100644 lib/librte_ipsec/Makefile
>>>    create mode 100644 lib/librte_ipsec/meson.build
>>>    create mode 100644 lib/librte_ipsec/pad.h
>>>    create mode 100644 lib/librte_ipsec/rte_ipsec.h
>>>    create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
>>>    create mode 100644 lib/librte_ipsec/sa.c
>> <snip>
>>> +static inline uint16_t
>>> +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>>> +       struct rte_crypto_op *cop[], uint16_t num)
>>> +{
>>> +       int32_t rc;
>>> +       uint32_t i, n;
>>> +       union sym_op_data icv;
>>> +
>>> +       n = esn_outb_check_sqn(sa, num);
>>> +
>>> +       for (i = 0; i != n; i++) {
>>> +
>>> +               sa->sqn++;
>> [Anoob] Shouldn't this be done atomically?
> If we want to have MT-safe API for SA-datapath API, then yes.
> Though it would make things more complicated here, especially for inbound with anti-replay support.
> I think it is doable (spin-lock?), but would cause extra overhead and complexity.
> Right now I am not sure it really worth it - comments/suggestions are welcome.
> What probably could be a good compromise - runtime decision per SA basis (at sa_init()),
> do we need an ST or MT behavior for given SA.
[Anoob] Going with single thread approach would significantly limit the 
scope of this library. Single thread approach would mean one SA on one 
core. This would not work on practical cases.

Suppose we have two flows which are supposed to use the same SA. With 
RSS, these flows could end up on different cores. Now only one core 
would be able to process, as SA will not be shared. We have the same 
problem in ipsec-secgw too.

In case of ingress also, the same problem exists. We will not be able to 
use RSS and spread the traffic to multiple cores. Considering IPsec 
being CPU intensive, this would limit the net output of the chip.

Thanks,

Anoob
  
Ananyev, Konstantin Sept. 12, 2018, 6:09 p.m. UTC | #4
Hi Anoob,

> 
> Hi Konstantin,
> Please see inline.
> 
> 
> This RFC introduces a new library within DPDK: librte_ipsec.
> The aim is to provide DPDK native high performance library for IPsec
> data-path processing.
> The library is supposed to utilize existing DPDK crypto-dev and
> security API to provide application with transparent IPsec processing API.
> The library is concentrated on data-path protocols processing (ESP and AH),
> IKE protocol(s) implementation is out of scope for that library.
> Though hook/callback mechanisms will be defined to allow integrate it
> with existing IKE implementations.
> Due to quite complex nature of IPsec protocol suite and variety of user
> requirements and usage scenarios a few API levels will be provided:
> 1) Security Association (SA-level) API
>      Operates at SA level, provides functions to:
>      - initialize/teardown SA object
>      - process inbound/outbound ESP/AH packets associated with the given SA
>        (decrypt/encrypt, authenticate, check integrity,
>         add/remove ESP/AH related headers and data, etc.).
> 2) Security Association Database (SAD) API
>      API to create/manage/destroy IPsec SAD.
>      While DPDK IPsec library plans to have its own implementation,
>      the intention is to keep it as independent from the other parts
>      of IPsec library as possible.
>      That is supposed to give users the ability to provide their own
>      implementation of the SAD compatible with the other parts of the
>      IPsec library.
> 3) IPsec Context (CTX) API
>      This is supposed to be a high-level API, where each IPsec CTX is an
>      abstraction of 'independent copy of the IPsec stack'.
>      CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
>      and provides:
>      - de-multiplexing stream of inbound packets to particular SAs and
>        further IPsec related processing.
>      - IPsec related processing for the outbound packets.
>      - SA add/delete/update functionality
> [Anoob]: Security Policy is an important aspect of IPsec. An IPsec
> library without Security Policy API would be incomplete. For inline
> protocol offload, the final SP-SA check(selector check) is the only
> IPsec part being done by ipsec-secgw now. Would make sense to add that
> also in the library.
> 
> You mean here, that we need some sort of SPD implementation, correct?
> [Anoob] Yes.

Ok, I see.
Our thought was that just something based on librte_acl would be enough here...
But if you think that a special defined SPD API (and implementation) is needed -
we can probably discuss it along with SAD API (#2 above).
Though if you'd like to start to work on RFC for it right-away - please feel free to do so :)

> 
> 
> 
> Current RFC concentrates on SA-level API only (1),
> detailed discussion for 2) and 3) will be subjects for separate RFC(s).
> 
> SA (low) level API
> ==================
> 
> API described below operates on SA level.
> It provides functionality that allows user for given SA to process
> inbound and outbound IPsec packets.
> To be more specific:
> - for inbound ESP/AH packets perform decryption, authentication,
>    integrity checking, remove ESP/AH related headers
> [Anoob] Anti-replay check would also be required.
> 
> Yep, anti-replay and ESN support is implied as part of "integrity checking".
> Probably I have to be more specific here.
> [Anoob] This is fine.
> 
> 
> 
> - for outbound packets perform payload encryption, attach ICV,
>    update/add IP headers, add ESP/AH headers/trailers,
>    setup related mbuf felids (ol_flags, tx_offloads, etc.).
> [Anoob] Do we have any plans to handle ESN expiry? Some means to
> initiate an IKE renegotiation? I'm assuming application won't be aware
> of the sequence numbers, in this case.
> [Anoob] What is your plan with events like ESN expiry? IPsec spec talks about byte and time expiry as well.

At current moment, for SA level: rte_ipsec_crypto_prepare()/rte_ipsec_inline_process() will set rte_errno
to special value (EOVERFLOW) to signal upper layer that limit is reached.
Upper layer can decide to start re-negotiation, or just destroy an SA.
 
Future plans for IPsec Context (CTX) API (#3 above):
Introduce a special function, something like:
rte_ipsec_get_expired(rte_ipsec_ctx *ctx, rte_ipsec_sa *expired_sa[], uint32_t num);
It would return up-to *num* of SAs for given ipsec context, that are expired/limit reached.
Then upper layer again might decide for each SA should renegotiation be started,
or just wipe given SA.
It would be upper layer responsibility to call this function periodically.
 
> 
> 
> - initialize/un-initialize given SA based on user provided parameters.
> 
> Processed inbound/outbound packets could be grouped by user provided
> flow id (opaque 64-bit number associated by user with given SA).
> 
> SA-level API is based on top of crypto-dev/security API and relies on them
> to perform actual cipher and integrity checking.
> Due to the nature of crypto-dev API (enqueue/deque model) we use
> asynchronous API for IPsec packets destined to be processed
> by crypto-device:
> rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
> rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
> Though for packets destined for inline processing no extra overhead
> is required and simple and synchronous API: rte_ipsec_inline_process()
> is introduced for that case.
> [Anoob] The API should include event-delivery as a crypto-op completion
> mechanism as well. The application could configure the event crypto
> adapter and then enqueue and dequeue to crypto device using events (via
> event dev).
> 
> Not sure what particular extra API you think is required here?
> As I understand in both cases (with or without event crypto-adapter) we still have to:
>  1) fill crypto-op properly
>  2) enqueue it to crypto-dev (via eventdev or directly)
> 3)  receive processed by crypto-dev crypto-op (either via eventdev or directly)
> 4) check crypto-op status, do further post-processing if any
> 
> So #1 and #4 (SA-level API respnibility) remain the same for both cases.
> [Anoob] rte_ipsec_inline_process works on packets not events. We might need a similar API which processes events.

Ok, I still don't get you here.
Could you specify what exactly function you'd like to add to the API here with parameter list
and brief behavior description? 

> 
> The following functionality:
>    - match inbound/outbound packets to particular SA
>    - manage crypto/security devices
>    - provide SAD/SPD related functionality
>    - determine what crypto/security device has to be used
>      for given packet(s)
> is out of scope for SA-level API.
> 
> Below is the brief (and simplified) overview of expected SA-level
> API usage.
> 
> /* allocate and initialize SA */
> size_t sz = rte_ipsec_sa_size();
> struct rte_ipsec_sa *sa = rte_malloc(sz);
> struct rte_ipsec_sa_prm prm;
> /* fill prm */
> rc = rte_ipsec_sa_init(sa, &prm);
> if (rc != 0) { /*handle error */}
> .....
> 
> /* process inbound/outbound IPsec packets that belongs to given SA */
> 
> /* inline IPsec processing was done for these packets */
> if (use_inline_ipsec)
>         n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
> /* use crypto-device to process the packets */
> else {
>       struct rte_crypto_op *cop[nb_pkts];
>       struct rte_ipsec_group grp[nb_pkts];
> 
>        ....
>       /* prepare crypto ops */
>       n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
>       /* enqueue crypto ops to related crypto-dev */
>       n =  rte_cryptodev_enqueue_burst(..., cops, n);
>       if (n != nb_pkts) { /*handle failed packets */}
>       /* dequeue finished crypto ops from related crypto-dev */
>       n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
>       /* finish IPsec processing for associated packets */
>       n = rte_ipsec_crypto_process(cop, pkts, grp, n);
> [Anoob] Does the SA based grouping apply to both inbound and outbound?
> 
> Yes, the plan is to have it available for both cases.
> [Anoob] On the inbound, shouldn't the packets be grouped+ordered based on inner L3+inner L4?

I think that's up to the user decide based on what criteria wants to group it and does he wants
to do any grouping at all.
That's why flowid is user-defined and totally transparent to the lib.

> 
> 
> 
>       /* now we have <n> group of packets grouped by SA flow id  */
>      ....
>   }
> ...
> 
> /* uninit given SA */
> rte_ipsec_sa_fini(sa);
> 
> Planned scope for 18.11:
> ========================
> 
> - SA-level API definition
> - ESP tunnel mode support (both IPv4/IPv6)
> - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
> - UT
> [Anoob] What is UT?
> 
> Unit-Test
> 
> 
> Note: Still WIP, so not all planned for 18.11 functionality is in place.
> 
> Post 18.11:
> ===========
> - ESP transport mode support (both IPv4/IPv6)
> - update examples/ipsec-secgw to use librte_ipsec
> - SAD and high-level API definition and implementation
> 
> 
> Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>   config/common_base                     |   5 +
>   lib/Makefile                           |   2 +
>   lib/librte_ipsec/Makefile              |  24 +
>   lib/librte_ipsec/meson.build           |  10 +
>   lib/librte_ipsec/pad.h                 |  45 ++
>   lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
>   lib/librte_ipsec/rte_ipsec_version.map |  13 +
>   lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
>   lib/librte_net/rte_esp.h               |  10 +-
>   lib/meson.build                        |   2 +
>   mk/rte.app.mk                          |   2 +
>   11 files changed, 1278 insertions(+), 1 deletion(-)
>   create mode 100644 lib/librte_ipsec/Makefile
>   create mode 100644 lib/librte_ipsec/meson.build
>   create mode 100644 lib/librte_ipsec/pad.h
>   create mode 100644 lib/librte_ipsec/rte_ipsec.h
>   create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
>   create mode 100644 lib/librte_ipsec/sa.c
> <snip>
> +static inline uint16_t
> +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
> +       struct rte_crypto_op *cop[], uint16_t num)
> +{
> +       int32_t rc;
> +       uint32_t i, n;
> +       union sym_op_data icv;
> +
> +       n = esn_outb_check_sqn(sa, num);
> +
> +       for (i = 0; i != n; i++) {
> +
> +               sa->sqn++;
> [Anoob] Shouldn't this be done atomically?
> 
> If we want to have MT-safe API for SA-datapath API, then yes.
> Though it would make things more complicated here, especially for inbound with anti-replay support.
> I think it is doable (spin-lock?), but would cause extra overhead and complexity.
> Right now I am not sure it really worth it - comments/suggestions are welcome.
> What probably could be a good compromise - runtime decision per SA basis (at sa_init()),
> do we need an ST or MT behavior for given SA.
> [Anoob] Going with single thread approach would significantly limit the scope of this library. Single thread approach would mean
> one SA on one core. This would not work on practical cases.
> Suppose we have two flows which are supposed to use the same SA. With RSS, these flows could end up on different cores. Now
> only one core would be able to process, as SA will not be shared. We have the same problem in ipsec-secgw too.

Just for my curiosity - how do you plan to use RSS for ipsec packet distribution?
Do you foresee a common situation when there would be packets that belongs to the same SA
(same SPI) but with multiple source(destination) IP addresses?
If so, probably some examples would be helpful.
I think IPsec RFCs doesn't prevent such situation, but AFAIK the most common case - single source/destination IPs for the same SPI.

Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU cores.
To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
I think it would require some extra synchronization:  
make sure that we do final packet processing (seq check/update) at the same order as we received the packets
(packets entered ipsec processing).  
I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
Though we plan CTX level API to support such scenario.
What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks concurrently.

> In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores. Considering
> IPsec being CPU intensive, this would limit the net output of the chip.

That's true - but from other side implementation can offload heavy part
(encrypt/decrypt, auth) to special HW (cryptodev).
In that case single core might be enough for SA and extra synchronization would just slowdown things.
That's why I think it should be configurable  what behavior (ST or MT) to use. 

Konstantin
  
Anoob Joseph Sept. 15, 2018, 5:06 p.m. UTC | #5
Hi Konstantin

See inline.

Thanks,

Anoob


On 12-09-2018 23:39, Ananyev, Konstantin wrote:
> External Email
>
> Hi Anoob,
>
>> Hi Konstantin,
>> Please see inline.
>>
>>
>> This RFC introduces a new library within DPDK: librte_ipsec.
>> The aim is to provide DPDK native high performance library for IPsec
>> data-path processing.
>> The library is supposed to utilize existing DPDK crypto-dev and
>> security API to provide application with transparent IPsec processing API.
>> The library is concentrated on data-path protocols processing (ESP and AH),
>> IKE protocol(s) implementation is out of scope for that library.
>> Though hook/callback mechanisms will be defined to allow integrate it
>> with existing IKE implementations.
>> Due to quite complex nature of IPsec protocol suite and variety of user
>> requirements and usage scenarios a few API levels will be provided:
>> 1) Security Association (SA-level) API
>>       Operates at SA level, provides functions to:
>>       - initialize/teardown SA object
>>       - process inbound/outbound ESP/AH packets associated with the given SA
>>         (decrypt/encrypt, authenticate, check integrity,
>>          add/remove ESP/AH related headers and data, etc.).
>> 2) Security Association Database (SAD) API
>>       API to create/manage/destroy IPsec SAD.
>>       While DPDK IPsec library plans to have its own implementation,
>>       the intention is to keep it as independent from the other parts
>>       of IPsec library as possible.
>>       That is supposed to give users the ability to provide their own
>>       implementation of the SAD compatible with the other parts of the
>>       IPsec library.
>> 3) IPsec Context (CTX) API
>>       This is supposed to be a high-level API, where each IPsec CTX is an
>>       abstraction of 'independent copy of the IPsec stack'.
>>       CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
>>       and provides:
>>       - de-multiplexing stream of inbound packets to particular SAs and
>>         further IPsec related processing.
>>       - IPsec related processing for the outbound packets.
>>       - SA add/delete/update functionality
>> [Anoob]: Security Policy is an important aspect of IPsec. An IPsec
>> library without Security Policy API would be incomplete. For inline
>> protocol offload, the final SP-SA check(selector check) is the only
>> IPsec part being done by ipsec-secgw now. Would make sense to add that
>> also in the library.
>>
>> You mean here, that we need some sort of SPD implementation, correct?
>> [Anoob] Yes.
> Ok, I see.
> Our thought was that just something based on librte_acl would be enough here...
> But if you think that a special defined SPD API (and implementation) is needed -
> we can probably discuss it along with SAD API (#2 above).
> Though if you'd like to start to work on RFC for it right-away - please feel free to do so :)
>
>>
>>
>> Current RFC concentrates on SA-level API only (1),
>> detailed discussion for 2) and 3) will be subjects for separate RFC(s).
>>
>> SA (low) level API
>> ==================
>>
>> API described below operates on SA level.
>> It provides functionality that allows user for given SA to process
>> inbound and outbound IPsec packets.
>> To be more specific:
>> - for inbound ESP/AH packets perform decryption, authentication,
>>     integrity checking, remove ESP/AH related headers
>> [Anoob] Anti-replay check would also be required.
>>
>> Yep, anti-replay and ESN support is implied as part of "integrity checking".
>> Probably I have to be more specific here.
>> [Anoob] This is fine.
>>
>>
>>
>> - for outbound packets perform payload encryption, attach ICV,
>>     update/add IP headers, add ESP/AH headers/trailers,
>>     setup related mbuf felids (ol_flags, tx_offloads, etc.).
>> [Anoob] Do we have any plans to handle ESN expiry? Some means to
>> initiate an IKE renegotiation? I'm assuming application won't be aware
>> of the sequence numbers, in this case.
>> [Anoob] What is your plan with events like ESN expiry? IPsec spec talks about byte and time expiry as well.
> At current moment, for SA level: rte_ipsec_crypto_prepare()/rte_ipsec_inline_process() will set rte_errno
> to special value (EOVERFLOW) to signal upper layer that limit is reached.
> Upper layer can decide to start re-negotiation, or just destroy an SA.
>
> Future plans for IPsec Context (CTX) API (#3 above):
> Introduce a special function, something like:
> rte_ipsec_get_expired(rte_ipsec_ctx *ctx, rte_ipsec_sa *expired_sa[], uint32_t num);
> It would return up-to *num* of SAs for given ipsec context, that are expired/limit reached.
> Then upper layer again might decide for each SA should renegotiation be started,
> or just wipe given SA.
> It would be upper layer responsibility to call this function periodically.
>
>>
>> - initialize/un-initialize given SA based on user provided parameters.
>>
>> Processed inbound/outbound packets could be grouped by user provided
>> flow id (opaque 64-bit number associated by user with given SA).
>>
>> SA-level API is based on top of crypto-dev/security API and relies on them
>> to perform actual cipher and integrity checking.
>> Due to the nature of crypto-dev API (enqueue/deque model) we use
>> asynchronous API for IPsec packets destined to be processed
>> by crypto-device:
>> rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
>> rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
>> Though for packets destined for inline processing no extra overhead
>> is required and simple and synchronous API: rte_ipsec_inline_process()
>> is introduced for that case.
>> [Anoob] The API should include event-delivery as a crypto-op completion
>> mechanism as well. The application could configure the event crypto
>> adapter and then enqueue and dequeue to crypto device using events (via
>> event dev).
>>
>> Not sure what particular extra API you think is required here?
>> As I understand in both cases (with or without event crypto-adapter) we still have to:
>>   1) fill crypto-op properly
>>   2) enqueue it to crypto-dev (via eventdev or directly)
>> 3)  receive processed by crypto-dev crypto-op (either via eventdev or directly)
>> 4) check crypto-op status, do further post-processing if any
>>
>> So #1 and #4 (SA-level API respnibility) remain the same for both cases.
>> [Anoob] rte_ipsec_inline_process works on packets not events. We might need a similar API which processes events.
> Ok, I still don't get you here.
> Could you specify what exactly function you'd like to add to the API here with parameter list
> and brief behavior description?
>
>> The following functionality:
>>     - match inbound/outbound packets to particular SA
>>     - manage crypto/security devices
>>     - provide SAD/SPD related functionality
>>     - determine what crypto/security device has to be used
>>       for given packet(s)
>> is out of scope for SA-level API.
>>
>> Below is the brief (and simplified) overview of expected SA-level
>> API usage.
>>
>> /* allocate and initialize SA */
>> size_t sz = rte_ipsec_sa_size();
>> struct rte_ipsec_sa *sa = rte_malloc(sz);
>> struct rte_ipsec_sa_prm prm;
>> /* fill prm */
>> rc = rte_ipsec_sa_init(sa, &prm);
>> if (rc != 0) { /*handle error */}
>> .....
>>
>> /* process inbound/outbound IPsec packets that belongs to given SA */
>>
>> /* inline IPsec processing was done for these packets */
>> if (use_inline_ipsec)
>>          n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
>> /* use crypto-device to process the packets */
>> else {
>>        struct rte_crypto_op *cop[nb_pkts];
>>        struct rte_ipsec_group grp[nb_pkts];
>>
>>         ....
>>        /* prepare crypto ops */
>>        n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
>>        /* enqueue crypto ops to related crypto-dev */
>>        n =  rte_cryptodev_enqueue_burst(..., cops, n);
>>        if (n != nb_pkts) { /*handle failed packets */}
>>        /* dequeue finished crypto ops from related crypto-dev */
>>        n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
>>        /* finish IPsec processing for associated packets */
>>        n = rte_ipsec_crypto_process(cop, pkts, grp, n);
>> [Anoob] Does the SA based grouping apply to both inbound and outbound?
>>
>> Yes, the plan is to have it available for both cases.
>> [Anoob] On the inbound, shouldn't the packets be grouped+ordered based on inner L3+inner L4?
> I think that's up to the user decide based on what criteria wants to group it and does he wants
> to do any grouping at all.
> That's why flowid is user-defined and totally transparent to the lib.
>
>>
>>
>>        /* now we have <n> group of packets grouped by SA flow id  */
>>       ....
>>    }
>> ...
>>
>> /* uninit given SA */
>> rte_ipsec_sa_fini(sa);
>>
>> Planned scope for 18.11:
>> ========================
>>
>> - SA-level API definition
>> - ESP tunnel mode support (both IPv4/IPv6)
>> - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
>> - UT
>> [Anoob] What is UT?
>>
>> Unit-Test
>>
>>
>> Note: Still WIP, so not all planned for 18.11 functionality is in place.
>>
>> Post 18.11:
>> ===========
>> - ESP transport mode support (both IPv4/IPv6)
>> - update examples/ipsec-secgw to use librte_ipsec
>> - SAD and high-level API definition and implementation
>>
>>
>> Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
>>    config/common_base                     |   5 +
>>    lib/Makefile                           |   2 +
>>    lib/librte_ipsec/Makefile              |  24 +
>>    lib/librte_ipsec/meson.build           |  10 +
>>    lib/librte_ipsec/pad.h                 |  45 ++
>>    lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
>>    lib/librte_ipsec/rte_ipsec_version.map |  13 +
>>    lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
>>    lib/librte_net/rte_esp.h               |  10 +-
>>    lib/meson.build                        |   2 +
>>    mk/rte.app.mk                          |   2 +
>>    11 files changed, 1278 insertions(+), 1 deletion(-)
>>    create mode 100644 lib/librte_ipsec/Makefile
>>    create mode 100644 lib/librte_ipsec/meson.build
>>    create mode 100644 lib/librte_ipsec/pad.h
>>    create mode 100644 lib/librte_ipsec/rte_ipsec.h
>>    create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
>>    create mode 100644 lib/librte_ipsec/sa.c
>> <snip>
>> +static inline uint16_t
>> +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>> +       struct rte_crypto_op *cop[], uint16_t num)
>> +{
>> +       int32_t rc;
>> +       uint32_t i, n;
>> +       union sym_op_data icv;
>> +
>> +       n = esn_outb_check_sqn(sa, num);
>> +
>> +       for (i = 0; i != n; i++) {
>> +
>> +               sa->sqn++;
>> [Anoob] Shouldn't this be done atomically?
>>
>> If we want to have MT-safe API for SA-datapath API, then yes.
>> Though it would make things more complicated here, especially for inbound with anti-replay support.
>> I think it is doable (spin-lock?), but would cause extra overhead and complexity.
>> Right now I am not sure it really worth it - comments/suggestions are welcome.
>> What probably could be a good compromise - runtime decision per SA basis (at sa_init()),
>> do we need an ST or MT behavior for given SA.
>> [Anoob] Going with single thread approach would significantly limit the scope of this library. Single thread approach would mean
>> one SA on one core. This would not work on practical cases.
>> Suppose we have two flows which are supposed to use the same SA. With RSS, these flows could end up on different cores. Now
>> only one core would be able to process, as SA will not be shared. We have the same problem in ipsec-secgw too.
> Just for my curiosity - how do you plan to use RSS for ipsec packet distribution?
> Do you foresee a common situation when there would be packets that belongs to the same SA
> (same SPI) but with multiple source(destination) IP addresses?
> If so, probably some examples would be helpful.
> I think IPsec RFCs doesn't prevent such situation, but AFAIK the most common case - single source/destination IPs for the same SPI.

sp ipv4 out esp protect 6 pri 1 dst 192.168.1.0/24 sport 0:65535 dport 0:65535
sp ipv4 out esp protect 6 pri 1 dst 192.168.2.0/24 sport 0:65535 dport 0:65535

sa out 6 cipher_algo aes-128-cbc cipher_key 22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff:00:11 auth_algo sha1-hmac auth_key 22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff:00:11:22:33:44:55 mode ipv4-tunnel src 172.16.2.1 dst 172.16.1.1

Isn't this a valid configuration? Wouldn't this be a common use case when we have site-to-site tunneling?

https://tools.ietf.org/html/rfc4301#section-4.4.1.1

>
> Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU cores.
> To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> I think it would require some extra synchronization:
> make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> (packets entered ipsec processing).
> I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> Though we plan CTX level API to support such scenario.
> What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks concurrently.
>
>> In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores. Considering
>> IPsec being CPU intensive, this would limit the net output of the chip.
> That's true - but from other side implementation can offload heavy part
> (encrypt/decrypt, auth) to special HW (cryptodev).
> In that case single core might be enough for SA and extra synchronization would just slowdown things.
> That's why I think it should be configurable  what behavior (ST or MT) to use.
I do agree that these are the issues that we need to address to make the 
library MT safe. Whether the extra synchronization would slow down 
things is a very subjective question and will heavily depend on the 
platform. The library should have enough provisions to be able to 
support MT without causing overheads to ST. Right now, the library 
assumes ST.
>
> Konstantin
  
Jerin Jacob Sept. 16, 2018, 10:56 a.m. UTC | #6
-----Original Message-----
> Date: Sat, 15 Sep 2018 22:36:18 +0530
> From: "Joseph, Anoob" <Anoob.Joseph@caviumnetworks.com>
> To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "dev@dpdk.org"
>  <dev@dpdk.org>
> Cc: "Awal, Mohammad Abdul" <mohammad.abdul.awal@intel.com>, "Doherty,
>  Declan" <declan.doherty@intel.com>, "Jerin Jacob
>  (jerin.jacob@caviumnetworks.com)" <jerin.jacob@caviumnetworks.com>,
>  Narayana Prasad <narayanaprasad.athreya@caviumnetworks.com>
> Subject: Re: [dpdk-dev] [RFC] ipsec: new library for IPsec data-path
>  processing
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> 
> > 
> > Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU cores.
> > To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> > I think it would require some extra synchronization:
> > make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> > (packets entered ipsec processing).
> > I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> > Though we plan CTX level API to support such scenario.
> > What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks concurrently.
> > 
> > > In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores. Considering
> > > IPsec being CPU intensive, this would limit the net output of the chip.
> > That's true - but from other side implementation can offload heavy part
> > (encrypt/decrypt, auth) to special HW (cryptodev).
> > In that case single core might be enough for SA and extra synchronization would just slowdown things.
> > That's why I think it should be configurable  what behavior (ST or MT) to use.
> I do agree that these are the issues that we need to address to make the
> library MT safe. Whether the extra synchronization would slow down things is
> a very subjective question and will heavily depend on the platform. The
> library should have enough provisions to be able to support MT without
> causing overheads to ST. Right now, the library assumes ST.


I agree with Anoob here.

I have two concerns with librte_ipsec as a separate library

1) There is an overlap with rte_security and new proposed library.
For IPsec, If an application needs to use rte_security for HW
implementation and and application needs to use librte_ipsec for
 SW implementation then it is bad and a lot duplication of work on 
he slow path too.

The rte_security spec can support both inline and look-aside IPSec
protocol support.

2) This library is tuned for fat CPU core in mind like single SA on core
etc. Which is fine for x86 servers and arm64 server category of machines
but it does not work very well with NPU class of SoC or FPGA.

As there  are the different ways to implement the IPSec, For instance,
use of eventdev can help in situation for handling millions of SA and 
equence number of update and anti reply check can be done by leveraging 
some of the HW specific features like 
ORDERED, ATOMIC schedule type(mapped as eventdev feature)in HW with PIPELINE model.

# Issues with having one SA one core,
- In the outbound side, there could be multiple flows using the same SA.
  Multiple flows could be processed parallel on different lcores,
but tying one SA to one core would mean we won't be able to do that.

- In the inbound side, we will have a fat flow hitting one core. If
  IPsec library assumes single core, we will not be able to to spread
fat flow to multiple cores. And one SA-one core would mean all ports on
which we would expect IPsec traffic has to be handled by that core.

I have made a simple presentation. This presentation details ONE WAY to
implement the IPSec with HW support on NPU.

https://docs.google.com/presentation/d/1e3IDf9R7ZQB8FN16Nvu7KINuLSWMdyKEw8_0H05rjj4/edit?usp=sharing

I am not saying this should be the ONLY way to do as it does not work
very well with non NPU/FPGA class of SoC.

So how about making the proposed IPSec library as plugin/driver to
rte_security.
This would give flexibly for each vendor/platform choose to different
IPse implementation based on HW support WITHOUT CHANGING THE APPLICATION
INTERFACE.

IMO, rte_security IPsec look aside support can be simply added by
creating the virtual crypto device(i.e move the proposed code to the virtual crypto device)
likewise inline support
can be added by the virtual ethdev device. This would avoid the need for
updating ipsec-gw application as well i.e unified interface to application.

If you don't like the above idea, any scheme of plugin based
implementation would be fine so that vendor or platform can choose its own implementation.
It can be based on partial HW implement too. i.e SA look can be used in SW, remaining stuff in HW
(for example IPsec inline case)

# For protocols like UDP, it makes sense to create librte_udp as there
no much HW specific offload other than ethdev provides.

# PDCP could be another library to offload to HW, So talking
rte_security path makes more sense in that case too.

Jerin
  
Ananyev, Konstantin Sept. 17, 2018, 10:36 a.m. UTC | #7
Hi Anoob,

> 
> Hi Konstantin,
> Please see inline.
> 
> 
> This RFC introduces a new library within DPDK: librte_ipsec.
> The aim is to provide DPDK native high performance library for IPsec
> data-path processing.
> The library is supposed to utilize existing DPDK crypto-dev and
> security API to provide application with transparent IPsec processing API.
> The library is concentrated on data-path protocols processing (ESP and AH),
> IKE protocol(s) implementation is out of scope for that library.
> Though hook/callback mechanisms will be defined to allow integrate it
> with existing IKE implementations.
> Due to quite complex nature of IPsec protocol suite and variety of user
> requirements and usage scenarios a few API levels will be provided:
> 1) Security Association (SA-level) API
>      Operates at SA level, provides functions to:
>      - initialize/teardown SA object
>      - process inbound/outbound ESP/AH packets associated with the given SA
>        (decrypt/encrypt, authenticate, check integrity,
>         add/remove ESP/AH related headers and data, etc.).
> 2) Security Association Database (SAD) API
>      API to create/manage/destroy IPsec SAD.
>      While DPDK IPsec library plans to have its own implementation,
>      the intention is to keep it as independent from the other parts
>      of IPsec library as possible.
>      That is supposed to give users the ability to provide their own
>      implementation of the SAD compatible with the other parts of the
>      IPsec library.
> 3) IPsec Context (CTX) API
>      This is supposed to be a high-level API, where each IPsec CTX is an
>      abstraction of 'independent copy of the IPsec stack'.
>      CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
>      and provides:
>      - de-multiplexing stream of inbound packets to particular SAs and
>        further IPsec related processing.
>      - IPsec related processing for the outbound packets.
>      - SA add/delete/update functionality
> [Anoob]: Security Policy is an important aspect of IPsec. An IPsec
> library without Security Policy API would be incomplete. For inline
> protocol offload, the final SP-SA check(selector check) is the only
> IPsec part being done by ipsec-secgw now. Would make sense to add that
> also in the library.
> 
> You mean here, that we need some sort of SPD implementation, correct?
> [Anoob] Yes.
> 
> Ok, I see.
> Our thought was that just something based on librte_acl would be enough here...
> But if you think that a special defined SPD API (and implementation) is needed -
> we can probably discuss it along with SAD API (#2 above).
> Though if you'd like to start to work on RFC for it right-away - please feel free to do so :)
> 
> 
> 
> 
> Current RFC concentrates on SA-level API only (1),
> detailed discussion for 2) and 3) will be subjects for separate RFC(s).
> 
> SA (low) level API
> ==================
> 
> API described below operates on SA level.
> It provides functionality that allows user for given SA to process
> inbound and outbound IPsec packets.
> To be more specific:
> - for inbound ESP/AH packets perform decryption, authentication,
>    integrity checking, remove ESP/AH related headers
> [Anoob] Anti-replay check would also be required.
> 
> Yep, anti-replay and ESN support is implied as part of "integrity checking".
> Probably I have to be more specific here.
> [Anoob] This is fine.
> 
> 
> 
> - for outbound packets perform payload encryption, attach ICV,
>    update/add IP headers, add ESP/AH headers/trailers,
>    setup related mbuf felids (ol_flags, tx_offloads, etc.).
> [Anoob] Do we have any plans to handle ESN expiry? Some means to
> initiate an IKE renegotiation? I'm assuming application won't be aware
> of the sequence numbers, in this case.
> [Anoob] What is your plan with events like ESN expiry? IPsec spec talks about byte and time expiry as well.
> 
> At current moment, for SA level: rte_ipsec_crypto_prepare()/rte_ipsec_inline_process() will set rte_errno
> to special value (EOVERFLOW) to signal upper layer that limit is reached.
> Upper layer can decide to start re-negotiation, or just destroy an SA.
> 
> Future plans for IPsec Context (CTX) API (#3 above):
> Introduce a special function, something like:
> rte_ipsec_get_expired(rte_ipsec_ctx *ctx, rte_ipsec_sa *expired_sa[], uint32_t num);
> It would return up-to *num* of SAs for given ipsec context, that are expired/limit reached.
> Then upper layer again might decide for each SA should renegotiation be started,
> or just wipe given SA.
> It would be upper layer responsibility to call this function periodically.
> 
> 
> 
> - initialize/un-initialize given SA based on user provided parameters.
> 
> Processed inbound/outbound packets could be grouped by user provided
> flow id (opaque 64-bit number associated by user with given SA).
> 
> SA-level API is based on top of crypto-dev/security API and relies on them
> to perform actual cipher and integrity checking.
> Due to the nature of crypto-dev API (enqueue/deque model) we use
> asynchronous API for IPsec packets destined to be processed
> by crypto-device:
> rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
> rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
> Though for packets destined for inline processing no extra overhead
> is required and simple and synchronous API: rte_ipsec_inline_process()
> is introduced for that case.
> [Anoob] The API should include event-delivery as a crypto-op completion
> mechanism as well. The application could configure the event crypto
> adapter and then enqueue and dequeue to crypto device using events (via
> event dev).
> 
> Not sure what particular extra API you think is required here?
> As I understand in both cases (with or without event crypto-adapter) we still have to:
>  1) fill crypto-op properly
>  2) enqueue it to crypto-dev (via eventdev or directly)
> 3)  receive processed by crypto-dev crypto-op (either via eventdev or directly)
> 4) check crypto-op status, do further post-processing if any
> 
> So #1 and #4 (SA-level API respnibility) remain the same for both cases.
> [Anoob] rte_ipsec_inline_process works on packets not events. We might need a similar API which processes events.
> 
> Ok, I still don't get you here.
> Could you specify what exactly function you'd like to add to the API here with parameter list
> and brief behavior description?
> 
> 
> The following functionality:
>    - match inbound/outbound packets to particular SA
>    - manage crypto/security devices
>    - provide SAD/SPD related functionality
>    - determine what crypto/security device has to be used
>      for given packet(s)
> is out of scope for SA-level API.
> 
> Below is the brief (and simplified) overview of expected SA-level
> API usage.
> 
> /* allocate and initialize SA */
> size_t sz = rte_ipsec_sa_size();
> struct rte_ipsec_sa *sa = rte_malloc(sz);
> struct rte_ipsec_sa_prm prm;
> /* fill prm */
> rc = rte_ipsec_sa_init(sa, &prm);
> if (rc != 0) { /*handle error */}
> .....
> 
> /* process inbound/outbound IPsec packets that belongs to given SA */
> 
> /* inline IPsec processing was done for these packets */
> if (use_inline_ipsec)
>         n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
> /* use crypto-device to process the packets */
> else {
>       struct rte_crypto_op *cop[nb_pkts];
>       struct rte_ipsec_group grp[nb_pkts];
> 
>        ....
>       /* prepare crypto ops */
>       n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
>       /* enqueue crypto ops to related crypto-dev */
>       n =  rte_cryptodev_enqueue_burst(..., cops, n);
>       if (n != nb_pkts) { /*handle failed packets */}
>       /* dequeue finished crypto ops from related crypto-dev */
>       n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
>       /* finish IPsec processing for associated packets */
>       n = rte_ipsec_crypto_process(cop, pkts, grp, n);
> [Anoob] Does the SA based grouping apply to both inbound and outbound?
> 
> Yes, the plan is to have it available for both cases.
> [Anoob] On the inbound, shouldn't the packets be grouped+ordered based on inner L3+inner L4?
> 
> I think that's up to the user decide based on what criteria wants to group it and does he wants
> to do any grouping at all.
> That's why flowid is user-defined and totally transparent to the lib.
> 
> 
> 
> 
>       /* now we have <n> group of packets grouped by SA flow id  */
>      ....
>   }
> ...
> 
> /* uninit given SA */
> rte_ipsec_sa_fini(sa);
> 
> Planned scope for 18.11:
> ========================
> 
> - SA-level API definition
> - ESP tunnel mode support (both IPv4/IPv6)
> - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
> - UT
> [Anoob] What is UT?
> 
> Unit-Test
> 
> 
> Note: Still WIP, so not all planned for 18.11 functionality is in place.
> 
> Post 18.11:
> ===========
> - ESP transport mode support (both IPv4/IPv6)
> - update examples/ipsec-secgw to use librte_ipsec
> - SAD and high-level API definition and implementation
> 
> 
> Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>   config/common_base                     |   5 +
>   lib/Makefile                           |   2 +
>   lib/librte_ipsec/Makefile              |  24 +
>   lib/librte_ipsec/meson.build           |  10 +
>   lib/librte_ipsec/pad.h                 |  45 ++
>   lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
>   lib/librte_ipsec/rte_ipsec_version.map |  13 +
>   lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
>   lib/librte_net/rte_esp.h               |  10 +-
>   lib/meson.build                        |   2 +
>   mk/rte.app.mk                          |   2 +
>   11 files changed, 1278 insertions(+), 1 deletion(-)
>   create mode 100644 lib/librte_ipsec/Makefile
>   create mode 100644 lib/librte_ipsec/meson.build
>   create mode 100644 lib/librte_ipsec/pad.h
>   create mode 100644 lib/librte_ipsec/rte_ipsec.h
>   create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
>   create mode 100644 lib/librte_ipsec/sa.c
> <snip>
> +static inline uint16_t
> +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
> +       struct rte_crypto_op *cop[], uint16_t num)
> +{
> +       int32_t rc;
> +       uint32_t i, n;
> +       union sym_op_data icv;
> +
> +       n = esn_outb_check_sqn(sa, num);
> +
> +       for (i = 0; i != n; i++) {
> +
> +               sa->sqn++;
> [Anoob] Shouldn't this be done atomically?
> 
> If we want to have MT-safe API for SA-datapath API, then yes.
> Though it would make things more complicated here, especially for inbound with anti-replay support.
> I think it is doable (spin-lock?), but would cause extra overhead and complexity.
> Right now I am not sure it really worth it - comments/suggestions are welcome.
> What probably could be a good compromise - runtime decision per SA basis (at sa_init()),
> do we need an ST or MT behavior for given SA.
> [Anoob] Going with single thread approach would significantly limit the scope of this library. Single thread approach would mean
> one SA on one core. This would not work on practical cases.
> Suppose we have two flows which are supposed to use the same SA. With RSS, these flows could end up on different cores. Now
> only one core would be able to process, as SA will not be shared. We have the same problem in ipsec-secgw too.
> 
> Just for my curiosity - how do you plan to use RSS for ipsec packet distribution?
> Do you foresee a common situation when there would be packets that belongs to the same SA
> (same SPI) but with multiple source(destination) IP addresses?
> If so, probably some examples would be helpful.
> I think IPsec RFCs doesn't prevent such situation, but AFAIK the most common case - single source/destination IPs for the same SPI.
> 
> sp ipv4 out esp protect 6 pri 1 dst 192.168.1.0/24 sport 0:65535 dport 0:65535
> sp ipv4 out esp protect 6 pri 1 dst 192.168.2.0/24 sport 0:65535 dport 0:65535
> sa out 6 cipher_algo aes-128-cbc cipher_key 22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff:00:11 auth_algo sha1-hmac auth_key
> 22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff:00:11:22:33:44:55 mode ipv4-tunnel src 172.16.2.1 dst 172.16.1.1
> Isn't this a valid configuration? Wouldn't this be a common use case when we have site-to-site tunneling?
> https://tools.ietf.org/html/rfc4301#section-4.4.1.1

Ok, I think I understand what was my confusion here - above you talked about using RSS to distribute incoming *outbound* traffic, correct?
If so, then yes I think such scheme would work without problems.
My original thought was that we are talking about inbound traffic distribution here - in that case standard RSS wouldn't help much.

> 
> 
> Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU cores.
> To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> I think it would require some extra synchronization:
> make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> (packets entered ipsec processing).
> I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> Though we plan CTX level API to support such scenario.
> What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
> concurrently.
> 
> In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores. Considering
> IPsec being CPU intensive, this would limit the net output of the chip.
> 
> That's true - but from other side implementation can offload heavy part
> (encrypt/decrypt, auth) to special HW (cryptodev).
> In that case single core might be enough for SA and extra synchronization would just slowdown things.
> That's why I think it should be configurable  what behavior (ST or MT) to use.
> I do agree that these are the issues that we need to address to make the library MT safe. Whether the extra synchronization would
> slow down things is a very subjective question and will heavily depend on the platform. The library should have enough provisions
> to be able to support MT without causing overheads to ST. Right now, the library assumes ST.

Ok, I suppose we both agree that we need ST and MT case supported.
I didn't want to introduce MT related code right now (for 18.11), but as you guys seems very concerned about it,
we will try to add MT related stuff into v1, so you can review it at early stages. 
Konstantin
  
Anoob Joseph Sept. 17, 2018, 2:41 p.m. UTC | #8
Hi Konstantin,


On 17-09-2018 16:06, Ananyev, Konstantin wrote:
> External Email
>
> Hi Anoob,
>
>> Hi Konstantin,
>> Please see inline.
>>
>>
>> This RFC introduces a new library within DPDK: librte_ipsec.
>> The aim is to provide DPDK native high performance library for IPsec
>> data-path processing.
>> The library is supposed to utilize existing DPDK crypto-dev and
>> security API to provide application with transparent IPsec processing API.
>> The library is concentrated on data-path protocols processing (ESP and AH),
>> IKE protocol(s) implementation is out of scope for that library.
>> Though hook/callback mechanisms will be defined to allow integrate it
>> with existing IKE implementations.
>> Due to quite complex nature of IPsec protocol suite and variety of user
>> requirements and usage scenarios a few API levels will be provided:
>> 1) Security Association (SA-level) API
>>       Operates at SA level, provides functions to:
>>       - initialize/teardown SA object
>>       - process inbound/outbound ESP/AH packets associated with the given SA
>>         (decrypt/encrypt, authenticate, check integrity,
>>          add/remove ESP/AH related headers and data, etc.).
>> 2) Security Association Database (SAD) API
>>       API to create/manage/destroy IPsec SAD.
>>       While DPDK IPsec library plans to have its own implementation,
>>       the intention is to keep it as independent from the other parts
>>       of IPsec library as possible.
>>       That is supposed to give users the ability to provide their own
>>       implementation of the SAD compatible with the other parts of the
>>       IPsec library.
>> 3) IPsec Context (CTX) API
>>       This is supposed to be a high-level API, where each IPsec CTX is an
>>       abstraction of 'independent copy of the IPsec stack'.
>>       CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
>>       and provides:
>>       - de-multiplexing stream of inbound packets to particular SAs and
>>         further IPsec related processing.
>>       - IPsec related processing for the outbound packets.
>>       - SA add/delete/update functionality
>> [Anoob]: Security Policy is an important aspect of IPsec. An IPsec
>> library without Security Policy API would be incomplete. For inline
>> protocol offload, the final SP-SA check(selector check) is the only
>> IPsec part being done by ipsec-secgw now. Would make sense to add that
>> also in the library.
>>
>> You mean here, that we need some sort of SPD implementation, correct?
>> [Anoob] Yes.
>>
>> Ok, I see.
>> Our thought was that just something based on librte_acl would be enough here...
>> But if you think that a special defined SPD API (and implementation) is needed -
>> we can probably discuss it along with SAD API (#2 above).
>> Though if you'd like to start to work on RFC for it right-away - please feel free to do so :)
>>
>>
>>
>>
>> Current RFC concentrates on SA-level API only (1),
>> detailed discussion for 2) and 3) will be subjects for separate RFC(s).
>>
>> SA (low) level API
>> ==================
>>
>> API described below operates on SA level.
>> It provides functionality that allows user for given SA to process
>> inbound and outbound IPsec packets.
>> To be more specific:
>> - for inbound ESP/AH packets perform decryption, authentication,
>>     integrity checking, remove ESP/AH related headers
>> [Anoob] Anti-replay check would also be required.
>>
>> Yep, anti-replay and ESN support is implied as part of "integrity checking".
>> Probably I have to be more specific here.
>> [Anoob] This is fine.
>>
>>
>>
>> - for outbound packets perform payload encryption, attach ICV,
>>     update/add IP headers, add ESP/AH headers/trailers,
>>     setup related mbuf felids (ol_flags, tx_offloads, etc.).
>> [Anoob] Do we have any plans to handle ESN expiry? Some means to
>> initiate an IKE renegotiation? I'm assuming application won't be aware
>> of the sequence numbers, in this case.
>> [Anoob] What is your plan with events like ESN expiry? IPsec spec talks about byte and time expiry as well.
>>
>> At current moment, for SA level: rte_ipsec_crypto_prepare()/rte_ipsec_inline_process() will set rte_errno
>> to special value (EOVERFLOW) to signal upper layer that limit is reached.
>> Upper layer can decide to start re-negotiation, or just destroy an SA.
>>
>> Future plans for IPsec Context (CTX) API (#3 above):
>> Introduce a special function, something like:
>> rte_ipsec_get_expired(rte_ipsec_ctx *ctx, rte_ipsec_sa *expired_sa[], uint32_t num);
>> It would return up-to *num* of SAs for given ipsec context, that are expired/limit reached.
>> Then upper layer again might decide for each SA should renegotiation be started,
>> or just wipe given SA.
>> It would be upper layer responsibility to call this function periodically.
>>
>>
>>
>> - initialize/un-initialize given SA based on user provided parameters.
>>
>> Processed inbound/outbound packets could be grouped by user provided
>> flow id (opaque 64-bit number associated by user with given SA).
>>
>> SA-level API is based on top of crypto-dev/security API and relies on them
>> to perform actual cipher and integrity checking.
>> Due to the nature of crypto-dev API (enqueue/deque model) we use
>> asynchronous API for IPsec packets destined to be processed
>> by crypto-device:
>> rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
>> rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
>> Though for packets destined for inline processing no extra overhead
>> is required and simple and synchronous API: rte_ipsec_inline_process()
>> is introduced for that case.
>> [Anoob] The API should include event-delivery as a crypto-op completion
>> mechanism as well. The application could configure the event crypto
>> adapter and then enqueue and dequeue to crypto device using events (via
>> event dev).
>>
>> Not sure what particular extra API you think is required here?
>> As I understand in both cases (with or without event crypto-adapter) we still have to:
>>   1) fill crypto-op properly
>>   2) enqueue it to crypto-dev (via eventdev or directly)
>> 3)  receive processed by crypto-dev crypto-op (either via eventdev or directly)
>> 4) check crypto-op status, do further post-processing if any
>>
>> So #1 and #4 (SA-level API respnibility) remain the same for both cases.
>> [Anoob] rte_ipsec_inline_process works on packets not events. We might need a similar API which processes events.
>>
>> Ok, I still don't get you here.
>> Could you specify what exactly function you'd like to add to the API here with parameter list
>> and brief behavior description?
>>
>>
>> The following functionality:
>>     - match inbound/outbound packets to particular SA
>>     - manage crypto/security devices
>>     - provide SAD/SPD related functionality
>>     - determine what crypto/security device has to be used
>>       for given packet(s)
>> is out of scope for SA-level API.
>>
>> Below is the brief (and simplified) overview of expected SA-level
>> API usage.
>>
>> /* allocate and initialize SA */
>> size_t sz = rte_ipsec_sa_size();
>> struct rte_ipsec_sa *sa = rte_malloc(sz);
>> struct rte_ipsec_sa_prm prm;
>> /* fill prm */
>> rc = rte_ipsec_sa_init(sa, &prm);
>> if (rc != 0) { /*handle error */}
>> .....
>>
>> /* process inbound/outbound IPsec packets that belongs to given SA */
>>
>> /* inline IPsec processing was done for these packets */
>> if (use_inline_ipsec)
>>          n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
>> /* use crypto-device to process the packets */
>> else {
>>        struct rte_crypto_op *cop[nb_pkts];
>>        struct rte_ipsec_group grp[nb_pkts];
>>
>>         ....
>>        /* prepare crypto ops */
>>        n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
>>        /* enqueue crypto ops to related crypto-dev */
>>        n =  rte_cryptodev_enqueue_burst(..., cops, n);
>>        if (n != nb_pkts) { /*handle failed packets */}
>>        /* dequeue finished crypto ops from related crypto-dev */
>>        n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
>>        /* finish IPsec processing for associated packets */
>>        n = rte_ipsec_crypto_process(cop, pkts, grp, n);
>> [Anoob] Does the SA based grouping apply to both inbound and outbound?
>>
>> Yes, the plan is to have it available for both cases.
>> [Anoob] On the inbound, shouldn't the packets be grouped+ordered based on inner L3+inner L4?
>>
>> I think that's up to the user decide based on what criteria wants to group it and does he wants
>> to do any grouping at all.
>> That's why flowid is user-defined and totally transparent to the lib.
>>
>>
>>
>>
>>        /* now we have <n> group of packets grouped by SA flow id  */
>>       ....
>>    }
>> ...
>>
>> /* uninit given SA */
>> rte_ipsec_sa_fini(sa);
>>
>> Planned scope for 18.11:
>> ========================
>>
>> - SA-level API definition
>> - ESP tunnel mode support (both IPv4/IPv6)
>> - Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
>> - UT
>> [Anoob] What is UT?
>>
>> Unit-Test
>>
>>
>> Note: Still WIP, so not all planned for 18.11 functionality is in place.
>>
>> Post 18.11:
>> ===========
>> - ESP transport mode support (both IPv4/IPv6)
>> - update examples/ipsec-secgw to use librte_ipsec
>> - SAD and high-level API definition and implementation
>>
>>
>> Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
>>    config/common_base                     |   5 +
>>    lib/Makefile                           |   2 +
>>    lib/librte_ipsec/Makefile              |  24 +
>>    lib/librte_ipsec/meson.build           |  10 +
>>    lib/librte_ipsec/pad.h                 |  45 ++
>>    lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
>>    lib/librte_ipsec/rte_ipsec_version.map |  13 +
>>    lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
>>    lib/librte_net/rte_esp.h               |  10 +-
>>    lib/meson.build                        |   2 +
>>    mk/rte.app.mk                          |   2 +
>>    11 files changed, 1278 insertions(+), 1 deletion(-)
>>    create mode 100644 lib/librte_ipsec/Makefile
>>    create mode 100644 lib/librte_ipsec/meson.build
>>    create mode 100644 lib/librte_ipsec/pad.h
>>    create mode 100644 lib/librte_ipsec/rte_ipsec.h
>>    create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
>>    create mode 100644 lib/librte_ipsec/sa.c
>> <snip>
>> +static inline uint16_t
>> +esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>> +       struct rte_crypto_op *cop[], uint16_t num)
>> +{
>> +       int32_t rc;
>> +       uint32_t i, n;
>> +       union sym_op_data icv;
>> +
>> +       n = esn_outb_check_sqn(sa, num);
>> +
>> +       for (i = 0; i != n; i++) {
>> +
>> +               sa->sqn++;
>> [Anoob] Shouldn't this be done atomically?
>>
>> If we want to have MT-safe API for SA-datapath API, then yes.
>> Though it would make things more complicated here, especially for inbound with anti-replay support.
>> I think it is doable (spin-lock?), but would cause extra overhead and complexity.
>> Right now I am not sure it really worth it - comments/suggestions are welcome.
>> What probably could be a good compromise - runtime decision per SA basis (at sa_init()),
>> do we need an ST or MT behavior for given SA.
>> [Anoob] Going with single thread approach would significantly limit the scope of this library. Single thread approach would mean
>> one SA on one core. This would not work on practical cases.
>> Suppose we have two flows which are supposed to use the same SA. With RSS, these flows could end up on different cores. Now
>> only one core would be able to process, as SA will not be shared. We have the same problem in ipsec-secgw too.
>>
>> Just for my curiosity - how do you plan to use RSS for ipsec packet distribution?
>> Do you foresee a common situation when there would be packets that belongs to the same SA
>> (same SPI) but with multiple source(destination) IP addresses?
>> If so, probably some examples would be helpful.
>> I think IPsec RFCs doesn't prevent such situation, but AFAIK the most common case - single source/destination IPs for the same SPI.
>>
>> sp ipv4 out esp protect 6 pri 1 dst 192.168.1.0/24 sport 0:65535 dport 0:65535
>> sp ipv4 out esp protect 6 pri 1 dst 192.168.2.0/24 sport 0:65535 dport 0:65535
>> sa out 6 cipher_algo aes-128-cbc cipher_key 22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff:00:11 auth_algo sha1-hmac auth_key
>> 22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff:00:11:22:33:44:55 mode ipv4-tunnel src 172.16.2.1 dst 172.16.1.1
>> Isn't this a valid configuration? Wouldn't this be a common use case when we have site-to-site tunneling?
>> https://tools.ietf.org/html/rfc4301#section-4.4.1.1
> Ok, I think I understand what was my confusion here - above you talked about using RSS to distribute incoming *outbound* traffic, correct?
> If so, then yes I think such scheme would work without problems.
> My original thought was that we are talking about inbound traffic distribution here - in that case standard RSS wouldn't help much.
Agreed. RSS won't be of much use in that case (inbound). But fat flow 
hitting one core would be a problem which we should solve. RSS will help 
in solving the same problem with outbound, to an extent.
>> Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU cores.
>> To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
>> I think it would require some extra synchronization:
>> make sure that we do final packet processing (seq check/update) at the same order as we received the packets
>> (packets entered ipsec processing).
>> I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
>> Though we plan CTX level API to support such scenario.
>> What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
>> concurrently.
>>
>> In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores. Considering
>> IPsec being CPU intensive, this would limit the net output of the chip.
>>
>> That's true - but from other side implementation can offload heavy part
>> (encrypt/decrypt, auth) to special HW (cryptodev).
>> In that case single core might be enough for SA and extra synchronization would just slowdown things.
>> That's why I think it should be configurable  what behavior (ST or MT) to use.
>> I do agree that these are the issues that we need to address to make the library MT safe. Whether the extra synchronization would
>> slow down things is a very subjective question and will heavily depend on the platform. The library should have enough provisions
>> to be able to support MT without causing overheads to ST. Right now, the library assumes ST.
> Ok, I suppose we both agree that we need ST and MT case supported.
> I didn't want to introduce MT related code right now (for 18.11), but as you guys seems very concerned about it,
> we will try to add MT related stuff into v1, so you can review it at early stages.
> Konstantin
Glad that we are on the same page. As you had pointed out, MT is not 
just about adding some locks. It's more complicated than that. And again 
for solving it, there could be multiple ways. Using locks and software 
queues etc would definitely be one way. But that's not the only 
solution. As Jerin had pointed out, there are parts in IPsec flow which 
can be offloaded to hardware.

http://mails.dpdk.org/archives/dev/2018-September/111770.html

Can you share your thoughts on the above approach?

Anoob
  
Ananyev, Konstantin Sept. 17, 2018, 6:12 p.m. UTC | #9
Hi Jerin,


> >
> > >
> > > Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU
> cores.
> > > To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> > > I think it would require some extra synchronization:
> > > make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> > > (packets entered ipsec processing).
> > > I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> > > Though we plan CTX level API to support such scenario.
> > > What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
> concurrently.
> > >
> > > > In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores.
> Considering
> > > > IPsec being CPU intensive, this would limit the net output of the chip.
> > > That's true - but from other side implementation can offload heavy part
> > > (encrypt/decrypt, auth) to special HW (cryptodev).
> > > In that case single core might be enough for SA and extra synchronization would just slowdown things.
> > > That's why I think it should be configurable  what behavior (ST or MT) to use.
> > I do agree that these are the issues that we need to address to make the
> > library MT safe. Whether the extra synchronization would slow down things is
> > a very subjective question and will heavily depend on the platform. The
> > library should have enough provisions to be able to support MT without
> > causing overheads to ST. Right now, the library assumes ST.
> 
> 
> I agree with Anoob here.
> 
> I have two concerns with librte_ipsec as a separate library
> 
> 1) There is an overlap with rte_security and new proposed library.

I don't think there really is an overlap.
rte_security is a 'framework for management and provisioning of security protocol operations offloaded to hardware based devices'.
While rte_ipsec is aimed to be a library for IPsec data-path processing.
There is no plans for rte_ipsec to 'obsolete' rte_security.
Quite opposite rte_ipsec supposed to work with both rte_cryptodev and rte_security APIs (devices).
It is possible to have an SA that would use both crypto and  security devices.
Or to have an SA that would use multiple crypto devs
(though right now it is up the user level to do load-balancing logic). 

> For IPsec, If an application needs to use rte_security for HW
> implementation and and application needs to use librte_ipsec for
>  SW implementation then it is bad and a lot duplication of work on
> he slow path too.

The plan is that application would need to use just rte_ipsec API for all data-paths
(HW/SW, lookaside/inline). 
Let say right now there is rte_ipsec_inline_process() function if user
prefers to use inline security device to process given group packets,
and rte_ipsec_crypto_process(/prepare) if user decides to use
lookaside security or simple crypto device for it.

> 
> The rte_security spec can support both inline and look-aside IPSec
> protocol support.

AFAIK right now rte_security just provides API to create/free/manipulate security sessions.
I don't see how it can support all the functionality mentioned above,
plus SAD and SPD.

> 
> 2) This library is tuned for fat CPU core in mind like single SA on core
> etc. Which is fine for x86 servers and arm64 server category of machines
> but it does not work very well with NPU class of SoC or FPGA.
> 
> As there  are the different ways to implement the IPSec, For instance,
> use of eventdev can help in situation for handling millions of SA and
> equence number of update and anti reply check can be done by leveraging
> some of the HW specific features like
> ORDERED, ATOMIC schedule type(mapped as eventdev feature)in HW with PIPELINE model.
> 
> # Issues with having one SA one core,
> - In the outbound side, there could be multiple flows using the same SA.
>   Multiple flows could be processed parallel on different lcores,
> but tying one SA to one core would mean we won't be able to do that.
> 
> - In the inbound side, we will have a fat flow hitting one core. If
>   IPsec library assumes single core, we will not be able to to spread
> fat flow to multiple cores. And one SA-one core would mean all ports on
> which we would expect IPsec traffic has to be handled by that core.

I suppose that all refers to the discussion about MT safe API for rte_ipsec, right?
If so, then as I said in my reply to Anoob: 
We will try to make API usable in MT environment for v1,
so you can review and provide comments at early stages.

> 
> I have made a simple presentation. This presentation details ONE WAY to
> implement the IPSec with HW support on NPU.
> 
> https://docs.google.com/presentation/d/1e3IDf9R7ZQB8FN16Nvu7KINuLSWMdyKEw8_0H05rjj4/edit?usp=sharing
> 

Thanks, quite helpful.
Actually from page 3, it looks like your expectations don't contradict in general with proposed API:

...
} else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
                        sa = ev.flow_queue_id;                                  
                        /* do critical section work per sa */               
                        do_critical_section_work(sa);  

[KA] that's the place where I expect either 
rte_ipsec_inline_process(sa, ...); OR rte_ipsec_crypto_prepare(sa, ...);      	
would be called.
                                                                               
                     /* Issue the crypto request and generate the following on crypto work completion */
[KA] that's the place where I expect rte_ipsec_crypto_process(...) be invoked.

                        ev.flow_queue_id = tx_port;                                  
                        ev.sub_event_id = tx_queue_id;            
                        ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;                 
                        rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);                
                }


> I am not saying this should be the ONLY way to do as it does not work
> very well with non NPU/FPGA class of SoC.
> 
> So how about making the proposed IPSec library as plugin/driver to
> rte_security.

As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
is the best possible approach.
Though I probably understand your concern:
In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
Though there are devices where most of prepare/process can be done in HW
(RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
Is that so?
To address that issue I suppose we can do:
1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
    security devices into ipsec.
    We planned to do it anyway, just don't have it done yet.
2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
    and add into rte_security_ops   new functions: 
    uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
    uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
    uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
    So for custom HW, PMD can overwrite normal prepare/process behavior.

> This would give flexibly for each vendor/platform choose to different
> IPse implementation based on HW support WITHOUT CHANGING THE APPLICATION
> INTERFACE.

Not sure what API changes you are referring to?
As I am aware we do introduce new API, but all existing APIs remain in place.

> 
> IMO, rte_security IPsec look aside support can be simply added by
> creating the virtual crypto device(i.e move the proposed code to the virtual crypto device)
> likewise inline support
> can be added by the virtual ethdev device.

That's probably possible and if someone would like to introduce such abstraction - NP in general
(though my suspicion - it might be too heavy to be really useful).
Though I don't think it should be the only possible way for the user to enable IPsec data-processing inside his app.
Again I guess such virtual-dev will still use rte_ipsec inside.

> This would avoid the need for
> updating ipsec-gw application as well i.e unified interface to application.

I think - it would  really good to simplify existing ipsec-secgw sample app.
Some parts of it seems unnecessary complex to me.
One of the reasons for it -  we don't really have an unified (and transparent) API for ipsec data-path.
Let's look at ipsec_enqueue() and related code (examples/ipsec-secgw/ipsec.c:365)
It is huge (and ugly) -  user has to handle dozen different cases just to enqueue packet for IPsec processing.
One of the aims of rte_ipsec library - hide all that complexities inside the library and provide to
the upper layer clean and transparent API.

> 
> If you don't like the above idea, any scheme of plugin based
> implementation would be fine so that vendor or platform can choose its own implementation.
> It can be based on partial HW implement too. i.e SA look can be used in SW, remaining stuff in HW
> (for example IPsec inline case)

I am surely ok with the idea to give vendors an ability to customize implementation
and enable their HW capabilities.
Do you think proposed additions to the rte_security would be  enough,
or something extra is needed?

Konstantin


> 
> # For protocols like UDP, it makes sense to create librte_udp as there
> no much HW specific offload other than ethdev provides.
> 
> # PDCP could be another library to offload to HW, So talking
> rte_security path makes more sense in that case too.
> 
> Jerin
  
Ananyev, Konstantin Sept. 18, 2018, 12:42 p.m. UTC | #10
> > I am not saying this should be the ONLY way to do as it does not work
> > very well with non NPU/FPGA class of SoC.
> >
> > So how about making the proposed IPSec library as plugin/driver to
> > rte_security.
> 
> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> is the best possible approach.
> Though I probably understand your concern:
> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
> Though there are devices where most of prepare/process can be done in HW
> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> Is that so?
> To address that issue I suppose we can do:
> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>     security devices into ipsec.
>     We planned to do it anyway, just don't have it done yet.
> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
>     and add into rte_security_ops   new functions:
>     uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>     uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>     uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>     So for custom HW, PMD can overwrite normal prepare/process behavior.
> 

Actually  after another thought: 
My previous assumption (probably wrong one) was that for both
RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now)
I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.  
Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
(HW/SW roses/responsibilities)?
About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that capability.
So my question is there any devices/drivers that do support it?
If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
Konstantin
  
Jerin Jacob Sept. 18, 2018, 5:54 p.m. UTC | #11
-----Original Message-----
> Date: Mon, 17 Sep 2018 18:12:48 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, "Joseph, Anoob"
>  <Anoob.Joseph@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, "Awal, Mohammad Abdul"
>  <mohammad.abdul.awal@intel.com>, "Doherty, Declan"
>  <declan.doherty@intel.com>, Narayana Prasad
>  <narayanaprasad.athreya@caviumnetworks.com>
> Subject: RE: [dpdk-dev] [RFC] ipsec: new library for IPsec data-path
>  processing
> 
> 
> Hi Jerin,


Hi Konstantin,

> 
> 
> > >
> > > >
> > > > Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU
> > cores.
> > > > To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> > > > I think it would require some extra synchronization:
> > > > make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> > > > (packets entered ipsec processing).
> > > > I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> > > > Though we plan CTX level API to support such scenario.
> > > > What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
> > concurrently.
> > > >
> > > > > In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores.
> > Considering
> > > > > IPsec being CPU intensive, this would limit the net output of the chip.
> > > > That's true - but from other side implementation can offload heavy part
> > > > (encrypt/decrypt, auth) to special HW (cryptodev).
> > > > In that case single core might be enough for SA and extra synchronization would just slowdown things.
> > > > That's why I think it should be configurable  what behavior (ST or MT) to use.
> > > I do agree that these are the issues that we need to address to make the
> > > library MT safe. Whether the extra synchronization would slow down things is
> > > a very subjective question and will heavily depend on the platform. The
> > > library should have enough provisions to be able to support MT without
> > > causing overheads to ST. Right now, the library assumes ST.
> >
> >
> > I agree with Anoob here.
> >
> > I have two concerns with librte_ipsec as a separate library
> >
> > 1) There is an overlap with rte_security and new proposed library.
> 
> I don't think there really is an overlap.

As mentioned in your other email. IMO, There is an overlap as
RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL can support almost everything
in HW or HW + SW if some PMD wishes to do so. 

Answering some of the questions, you have asked in other thread based on
my understanding.

Regarding RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL support,
Marvell/Cavium CPT hardware on next generation HW(Planning to upstream
around v19.02) can support RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and
Anoob already pushed the application changes in ipsec-gw.

In our understanding of HW/SW roles/responsibilities for that type of
devices are:

INLINE_PROTOCOL
----------------
In control path, security session is created with the given SA and
rte_flow configuration etc.
 
For outbound traffic, the application will have to do SA lookup and
identify the security action (inline/look aside crypto/protocol). For
packets identified for inline protocol processing, the application would
submit as plain packets to the ethernet device and the security capable
ethernet device would perform IPSec and send out the packet. For PMDs
which would need extra metadata (capability flag), set_pkt_metadata
function pointer would be called (from application). This can be used to
set some per packet field to identify the security session to be used to
process the packet. Sequence number updation will be done by the PMD.
For inbound traffic, the packets for IPSec would be identified by using
rte_flow (hardware accelerated packet filtering). For the packets
identified for inline offload (SECURITY action), hardware would perform
the processing. For inline protocol processed IPSec packets, PMD would
set “user data” so that application can get the details of the security
processing done on the packet. Once the plain packet (after IPSec
processing) is received, a selector check need to be performed to make
sure we have a valid packet after IPSec processing. The user data is used
for that. Anti-replay check is handled by the PMD. The PMD would raise
an eth event in case of sequence number expiry or any SA expiry.


LOOKASIDE_PROTOCOL
------------------
In control path, security session is created with the given SA.
 
Enqueue/dequeue is similar to what is done for regular crypto
(RTE_SECURITY_ACTION_TYPE_NONE) but all the protocol related processing
would be offloaded. Application will need to do SA lookup and identify
the processing to be done (both in case of outbound & inbound), and
submit packet to crypto device. Application need not do any IPSec
related transformations other than the lookup. Anti-replay need to be
handled in the PMD (the spec says the device “may be handled” do anti-replay check,
but a complete protocol offload would need anti-replay check also).


> rte_security is a 'framework for management and provisioning of security protocol operations offloaded to hardware based devices'.
> While rte_ipsec is aimed to be a library for IPsec data-path processing.
> There is no plans for rte_ipsec to 'obsolete' rte_security.
> Quite opposite rte_ipsec supposed to work with both rte_cryptodev and rte_security APIs (devices).
> It is possible to have an SA that would use both crypto and  security devices.
> Or to have an SA that would use multiple crypto devs
> (though right now it is up the user level to do load-balancing logic).
> 
> > For IPsec, If an application needs to use rte_security for HW
> > implementation and and application needs to use librte_ipsec for
> >  SW implementation then it is bad and a lot duplication of work on
> > he slow path too.
> 
> The plan is that application would need to use just rte_ipsec API for all data-paths
> (HW/SW, lookaside/inline).
> Let say right now there is rte_ipsec_inline_process() function if user
> prefers to use inline security device to process given group packets,
> and rte_ipsec_crypto_process(/prepare) if user decides to use
> lookaside security or simple crypto device for it.
> 
> >
> > The rte_security spec can support both inline and look-aside IPSec
> > protocol support.
> 
> AFAIK right now rte_security just provides API to create/free/manipulate security sessions.
> I don't see how it can support all the functionality mentioned above,
> plus SAD and SPD.


At least for INLINE_PROTOCOL case SA lookup for inbound traffic does by
HW.

> 
> >
> > 2) This library is tuned for fat CPU core in mind like single SA on core
> > etc. Which is fine for x86 servers and arm64 server category of machines
> > but it does not work very well with NPU class of SoC or FPGA.
> >
> > As there  are the different ways to implement the IPSec, For instance,
> > use of eventdev can help in situation for handling millions of SA and
> > equence number of update and anti reply check can be done by leveraging
> > some of the HW specific features like
> > ORDERED, ATOMIC schedule type(mapped as eventdev feature)in HW with PIPELINE model.
> >
> > # Issues with having one SA one core,
> > - In the outbound side, there could be multiple flows using the same SA.
> >   Multiple flows could be processed parallel on different lcores,
> > but tying one SA to one core would mean we won't be able to do that.
> >
> > - In the inbound side, we will have a fat flow hitting one core. If
> >   IPsec library assumes single core, we will not be able to to spread
> > fat flow to multiple cores. And one SA-one core would mean all ports on
> > which we would expect IPsec traffic has to be handled by that core.
> 
> I suppose that all refers to the discussion about MT safe API for rte_ipsec, right?
> If so, then as I said in my reply to Anoob:
> We will try to make API usable in MT environment for v1,
> so you can review and provide comments at early stages.

OK

> 
> >
> > I have made a simple presentation. This presentation details ONE WAY to
> > implement the IPSec with HW support on NPU.
> >
> > https://docs.google.com/presentation/d/1e3IDf9R7ZQB8FN16Nvu7KINuLSWMdyKEw8_0H05rjj4/edit?usp=sharing
> >
> 
> Thanks, quite helpful.
> Actually from page 3, it looks like your expectations don't contradict in general with proposed API:
> 
> ...
> } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
>                         sa = ev.flow_queue_id;
>                         /* do critical section work per sa */
>                         do_critical_section_work(sa);
> 
> [KA] that's the place where I expect either
> rte_ipsec_inline_process(sa, ...); OR rte_ipsec_crypto_prepare(sa, ...);
> would be called.

Makes sense. But currently, the library defines what is
rte_ipsec_inline_process() and rte_ipsec_crypto_prepare(), but it should
be based on underneath security device or crypto device.

So, IMO for better control, these functions should be the function pointer
based and based on underlying device, library can fill the
implementation.

IMO, it is not possible to create "static inline function" with all "if"
checks. I think, we can have four ipsec functions with function pointer
scheme.

rte_ipsec_inbound_prepare()
rte_ipsec_inbound_process()
rte_ipsec_outbound_prepare()
rte_ipsec_outbound_process()

Some of the other concerns:
1) For HW implementation, rte_ipsec_sa needs to opaque like rte_security
as some of the structure defined by HW or Microcode. We can choose
absolute generic items as common and device/rte_security specific can be opaque.

2)I think, in order to accommodate the event drivern model. We need to pass
void ** in prepare() and process() function with an additional argument
of type(TYPE_EVENT/TYPE_MBUF) can be passed to detect packet object
type as some of the functions in prepare() and process() may need
rte_event to operate on.

> 
>                      /* Issue the crypto request and generate the following on crypto work completion */
> [KA] that's the place where I expect rte_ipsec_crypto_process(...) be invoked.
> 
>                         ev.flow_queue_id = tx_port;
>                         ev.sub_event_id = tx_queue_id;
>                         ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
>                         rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);
>                 }
> 
> 
> > I am not saying this should be the ONLY way to do as it does not work
> > very well with non NPU/FPGA class of SoC.
> >
> > So how about making the proposed IPSec library as plugin/driver to
> > rte_security.
> 
> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> is the best possible approach.
> Though I probably understand your concern:
> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
> Though there are devices where most of prepare/process can be done in HW
> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> Is that so?
> To address that issue I suppose we can do:
> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>     security devices into ipsec.
>     We planned to do it anyway, just don't have it done yet.
> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM

The problem is, CUSTOM may have different variants and "if" conditions won't
scale if we choose to have non function pointer scheme. Otherwise, it
looks OK to create new SECURITY TYPE and associated plugin for prepare() and process()
function in librte_ipsec library.


>     and add into rte_security_ops   new functions:
>     uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>     uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>     uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>     So for custom HW, PMD can overwrite normal prepare/process behavior.
> 
> > This would give flexibly for each vendor/platform choose to different
> > IPse implementation based on HW support WITHOUT CHANGING THE APPLICATION
> > INTERFACE.
> 
> Not sure what API changes you are referring to?
> As I am aware we do introduce new API, but all existing APIs remain in place.


What I meant was, Single application programming interface to enable IPSec processing to
application.


> 
> >
> > IMO, rte_security IPsec look aside support can be simply added by
> > creating the virtual crypto device(i.e move the proposed code to the virtual crypto device)
> > likewise inline support
> > can be added by the virtual ethdev device.
> 
> That's probably possible and if someone would like to introduce such abstraction - NP in general
> (though my suspicion - it might be too heavy to be really useful).
> Though I don't think it should be the only possible way for the user to enable IPsec data-processing inside his app.
> Again I guess such virtual-dev will still use rte_ipsec inside.

I don't have strong opinion on virtual devices VS function pointer based
prepare() and process() function in librte_ipsec library.

> 
> > This would avoid the need for
> > updating ipsec-gw application as well i.e unified interface to application.
> 
> I think - it would  really good to simplify existing ipsec-secgw sample app.
> Some parts of it seems unnecessary complex to me.
> One of the reasons for it -  we don't really have an unified (and transparent) API for ipsec data-path.
> Let's look at ipsec_enqueue() and related code (examples/ipsec-secgw/ipsec.c:365)
> It is huge (and ugly) -  user has to handle dozen different cases just to enqueue packet for IPsec processing.
> One of the aims of rte_ipsec library - hide all that complexities inside the library and provide to
> the upper layer clean and transparent API.
> 
> >
> > If you don't like the above idea, any scheme of plugin based
> > implementation would be fine so that vendor or platform can choose its own implementation.
> > It can be based on partial HW implement too. i.e SA look can be used in SW, remaining stuff in HW
> > (for example IPsec inline case)
> 
> I am surely ok with the idea to give vendors an ability to customize implementation
> and enable their HW capabilities.

I think, We are on the same page, just that the fine details of "framework"
for customizing implementation based on their HW capabilities need to
iron out.

> Do you think proposed additions to the rte_security would be  enough,
> or something extra is needed?

See above.

Jerin

> 
> Konstantin
> 
> 
> >
> > # For protocols like UDP, it makes sense to create librte_udp as there
> > no much HW specific offload other than ethdev provides.
> >
> > # PDCP could be another library to offload to HW, So talking
> > rte_security path makes more sense in that case too.
> >
> > Jerin
  
Akhil Goyal Sept. 20, 2018, 2:26 p.m. UTC | #12
Hi Konstantin,

On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote:
>>> I am not saying this should be the ONLY way to do as it does not work
>>> very well with non NPU/FPGA class of SoC.
>>>
>>> So how about making the proposed IPSec library as plugin/driver to
>>> rte_security.
>> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
>> is the best possible approach.
>> Though I probably understand your concern:
>> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
>> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
>> Though there are devices where most of prepare/process can be done in HW
>> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
>> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
>> Is that so?
>> To address that issue I suppose we can do:
>> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>>      security devices into ipsec.
>>      We planned to do it anyway, just don't have it done yet.
>> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
>>      and add into rte_security_ops   new functions:
>>      uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>>      uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>>      uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>>      So for custom HW, PMD can overwrite normal prepare/process behavior.
>>
> Actually  after another thought:
> My previous assumption (probably wrong one) was that for both
> RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
> Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now)
> I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.
> Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
> (HW/SW roses/responsibilities)?
> About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that capability.
> So my question is there any devices/drivers that do support it?
> If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
> Konstantin
>
>
In case of LOOKASIDE, the protocol errors like antireplay and sequence 
number overflow shall be the responsibility of either PMD or the HW.
It should notify the application that the error has occurred and 
application need to decide what it needs to decide next.

As Jerin said in other email, the roles/responsibility of the PMD in 
case of inline proto and lookaside case, nothing much is required from 
the application to do any processing for ipsec.

As per my understanding, the proposed RFC is to make the application 
code cleaner for  the protocol processing.
1. For inline proto and lookaside there won't be any change in the data 
path. The main changes would be in the control path.

2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the 
protocol processing will be done in the library and there would be 
changes in both control and data path.

As the rte_security currently provide generic APIs for control path only 
and we may have it expanded for protocol specific datapath processing.
So for the application, working with inline crypto/ inline proto would 
be quite similar and it won't need to do some extra processing for 
inline crypto.
Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside.

We may have the protocol specific APIs reside inside the rte_security 
and we can use either the crypto/net PMD underneath it.

Moving the SPD lookup inside the ipsec library may not be beneficial in 
terms of performance as well as configurability for the application. It 
would just be based on the rss hash.

Please let me know if my understanding is not correct anywhere.

-Akhil
  
Ananyev, Konstantin Sept. 24, 2018, 8:45 a.m. UTC | #13
Hi Jerin,

> > > > >
> > > > > Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU
> > > cores.
> > > > > To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> > > > > I think it would require some extra synchronization:
> > > > > make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> > > > > (packets entered ipsec processing).
> > > > > I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> > > > > Though we plan CTX level API to support such scenario.
> > > > > What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
> > > concurrently.
> > > > >
> > > > > > In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores.
> > > Considering
> > > > > > IPsec being CPU intensive, this would limit the net output of the chip.
> > > > > That's true - but from other side implementation can offload heavy part
> > > > > (encrypt/decrypt, auth) to special HW (cryptodev).
> > > > > In that case single core might be enough for SA and extra synchronization would just slowdown things.
> > > > > That's why I think it should be configurable  what behavior (ST or MT) to use.
> > > > I do agree that these are the issues that we need to address to make the
> > > > library MT safe. Whether the extra synchronization would slow down things is
> > > > a very subjective question and will heavily depend on the platform. The
> > > > library should have enough provisions to be able to support MT without
> > > > causing overheads to ST. Right now, the library assumes ST.
> > >
> > >
> > > I agree with Anoob here.
> > >
> > > I have two concerns with librte_ipsec as a separate library
> > >
> > > 1) There is an overlap with rte_security and new proposed library.
> >
> > I don't think there really is an overlap.
> 
> As mentioned in your other email. IMO, There is an overlap as
> RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL can support almost everything
> in HW or HW + SW if some PMD wishes to do so.
> 
> Answering some of the questions, you have asked in other thread based on
> my understanding.
> 
> Regarding RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL support,
> Marvell/Cavium CPT hardware on next generation HW(Planning to upstream
> around v19.02) can support RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and
> Anoob already pushed the application changes in ipsec-gw.

Ok good to know.

> 
> In our understanding of HW/SW roles/responsibilities for that type of
> devices are:
> 
> INLINE_PROTOCOL
> ----------------
> In control path, security session is created with the given SA and
> rte_flow configuration etc.
> 
> For outbound traffic, the application will have to do SA lookup and
> identify the security action (inline/look aside crypto/protocol). For
> packets identified for inline protocol processing, the application would
> submit as plain packets to the ethernet device and the security capable
> ethernet device would perform IPSec and send out the packet. For PMDs
> which would need extra metadata (capability flag), set_pkt_metadata
> function pointer would be called (from application).
> This can be used to set some per packet field to identify the security session to be used to
> process the packet.

Yes, as I can see, that's what ipsec-gw is doing right now and it wouldn't be
a problem to do the same in ipsec lib.  

> Sequence number updation will be done by the PMD.

Ok, so for INLINE_PROTOCOL upper layer wouldn't need to keep track for SQN values at all?
You don’t' consider a possibility that by some reason that SA would need to
be moved from device that support INLINE_PROTOCOL to the device that doesn't?  

> For inbound traffic, the packets for IPSec would be identified by using
> rte_flow (hardware accelerated packet filtering). For the packets
> identified for inline offload (SECURITY action), hardware would perform
> the processing. For inline protocol processed IPSec packets, PMD would
> set “user data” so that application can get the details of the security
> processing done on the packet. Once the plain packet (after IPSec
> processing) is received, a selector check need to be performed to make
> sure we have a valid packet after IPSec processing. The user data is used
> for that. Anti-replay check is handled by the PMD. The PMD would raise
> an eth event in case of sequence number expiry or any SA expiry.

Few questions here:
1) if I understand things right - to specify that it was an IPsec packet -
PKT_RX_SEC_OFFLOAD will be set in mbuf ol_flags?
2) Basically 'userdata' will contain just a user provided at rte_security_session_create pointer
(most likely pointer to the SA, as it is done right now in ipsec-secgw), correct?
3) in current rte_security API si there a way to get/set replay window size, etc?
4)   Same question as for TX: you don't plan to support fallback to other type of devices/SW?
I.E. HW was not able to process ipsec packet by some reason (let say fragmented packet)
and now it is SW responsibility to do so?
The reason I am asking for that - it seems right now there is no defined way
to share SQN related information between HW/PMD and upper layer SW.
Is that ok, or would we need such capability?
If we would, and upper layer SW would need to keep track on SQN anyway,
then there is probably no point to do same thing in PMD itelf?
In that case PMD just need to provide SQN information to the upper layer 
(probably one easy way to do it - reuse rte_,buf.seqn for that purpose,
though for that will probably need make it 64-bit long).

> 
> 
> LOOKASIDE_PROTOCOL
> ------------------
> In control path, security session is created with the given SA.
> 
> Enqueue/dequeue is similar to what is done for regular crypto
> (RTE_SECURITY_ACTION_TYPE_NONE) but all the protocol related processing
> would be offloaded. Application will need to do SA lookup and identify
> the processing to be done (both in case of outbound & inbound), and
> submit packet to crypto device. Application need not do any IPSec
> related transformations other than the lookup. Anti-replay need to be
> handled in the PMD (the spec says the device “may be handled” do anti-replay check,
> but a complete protocol offload would need anti-replay check also).

Same question here - wouldn't there be a situations when HW/PMD would need to
share SQN information with upper layer?
Let say if upper layer SW would need to do load balancing between crypto-devices 
with LOOKASIDE_PROTOCOL and without?

> 
> 
> > rte_security is a 'framework for management and provisioning of security protocol operations offloaded to hardware based devices'.
> > While rte_ipsec is aimed to be a library for IPsec data-path processing.
> > There is no plans for rte_ipsec to 'obsolete' rte_security.
> > Quite opposite rte_ipsec supposed to work with both rte_cryptodev and rte_security APIs (devices).
> > It is possible to have an SA that would use both crypto and  security devices.
> > Or to have an SA that would use multiple crypto devs
> > (though right now it is up the user level to do load-balancing logic).
> >
> > > For IPsec, If an application needs to use rte_security for HW
> > > implementation and and application needs to use librte_ipsec for
> > >  SW implementation then it is bad and a lot duplication of work on
> > > he slow path too.
> >
> > The plan is that application would need to use just rte_ipsec API for all data-paths
> > (HW/SW, lookaside/inline).
> > Let say right now there is rte_ipsec_inline_process() function if user
> > prefers to use inline security device to process given group packets,
> > and rte_ipsec_crypto_process(/prepare) if user decides to use
> > lookaside security or simple crypto device for it.
> >
> > >
> > > The rte_security spec can support both inline and look-aside IPSec
> > > protocol support.
> >
> > AFAIK right now rte_security just provides API to create/free/manipulate security sessions.
> > I don't see how it can support all the functionality mentioned above,
> > plus SAD and SPD.
> 
> 
> At least for INLINE_PROTOCOL case SA lookup for inbound traffic does by
> HW.

For inbound yes, for outbound I suppose you still would need to do a lookup in SW.

> 
> >
> > >
> > > 2) This library is tuned for fat CPU core in mind like single SA on core
> > > etc. Which is fine for x86 servers and arm64 server category of machines
> > > but it does not work very well with NPU class of SoC or FPGA.
> > >
> > > As there  are the different ways to implement the IPSec, For instance,
> > > use of eventdev can help in situation for handling millions of SA and
> > > equence number of update and anti reply check can be done by leveraging
> > > some of the HW specific features like
> > > ORDERED, ATOMIC schedule type(mapped as eventdev feature)in HW with PIPELINE model.
> > >
> > > # Issues with having one SA one core,
> > > - In the outbound side, there could be multiple flows using the same SA.
> > >   Multiple flows could be processed parallel on different lcores,
> > > but tying one SA to one core would mean we won't be able to do that.
> > >
> > > - In the inbound side, we will have a fat flow hitting one core. If
> > >   IPsec library assumes single core, we will not be able to to spread
> > > fat flow to multiple cores. And one SA-one core would mean all ports on
> > > which we would expect IPsec traffic has to be handled by that core.
> >
> > I suppose that all refers to the discussion about MT safe API for rte_ipsec, right?
> > If so, then as I said in my reply to Anoob:
> > We will try to make API usable in MT environment for v1,
> > so you can review and provide comments at early stages.
> 
> OK
> 
> >
> > >
> > > I have made a simple presentation. This presentation details ONE WAY to
> > > implement the IPSec with HW support on NPU.
> > >
> > > https://docs.google.com/presentation/d/1e3IDf9R7ZQB8FN16Nvu7KINuLSWMdyKEw8_0H05rjj4/edit?usp=sharing
> > >
> >
> > Thanks, quite helpful.
> > Actually from page 3, it looks like your expectations don't contradict in general with proposed API:
> >
> > ...
> > } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
> >                         sa = ev.flow_queue_id;
> >                         /* do critical section work per sa */
> >                         do_critical_section_work(sa);
> >
> > [KA] that's the place where I expect either
> > rte_ipsec_inline_process(sa, ...); OR rte_ipsec_crypto_prepare(sa, ...);
> > would be called.
> 
> Makes sense. But currently, the library defines what is
> rte_ipsec_inline_process() and rte_ipsec_crypto_prepare(), but it should
> be based on underneath security device or crypto device.

Reason for that - their code-paths are quite different:
for inline devices we can do whole processing synchronously(within process() function),
while fro crypto it is sort of split into tw parts -
we first have to do prepare();enqueue() them to crypto-dev, and then dequeue();process().
Another good thing with that way - it allows the same SA to work with different devices.
 
> 
> So, IMO for better control, these functions should be the function pointer
> based and based on underlying device, library can fill the
> implementation.
> 
> IMO, it is not possible to create "static inline function" with all "if"
> checks. I think, we can have four ipsec functions with function pointer
> scheme.
> 
> rte_ipsec_inbound_prepare()
> rte_ipsec_inbound_process()
> rte_ipsec_outbound_prepare()
> rte_ipsec_outbound_process()
> 
> Some of the other concerns:
> 1) For HW implementation, rte_ipsec_sa needs to opaque like rte_security
> as some of the structure defined by HW or Microcode. We can choose
> absolute generic items as common and device/rte_security specific can be opaque.

I don't think it would be a problem, rte_ipsec_sa  does contain a pointer to
rte_security_session, so it can provide it as an argument to these functions.
 
> 
> 2)I think, in order to accommodate the event drivern model. We need to pass
> void ** in prepare() and process() function with an additional argument
> of type(TYPE_EVENT/TYPE_MBUF) can be passed to detect packet object
> type as some of the functions in prepare() and process() may need
> rte_event to operate on.

You are talking here about security device specific functions described below, correct?

> 
> >
> >                      /* Issue the crypto request and generate the following on crypto work completion */
> > [KA] that's the place where I expect rte_ipsec_crypto_process(...) be invoked.
> >
> >                         ev.flow_queue_id = tx_port;
> >                         ev.sub_event_id = tx_queue_id;
> >                         ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
> >                         rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);
> >                 }
> >
> >
> > > I am not saying this should be the ONLY way to do as it does not work
> > > very well with non NPU/FPGA class of SoC.
> > >
> > > So how about making the proposed IPSec library as plugin/driver to
> > > rte_security.
> >
> > As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> > is the best possible approach.
> > Though I probably understand your concern:
> > In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> > i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
> > Though there are devices where most of prepare/process can be done in HW
> > (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> > plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> > Is that so?
> > To address that issue I suppose we can do:
> > 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >     security devices into ipsec.
> >     We planned to do it anyway, just don't have it done yet.
> > 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
> 
> The problem is, CUSTOM may have different variants and "if" conditions won't
> scale if we choose to have non function pointer scheme. Otherwise, it
> looks OK to create new SECURITY TYPE and associated plugin for prepare() and process()
> function in librte_ipsec library.

In principle, I don't mind to always use function pointers for prepare()/process(), but:
from your description above of INLINE_PROTOCOL and LOOKASIDE_PROTOCOL
the process()/prepare() for such devices looks well defined and
straightforward to implement.
Not sure we'll need a function pointer for such simple and lightweight case:
set/check ol_flags, set/read userdata value.
I think extra function call here is kind of overkill and will only slowdown things.
But if that would be majority preference - I wouldn't argue.
BTW if we'll agree to always use function pointers for process/prepare,
then there is no point to have that all existing action types -
all we need is an indication is it inline or lookaside device and
function pointers for prepare/process().

Konstantin

> 
> 
> >     and add into rte_security_ops   new functions:
> >     uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> >     uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> >     uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> >     So for custom HW, PMD can overwrite normal prepare/process behavior.
> >
> > > This would give flexibly for each vendor/platform choose to different
> > > IPse implementation based on HW support WITHOUT CHANGING THE APPLICATION
> > > INTERFACE.
> >
> > Not sure what API changes you are referring to?
> > As I am aware we do introduce new API, but all existing APIs remain in place.
> 
> 
> What I meant was, Single application programming interface to enable IPSec processing to
> application.
> 
> 
> >
> > >
> > > IMO, rte_security IPsec look aside support can be simply added by
> > > creating the virtual crypto device(i.e move the proposed code to the virtual crypto device)
> > > likewise inline support
> > > can be added by the virtual ethdev device.
> >
> > That's probably possible and if someone would like to introduce such abstraction - NP in general
> > (though my suspicion - it might be too heavy to be really useful).
> > Though I don't think it should be the only possible way for the user to enable IPsec data-processing inside his app.
> > Again I guess such virtual-dev will still use rte_ipsec inside.
> 
> I don't have strong opinion on virtual devices VS function pointer based
> prepare() and process() function in librte_ipsec library.
> 
> >
> > > This would avoid the need for
> > > updating ipsec-gw application as well i.e unified interface to application.
> >
> > I think - it would  really good to simplify existing ipsec-secgw sample app.
> > Some parts of it seems unnecessary complex to me.
> > One of the reasons for it -  we don't really have an unified (and transparent) API for ipsec data-path.
> > Let's look at ipsec_enqueue() and related code (examples/ipsec-secgw/ipsec.c:365)
> > It is huge (and ugly) -  user has to handle dozen different cases just to enqueue packet for IPsec processing.
> > One of the aims of rte_ipsec library - hide all that complexities inside the library and provide to
> > the upper layer clean and transparent API.
> >
> > >
> > > If you don't like the above idea, any scheme of plugin based
> > > implementation would be fine so that vendor or platform can choose its own implementation.
> > > It can be based on partial HW implement too. i.e SA look can be used in SW, remaining stuff in HW
> > > (for example IPsec inline case)
> >
> > I am surely ok with the idea to give vendors an ability to customize implementation
> > and enable their HW capabilities.
> 
> I think, We are on the same page, just that the fine details of "framework"
> for customizing implementation based on their HW capabilities need to
> iron out.
> 
> > Do you think proposed additions to the rte_security would be  enough,
> > or something extra is needed?
> 
> See above.
> 
> Jerin
> 
> >
> > Konstantin
> >
> >
> > >
> > > # For protocols like UDP, it makes sense to create librte_udp as there
> > > no much HW specific offload other than ethdev provides.
> > >
> > > # PDCP could be another library to offload to HW, So talking
> > > rte_security path makes more sense in that case too.
> > >
> > > Jerin
  
Ananyev, Konstantin Sept. 24, 2018, 10:51 a.m. UTC | #14
Hi Akhil,

> 
> Hi Konstantin,
> 
> On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote:
> >>> I am not saying this should be the ONLY way to do as it does not work
> >>> very well with non NPU/FPGA class of SoC.
> >>>
> >>> So how about making the proposed IPSec library as plugin/driver to
> >>> rte_security.
> >> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> >> is the best possible approach.
> >> Though I probably understand your concern:
> >> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> >> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
> >> Though there are devices where most of prepare/process can be done in HW
> >> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> >> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> >> Is that so?
> >> To address that issue I suppose we can do:
> >> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >>      security devices into ipsec.
> >>      We planned to do it anyway, just don't have it done yet.
> >> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and
> RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
> >>      and add into rte_security_ops   new functions:
> >>      uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> num);
> >>      uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> num);
> >>      uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> >>      So for custom HW, PMD can overwrite normal prepare/process behavior.
> >>
> > Actually  after another thought:
> > My previous assumption (probably wrong one) was that for both
> > RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> > devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
> > Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now)
> > I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.
> > Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
> > (HW/SW roses/responsibilities)?
> > About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that
> capability.
> > So my question is there any devices/drivers that do support it?
> > If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
> > Konstantin
> >
> >
> In case of LOOKASIDE, the protocol errors like antireplay and sequence
> number overflow shall be the responsibility of either PMD or the HW.
> It should notify the application that the error has occurred and
> application need to decide what it needs to decide next.

Ok, thanks for clarification.
Just to confirm -  do we have a defined way for it right now in rte_security?

> 
> As Jerin said in other email, the roles/responsibility of the PMD in
> case of inline proto and lookaside case, nothing much is required from
> the application to do any processing for ipsec.
> 
> As per my understanding, the proposed RFC is to make the application
> code cleaner for  the protocol processing.

Yes, unified data-path API is definitely one of the main goals. 

> 1. For inline proto and lookaside there won't be any change in the data
> path. The main changes would be in the control path.

Yes, from your and Jerin description data-path processing looks
really lightweight for these cases.
For control path - there is no much change, user would have to call 
rte_ipsec_sa_init() to start using given SA.

> 
> 2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the
> protocol processing will be done in the library and there would be
> changes in both control and data path.

Yes.

> 
> As the rte_security currently provide generic APIs for control path only
> and we may have it expanded for protocol specific datapath processing.
> So for the application, working with inline crypto/ inline proto would
> be quite similar and it won't need to do some extra processing for
> inline crypto.
> Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside.
> 
> We may have the protocol specific APIs reside inside the rte_security
> and we can use either the crypto/net PMD underneath it.

As I understand, you suggest instead of introducing new library,
introduce similar data-path functions inside rte_security.
Probably something like:

uint16_t rte_security_process(struct rte_security_session *s, struct rte_mbuf *mb[], uint16_t num);
uint16_t rte_security_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[], 
                                                                      struct rte_crypto_op *cop[], uint16_t num);
...
Is that correct?

I thought about that approach too, and indeed from one side it looks cleaner and easier
to customize - each of these functions would just call related function inside rte_security_ops.
The problem with that approach - it would mean that each SA would be able to work with one
device only.
So if someone needs an SA that could be processed by multiple cores and multiple crypto-devices
in parallel such approach wouldn’t fit.
That was the main reason to keep rte_security as it is right now and go ahead with new library.
One thing that worries me -  do we need a way to share SQN and replay window information
between rte_security and upper layer (rte_ipsec)?
If 'no', then ok, if 'yes' then probably we need to discuss how to do it now?

> 
> Moving the SPD lookup inside the ipsec library may not be beneficial in
> terms of performance as well as configurability for the application. It
> would just be based on the rss hash.

If SPD c be done completely in HW - that's fine.
I just don't think there are many devices these days that wouldn't require
any SW intervention for SPD lookup, and I think RSS would be enough here 
(though flow-director might be).
As I said before, my thought was that might be ACL library would be enough
here as SW fallback, but if people think we need a special API/implementation
for it - that's ok by me too.
Konstantin
  
Akhil Goyal Sept. 25, 2018, 7:48 a.m. UTC | #15
Hi Konstantin,

On 9/24/2018 4:21 PM, Ananyev, Konstantin wrote:
> Hi Akhil,
>
>> Hi Konstantin,
>>
>> On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote:
>>>>> I am not saying this should be the ONLY way to do as it does not work
>>>>> very well with non NPU/FPGA class of SoC.
>>>>>
>>>>> So how about making the proposed IPSec library as plugin/driver to
>>>>> rte_security.
>>>> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
>>>> is the best possible approach.
>>>> Though I probably understand your concern:
>>>> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
>>>> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
>>>> Though there are devices where most of prepare/process can be done in HW
>>>> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
>>>> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
>>>> Is that so?
>>>> To address that issue I suppose we can do:
>>>> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>>>>       security devices into ipsec.
>>>>       We planned to do it anyway, just don't have it done yet.
>>>> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and
>> RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
>>>>       and add into rte_security_ops   new functions:
>>>>       uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
>> num);
>>>>       uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
>> num);
>>>>       uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
>>>>       So for custom HW, PMD can overwrite normal prepare/process behavior.
>>>>
>>> Actually  after another thought:
>>> My previous assumption (probably wrong one) was that for both
>>> RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>>> devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
>>> Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now)
>>> I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.
>>> Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
>>> (HW/SW roses/responsibilities)?
>>> About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that
>> capability.
>>> So my question is there any devices/drivers that do support it?
>>> If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
>>> Konstantin
>>>
>>>
>> In case of LOOKASIDE, the protocol errors like antireplay and sequence
>> number overflow shall be the responsibility of either PMD or the HW.
>> It should notify the application that the error has occurred and
>> application need to decide what it needs to decide next.
> Ok, thanks for clarification.
> Just to confirm -  do we have a defined way for it right now in rte_security?
As of now, there are no macros defined for antireplay/seq. no. overflow 
errors in crypto errors(rte_crypto_op_status), but it will be added soon.
For inline cases, ipsec-secgw application gets error notification via 
rte_eth_event.
>
>> As Jerin said in other email, the roles/responsibility of the PMD in
>> case of inline proto and lookaside case, nothing much is required from
>> the application to do any processing for ipsec.
>>
>> As per my understanding, the proposed RFC is to make the application
>> code cleaner for  the protocol processing.
> Yes, unified data-path API is definitely one of the main goals.
>
>> 1. For inline proto and lookaside there won't be any change in the data
>> path. The main changes would be in the control path.
> Yes, from your and Jerin description data-path processing looks
> really lightweight for these cases.
> For control path - there is no much change, user would have to call
> rte_ipsec_sa_init() to start using given SA.
>
>> 2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the
>> protocol processing will be done in the library and there would be
>> changes in both control and data path.
> Yes.
>
>> As the rte_security currently provide generic APIs for control path only
>> and we may have it expanded for protocol specific datapath processing.
>> So for the application, working with inline crypto/ inline proto would
>> be quite similar and it won't need to do some extra processing for
>> inline crypto.
>> Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside.
>>
>> We may have the protocol specific APIs reside inside the rte_security
>> and we can use either the crypto/net PMD underneath it.
> As I understand, you suggest instead of introducing new library,
> introduce similar data-path functions inside rte_security.
> Probably something like:
>
> uint16_t rte_security_process(struct rte_security_session *s, struct rte_mbuf *mb[], uint16_t num);
> uint16_t rte_security_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>                                                                        struct rte_crypto_op *cop[], uint16_t num);
> ...
> Is that correct?

"rte_security_process_ipsec" and "rte_security_crypto_prepare_ipsec" will be better.
We can have such APIs for other protocols as well.
Also, we should leave the existing functionality as is and we should let the user decide whether
it needs to manage the ipsec on it's own or with the new APIs.

>
> I thought about that approach too, and indeed from one side it looks cleaner and easier
> to customize - each of these functions would just call related function inside rte_security_ops.
> The problem with that approach - it would mean that each SA would be able to work with one
> device only.
> So if someone needs an SA that could be processed by multiple cores and multiple crypto-devices
> in parallel such approach wouldn’t fit.
One SA should be processed by a single core or else we need to have an 
event based application which support ordered queues,
because if we process packets of single SA on multiple cores, then 
packets will get re-ordered and we will get the anti-replay late errors 
on decap side.
And if we have event based solution, then the scheduler will be able to 
handle the load balancing accordingly.

> That was the main reason to keep rte_security as it is right now and go ahead with new library.
> One thing that worries me -  do we need a way to share SQN and replay window information
> between rte_security and upper layer (rte_ipsec)?
> If 'no', then ok, if 'yes' then probably we need to discuss how to do it now?
anti-replay window size shall be a parameter in ipsec_xform, which shall 
be added.
And the error notification
  - in case of using crypto, then use rte_crypto_op_status
- in case of inline cases, then use rte_eth_event callbacks.
I don't see rte_ipsec needs to take care of that in your initial approach.
However, if you plan to include session reset inside rte_ipsec, then you 
may need that inside the rte_ipsec.
And yes that would be tricky.

-Akhil
  
Jerin Jacob Sept. 26, 2018, 6:02 p.m. UTC | #16
-----Original Message-----
> Date: Mon, 24 Sep 2018 08:45:48 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "Joseph, Anoob" <Anoob.Joseph@caviumnetworks.com>, "dev@dpdk.org"
>  <dev@dpdk.org>, "Awal, Mohammad Abdul" <mohammad.abdul.awal@intel.com>,
>  "Doherty, Declan" <declan.doherty@intel.com>, Narayana Prasad
>  <narayanaprasad.athreya@caviumnetworks.com>, "akhil.goyal@nxp.com"
>  <akhil.goyal@nxp.com>, "hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
>  "shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
> Subject: RE: [dpdk-dev] [RFC] ipsec: new library for IPsec data-path
>  processing
> 
> External Email
> 
> Hi Jerin,

Hi Konstantin,

> 
> > > > > >
> > > > > > Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU
> > > > cores.
> > > > > > To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> > > > > > I think it would require some extra synchronization:
> > > > > > make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> > > > > > (packets entered ipsec processing).
> > > > > > I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> > > > > > Though we plan CTX level API to support such scenario.
> > > > > > What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
> > > > concurrently.
> > > > > >
> > > > > > > In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores.
> > > > Considering
> > > > > > > IPsec being CPU intensive, this would limit the net output of the chip.
> > > > > > That's true - but from other side implementation can offload heavy part
> > > > > > (encrypt/decrypt, auth) to special HW (cryptodev).
> > > > > > In that case single core might be enough for SA and extra synchronization would just slowdown things.
> > > > > > That's why I think it should be configurable  what behavior (ST or MT) to use.
> > > > > I do agree that these are the issues that we need to address to make the
> > > > > library MT safe. Whether the extra synchronization would slow down things is
> > > > > a very subjective question and will heavily depend on the platform. The
> > > > > library should have enough provisions to be able to support MT without
> > > > > causing overheads to ST. Right now, the library assumes ST.
> > > >
> > > >
> > > > I agree with Anoob here.
> > > >
> > > > I have two concerns with librte_ipsec as a separate library
> > > >
> > > > 1) There is an overlap with rte_security and new proposed library.
> > >
> > > I don't think there really is an overlap.
> >
> > As mentioned in your other email. IMO, There is an overlap as
> > RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL can support almost everything
> > in HW or HW + SW if some PMD wishes to do so.
> >
> > Answering some of the questions, you have asked in other thread based on
> > my understanding.
> >
> > Regarding RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL support,
> > Marvell/Cavium CPT hardware on next generation HW(Planning to upstream
> > around v19.02) can support RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and
> > Anoob already pushed the application changes in ipsec-gw.
> 
> Ok good to know.
> 
> >
> > In our understanding of HW/SW roles/responsibilities for that type of
> > devices are:
> >
> > INLINE_PROTOCOL
> > ----------------
> > In control path, security session is created with the given SA and
> > rte_flow configuration etc.
> >
> > For outbound traffic, the application will have to do SA lookup and
> > identify the security action (inline/look aside crypto/protocol). For
> > packets identified for inline protocol processing, the application would
> > submit as plain packets to the ethernet device and the security capable
> > ethernet device would perform IPSec and send out the packet. For PMDs
> > which would need extra metadata (capability flag), set_pkt_metadata
> > function pointer would be called (from application).
> > This can be used to set some per packet field to identify the security session to be used to
> > process the packet.
> 
> Yes, as I can see, that's what ipsec-gw is doing right now and it wouldn't be
> a problem to do the same in ipsec lib.
> 
> > Sequence number updation will be done by the PMD.
> 
> Ok, so for INLINE_PROTOCOL upper layer wouldn't need to keep track for SQN values at all?
> You don’t' consider a possibility that by some reason that SA would need to
> be moved from device that support INLINE_PROTOCOL to the device that doesn't?

For INLINE_PROTOCOL, the application won't have any control over such 
per packet fields. As for moving the SA to a different device, right now 
rte_security spec doesn't allow that. May be we should fix the spec to 
allow multiple devices to share the same security session. That way, if 
there is error in the inline processing, application will be able to 
submit the packet to LOOKASIDE_PROTOCOL crypto device (sharing the 
session) and get the packet processed.


> 
> > For inbound traffic, the packets for IPSec would be identified by using
> > rte_flow (hardware accelerated packet filtering). For the packets
> > identified for inline offload (SECURITY action), hardware would perform
> > the processing. For inline protocol processed IPSec packets, PMD would
> > set “user data” so that application can get the details of the security
> > processing done on the packet. Once the plain packet (after IPSec
> > processing) is received, a selector check need to be performed to make
> > sure we have a valid packet after IPSec processing. The user data is used
> > for that. Anti-replay check is handled by the PMD. The PMD would raise
> > an eth event in case of sequence number expiry or any SA expiry.
> 
> Few questions here:
> 1) if I understand things right - to specify that it was an IPsec packet -
> PKT_RX_SEC_OFFLOAD will be set in mbuf ol_flags?
> 2) Basically 'userdata' will contain just a user provided at rte_security_session_create pointer
> (most likely pointer to the SA, as it is done right now in ipsec-secgw), correct?

Yes to 1 & 2.


> 3) in current rte_security API si there a way to get/set replay window size, etc?

Not right now. But Akhil mentioned that it will be added soon.


> 4)   Same question as for TX: you don't plan to support fallback to other type of devices/SW?
> I.E. HW was not able to process ipsec packet by some reason (let say fragmented packet)
> and now it is SW responsibility to do so?
> The reason I am asking for that - it seems right now there is no defined way
> to share SQN related information between HW/PMD and upper layer SW.
> Is that ok, or would we need such capability?
> If we would, and upper layer SW would need to keep track on SQN anyway,
> then there is probably no point to do same thing in PMD itelf?
> In that case PMD just need to provide SQN information to the upper layer
> (probably one easy way to do it - reuse rte_,buf.seqn for that purpose,
> though for that will probably need make it 64-bit long).

The spec doesn't allow doing IPsec partially on HW & SW. The way spec is 
written (and implemented in ipsec-secgw) to allow one kind of 
RTE_SECURITY_ACTION_TYPE for one SA. If HW is not able to process packet 
received on INLINE_PROTOCOL SA, then it is treated as error. Handling 
fragmentation is a very valid scenario. We will have to edit the spec if 
we need to handle this scenario.

> 
> >
> >
> > LOOKASIDE_PROTOCOL
> > ------------------
> > In control path, security session is created with the given SA.
> >
> > Enqueue/dequeue is similar to what is done for regular crypto
> > (RTE_SECURITY_ACTION_TYPE_NONE) but all the protocol related processing
> > would be offloaded. Application will need to do SA lookup and identify
> > the processing to be done (both in case of outbound & inbound), and
> > submit packet to crypto device. Application need not do any IPSec
> > related transformations other than the lookup. Anti-replay need to be
> > handled in the PMD (the spec says the device “may be handled” do anti-replay check,
> > but a complete protocol offload would need anti-replay check also).
> 
> Same question here - wouldn't there be a situations when HW/PMD would need to
> share SQN information with upper layer?
> Let say if upper layer SW would need to do load balancing between crypto-devices
> with LOOKASIDE_PROTOCOL and without?

Same answer as above. ACTION is tied to security session which is tied 
to SA. SQN etc is internal to the session and so load balancing between 
crypto-devices is not supported.

> 
> >
> >
> > > rte_security is a 'framework for management and provisioning of security protocol operations offloaded to hardware based devices'.
> > > While rte_ipsec is aimed to be a library for IPsec data-path processing.
> > > There is no plans for rte_ipsec to 'obsolete' rte_security.
> > > Quite opposite rte_ipsec supposed to work with both rte_cryptodev and rte_security APIs (devices).
> > > It is possible to have an SA that would use both crypto and  security devices.
> > > Or to have an SA that would use multiple crypto devs
> > > (though right now it is up the user level to do load-balancing logic).
> > >
> > > > For IPsec, If an application needs to use rte_security for HW
> > > > implementation and and application needs to use librte_ipsec for
> > > >  SW implementation then it is bad and a lot duplication of work on
> > > > he slow path too.
> > >
> > > The plan is that application would need to use just rte_ipsec API for all data-paths
> > > (HW/SW, lookaside/inline).
> > > Let say right now there is rte_ipsec_inline_process() function if user
> > > prefers to use inline security device to process given group packets,
> > > and rte_ipsec_crypto_process(/prepare) if user decides to use
> > > lookaside security or simple crypto device for it.
> > >
> > > >
> > > > The rte_security spec can support both inline and look-aside IPSec
> > > > protocol support.
> > >
> > > AFAIK right now rte_security just provides API to create/free/manipulate security sessions.
> > > I don't see how it can support all the functionality mentioned above,
> > > plus SAD and SPD.
> >
> >
> > At least for INLINE_PROTOCOL case SA lookup for inbound traffic does by
> > HW.
> 
> For inbound yes, for outbound I suppose you still would need to do a lookup in SW.

Yes

> 
> >
> > >
> > > >
> > > > 2) This library is tuned for fat CPU core in mind like single SA on core
> > > > etc. Which is fine for x86 servers and arm64 server category of machines
> > > > but it does not work very well with NPU class of SoC or FPGA.
> > > >
> > > > As there  are the different ways to implement the IPSec, For instance,
> > > > use of eventdev can help in situation for handling millions of SA and
> > > > equence number of update and anti reply check can be done by leveraging
> > > > some of the HW specific features like
> > > > ORDERED, ATOMIC schedule type(mapped as eventdev feature)in HW with PIPELINE model.
> > > >
> > > > # Issues with having one SA one core,
> > > > - In the outbound side, there could be multiple flows using the same SA.
> > > >   Multiple flows could be processed parallel on different lcores,
> > > > but tying one SA to one core would mean we won't be able to do that.
> > > >
> > > > - In the inbound side, we will have a fat flow hitting one core. If
> > > >   IPsec library assumes single core, we will not be able to to spread
> > > > fat flow to multiple cores. And one SA-one core would mean all ports on
> > > > which we would expect IPsec traffic has to be handled by that core.
> > >
> > > I suppose that all refers to the discussion about MT safe API for rte_ipsec, right?
> > > If so, then as I said in my reply to Anoob:
> > > We will try to make API usable in MT environment for v1,
> > > so you can review and provide comments at early stages.
> >
> > OK
> >
> > >
> > > >
> > > > I have made a simple presentation. This presentation details ONE WAY to
> > > > implement the IPSec with HW support on NPU.
> > > >
> > > > https://docs.google.com/presentation/d/1e3IDf9R7ZQB8FN16Nvu7KINuLSWMdyKEw8_0H05rjj4/edit?usp=sharing
> > > >
> > >
> > > Thanks, quite helpful.
> > > Actually from page 3, it looks like your expectations don't contradict in general with proposed API:
> > >
> > > ...
> > > } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
> > >                         sa = ev.flow_queue_id;
> > >                         /* do critical section work per sa */
> > >                         do_critical_section_work(sa);
> > >
> > > [KA] that's the place where I expect either
> > > rte_ipsec_inline_process(sa, ...); OR rte_ipsec_crypto_prepare(sa, ...);
> > > would be called.
> >
> > Makes sense. But currently, the library defines what is
> > rte_ipsec_inline_process() and rte_ipsec_crypto_prepare(), but it should
> > be based on underneath security device or crypto device.
> 
> Reason for that - their code-paths are quite different:
> for inline devices we can do whole processing synchronously(within process() function),
> while fro crypto it is sort of split into tw parts -
> we first have to do prepare();enqueue() them to crypto-dev, and then dequeue();process().
> Another good thing with that way - it allows the same SA to work with different devices.
> 
> >
> > So, IMO for better control, these functions should be the function pointer
> > based and based on underlying device, library can fill the
> > implementation.
> >
> > IMO, it is not possible to create "static inline function" with all "if"
> > checks. I think, we can have four ipsec functions with function pointer
> > scheme.
> >
> > rte_ipsec_inbound_prepare()
> > rte_ipsec_inbound_process()
> > rte_ipsec_outbound_prepare()
> > rte_ipsec_outbound_process()
> >
> > Some of the other concerns:
> > 1) For HW implementation, rte_ipsec_sa needs to opaque like rte_security
> > as some of the structure defined by HW or Microcode. We can choose
> > absolute generic items as common and device/rte_security specific can be opaque.
> 
> I don't think it would be a problem, rte_ipsec_sa  does contain a pointer to
> rte_security_session, so it can provide it as an argument to these functions.

The rte_ipsec_sa would need some private space for application to store 
it's metadata. There can be SA implementations with additional fields 
for faster lookups. To rephrase, the application should be given some 
provision to store some metadata it would need for faster lookups.
may sa_init API can give amount private size required.


> 
> >
> > 2)I think, in order to accommodate the event drivern model. We need to pass
> > void ** in prepare() and process() function with an additional argument
> > of type(TYPE_EVENT/TYPE_MBUF) can be passed to detect packet object
> > type as some of the functions in prepare() and process() may need
> > rte_event to operate on.
> 
> You are talking here about security device specific functions described below, correct?
> 
> >
> > >
> > >                      /* Issue the crypto request and generate the following on crypto work completion */
> > > [KA] that's the place where I expect rte_ipsec_crypto_process(...) be invoked.
> > >
> > >                         ev.flow_queue_id = tx_port;
> > >                         ev.sub_event_id = tx_queue_id;
> > >                         ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
> > >                         rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);
> > >                 }
> > >
> > >
> > > > I am not saying this should be the ONLY way to do as it does not work
> > > > very well with non NPU/FPGA class of SoC.
> > > >
> > > > So how about making the proposed IPSec library as plugin/driver to
> > > > rte_security.
> > >
> > > As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> > > is the best possible approach.
> > > Though I probably understand your concern:
> > > In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> > > i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered.
> > > Though there are devices where most of prepare/process can be done in HW
> > > (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> > > plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> > > Is that so?
> > > To address that issue I suppose we can do:
> > > 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> > >     security devices into ipsec.
> > >     We planned to do it anyway, just don't have it done yet.
> > > 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
> >
> > The problem is, CUSTOM may have different variants and "if" conditions won't
> > scale if we choose to have non function pointer scheme. Otherwise, it
> > looks OK to create new SECURITY TYPE and associated plugin for prepare() and process()
> > function in librte_ipsec library.
> 
> In principle, I don't mind to always use function pointers for prepare()/process(), but:
> from your description above of INLINE_PROTOCOL and LOOKASIDE_PROTOCOL
> the process()/prepare() for such devices looks well defined and
> straightforward to implement.
> Not sure we'll need a function pointer for such simple and lightweight case:
> set/check ol_flags, set/read userdata value.
> I think extra function call here is kind of overkill and will only slowdown things.
> But if that would be majority preference - I wouldn't argue.
> BTW if we'll agree to always use function pointers for process/prepare,
> then there is no point to have that all existing action types -
> all we need is an indication is it inline or lookaside device and
> function pointers for prepare/process().

Me too not a fan of function pointer scheme. But options are limited.

Though the generic usage seems straightforward, the implementation of 
the above modes can be very different. Vendors could optimize various 
operations (SQN update for example) for better performance on their 
hardware. Sticking to one approach would negate that advantage.

Another option would be to use multiple-worker model that Anoob had 
proposed some time back.
https://mails.dpdk.org/archives/dev/2018-June/103808.html

Idea would be to make all lib_ipsec functions added as static inline 
functions.

static inline rte_ipsec_add_tunnel_hdr(struct rte_mbuf *mbuf);
static inline rte_ipsec_update_sqn(struct rte_mbuf *mbuf, &seq_no);
...

For the regular use case, a fat 
rte_ipsec_(inbound/outbound)_(prepare/process) can be provided. The 
worker implemented for that case can directly call the function and 
forget about the other modes. For other vendors with varying 
capabilities, there can be multiple workers taking advantage of the hw 
features. For such workers, the static inline functions can be used as 
required. This gives vendors opportunity to pick and choose what they 
want from the ipsec lib. The worker to be used for that case will be 
determined based on the capabilities exposed by the PMDs.

https://mails.dpdk.org/archives/dev/2018-June/103828.html

The above email explains how multiple workers can be used with l2fwd.

For this to work, the application & library code need to be modularised. 
Like what is being done in the following series,
https://mails.dpdk.org/archives/dev/2018-June/103786.html

This way one application can be made to run on multiple platforms, with 
the app being optimized for the platform on which it would run.

/* ST SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - NO EVENTDEV*/
worker1()
{
     while(true) {
         nb_pkts = rte_eth_rx_burst();

         if (nb_pkts != 0) {
             /* Do lookup */
             rte_ipsec_inbound_prepare();
             rte_cryptodev_enqueue_burst();
             /* Update in-flight */
         }

         if (in_flight) {
             rte_cryptodev_dequeue_burst();
             rte_ipsec_outbound_process();
         }
         /* route packet */
}

#include <rte_ipsec.h>   /* For IPsec lib static inlines */

static inline rte_event_enqueue(struct rte_event *ev)
{
     ...
}

/* MT safe SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - EVENTDEV)
worker2()
{
     while(true) {
         nb_pkts = rte_eth_rx_burst();

         if (nb_pkts != 0) {
             /* Do lookup */
            rte_ipsec_add tunnel(ev->mbuf);
            rte_event_enqueue(ev)
            rte_cryptodev_enqueue_burst(ev->mbuf);
             /* Update in-flight */
         }

         if (in_flight) {
             rte_cryptodev_dequeue_burst();
             rte_ipsec_outbound_process();
         }
         /* route packet */
}

In short,

1) Have separate small inline functions in library
2) If something can be grouped, it can be exposed a specific function
to address a specific usecases
3) Let remaining code, can go in application as different worker() to 
address all the usecases.

> 
> Konstantin
> 
> >
> >
> > >     and add into rte_security_ops   new functions:
> > >     uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> > >     uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> > >     uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> > >     So for custom HW, PMD can overwrite normal prepare/process behavior.
> > >
> > > > This would give flexibly for each vendor/platform choose to different
> > > > IPse implementation based on HW support WITHOUT CHANGING THE APPLICATION
> > > > INTERFACE.
> > >
> > > Not sure what API changes you are referring to?
> > > As I am aware we do introduce new API, but all existing APIs remain in place.
> >
> >
> > What I meant was, Single application programming interface to enable IPSec processing to
> > application.
> >
> >
> > >
> > > >
> > > > IMO, rte_security IPsec look aside support can be simply added by
> > > > creating the virtual crypto device(i.e move the proposed code to the virtual crypto device)
> > > > likewise inline support
> > > > can be added by the virtual ethdev device.
> > >
> > > That's probably possible and if someone would like to introduce such abstraction - NP in general
> > > (though my suspicion - it might be too heavy to be really useful).
> > > Though I don't think it should be the only possible way for the user to enable IPsec data-processing inside his app.
> > > Again I guess such virtual-dev will still use rte_ipsec inside.
> >
> > I don't have strong opinion on virtual devices VS function pointer based
> > prepare() and process() function in librte_ipsec library.
> >
> > >
> > > > This would avoid the need for
> > > > updating ipsec-gw application as well i.e unified interface to application.
> > >
> > > I think - it would  really good to simplify existing ipsec-secgw sample app.
> > > Some parts of it seems unnecessary complex to me.
> > > One of the reasons for it -  we don't really have an unified (and transparent) API for ipsec data-path.
> > > Let's look at ipsec_enqueue() and related code (examples/ipsec-secgw/ipsec.c:365)
> > > It is huge (and ugly) -  user has to handle dozen different cases just to enqueue packet for IPsec processing.
> > > One of the aims of rte_ipsec library - hide all that complexities inside the library and provide to
> > > the upper layer clean and transparent API.
> > >
> > > >
> > > > If you don't like the above idea, any scheme of plugin based
> > > > implementation would be fine so that vendor or platform can choose its own implementation.
> > > > It can be based on partial HW implement too. i.e SA look can be used in SW, remaining stuff in HW
> > > > (for example IPsec inline case)
> > >
> > > I am surely ok with the idea to give vendors an ability to customize implementation
> > > and enable their HW capabilities.
> >
> > I think, We are on the same page, just that the fine details of "framework"
> > for customizing implementation based on their HW capabilities need to
> > iron out.
> >
> > > Do you think proposed additions to the rte_security would be  enough,
> > > or something extra is needed?
> >
> > See above.
> >
> > Jerin
> >
> > >
> > > Konstantin
> > >
> > >
> > > >
> > > > # For protocols like UDP, it makes sense to create librte_udp as there
> > > > no much HW specific offload other than ethdev provides.
> > > >
> > > > # PDCP could be another library to offload to HW, So talking
> > > > rte_security path makes more sense in that case too.
> > > >
> > > > Jerin
  
Ananyev, Konstantin Sept. 30, 2018, 9 p.m. UTC | #17
Hi Akhil,

> 
> Hi Konstantin,
> 
> On 9/24/2018 4:21 PM, Ananyev, Konstantin wrote:
> > Hi Akhil,
> >
> >> Hi Konstantin,
> >>
> >> On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote:
> >>>>> I am not saying this should be the ONLY way to do as it does not work
> >>>>> very well with non NPU/FPGA class of SoC.
> >>>>>
> >>>>> So how about making the proposed IPSec library as plugin/driver to
> >>>>> rte_security.
> >>>> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> >>>> is the best possible approach.
> >>>> Though I probably understand your concern:
> >>>> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> >>>> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are
> covered.
> >>>> Though there are devices where most of prepare/process can be done in HW
> >>>> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> >>>> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> >>>> Is that so?
> >>>> To address that issue I suppose we can do:
> >>>> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >>>>       security devices into ipsec.
> >>>>       We planned to do it anyway, just don't have it done yet.
> >>>> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and
> >> RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
> >>>>       and add into rte_security_ops   new functions:
> >>>>       uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> >> num);
> >>>>       uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> >> num);
> >>>>       uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> num);
> >>>>       So for custom HW, PMD can overwrite normal prepare/process behavior.
> >>>>
> >>> Actually  after another thought:
> >>> My previous assumption (probably wrong one) was that for both
> >>> RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >>> devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
> >>> Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now)
> >>> I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.
> >>> Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
> >>> (HW/SW roses/responsibilities)?
> >>> About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that
> >> capability.
> >>> So my question is there any devices/drivers that do support it?
> >>> If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
> >>> Konstantin
> >>>
> >>>
> >> In case of LOOKASIDE, the protocol errors like antireplay and sequence
> >> number overflow shall be the responsibility of either PMD or the HW.
> >> It should notify the application that the error has occurred and
> >> application need to decide what it needs to decide next.
> > Ok, thanks for clarification.
> > Just to confirm -  do we have a defined way for it right now in rte_security?
> As of now, there are no macros defined for antireplay/seq. no. overflow
> errors in crypto errors(rte_crypto_op_status), but it will be added soon.
> For inline cases, ipsec-secgw application gets error notification via
> rte_eth_event.

Ok.


> >
> >> As Jerin said in other email, the roles/responsibility of the PMD in
> >> case of inline proto and lookaside case, nothing much is required from
> >> the application to do any processing for ipsec.
> >>
> >> As per my understanding, the proposed RFC is to make the application
> >> code cleaner for  the protocol processing.
> > Yes, unified data-path API is definitely one of the main goals.
> >
> >> 1. For inline proto and lookaside there won't be any change in the data
> >> path. The main changes would be in the control path.
> > Yes, from your and Jerin description data-path processing looks
> > really lightweight for these cases.
> > For control path - there is no much change, user would have to call
> > rte_ipsec_sa_init() to start using given SA.
> >
> >> 2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the
> >> protocol processing will be done in the library and there would be
> >> changes in both control and data path.
> > Yes.
> >
> >> As the rte_security currently provide generic APIs for control path only
> >> and we may have it expanded for protocol specific datapath processing.
> >> So for the application, working with inline crypto/ inline proto would
> >> be quite similar and it won't need to do some extra processing for
> >> inline crypto.
> >> Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside.
> >>
> >> We may have the protocol specific APIs reside inside the rte_security
> >> and we can use either the crypto/net PMD underneath it.
> > As I understand, you suggest instead of introducing new library,
> > introduce similar data-path functions inside rte_security.
> > Probably something like:
> >
> > uint16_t rte_security_process(struct rte_security_session *s, struct rte_mbuf *mb[], uint16_t num);
> > uint16_t rte_security_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
> >                                                                        struct rte_crypto_op *cop[], uint16_t num);
> > ...
> > Is that correct?
> 
> "rte_security_process_ipsec" and "rte_security_crypto_prepare_ipsec" will be better.
> We can have such APIs for other protocols as well.
> Also, we should leave the existing functionality as is and we should let the user decide whether
> it needs to manage the ipsec on it's own or with the new APIs.

There are no plans to void any existing API.
If the user has working code that uses rte_crytpodev/rte_security directly and wants to keep it,
that's fine.

> 
> >
> > I thought about that approach too, and indeed from one side it looks cleaner and easier
> > to customize - each of these functions would just call related function inside rte_security_ops.
> > The problem with that approach - it would mean that each SA would be able to work with one
> > device only.
> > So if someone needs an SA that could be processed by multiple cores and multiple crypto-devices
> > in parallel such approach wouldn’t fit.
> One SA should be processed by a single core or else we need to have an
> event based application which support ordered queues,
> because if we process packets of single SA on multiple cores, then
> packets will get re-ordered and we will get the anti-replay late errors
> on decap side.

I suppose in some cases one core would be enough to handle SA traffic,
for some not, as I said before, I think it should be configurable.
Of course for MT case some entity that would  guarantee proper ordering
for final packet processing would be needed.
It could be some eventdev, or SW FIFO queue, or something else.

> And if we have event based solution, then the scheduler will be able to
> handle the load balancing accordingly.

Didn't understand that sentence.

> 
> > That was the main reason to keep rte_security as it is right now and go ahead with new library.
> > One thing that worries me -  do we need a way to share SQN and replay window information
> > between rte_security and upper layer (rte_ipsec)?
> > If 'no', then ok, if 'yes' then probably we need to discuss how to do it now?
> anti-replay window size shall be a parameter in ipsec_xform, which shall
> be added.
> And the error notification
>   - in case of using crypto, then use rte_crypto_op_status
> - in case of inline cases, then use rte_eth_event callbacks.
> I don't see rte_ipsec needs to take care of that in your initial approach.
> However, if you plan to include session reset inside rte_ipsec, then you
> may need that inside the rte_ipsec.

I am not talking rte_ipsec, my concern here is rte_security.
Suppose you need to switch from device that can do inline_proto to the device that doesn't.
Right now the only way - renegotiate all SAs that were handled by inline_proto device
(because there is no way to retrieve from rte_security device SQN information).
Renegotiation should work, but it looks like quite expensive approach.
If rte_security would have a way to share its SQN status with SW, then I think it would
be possible to do such switch without SA termination.
Again with such info available - load-balancing for the same SA on multiple devices
might be possible.
Konstantin
  
Akhil Goyal Oct. 1, 2018, 12:49 p.m. UTC | #18
Hi Konstantin,

On 10/1/2018 2:30 AM, Ananyev, Konstantin wrote:
>
> Hi Akhil,
>
>> Hi Konstantin,
>>
>> On 9/24/2018 4:21 PM, Ananyev, Konstantin wrote:
>>> Hi Akhil,
>>>
>>>> Hi Konstantin,
>>>>
>>>> On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote:
>>>>>>> I am not saying this should be the ONLY way to do as it does not work
>>>>>>> very well with non NPU/FPGA class of SoC.
>>>>>>>
>>>>>>> So how about making the proposed IPSec library as plugin/driver to
>>>>>>> rte_security.
>>>>>> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
>>>>>> is the best possible approach.
>>>>>> Though I probably understand your concern:
>>>>>> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
>>>>>> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are
>> covered.
>>>>>> Though there are devices where most of prepare/process can be done in HW
>>>>>> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
>>>>>> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
>>>>>> Is that so?
>>>>>> To address that issue I suppose we can do:
>>>>>> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>>>>>>        security devices into ipsec.
>>>>>>        We planned to do it anyway, just don't have it done yet.
>>>>>> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and
>>>> RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
>>>>>>        and add into rte_security_ops   new functions:
>>>>>>        uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
>>>> num);
>>>>>>        uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
>>>> num);
>>>>>>        uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
>> num);
>>>>>>        So for custom HW, PMD can overwrite normal prepare/process behavior.
>>>>>>
>>>>> Actually  after another thought:
>>>>> My previous assumption (probably wrong one) was that for both
>>>>> RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>>>>> devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
>>>>> Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now)
>>>>> I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.
>>>>> Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
>>>>> (HW/SW roses/responsibilities)?
>>>>> About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that
>>>> capability.
>>>>> So my question is there any devices/drivers that do support it?
>>>>> If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
>>>>> Konstantin
>>>>>
>>>>>
>>>> In case of LOOKASIDE, the protocol errors like antireplay and sequence
>>>> number overflow shall be the responsibility of either PMD or the HW.
>>>> It should notify the application that the error has occurred and
>>>> application need to decide what it needs to decide next.
>>> Ok, thanks for clarification.
>>> Just to confirm -  do we have a defined way for it right now in rte_security?
>> As of now, there are no macros defined for antireplay/seq. no. overflow
>> errors in crypto errors(rte_crypto_op_status), but it will be added soon.
>> For inline cases, ipsec-secgw application gets error notification via
>> rte_eth_event.
> Ok.
>
>
>>>> As Jerin said in other email, the roles/responsibility of the PMD in
>>>> case of inline proto and lookaside case, nothing much is required from
>>>> the application to do any processing for ipsec.
>>>>
>>>> As per my understanding, the proposed RFC is to make the application
>>>> code cleaner for  the protocol processing.
>>> Yes, unified data-path API is definitely one of the main goals.
>>>
>>>> 1. For inline proto and lookaside there won't be any change in the data
>>>> path. The main changes would be in the control path.
>>> Yes, from your and Jerin description data-path processing looks
>>> really lightweight for these cases.
>>> For control path - there is no much change, user would have to call
>>> rte_ipsec_sa_init() to start using given SA.
>>>
>>>> 2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the
>>>> protocol processing will be done in the library and there would be
>>>> changes in both control and data path.
>>> Yes.
>>>
>>>> As the rte_security currently provide generic APIs for control path only
>>>> and we may have it expanded for protocol specific datapath processing.
>>>> So for the application, working with inline crypto/ inline proto would
>>>> be quite similar and it won't need to do some extra processing for
>>>> inline crypto.
>>>> Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside.
>>>>
>>>> We may have the protocol specific APIs reside inside the rte_security
>>>> and we can use either the crypto/net PMD underneath it.
>>> As I understand, you suggest instead of introducing new library,
>>> introduce similar data-path functions inside rte_security.
>>> Probably something like:
>>>
>>> uint16_t rte_security_process(struct rte_security_session *s, struct rte_mbuf *mb[], uint16_t num);
>>> uint16_t rte_security_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>>>                                                                         struct rte_crypto_op *cop[], uint16_t num);
>>> ...
>>> Is that correct?
>> "rte_security_process_ipsec" and "rte_security_crypto_prepare_ipsec" will be better.
>> We can have such APIs for other protocols as well.
>> Also, we should leave the existing functionality as is and we should let the user decide whether
>> it needs to manage the ipsec on it's own or with the new APIs.
> There are no plans to void any existing API.
> If the user has working code that uses rte_crytpodev/rte_security directly and wants to keep it,
> that's fine.
>
>>> I thought about that approach too, and indeed from one side it looks cleaner and easier
>>> to customize - each of these functions would just call related function inside rte_security_ops.
>>> The problem with that approach - it would mean that each SA would be able to work with one
>>> device only.
>>> So if someone needs an SA that could be processed by multiple cores and multiple crypto-devices
>>> in parallel such approach wouldn’t fit.
>> One SA should be processed by a single core or else we need to have an
>> event based application which support ordered queues,
>> because if we process packets of single SA on multiple cores, then
>> packets will get re-ordered and we will get the anti-replay late errors
>> on decap side.
> I suppose in some cases one core would be enough to handle SA traffic,
> for some not, as I said before, I think it should be configurable.
> Of course for MT case some entity that would  guarantee proper ordering
> for final packet processing would be needed.
> It could be some eventdev, or SW FIFO queue, or something else.
>
>> And if we have event based solution, then the scheduler will be able to
>> handle the load balancing accordingly.
> Didn't understand that sentence.
I mean the event device will be able to handle that which has an inbuilt 
scheduler in it for balancing the load of single SA,
and if the queues are ordered and it support order restoration, then it 
will be able to maintain the ordering. And for that
you would not have to bother about giving the same SA to different 
cryptodevs on multi cores.
>>> That was the main reason to keep rte_security as it is right now and go ahead with new library.
>>> One thing that worries me -  do we need a way to share SQN and replay window information
>>> between rte_security and upper layer (rte_ipsec)?
>>> If 'no', then ok, if 'yes' then probably we need to discuss how to do it now?
>> anti-replay window size shall be a parameter in ipsec_xform, which shall
>> be added.
>> And the error notification
>>    - in case of using crypto, then use rte_crypto_op_status
>> - in case of inline cases, then use rte_eth_event callbacks.
>> I don't see rte_ipsec needs to take care of that in your initial approach.
>> However, if you plan to include session reset inside rte_ipsec, then you
>> may need that inside the rte_ipsec.
> I am not talking rte_ipsec, my concern here is rte_security.
> Suppose you need to switch from device that can do inline_proto to the device that doesn't.
In what use case you would need such switching?
> Right now the only way - renegotiate all SAs that were handled by inline_proto device
> (because there is no way to retrieve from rte_security device SQN information).
> Renegotiation should work, but it looks like quite expensive approach.
This will be only for the first packet.
> If rte_security would have a way to share its SQN status with SW, then I think it would
> be possible to do such switch without SA termination.
what kind of SQN status you are looking for? overflow? If yes, 
application need to re-negotiate the session,
which will be done periodically anyways.
> Again with such info available - load-balancing for the same SA on multiple devices
> might be possible.
> Konstantin
>
  
Ananyev, Konstantin Oct. 2, 2018, 11:24 p.m. UTC | #19
Hi Akhil,

> 
> Hi Konstantin,
> 
> On 10/1/2018 2:30 AM, Ananyev, Konstantin wrote:
> >
> > Hi Akhil,
> >
> >> Hi Konstantin,
> >>
> >> On 9/24/2018 4:21 PM, Ananyev, Konstantin wrote:
> >>> Hi Akhil,
> >>>
> >>>> Hi Konstantin,
> >>>>
> >>>> On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote:
> >>>>>>> I am not saying this should be the ONLY way to do as it does not work
> >>>>>>> very well with non NPU/FPGA class of SoC.
> >>>>>>>
> >>>>>>> So how about making the proposed IPSec library as plugin/driver to
> >>>>>>> rte_security.
> >>>>>> As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> >>>>>> is the best possible approach.
> >>>>>> Though I probably understand your concern:
> >>>>>> In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> >>>>>> i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are
> >> covered.
> >>>>>> Though there are devices where most of prepare/process can be done in HW
> >>>>>> (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> >>>>>> plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> >>>>>> Is that so?
> >>>>>> To address that issue I suppose we can do:
> >>>>>> 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >>>>>>        security devices into ipsec.
> >>>>>>        We planned to do it anyway, just don't have it done yet.
> >>>>>> 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and
> >>>> RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
> >>>>>>        and add into rte_security_ops   new functions:
> >>>>>>        uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[],
> uint16_t
> >>>> num);
> >>>>>>        uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[],
> uint16_t
> >>>> num);
> >>>>>>        uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> >> num);
> >>>>>>        So for custom HW, PMD can overwrite normal prepare/process behavior.
> >>>>>>
> >>>>> Actually  after another thought:
> >>>>> My previous assumption (probably wrong one) was that for both
> >>>>> RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >>>>> devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config).
> >>>>> Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right
> now)
> >>>>> I am not so sure about that - looks like some SW help might be needed for replay window updates, etc.
> >>>>> Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices
> >>>>> (HW/SW roses/responsibilities)?
> >>>>> About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL  - I didn't find any driver inside DPDK source tree that does support that
> >>>> capability.
> >>>>> So my question is there any devices/drivers that do support it?
> >>>>> If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices?
> >>>>> Konstantin
> >>>>>
> >>>>>
> >>>> In case of LOOKASIDE, the protocol errors like antireplay and sequence
> >>>> number overflow shall be the responsibility of either PMD or the HW.
> >>>> It should notify the application that the error has occurred and
> >>>> application need to decide what it needs to decide next.
> >>> Ok, thanks for clarification.
> >>> Just to confirm -  do we have a defined way for it right now in rte_security?
> >> As of now, there are no macros defined for antireplay/seq. no. overflow
> >> errors in crypto errors(rte_crypto_op_status), but it will be added soon.
> >> For inline cases, ipsec-secgw application gets error notification via
> >> rte_eth_event.
> > Ok.

Actually looking at it a bit closer -you are talking about RTE_ETH_EVENT_IPSEC, right?
I do see struct/types definitions, and to see code in ipsec-secgw to handle it,
but I don't see any driver that supports it.
Is that what intended?

> >
> >
> >>>> As Jerin said in other email, the roles/responsibility of the PMD in
> >>>> case of inline proto and lookaside case, nothing much is required from
> >>>> the application to do any processing for ipsec.
> >>>>
> >>>> As per my understanding, the proposed RFC is to make the application
> >>>> code cleaner for  the protocol processing.
> >>> Yes, unified data-path API is definitely one of the main goals.
> >>>
> >>>> 1. For inline proto and lookaside there won't be any change in the data
> >>>> path. The main changes would be in the control path.
> >>> Yes, from your and Jerin description data-path processing looks
> >>> really lightweight for these cases.
> >>> For control path - there is no much change, user would have to call
> >>> rte_ipsec_sa_init() to start using given SA.
> >>>
> >>>> 2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the
> >>>> protocol processing will be done in the library and there would be
> >>>> changes in both control and data path.
> >>> Yes.
> >>>
> >>>> As the rte_security currently provide generic APIs for control path only
> >>>> and we may have it expanded for protocol specific datapath processing.
> >>>> So for the application, working with inline crypto/ inline proto would
> >>>> be quite similar and it won't need to do some extra processing for
> >>>> inline crypto.
> >>>> Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside.
> >>>>
> >>>> We may have the protocol specific APIs reside inside the rte_security
> >>>> and we can use either the crypto/net PMD underneath it.
> >>> As I understand, you suggest instead of introducing new library,
> >>> introduce similar data-path functions inside rte_security.
> >>> Probably something like:
> >>>
> >>> uint16_t rte_security_process(struct rte_security_session *s, struct rte_mbuf *mb[], uint16_t num);
> >>> uint16_t rte_security_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
> >>>                                                                         struct rte_crypto_op *cop[], uint16_t num);
> >>> ...
> >>> Is that correct?
> >> "rte_security_process_ipsec" and "rte_security_crypto_prepare_ipsec" will be better.
> >> We can have such APIs for other protocols as well.
> >> Also, we should leave the existing functionality as is and we should let the user decide whether
> >> it needs to manage the ipsec on it's own or with the new APIs.
> > There are no plans to void any existing API.
> > If the user has working code that uses rte_crytpodev/rte_security directly and wants to keep it,
> > that's fine.
> >
> >>> I thought about that approach too, and indeed from one side it looks cleaner and easier
> >>> to customize - each of these functions would just call related function inside rte_security_ops.
> >>> The problem with that approach - it would mean that each SA would be able to work with one
> >>> device only.
> >>> So if someone needs an SA that could be processed by multiple cores and multiple crypto-devices
> >>> in parallel such approach wouldn’t fit.
> >> One SA should be processed by a single core or else we need to have an
> >> event based application which support ordered queues,
> >> because if we process packets of single SA on multiple cores, then
> >> packets will get re-ordered and we will get the anti-replay late errors
> >> on decap side.
> > I suppose in some cases one core would be enough to handle SA traffic,
> > for some not, as I said before, I think it should be configurable.
> > Of course for MT case some entity that would  guarantee proper ordering
> > for final packet processing would be needed.
> > It could be some eventdev, or SW FIFO queue, or something else.
> >
> >> And if we have event based solution, then the scheduler will be able to
> >> handle the load balancing accordingly.
> > Didn't understand that sentence.
> I mean the event device will be able to handle that which has an inbuilt
> scheduler in it for balancing the load of single SA,
> and if the queues are ordered and it support order restoration, then it
> will be able to maintain the ordering. And for that
> you would not have to bother about giving the same SA to different
> cryptodevs on multi cores.

If such event device will be available for the user, and it would be a user preference to use it -
that's fine.
In such case there is no need for MT support  just ST version of SA code could be used.
But I suppose such scheduler shouldn't be the only option.

> >>> That was the main reason to keep rte_security as it is right now and go ahead with new library.
> >>> One thing that worries me -  do we need a way to share SQN and replay window information
> >>> between rte_security and upper layer (rte_ipsec)?
> >>> If 'no', then ok, if 'yes' then probably we need to discuss how to do it now?
> >> anti-replay window size shall be a parameter in ipsec_xform, which shall
> >> be added.
> >> And the error notification
> >>    - in case of using crypto, then use rte_crypto_op_status
> >> - in case of inline cases, then use rte_eth_event callbacks.
> >> I don't see rte_ipsec needs to take care of that in your initial approach.
> >> However, if you plan to include session reset inside rte_ipsec, then you
> >> may need that inside the rte_ipsec.
> > I am not talking rte_ipsec, my concern here is rte_security.
> > Suppose you need to switch from device that can do inline_proto to the device that doesn't.
> In what use case you would need such switching?

As an example - device detach,  VM live migration, in some cases even changes in routing table.
As another example - limitations in HW offload supported.
Let say ixgbe doesn't support ip reassemble.

> > Right now the only way - renegotiate all SAs that were handled by inline_proto device
> > (because there is no way to retrieve from rte_security device SQN information).
> > Renegotiation should work, but it looks like quite expensive approach.
> This will be only for the first packet.

Sure, and now imagine you have 1M SAs on inline-proto device and sysadmin wants
to detach that device.
How long it would take to re-negotiate all of them?

> > If rte_security would have a way to share its SQN status with SW, then I think it would
> > be possible to do such switch without SA termination.
> what kind of SQN status you are looking for? overflow?

Nope, I am talking about last-seq and replay-window state:
https://tools.ietf.org/html/rfc4303#section-3.4.3

Konstantin

> If yes,
> application need to re-negotiate the session,
> which will be done periodically anyways.
> > Again with such info available - load-balancing for the same SA on multiple devices
> > might be possible.
> > Konstantin
> >
  
Ananyev, Konstantin Oct. 2, 2018, 11:56 p.m. UTC | #20
Hi Jerin,

> > > > > > >
> > > > > > > Anyway, let's pretend we found some smart way to distribute inbound packets for the same SA to multiple HW queues/CPU
> > > > > cores.
> > > > > > > To make ipsec processing for such case to work correctly just atomicity on check/update segn/replay_window is not enough.
> > > > > > > I think it would require some extra synchronization:
> > > > > > > make sure that we do final packet processing (seq check/update) at the same order as we received the packets
> > > > > > > (packets entered ipsec processing).
> > > > > > > I don't really like to introduce such heavy mechanisms on SA level,  after all it supposed to be light and simple.
> > > > > > > Though we plan CTX level API to support such scenario.
> > > > > > > What I think would be useful addition for SA level API - have an ability to do one update seqn/replay_window and multiple checks
> > > > > concurrently.
> > > > > > >
> > > > > > > > In case of ingress also, the same problem exists. We will not be able to use RSS and spread the traffic to multiple cores.
> > > > > Considering
> > > > > > > > IPsec being CPU intensive, this would limit the net output of the chip.
> > > > > > > That's true - but from other side implementation can offload heavy part
> > > > > > > (encrypt/decrypt, auth) to special HW (cryptodev).
> > > > > > > In that case single core might be enough for SA and extra synchronization would just slowdown things.
> > > > > > > That's why I think it should be configurable  what behavior (ST or MT) to use.
> > > > > > I do agree that these are the issues that we need to address to make the
> > > > > > library MT safe. Whether the extra synchronization would slow down things is
> > > > > > a very subjective question and will heavily depend on the platform. The
> > > > > > library should have enough provisions to be able to support MT without
> > > > > > causing overheads to ST. Right now, the library assumes ST.
> > > > >
> > > > >
> > > > > I agree with Anoob here.
> > > > >
> > > > > I have two concerns with librte_ipsec as a separate library
> > > > >
> > > > > 1) There is an overlap with rte_security and new proposed library.
> > > >
> > > > I don't think there really is an overlap.
> > >
> > > As mentioned in your other email. IMO, There is an overlap as
> > > RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL can support almost everything
> > > in HW or HW + SW if some PMD wishes to do so.
> > >
> > > Answering some of the questions, you have asked in other thread based on
> > > my understanding.
> > >
> > > Regarding RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL support,
> > > Marvell/Cavium CPT hardware on next generation HW(Planning to upstream
> > > around v19.02) can support RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and
> > > Anoob already pushed the application changes in ipsec-gw.
> >
> > Ok good to know.
> >
> > >
> > > In our understanding of HW/SW roles/responsibilities for that type of
> > > devices are:
> > >
> > > INLINE_PROTOCOL
> > > ----------------
> > > In control path, security session is created with the given SA and
> > > rte_flow configuration etc.
> > >
> > > For outbound traffic, the application will have to do SA lookup and
> > > identify the security action (inline/look aside crypto/protocol). For
> > > packets identified for inline protocol processing, the application would
> > > submit as plain packets to the ethernet device and the security capable
> > > ethernet device would perform IPSec and send out the packet. For PMDs
> > > which would need extra metadata (capability flag), set_pkt_metadata
> > > function pointer would be called (from application).
> > > This can be used to set some per packet field to identify the security session to be used to
> > > process the packet.
> >
> > Yes, as I can see, that's what ipsec-gw is doing right now and it wouldn't be
> > a problem to do the same in ipsec lib.
> >
> > > Sequence number updation will be done by the PMD.
> >
> > Ok, so for INLINE_PROTOCOL upper layer wouldn't need to keep track for SQN values at all?
> > You don’t' consider a possibility that by some reason that SA would need to
> > be moved from device that support INLINE_PROTOCOL to the device that doesn't?
> 
> For INLINE_PROTOCOL, the application won't have any control over such
> per packet fields. As for moving the SA to a different device, right now
> rte_security spec doesn't allow that. May be we should fix the spec to
> allow multiple devices to share the same security session. That way, if
> there is error in the inline processing, application will be able to
> submit the packet to LOOKASIDE_PROTOCOL crypto device (sharing the
> session) and get the packet processed.
> 

Yep, that my thought too.
If we want to support such scenarios with lookaside-proto and inline-proto devices,
then rte_security need to be changed/extended.

> 
> >
> > > For inbound traffic, the packets for IPSec would be identified by using
> > > rte_flow (hardware accelerated packet filtering). For the packets
> > > identified for inline offload (SECURITY action), hardware would perform
> > > the processing. For inline protocol processed IPSec packets, PMD would
> > > set “user data” so that application can get the details of the security
> > > processing done on the packet. Once the plain packet (after IPSec
> > > processing) is received, a selector check need to be performed to make
> > > sure we have a valid packet after IPSec processing. The user data is used
> > > for that. Anti-replay check is handled by the PMD. The PMD would raise
> > > an eth event in case of sequence number expiry or any SA expiry.
> >
> > Few questions here:
> > 1) if I understand things right - to specify that it was an IPsec packet -
> > PKT_RX_SEC_OFFLOAD will be set in mbuf ol_flags?
> > 2) Basically 'userdata' will contain just a user provided at rte_security_session_create pointer
> > (most likely pointer to the SA, as it is done right now in ipsec-secgw), correct?
> 
> Yes to 1 & 2.
> 
> 
> > 3) in current rte_security API si there a way to get/set replay window size, etc?
> 
> Not right now. But Akhil mentioned that it will be added soon.
> 
> 
> > 4)   Same question as for TX: you don't plan to support fallback to other type of devices/SW?
> > I.E. HW was not able to process ipsec packet by some reason (let say fragmented packet)
> > and now it is SW responsibility to do so?
> > The reason I am asking for that - it seems right now there is no defined way
> > to share SQN related information between HW/PMD and upper layer SW.
> > Is that ok, or would we need such capability?
> > If we would, and upper layer SW would need to keep track on SQN anyway,
> > then there is probably no point to do same thing in PMD itelf?
> > In that case PMD just need to provide SQN information to the upper layer
> > (probably one easy way to do it - reuse rte_,buf.seqn for that purpose,
> > though for that will probably need make it 64-bit long).
> 
> The spec doesn't allow doing IPsec partially on HW & SW. The way spec is
> written (and implemented in ipsec-secgw) to allow one kind of
> RTE_SECURITY_ACTION_TYPE for one SA. If HW is not able to process packet
> received on INLINE_PROTOCOL SA, then it is treated as error. Handling
> fragmentation is a very valid scenario. We will have to edit the spec if
> we need to handle this scenario.
> 
> >
> > >
> > >
> > > LOOKASIDE_PROTOCOL
> > > ------------------
> > > In control path, security session is created with the given SA.
> > >
> > > Enqueue/dequeue is similar to what is done for regular crypto
> > > (RTE_SECURITY_ACTION_TYPE_NONE) but all the protocol related processing
> > > would be offloaded. Application will need to do SA lookup and identify
> > > the processing to be done (both in case of outbound & inbound), and
> > > submit packet to crypto device. Application need not do any IPSec
> > > related transformations other than the lookup. Anti-replay need to be
> > > handled in the PMD (the spec says the device “may be handled” do anti-replay check,
> > > but a complete protocol offload would need anti-replay check also).
> >
> > Same question here - wouldn't there be a situations when HW/PMD would need to
> > share SQN information with upper layer?
> > Let say if upper layer SW would need to do load balancing between crypto-devices
> > with LOOKASIDE_PROTOCOL and without?
> 
> Same answer as above. ACTION is tied to security session which is tied
> to SA. SQN etc is internal to the session and so load balancing between
> crypto-devices is not supported.
> 
> >
> > >
> > >
> > > > rte_security is a 'framework for management and provisioning of security protocol operations offloaded to hardware based devices'.
> > > > While rte_ipsec is aimed to be a library for IPsec data-path processing.
> > > > There is no plans for rte_ipsec to 'obsolete' rte_security.
> > > > Quite opposite rte_ipsec supposed to work with both rte_cryptodev and rte_security APIs (devices).
> > > > It is possible to have an SA that would use both crypto and  security devices.
> > > > Or to have an SA that would use multiple crypto devs
> > > > (though right now it is up the user level to do load-balancing logic).
> > > >
> > > > > For IPsec, If an application needs to use rte_security for HW
> > > > > implementation and and application needs to use librte_ipsec for
> > > > >  SW implementation then it is bad and a lot duplication of work on
> > > > > he slow path too.
> > > >
> > > > The plan is that application would need to use just rte_ipsec API for all data-paths
> > > > (HW/SW, lookaside/inline).
> > > > Let say right now there is rte_ipsec_inline_process() function if user
> > > > prefers to use inline security device to process given group packets,
> > > > and rte_ipsec_crypto_process(/prepare) if user decides to use
> > > > lookaside security or simple crypto device for it.
> > > >
> > > > >
> > > > > The rte_security spec can support both inline and look-aside IPSec
> > > > > protocol support.
> > > >
> > > > AFAIK right now rte_security just provides API to create/free/manipulate security sessions.
> > > > I don't see how it can support all the functionality mentioned above,
> > > > plus SAD and SPD.
> > >
> > >
> > > At least for INLINE_PROTOCOL case SA lookup for inbound traffic does by
> > > HW.
> >
> > For inbound yes, for outbound I suppose you still would need to do a lookup in SW.
> 
> Yes
> 
> >
> > >
> > > >
> > > > >
> > > > > 2) This library is tuned for fat CPU core in mind like single SA on core
> > > > > etc. Which is fine for x86 servers and arm64 server category of machines
> > > > > but it does not work very well with NPU class of SoC or FPGA.
> > > > >
> > > > > As there  are the different ways to implement the IPSec, For instance,
> > > > > use of eventdev can help in situation for handling millions of SA and
> > > > > equence number of update and anti reply check can be done by leveraging
> > > > > some of the HW specific features like
> > > > > ORDERED, ATOMIC schedule type(mapped as eventdev feature)in HW with PIPELINE model.
> > > > >
> > > > > # Issues with having one SA one core,
> > > > > - In the outbound side, there could be multiple flows using the same SA.
> > > > >   Multiple flows could be processed parallel on different lcores,
> > > > > but tying one SA to one core would mean we won't be able to do that.
> > > > >
> > > > > - In the inbound side, we will have a fat flow hitting one core. If
> > > > >   IPsec library assumes single core, we will not be able to to spread
> > > > > fat flow to multiple cores. And one SA-one core would mean all ports on
> > > > > which we would expect IPsec traffic has to be handled by that core.
> > > >
> > > > I suppose that all refers to the discussion about MT safe API for rte_ipsec, right?
> > > > If so, then as I said in my reply to Anoob:
> > > > We will try to make API usable in MT environment for v1,
> > > > so you can review and provide comments at early stages.
> > >
> > > OK
> > >
> > > >
> > > > >
> > > > > I have made a simple presentation. This presentation details ONE WAY to
> > > > > implement the IPSec with HW support on NPU.
> > > > >
> > > > > https://docs.google.com/presentation/d/1e3IDf9R7ZQB8FN16Nvu7KINuLSWMdyKEw8_0H05rjj4/edit?usp=sharing
> > > > >
> > > >
> > > > Thanks, quite helpful.
> > > > Actually from page 3, it looks like your expectations don't contradict in general with proposed API:
> > > >
> > > > ...
> > > > } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
> > > >                         sa = ev.flow_queue_id;
> > > >                         /* do critical section work per sa */
> > > >                         do_critical_section_work(sa);
> > > >
> > > > [KA] that's the place where I expect either
> > > > rte_ipsec_inline_process(sa, ...); OR rte_ipsec_crypto_prepare(sa, ...);
> > > > would be called.
> > >
> > > Makes sense. But currently, the library defines what is
> > > rte_ipsec_inline_process() and rte_ipsec_crypto_prepare(), but it should
> > > be based on underneath security device or crypto device.
> >
> > Reason for that - their code-paths are quite different:
> > for inline devices we can do whole processing synchronously(within process() function),
> > while fro crypto it is sort of split into tw parts -
> > we first have to do prepare();enqueue() them to crypto-dev, and then dequeue();process().
> > Another good thing with that way - it allows the same SA to work with different devices.
> >
> > >
> > > So, IMO for better control, these functions should be the function pointer
> > > based and based on underlying device, library can fill the
> > > implementation.
> > >
> > > IMO, it is not possible to create "static inline function" with all "if"
> > > checks. I think, we can have four ipsec functions with function pointer
> > > scheme.
> > >
> > > rte_ipsec_inbound_prepare()
> > > rte_ipsec_inbound_process()
> > > rte_ipsec_outbound_prepare()
> > > rte_ipsec_outbound_process()
> > >
> > > Some of the other concerns:
> > > 1) For HW implementation, rte_ipsec_sa needs to opaque like rte_security
> > > as some of the structure defined by HW or Microcode. We can choose
> > > absolute generic items as common and device/rte_security specific can be opaque.
> >
> > I don't think it would be a problem, rte_ipsec_sa  does contain a pointer to
> > rte_security_session, so it can provide it as an argument to these functions.
> 
> The rte_ipsec_sa would need some private space for application to store
> it's metadata. There can be SA implementations with additional fields
> for faster lookups. To rephrase, the application should be given some
> provision to store some metadata it would need for faster lookups.
> may sa_init API can give amount private size required.
> 
> 
> >
> > >
> > > 2)I think, in order to accommodate the event drivern model. We need to pass
> > > void ** in prepare() and process() function with an additional argument
> > > of type(TYPE_EVENT/TYPE_MBUF) can be passed to detect packet object
> > > type as some of the functions in prepare() and process() may need
> > > rte_event to operate on.
> >
> > You are talking here about security device specific functions described below, correct?
> >
> > >
> > > >
> > > >                      /* Issue the crypto request and generate the following on crypto work completion */
> > > > [KA] that's the place where I expect rte_ipsec_crypto_process(...) be invoked.
> > > >
> > > >                         ev.flow_queue_id = tx_port;
> > > >                         ev.sub_event_id = tx_queue_id;
> > > >                         ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
> > > >                         rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);
> > > >                 }
> > > >
> > > >
> > > > > I am not saying this should be the ONLY way to do as it does not work
> > > > > very well with non NPU/FPGA class of SoC.
> > > > >
> > > > > So how about making the proposed IPSec library as plugin/driver to
> > > > > rte_security.
> > > >
> > > > As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security
> > > > is the best possible approach.
> > > > Though I probably understand your concern:
> > > > In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.),
> > > > i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are
> covered.
> > > > Though there are devices where most of prepare/process can be done in HW
> > > > (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL),
> > > > plus in future could be devices where prepare/process would be split between HW/SW in a custom way.
> > > > Is that so?
> > > > To address that issue I suppose we can do:
> > > > 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> > > >     security devices into ipsec.
> > > >     We planned to do it anyway, just don't have it done yet.
> > > > 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and
> RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM
> > >
> > > The problem is, CUSTOM may have different variants and "if" conditions won't
> > > scale if we choose to have non function pointer scheme. Otherwise, it
> > > looks OK to create new SECURITY TYPE and associated plugin for prepare() and process()
> > > function in librte_ipsec library.
> >
> > In principle, I don't mind to always use function pointers for prepare()/process(), but:
> > from your description above of INLINE_PROTOCOL and LOOKASIDE_PROTOCOL
> > the process()/prepare() for such devices looks well defined and
> > straightforward to implement.
> > Not sure we'll need a function pointer for such simple and lightweight case:
> > set/check ol_flags, set/read userdata value.
> > I think extra function call here is kind of overkill and will only slowdown things.
> > But if that would be majority preference - I wouldn't argue.
> > BTW if we'll agree to always use function pointers for process/prepare,
> > then there is no point to have that all existing action types -
> > all we need is an indication is it inline or lookaside device and
> > function pointers for prepare/process().
> 
> Me too not a fan of function pointer scheme. But options are limited.
> 
> Though the generic usage seems straightforward, the implementation of
> the above modes can be very different. Vendors could optimize various
> operations (SQN update for example) for better performance on their
> hardware. Sticking to one approach would negate that advantage.
> 
> Another option would be to use multiple-worker model that Anoob had
> proposed some time back.
> https://mails.dpdk.org/archives/dev/2018-June/103808.html
> 
> Idea would be to make all lib_ipsec functions added as static inline
> functions.
> 
> static inline rte_ipsec_add_tunnel_hdr(struct rte_mbuf *mbuf);
> static inline rte_ipsec_update_sqn(struct rte_mbuf *mbuf, &seq_no);
> ...
> 
> For the regular use case, a fat
> rte_ipsec_(inbound/outbound)_(prepare/process) can be provided. The
> worker implemented for that case can directly call the function and
> forget about the other modes. For other vendors with varying
> capabilities, there can be multiple workers taking advantage of the hw
> features. For such workers, the static inline functions can be used as
> required. This gives vendors opportunity to pick and choose what they
> want from the ipsec lib. The worker to be used for that case will be
> determined based on the capabilities exposed by the PMDs.
> 
> https://mails.dpdk.org/archives/dev/2018-June/103828.html
> 
> The above email explains how multiple workers can be used with l2fwd.
> 
> For this to work, the application & library code need to be modularised.
> Like what is being done in the following series,
> https://mails.dpdk.org/archives/dev/2018-June/103786.html
> 
> This way one application can be made to run on multiple platforms, with
> the app being optimized for the platform on which it would run.
> 
> /* ST SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - NO EVENTDEV*/
> worker1()
> {
>      while(true) {
>          nb_pkts = rte_eth_rx_burst();
> 
>          if (nb_pkts != 0) {
>              /* Do lookup */
>              rte_ipsec_inbound_prepare();
>              rte_cryptodev_enqueue_burst();
>              /* Update in-flight */
>          }
> 
>          if (in_flight) {
>              rte_cryptodev_dequeue_burst();
>              rte_ipsec_outbound_process();
>          }
>          /* route packet */
> }
> 
> #include <rte_ipsec.h>   /* For IPsec lib static inlines */
> 
> static inline rte_event_enqueue(struct rte_event *ev)
> {
>      ...
> }
> 
> /* MT safe SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - EVENTDEV)
> worker2()
> {
>      while(true) {
>          nb_pkts = rte_eth_rx_burst();
> 
>          if (nb_pkts != 0) {
>              /* Do lookup */
>             rte_ipsec_add tunnel(ev->mbuf);
>             rte_event_enqueue(ev)
>             rte_cryptodev_enqueue_burst(ev->mbuf);
>              /* Update in-flight */
>          }
> 
>          if (in_flight) {
>              rte_cryptodev_dequeue_burst();
>              rte_ipsec_outbound_process();
>          }
>          /* route packet */
> }

Hmm, not sure how these 2 cases really differs in terms of ipsec processing.
I do understand the in second one we use events to propagate packets through the system,
and that eventdev might be smart enough to preserve packet ordering, etc.
But in terms of ipsec processing we have to do exactly the same for both cases.
Let say for the example above (outbound, crytpodev):
a) lookup an SA
b) increment SA.SQN and check for overflow
d) generate IV
e) generate & fill ESP header/trailer, tunnel header
f) perform actual encrypt, generate digest

So crypto_prepare() - deals with b)-e).
f) is handled by cryptodev. 
Yes, step b) might need to be atomic, or might not -
depends on particular application design.
But in both cases (polling/eventdev) we do need all these steps to be performed.
Konstantin

> 
> In short,
> 
> 1) Have separate small inline functions in library
> 2) If something can be grouped, it can be exposed a specific function
> to address a specific usecases
> 3) Let remaining code, can go in application as different worker() to
> address all the usecases.
> 
> >
> > Konstantin
> >
> > >
> > >
> > > >     and add into rte_security_ops   new functions:
> > > >     uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> num);
> > > >     uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t
> num);
> > > >     uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num);
> > > >     So for custom HW, PMD can overwrite normal prepare/process behavior.
> > > >
> > > > > This would give flexibly for each vendor/platform choose to different
> > > > > IPse implementation based on HW support WITHOUT CHANGING THE APPLICATION
> > > > > INTERFACE.
> > > >
> > > > Not sure what API changes you are referring to?
> > > > As I am aware we do introduce new API, but all existing APIs remain in place.
> > >
> > >
> > > What I meant was, Single application programming interface to enable IPSec processing to
> > > application.
> > >
> > >
> > > >
> > > > >
> > > > > IMO, rte_security IPsec look aside support can be simply added by
> > > > > creating the virtual crypto device(i.e move the proposed code to the virtual crypto device)
> > > > > likewise inline support
> > > > > can be added by the virtual ethdev device.
> > > >
> > > > That's probably possible and if someone would like to introduce such abstraction - NP in general
> > > > (though my suspicion - it might be too heavy to be really useful).
> > > > Though I don't think it should be the only possible way for the user to enable IPsec data-processing inside his app.
> > > > Again I guess such virtual-dev will still use rte_ipsec inside.
> > >
> > > I don't have strong opinion on virtual devices VS function pointer based
> > > prepare() and process() function in librte_ipsec library.
> > >
> > > >
> > > > > This would avoid the need for
> > > > > updating ipsec-gw application as well i.e unified interface to application.
> > > >
> > > > I think - it would  really good to simplify existing ipsec-secgw sample app.
> > > > Some parts of it seems unnecessary complex to me.
> > > > One of the reasons for it -  we don't really have an unified (and transparent) API for ipsec data-path.
> > > > Let's look at ipsec_enqueue() and related code (examples/ipsec-secgw/ipsec.c:365)
> > > > It is huge (and ugly) -  user has to handle dozen different cases just to enqueue packet for IPsec processing.
> > > > One of the aims of rte_ipsec library - hide all that complexities inside the library and provide to
> > > > the upper layer clean and transparent API.
> > > >
> > > > >
> > > > > If you don't like the above idea, any scheme of plugin based
> > > > > implementation would be fine so that vendor or platform can choose its own implementation.
> > > > > It can be based on partial HW implement too. i.e SA look can be used in SW, remaining stuff in HW
> > > > > (for example IPsec inline case)
> > > >
> > > > I am surely ok with the idea to give vendors an ability to customize implementation
> > > > and enable their HW capabilities.
> > >
> > > I think, We are on the same page, just that the fine details of "framework"
> > > for customizing implementation based on their HW capabilities need to
> > > iron out.
> > >
> > > > Do you think proposed additions to the rte_security would be  enough,
> > > > or something extra is needed?
> > >
> > > See above.
> > >
> > > Jerin
> > >
> > > >
> > > > Konstantin
> > > >
> > > >
> > > > >
> > > > > # For protocols like UDP, it makes sense to create librte_udp as there
> > > > > no much HW specific offload other than ethdev provides.
> > > > >
> > > > > # PDCP could be another library to offload to HW, So talking
> > > > > rte_security path makes more sense in that case too.
> > > > >
> > > > > Jerin
  
Jerin Jacob Oct. 3, 2018, 9:37 a.m. UTC | #21
-----Original Message-----
> Date: Tue, 2 Oct 2018 23:56:23 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "Joseph, Anoob" <Anoob.Joseph@caviumnetworks.com>, "dev@dpdk.org"
>  <dev@dpdk.org>, "Awal, Mohammad Abdul" <mohammad.abdul.awal@intel.com>,
>  "Doherty, Declan" <declan.doherty@intel.com>, Narayana Prasad
>  <narayanaprasad.athreya@caviumnetworks.com>, "akhil.goyal@nxp.com"
>  <akhil.goyal@nxp.com>, "hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
>  "shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
> Subject: RE: [dpdk-dev] [RFC] ipsec: new library for IPsec data-path
>  processing
> 
> External Email
> 
> Hi Jerin,

Hi Konstantin,

> 
> > static inline rte_ipsec_add_tunnel_hdr(struct rte_mbuf *mbuf);
> > static inline rte_ipsec_update_sqn(struct rte_mbuf *mbuf, &seq_no);
> > ...
> >
> > For the regular use case, a fat
> > rte_ipsec_(inbound/outbound)_(prepare/process) can be provided. The
> > worker implemented for that case can directly call the function and
> > forget about the other modes. For other vendors with varying
> > capabilities, there can be multiple workers taking advantage of the hw
> > features. For such workers, the static inline functions can be used as
> > required. This gives vendors opportunity to pick and choose what they
> > want from the ipsec lib. The worker to be used for that case will be
> > determined based on the capabilities exposed by the PMDs.
> >
> > https://mails.dpdk.org/archives/dev/2018-June/103828.html
> >
> > The above email explains how multiple workers can be used with l2fwd.
> >
> > For this to work, the application & library code need to be modularised.
> > Like what is being done in the following series,
> > https://mails.dpdk.org/archives/dev/2018-June/103786.html
> >
> > This way one application can be made to run on multiple platforms, with
> > the app being optimized for the platform on which it would run.
> >
> > /* ST SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - NO EVENTDEV*/
> > worker1()
> > {
> >      while(true) {
> >          nb_pkts = rte_eth_rx_burst();
> >
> >          if (nb_pkts != 0) {
> >              /* Do lookup */
> >              rte_ipsec_inbound_prepare();
> >              rte_cryptodev_enqueue_burst();
> >              /* Update in-flight */
> >          }
> >
> >          if (in_flight) {
> >              rte_cryptodev_dequeue_burst();
> >              rte_ipsec_outbound_process();
> >          }
> >          /* route packet */
> > }
> >
> > #include <rte_ipsec.h>   /* For IPsec lib static inlines */
> >
> > static inline rte_event_enqueue(struct rte_event *ev)
> > {
> >      ...
> > }
> >
> > /* MT safe SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - EVENTDEV)
> > worker2()
> > {
> >      while(true) {
> >          nb_pkts = rte_eth_rx_burst();
> >
> >          if (nb_pkts != 0) {
> >              /* Do lookup */
> >             rte_ipsec_add tunnel(ev->mbuf);
> >             rte_event_enqueue(ev)
> >             rte_cryptodev_enqueue_burst(ev->mbuf);
> >              /* Update in-flight */
> >          }
> >
> >          if (in_flight) {
> >              rte_cryptodev_dequeue_burst();
> >              rte_ipsec_outbound_process();
> >          }
> >          /* route packet */
> > }
> 
> Hmm, not sure how these 2 cases really differs in terms of ipsec processing.
> I do understand the in second one we use events to propagate packets through the system,
> and that eventdev might be smart enough to preserve packet ordering, etc.
> But in terms of ipsec processing we have to do exactly the same for both cases.
> Let say for the example above (outbound, crytpodev):
> a) lookup an SA
> b) increment SA.SQN and check for overflow
> d) generate IV
> e) generate & fill ESP header/trailer, tunnel header
> f) perform actual encrypt, generate digest
> 
> So crypto_prepare() - deals with b)-e).
> f) is handled by cryptodev.
> Yes, step b) might need to be atomic, or might not -
> depends on particular application design.
> But in both cases (polling/eventdev) we do need all these steps to be performed.

The real question, Is the new library should be aware of eventdev or
application decides it?

If it is former, in order to complete step (b), we need rte_event also passed to
_process() API and process() API needs to be function pointer in order to
accommodate all combinations of different HW/SW capabilities.



> Konstantin
> 
> >
> > In short,
> >
> > 1) Have separate small inline functions in library
> > 2) If something can be grouped, it can be exposed a specific function
> > to address a specific usecases
> > 3) Let remaining code, can go in application as different worker() to
> > address all the usecases.
> >
> > >
  
Ananyev, Konstantin Oct. 9, 2018, 6:24 p.m. UTC | #22
Hi Jerin,

> >
> > > static inline rte_ipsec_add_tunnel_hdr(struct rte_mbuf *mbuf);
> > > static inline rte_ipsec_update_sqn(struct rte_mbuf *mbuf, &seq_no);
> > > ...
> > >
> > > For the regular use case, a fat
> > > rte_ipsec_(inbound/outbound)_(prepare/process) can be provided. The
> > > worker implemented for that case can directly call the function and
> > > forget about the other modes. For other vendors with varying
> > > capabilities, there can be multiple workers taking advantage of the hw
> > > features. For such workers, the static inline functions can be used as
> > > required. This gives vendors opportunity to pick and choose what they
> > > want from the ipsec lib. The worker to be used for that case will be
> > > determined based on the capabilities exposed by the PMDs.
> > >
> > > https://mails.dpdk.org/archives/dev/2018-June/103828.html
> > >
> > > The above email explains how multiple workers can be used with l2fwd.
> > >
> > > For this to work, the application & library code need to be modularised.
> > > Like what is being done in the following series,
> > > https://mails.dpdk.org/archives/dev/2018-June/103786.html
> > >
> > > This way one application can be made to run on multiple platforms, with
> > > the app being optimized for the platform on which it would run.
> > >
> > > /* ST SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - NO EVENTDEV*/
> > > worker1()
> > > {
> > >      while(true) {
> > >          nb_pkts = rte_eth_rx_burst();
> > >
> > >          if (nb_pkts != 0) {
> > >              /* Do lookup */
> > >              rte_ipsec_inbound_prepare();
> > >              rte_cryptodev_enqueue_burst();
> > >              /* Update in-flight */
> > >          }
> > >
> > >          if (in_flight) {
> > >              rte_cryptodev_dequeue_burst();
> > >              rte_ipsec_outbound_process();
> > >          }
> > >          /* route packet */
> > > }
> > >
> > > #include <rte_ipsec.h>   /* For IPsec lib static inlines */
> > >
> > > static inline rte_event_enqueue(struct rte_event *ev)
> > > {
> > >      ...
> > > }
> > >
> > > /* MT safe SA - RTE_SECURITY_ACTION_TYPE_NONE - CRYPTODEV - EVENTDEV)
> > > worker2()
> > > {
> > >      while(true) {
> > >          nb_pkts = rte_eth_rx_burst();
> > >
> > >          if (nb_pkts != 0) {
> > >              /* Do lookup */
> > >             rte_ipsec_add tunnel(ev->mbuf);
> > >             rte_event_enqueue(ev)
> > >             rte_cryptodev_enqueue_burst(ev->mbuf);
> > >              /* Update in-flight */
> > >          }
> > >
> > >          if (in_flight) {
> > >              rte_cryptodev_dequeue_burst();
> > >              rte_ipsec_outbound_process();
> > >          }
> > >          /* route packet */
> > > }
> >
> > Hmm, not sure how these 2 cases really differs in terms of ipsec processing.
> > I do understand the in second one we use events to propagate packets through the system,
> > and that eventdev might be smart enough to preserve packet ordering, etc.
> > But in terms of ipsec processing we have to do exactly the same for both cases.
> > Let say for the example above (outbound, crytpodev):
> > a) lookup an SA
> > b) increment SA.SQN and check for overflow
> > d) generate IV
> > e) generate & fill ESP header/trailer, tunnel header
> > f) perform actual encrypt, generate digest
> >
> > So crypto_prepare() - deals with b)-e).
> > f) is handled by cryptodev.
> > Yes, step b) might need to be atomic, or might not -
> > depends on particular application design.
> > But in both cases (polling/eventdev) we do need all these steps to be performed.
> 
> The real question, Is the new library should be aware of eventdev or
> application decides it?

My thought right now - it shouldn't.
Looking at rte_event_crypto_adapter - right now it accepts crypto-ops as input
for both new and forward modes.
Which means that prepare() has to called by the app before doing enqueue
(either straight to cryptodev or to eventdev).
Anyway I just sumitted RFC v2 with process/prepare as function pointers
inside ipsec_session, please have a look.
Konstantin


> 
> If it is former, in order to complete step (b), we need rte_event also passed to
> _process() API and process() API needs to be function pointer in order to
> accommodate all combinations of different HW/SW capabilities.
> 
>
  

Patch

==================

API described below operates on SA level.
It provides functionality that allows user for given SA to process
inbound and outbound IPsec packets.
To be more specific:
- for inbound ESP/AH packets perform decryption, authentication,
  integrity checking, remove ESP/AH related headers
- for outbound packets perform payload encryption, attach ICV,
  update/add IP headers, add ESP/AH headers/trailers,
  setup related mbuf felids (ol_flags, tx_offloads, etc.).
- initialize/un-initialize given SA based on user provided parameters.

Processed inbound/outbound packets could be grouped by user provided
flow id (opaque 64-bit number associated by user with given SA).

SA-level API is based on top of crypto-dev/security API and relies on them
to perform actual cipher and integrity checking.
Due to the nature of crypto-dev API (enqueue/deque model) we use
asynchronous API for IPsec packets destined to be processed
by crypto-device:
rte_ipsec_crypto_prepare()->rte_cryptodev_enqueue_burst()->
rte_cryptodev_dequeue_burst()->rte_ipsec_crypto_process().
Though for packets destined for inline processing no extra overhead
is required and simple and synchronous API: rte_ipsec_inline_process()
is introduced for that case.

The following functionality:
  - match inbound/outbound packets to particular SA
  - manage crypto/security devices
  - provide SAD/SPD related functionality
  - determine what crypto/security device has to be used
    for given packet(s)
is out of scope for SA-level API.

Below is the brief (and simplified) overview of expected SA-level
API usage.

/* allocate and initialize SA */
size_t sz = rte_ipsec_sa_size();
struct rte_ipsec_sa *sa = rte_malloc(sz);
struct rte_ipsec_sa_prm prm;
/* fill prm */
rc = rte_ipsec_sa_init(sa, &prm);
if (rc != 0) { /*handle error */}
.....

/* process inbound/outbound IPsec packets that belongs to given SA */

/* inline IPsec processing was done for these packets */
if (use_inline_ipsec)
       n = rte_ipsec_inline_process(sa, pkts, nb_pkts);
/* use crypto-device to process the packets */
else {
     struct rte_crypto_op *cop[nb_pkts];
     struct rte_ipsec_group grp[nb_pkts];

      ....
     /* prepare crypto ops */
     n = rte_ipsec_crypto_prepare(sa, pkts, cops, nb_pkts);
     /* enqueue crypto ops to related crypto-dev */
     n =  rte_cryptodev_enqueue_burst(..., cops, n);
     if (n != nb_pkts) { /*handle failed packets */}
     /* dequeue finished crypto ops from related crypto-dev */
     n = rte_cryptodev_dequeue_burst(..., cops, nb_pkts);
     /* finish IPsec processing for associated packets */
     n = rte_ipsec_crypto_process(cop, pkts, grp, n);
     /* now we have <n> group of packets grouped by SA flow id  */
    ....
 }   
...

/* uninit given SA */
rte_ipsec_sa_fini(sa);

Planned scope for 18.11:
========================

- SA-level API definition
- ESP tunnel mode support (both IPv4/IPv6)
- Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL.
- UT
 
Note: Still WIP, so not all planned for 18.11 functionality is in place.

Post 18.11:
===========
- ESP transport mode support (both IPv4/IPv6)
- update examples/ipsec-secgw to use librte_ipsec
- SAD and high-level API definition and implementation


Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                     |   5 +
 lib/Makefile                           |   2 +
 lib/librte_ipsec/Makefile              |  24 +
 lib/librte_ipsec/meson.build           |  10 +
 lib/librte_ipsec/pad.h                 |  45 ++
 lib/librte_ipsec/rte_ipsec.h           | 245 +++++++++
 lib/librte_ipsec/rte_ipsec_version.map |  13 +
 lib/librte_ipsec/sa.c                  | 921 +++++++++++++++++++++++++++++++++
 lib/librte_net/rte_esp.h               |  10 +-
 lib/meson.build                        |   2 +
 mk/rte.app.mk                          |   2 +
 11 files changed, 1278 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ipsec/Makefile
 create mode 100644 lib/librte_ipsec/meson.build
 create mode 100644 lib/librte_ipsec/pad.h
 create mode 100644 lib/librte_ipsec/rte_ipsec.h
 create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
 create mode 100644 lib/librte_ipsec/sa.c

diff --git a/config/common_base b/config/common_base
index 4bcbaf923..c95602c05 100644
--- a/config/common_base
+++ b/config/common_base
@@ -879,6 +879,11 @@  CONFIG_RTE_LIBRTE_BPF=y
 CONFIG_RTE_LIBRTE_BPF_ELF=n
 
 #
+# Compile librte_ipsec
+#
+CONFIG_RTE_LIBRTE_IPSEC=y
+
+#
 # Compile the test application
 #
 CONFIG_RTE_APP_TEST=y
diff --git a/lib/Makefile b/lib/Makefile
index afa604e20..58998dedd 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -105,6 +105,8 @@  DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ethdev librte_net
 DEPDIRS-librte_gso += librte_mempool
 DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
 DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ethdev
+DIRS-$(CONFIG_RTE_LIBRTE_IPSEC) += librte_ipsec
+DEPDIRS-librte_ipsec := librte_eal librte_mbuf librte_cryptodev librte_security
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_ipsec/Makefile b/lib/librte_ipsec/Makefile
new file mode 100644
index 000000000..15441cf41
--- /dev/null
+++ b/lib/librte_ipsec/Makefile
@@ -0,0 +1,24 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_ipsec.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_cryptodev -lrte_security
+
+EXPORT_MAP := rte_ipsec_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_IPSEC) += sa.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_IPSEC)-include += rte_ipsec.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ipsec/meson.build b/lib/librte_ipsec/meson.build
new file mode 100644
index 000000000..79c55a8be
--- /dev/null
+++ b/lib/librte_ipsec/meson.build
@@ -0,0 +1,10 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+
+sources=files('sa.c')
+
+install_headers = files('rte_ipsec.h')
+
+deps += ['mbuf', 'net', 'cryptodev', 'security']
diff --git a/lib/librte_ipsec/pad.h b/lib/librte_ipsec/pad.h
new file mode 100644
index 000000000..2f5ccd00e
--- /dev/null
+++ b/lib/librte_ipsec/pad.h
@@ -0,0 +1,45 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _PAD_H_
+#define _PAD_H_
+
+#define IPSEC_MAX_PAD_SIZE	UINT8_MAX
+
+static const uint8_t esp_pad_bytes[IPSEC_MAX_PAD_SIZE] = {
+	1, 2, 3, 4, 5, 6, 7, 8,
+	9, 10, 11, 12, 13, 14, 15, 16,
+	17, 18, 19, 20, 21, 22, 23, 24,
+	25, 26, 27, 28, 29, 30, 31, 32,
+	33, 34, 35, 36, 37, 38, 39, 40,
+	41, 42, 43, 44, 45, 46, 47, 48,
+	49, 50, 51, 52, 53, 54, 55, 56,
+	57, 58, 59, 60, 61, 62, 63, 64,
+	65, 66, 67, 68, 69, 70, 71, 72,
+	73, 74, 75, 76, 77, 78, 79, 80,
+	81, 82, 83, 84, 85, 86, 87, 88,
+	89, 90, 91, 92, 93, 94, 95, 96,
+	97, 98, 99, 100, 101, 102, 103, 104,
+	105, 106, 107, 108, 109, 110, 111, 112,
+	113, 114, 115, 116, 117, 118, 119, 120,
+	121, 122, 123, 124, 125, 126, 127, 128,
+	129, 130, 131, 132, 133, 134, 135, 136,
+	137, 138, 139, 140, 141, 142, 143, 144,
+	145, 146, 147, 148, 149, 150, 151, 152,
+	153, 154, 155, 156, 157, 158, 159, 160,
+	161, 162, 163, 164, 165, 166, 167, 168,
+	169, 170, 171, 172, 173, 174, 175, 176,
+	177, 178, 179, 180, 181, 182, 183, 184,
+	185, 186, 187, 188, 189, 190, 191, 192,
+	193, 194, 195, 196, 197, 198, 199, 200,
+	201, 202, 203, 204, 205, 206, 207, 208,
+	209, 210, 211, 212, 213, 214, 215, 216,
+	217, 218, 219, 220, 221, 222, 223, 224,
+	225, 226, 227, 228, 229, 230, 231, 232,
+	233, 234, 235, 236, 237, 238, 239, 240,
+	241, 242, 243, 244, 245, 246, 247, 248,
+	249, 250, 251, 252, 253, 254, 255,
+};
+
+#endif /* _PAD_H_ */
diff --git a/lib/librte_ipsec/rte_ipsec.h b/lib/librte_ipsec/rte_ipsec.h
new file mode 100644
index 000000000..d1154eede
--- /dev/null
+++ b/lib/librte_ipsec/rte_ipsec.h
@@ -0,0 +1,245 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_IPSEC_H_
+#define _RTE_IPSEC_H_
+
+/**
+ * @file rte_ipsec.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * RTE IPsec support.
+ * librte_ipsec provides a framework for data-path IPsec protocol
+ * processing (ESP/AH).
+ * IKEv2 protocol support right now is out of scope of that draft.
+ * Though it tries to define related API in such way, that it could be adopted
+ * by IKEv2 implementation.
+ */
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_security.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * An opaque structure to represent Security Association (SA).
+ */
+struct rte_ipsec_sa;
+
+/**
+ * SA initialization parameters.
+ */
+struct rte_ipsec_sa_prm {
+
+	uint64_t flowid; /**< provided and interpreted by user */
+	struct rte_security_ipsec_xform ipsec_xform; /**< SA configuration */
+	union {
+		struct {
+			uint8_t hdr_len;     /**< tunnel header len */
+			uint8_t hdr_l3_off;  /**< offset for IPv4/IPv6 header */
+			uint8_t next_proto;  /**< next header protocol */
+			const void *hdr;     /**< tunnel header template */
+		} tun; /**< tunnel mode repated parameters */
+		struct {
+			uint8_t proto;  /**< next header protocol */
+		} trs; /**< transport mode repated parameters */
+	};
+
+	struct {
+		enum rte_security_session_action_type type;
+		struct rte_security_ctx *sctx;
+		struct rte_security_session *sses;
+		uint32_t ol_flags;
+	} sec; /**< rte_security related parameters */
+
+	struct {
+		struct rte_crypto_sym_xform *xform;
+		struct rte_mempool *pool;
+		/**<pool for rte_cryptodev_sym_session */
+		const uint8_t *devid;
+		/**<array of cryptodevs that can be used byt that SA */
+		uint32_t nb_dev; /**< number of elements in devid[] */
+	} crypto; /**< rte_cryptodev related parameters */
+};
+
+/**
+ * SA type is an 64-bit value that contain the following information:
+ * - IP version (IPv4/IPv6)
+ * - IPsec proto (ESP/AH)
+ * - inbound/outbound
+ * - mode (TRANSPORT/TUNNEL)
+ * - for TUNNEL outer IP version (IPv4/IPv6)
+ * - AUTH/CRYPT/AEAD algorithm
+ * ...
+ */
+
+enum {
+	RTE_SATP_LOG_IPV,
+	RTE_SATP_LOG_PROTO,
+	RTE_SATP_LOG_DIR,
+	RTE_SATP_LOG_MODE,
+	RTE_SATP_LOG_USE = RTE_SATP_LOG_MODE + 2,
+	RTE_SATP_LOG_NUM
+};
+
+#define RTE_IPSEC_SATP_IPV_MASK		(1ULL << RTE_SATP_LOG_IPV)
+#define RTE_IPSEC_SATP_IPV4		(0ULL << RTE_SATP_LOG_IPV)
+#define RTE_IPSEC_SATP_IPV6		(1ULL << RTE_SATP_LOG_IPV)
+
+#define RTE_IPSEC_SATP_PROTO_MASK	(1ULL << RTE_SATP_LOG_PROTO)
+#define RTE_IPSEC_SATP_PROTO_AH		(0ULL << RTE_SATP_LOG_PROTO)
+#define RTE_IPSEC_SATP_PROTO_ESP	(1ULL << RTE_SATP_LOG_PROTO)
+
+#define RTE_IPSEC_SATP_DIR_MASK		(1ULL << RTE_SATP_LOG_DIR)
+#define RTE_IPSEC_SATP_DIR_IB		(0ULL << RTE_SATP_LOG_DIR)
+#define RTE_IPSEC_SATP_DIR_OB		(1ULL << RTE_SATP_LOG_DIR)
+
+#define RTE_IPSEC_SATP_MODE_MASK	(3ULL << RTE_SATP_LOG_MODE)
+#define RTE_IPSEC_SATP_MODE_TRANS	(0ULL << RTE_SATP_LOG_MODE)
+#define RTE_IPSEC_SATP_MODE_TUNLV4	(1ULL << RTE_SATP_LOG_MODE)
+#define RTE_IPSEC_SATP_MODE_TUNLV6	(2ULL << RTE_SATP_LOG_MODE)
+
+#define RTE_IPSEC_SATP_USE_MASK		(1ULL << RTE_SATP_LOG_USE)
+#define RTE_IPSEC_SATP_USE_LKSD		(0ULL << RTE_SATP_LOG_USE)
+#define RTE_IPSEC_SATP_USE_INLN		(1ULL << RTE_SATP_LOG_USE)
+
+/**
+ * get type of given SA
+ * @return
+ *   SA type value.
+ */
+uint64_t __rte_experimental
+rte_ipsec_sa_type(const struct rte_ipsec_sa *sa);
+
+/**
+ * initialise SA base on provided input parameters.
+ * @param sa
+ *   SA object to initialise.
+ * @param prm
+ *   Parameters used to initialise given SA object.
+ * @return
+ *   - Zero if operation completed successfully.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int __rte_experimental
+rte_ipsec_sa_init(struct rte_ipsec_sa *sa, const struct rte_ipsec_sa_prm *prm);
+
+/**
+ * cleanup SA
+ * @param sa
+ *   Pointer to SA object to de-initialize.
+ */
+void __rte_experimental
+rte_ipsec_sa_fini(struct rte_ipsec_sa *sa);
+
+/**
+ * get SA size
+ * @return
+ *   size required for rte_ipsec_sa instance.
+ */
+size_t __rte_experimental
+rte_ipsec_sa_size(void);
+
+
+/**
+ * Used to group mbufs by flowid, sa, etc.
+ * See below for particular usages.
+ */
+struct rte_ipsec_group {
+	union {
+		uint64_t flowid;
+		struct rte_ipsec_sa *sa;
+	}; /**< grouped by value */
+	struct rte_mbuf **m;  /**< start of the group */
+	uint32_t cnt;         /**< number of entries in the group */
+	int32_t rc;           /**< status code associated with the group */
+};
+
+/*
+ * For input mbufs and given SA prepare crypto ops that can be enqueued
+ * into the cryptodev associated with given session.
+ * expects that for each input packet:
+ *      - l2_len, l3_len are setup correctly
+ * @param sa
+ *   Pointer to SA object the packets belong to.
+ * @param mb
+ *   The address of an array of *num* pointers to *rte_mbuf* structures
+ *   which contain the input packets.
+ * @param cop
+ *   The address of an array of *num* pointers to the output *rte_crypto_op*
+ *   structures.
+ * @param num
+ *   The maximum number of packets to process.
+ * @return
+ *   Number of successfully processed packets, with error code set in rte_errno.
+ */
+uint16_t __rte_experimental
+rte_ipsec_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	struct rte_crypto_op *cop[], uint16_t num);
+
+/*
+ * Process dequeued from crypto-dev crypto ops, apply necessary
+ * changes to related mbufs and group them by user-defined *flowid*.
+ * Output mbufs will be:
+ * inbound - decrypted & authenticated, ESP(AH) related headers removed,
+ * *l2_len* and *l3_len* fields updated.
+ * outbound - encrypted, ICV attached, IP headers updated,
+ * ESP/AH fields added, related mbuf fields (ol_flags, tx_offloads, etc.)
+ * properly setup.
+ * @param cop
+ *   The address of an array of *num* pointers to the input *rte_crypto_op*
+ *   structures.
+ * @param mb
+ *   The address of an array of *num* pointers to output *rte_mbuf* structures.
+ * @param grp
+ *   The address of an array of *num* to output *rte_ipsec_group* structures.
+ * @param num
+ *   The maximum number of crypto-ops to process.
+ * @return
+ *   Number of filled elements in *grp* array, or if *grp* is NULL, then
+ *   number of filled elements in *mb* array.
+ * Note: input crypto_ops can represent mbufs that belong to different SAs.
+ * So grp parameter allows to return mbufs grouped based on user defined
+ * *flowid*.
+ * If user doesn't want any grouping to be perfomed, he can set grp to NULL.
+ */
+uint16_t __rte_experimental
+rte_ipsec_crypto_process(const struct rte_crypto_op *cop[],
+	struct rte_mbuf *mb[], struct rte_ipsec_group grp[], uint16_t num);
+
+/*
+ * Process packets that are subjects to inline IPsec offload.
+ * It is up to the caller to figure out does given SA and input packets
+ * are eligible to perform inline IPsec.
+ * expects that for each input packet:
+ *      - l2_len, l3_len are setup correctly
+ * Output mbufs will be:
+ * inbound - decrypted & authenticated, ESP(AH) related headers removed,
+ * *l2_len* and *l3_len* fields are updated.
+ * outbound - appropriate mbuf fields (ol_flags, tx_offloads, etc.)
+ * properly setup, if necessary - IP headers updated, ESP(AH) fields added.
+ * @param sa
+ *   Pointer to SA object the packets belong to.
+ * @param mb
+ *   The address of an array of *num* pointers to *rte_mbuf* structures
+ *   which contain the input packets.
+ * @param num
+ *   The maximum number of packets to process.
+ * @return
+ *   Number of successfully processed packets, with error code set in rte_errno.
+ */
+uint16_t __rte_experimental
+rte_ipsec_inline_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	uint16_t num);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_IPSEC_H_ */
diff --git a/lib/librte_ipsec/rte_ipsec_version.map b/lib/librte_ipsec/rte_ipsec_version.map
new file mode 100644
index 000000000..9b79b3ad0
--- /dev/null
+++ b/lib/librte_ipsec/rte_ipsec_version.map
@@ -0,0 +1,13 @@ 
+EXPERIMENTAL {
+	global:
+
+	rte_ipsec_crypto_prepare;
+	rte_ipsec_crypto_process;
+	rte_ipsec_inline_process;
+	rte_ipsec_sa_fini;
+	rte_ipsec_sa_init;
+	rte_ipsec_sa_size;
+	rte_ipsec_sa_type;
+
+	local: *;
+};
diff --git a/lib/librte_ipsec/sa.c b/lib/librte_ipsec/sa.c
new file mode 100644
index 000000000..0c293f40f
--- /dev/null
+++ b/lib/librte_ipsec/sa.c
@@ -0,0 +1,921 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_ipsec.h>
+#include <rte_esp.h>
+#include <rte_ip.h>
+#include <rte_errno.h>
+#include <rte_cryptodev.h>
+#include "pad.h"
+
+#define IPSEC_MAX_HDR_SIZE	64
+#define IPSEC_MAX_IV_SIZE	(2 * sizeof(uint64_t))
+
+#define	IPSEC_MAX_CRYPTO_DEVS	(UINT8_MAX + 1)
+
+/* ??? these definitions probably has to be in rte_crypto_sym.h */
+union sym_op_ofslen {
+	uint64_t raw;
+	struct {
+		uint32_t offset;
+		uint32_t length;
+	};
+};
+
+union sym_op_data {
+	__uint128_t raw;
+	struct {
+		uint8_t *va;
+		rte_iova_t pa;
+	};
+};
+
+struct rte_ipsec_sa {
+	uint64_t type;   /* type of given SA */
+	uint64_t flowid; /* user defined */
+	uint32_t spi;
+	uint32_t salt;
+	uint64_t sqn;
+	uint64_t *iv_ptr;
+	uint8_t aad_len;
+	uint8_t hdr_len;
+	uint8_t hdr_l3_off;
+	uint8_t icv_len;
+	uint8_t iv_len;
+	uint8_t pad_align;
+	uint8_t proto;    /* next proto */
+	/* template for crypto op fields */
+	struct {
+		union sym_op_ofslen cipher;
+		union sym_op_ofslen auth;
+		uint8_t type;
+		uint8_t status;
+		uint8_t sess_type;
+	} ctp;
+	struct {
+		uint64_t v8;
+		uint64_t v[IPSEC_MAX_IV_SIZE / sizeof(uint64_t)];
+	} iv;
+	uint8_t hdr[IPSEC_MAX_HDR_SIZE];
+
+	struct {
+		struct rte_security_session *sec;
+		uint32_t ol_flags;
+		struct rte_security_ctx *sctx;
+
+		/*
+		 * !!! should be removed if we do crypto sym session properly
+		 * bitmap of crypto devs for which that session was initialised.
+		 */
+		rte_ymm_t cdev_bmap;
+
+		/*
+		 * !!! as alternative we need a space in cryptodev_sym_session
+		 * to store ptr to SA (uint64_t udata or so).
+		 */
+		struct rte_cryptodev_sym_session crypto;
+	} session __rte_cache_min_aligned;
+
+} __rte_cache_aligned;
+
+#define	CS_TO_SA(cs)	((cs) - offsetof(struct rte_ipsec_sa, session.crypto))
+
+/* some helper structures */
+struct crypto_xform {
+	struct rte_crypto_auth_xform *auth;
+	struct rte_crypto_cipher_xform *cipher;
+	struct rte_crypto_aead_xform *aead;
+};
+
+static inline struct rte_ipsec_sa *
+cses2sa(uintptr_t p)
+{
+	p -= offsetof(struct rte_ipsec_sa, session.crypto);
+	return (struct rte_ipsec_sa *)p;
+}
+
+static int
+check_crypto_xform(struct crypto_xform *xform)
+{
+	uintptr_t p;
+
+	p = (uintptr_t)xform->auth | (uintptr_t)xform->cipher;
+
+	/* either aead or both auth and cipher should be not NULLs */
+	if (xform->aead) {
+		if (p)
+			return -EINVAL;
+	} else if (p == (uintptr_t)xform->auth) {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+fill_crypto_xform(struct crypto_xform *xform,
+	const struct rte_ipsec_sa_prm *prm)
+{
+	struct rte_crypto_sym_xform *xf;
+
+	memset(xform, 0, sizeof(*xform));
+
+	for (xf = prm->crypto.xform; xf != NULL; xf = xf->next) {
+		if (xf->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+			if (xform->auth != NULL)
+				return -EINVAL;
+			xform->auth = &xf->auth;
+		} else if (xf->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+			if (xform->cipher != NULL)
+				return -EINVAL;
+			xform->cipher = &xf->cipher;
+		} else if (xf->type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+			if (xform->aead != NULL)
+				return -EINVAL;
+			xform->aead = &xf->aead;
+		} else
+			return -EINVAL;
+	}
+
+	return check_crypto_xform(xform);
+}
+
+/*
+ * !!! we might not need session fini - if cryptodev layer would have similar
+ * functionality.
+ */
+static void
+crypto_session_fini(struct rte_ipsec_sa *sa)
+{
+	uint64_t v;
+	size_t sz;
+	uint32_t i, j;
+
+	sz = sizeof(sa->session.cdev_bmap.u64[0]) * CHAR_BIT;
+
+	for (i = 0; i != RTE_DIM(sa->session.cdev_bmap.u64); i++) {
+
+		v = sa->session.cdev_bmap.u64[0];
+		for (j = 0; v != 0; v >>= 1, j++) {
+			if ((v & 1) != 0)
+				rte_cryptodev_sym_session_clear(i * sz + j,
+					&sa->session.crypto);
+		}
+		sa->session.cdev_bmap.u64[i] = 0;
+	}
+}
+
+static int
+crypto_session_init(struct rte_ipsec_sa *sa, const struct rte_ipsec_sa_prm *prm)
+{
+	size_t sz;
+	uint32_t i, k;
+	int32_t rc;
+
+	rc = 0;
+	sz = sizeof(sa->session.cdev_bmap.u64[0]) * CHAR_BIT;
+
+	for (i = 0; i != prm->crypto.nb_dev; i++) {
+
+		k = prm->crypto.devid[i];
+		rc = rte_cryptodev_sym_session_init(k, &sa->session.crypto,
+			prm->crypto.xform, prm->crypto.pool);
+		if (rc != 0)
+			break;
+		sa->session.cdev_bmap.u64[k / sz] |= 1 << (k % sz);
+	}
+
+	return rc;
+}
+
+uint64_t __rte_experimental
+rte_ipsec_sa_type(const struct rte_ipsec_sa *sa)
+{
+	return sa->type;
+}
+
+size_t __rte_experimental
+rte_ipsec_sa_size(void)
+{
+	size_t sz;
+
+	sz = sizeof(struct rte_ipsec_sa) +
+		rte_cryptodev_sym_get_header_session_size();
+	sz = RTE_ALIGN_CEIL(sz,  RTE_CACHE_LINE_SIZE);
+	return sz;
+}
+
+void __rte_experimental
+rte_ipsec_sa_fini(struct rte_ipsec_sa *sa)
+{
+	size_t sz;
+
+	sz = rte_ipsec_sa_size();
+	crypto_session_fini(sa);
+	memset(sa, 0, sz);
+}
+
+static uint64_t
+fill_sa_type(const struct rte_ipsec_sa_prm *prm)
+{
+	uint64_t tp;
+
+	tp = 0;
+
+	if (prm->ipsec_xform.proto == RTE_SECURITY_IPSEC_SA_PROTO_AH)
+		tp |= RTE_IPSEC_SATP_PROTO_AH;
+	else if (prm->ipsec_xform.proto == RTE_SECURITY_IPSEC_SA_PROTO_ESP)
+		tp |= RTE_IPSEC_SATP_PROTO_ESP;
+
+	if (prm->ipsec_xform.direction == RTE_SECURITY_IPSEC_SA_DIR_EGRESS)
+		tp |= RTE_IPSEC_SATP_DIR_OB;
+	else
+		tp |= RTE_IPSEC_SATP_DIR_IB;
+
+	if (prm->ipsec_xform.mode == RTE_SECURITY_IPSEC_SA_MODE_TUNNEL) {
+		if (prm->ipsec_xform.tunnel.type ==
+				RTE_SECURITY_IPSEC_TUNNEL_IPV4)
+			tp |= RTE_IPSEC_SATP_MODE_TUNLV4;
+		else
+			tp |= RTE_IPSEC_SATP_MODE_TUNLV6;
+
+		if (prm->tun.next_proto == IPPROTO_IPIP)
+			tp |= RTE_IPSEC_SATP_IPV4;
+		else if (prm->tun.next_proto == IPPROTO_IPV6)
+			tp |= RTE_IPSEC_SATP_IPV4;
+	} else {
+		tp |= RTE_IPSEC_SATP_MODE_TRANS;
+		if (prm->trs.proto == IPPROTO_IPIP)
+			tp |= RTE_IPSEC_SATP_IPV4;
+		else if (prm->trs.proto == IPPROTO_IPV6)
+			tp |= RTE_IPSEC_SATP_IPV4;
+	}
+
+	/* !!! some inline ipsec support right now */
+	if (prm->sec.type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO)
+		tp |= RTE_IPSEC_SATP_USE_INLN;
+	else
+		tp |= RTE_IPSEC_SATP_USE_LKSD;
+
+	return tp;
+}
+
+static void
+esp_inb_tun_init(struct rte_ipsec_sa *sa)
+{
+	/* these params may differ with new algorithms support */
+	sa->ctp.auth.offset = 0;
+	sa->ctp.auth.length = sa->icv_len;
+	sa->ctp.cipher.offset = sizeof(struct esp_hdr) + sa->iv_len;
+	sa->ctp.cipher.length = sa->ctp.auth.length + sa->ctp.cipher.offset;
+}
+
+static void
+esp_outb_tun_init(struct rte_ipsec_sa *sa, const struct rte_ipsec_sa_prm *prm)
+{
+	sa->hdr_len = prm->tun.hdr_len;
+	sa->hdr_l3_off = prm->tun.hdr_l3_off;
+	memcpy(sa->hdr, prm->tun.hdr, sa->hdr_len);
+
+	/* these params may differ with new algorithms support */
+	sa->ctp.auth.offset = sa->hdr_len;
+	sa->ctp.auth.length = sizeof(struct esp_hdr) + sa->iv_len;
+	sa->ctp.cipher.offset = sa->hdr_len + sizeof(struct esp_hdr);
+	sa->ctp.cipher.length = sa->iv_len;
+}
+
+static int
+esp_tun_init(struct rte_ipsec_sa *sa, const struct rte_ipsec_sa_prm *prm,
+	const struct crypto_xform *cxf)
+{
+	int32_t rc = 0;
+
+	if (cxf->aead != NULL) {
+		if (cxf->aead->algo != RTE_CRYPTO_AEAD_AES_GCM)
+			return -EINVAL;
+		sa->aad_len = cxf->aead->aad_length;
+		sa->icv_len = cxf->aead->digest_length;
+		sa->iv_len = cxf->aead->iv.length;
+		sa->iv_ptr = sa->iv.v;
+		sa->pad_align = 4;
+	} else {
+		sa->aad_len = 0;
+		sa->icv_len = cxf->auth->digest_length;
+		if (cxf->cipher->algo == RTE_CRYPTO_CIPHER_NULL) {
+			sa->pad_align = 4;
+			sa->iv_len = 0;
+			sa->iv_ptr = sa->iv.v;
+		} else if (cxf->cipher->algo == RTE_CRYPTO_CIPHER_AES_CBC) {
+			sa->pad_align = sizeof(sa->iv.v);
+			sa->iv_len = sizeof(sa->iv.v);
+			sa->iv_ptr = sa->iv.v;
+			memset(sa->iv.v, 0, sizeof(sa->iv.v));
+		} else
+			return -EINVAL;
+	}
+
+	sa->type = fill_sa_type(prm);
+	sa->flowid = prm->flowid;
+	sa->spi = rte_cpu_to_be_32(prm->ipsec_xform.spi);
+	sa->salt = prm->ipsec_xform.salt;
+	sa->sqn = 0;
+
+	sa->proto = prm->tun.next_proto;
+
+	if ((sa->type & RTE_IPSEC_SATP_DIR_MASK) == RTE_IPSEC_SATP_DIR_IB)
+		esp_inb_tun_init(sa);
+	else
+		esp_outb_tun_init(sa, prm);
+
+	sa->ctp.type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+	sa->ctp.status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED;
+	sa->ctp.sess_type = RTE_CRYPTO_OP_WITH_SESSION;
+
+	/* pass info required for inline outbound */
+	sa->session.sctx = prm->sec.sctx;
+	sa->session.sec = prm->sec.sses;
+	sa->session.ol_flags = prm->sec.ol_flags;
+
+	if ((sa->type & RTE_IPSEC_SATP_USE_MASK) != RTE_IPSEC_SATP_USE_INLN)
+		rc = crypto_session_init(sa, prm);
+	return rc;
+}
+
+int __rte_experimental
+rte_ipsec_sa_init(struct rte_ipsec_sa *sa, const struct rte_ipsec_sa_prm *prm)
+{
+	int32_t rc;
+	struct crypto_xform cxf;
+
+	if (sa == NULL || prm == NULL)
+		return -EINVAL;
+
+	/* only esp inbound and outbound tunnel is supported right now */
+	if (prm->ipsec_xform.proto != RTE_SECURITY_IPSEC_SA_PROTO_ESP ||
+			prm->ipsec_xform.mode !=
+			RTE_SECURITY_IPSEC_SA_MODE_TUNNEL)
+		return -EINVAL;
+
+	/* only inline crypto or none action type are supported */
+	if (!(prm->sec.type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
+			prm->sec.type == RTE_SECURITY_ACTION_TYPE_NONE))
+		return -EINVAL;
+
+	if (prm->tun.hdr_len > sizeof(sa->hdr))
+		return -EINVAL;
+
+	rc = fill_crypto_xform(&cxf, prm);
+	if (rc != 0)
+		return rc;
+
+	rc = esp_tun_init(sa, prm, &cxf);
+	if (rc != 0)
+		rte_ipsec_sa_fini(sa);
+
+	return rc;
+}
+
+static inline void
+esp_outb_tun_cop_prepare(struct rte_crypto_op *cop,
+	const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	const union sym_op_data *icv, uint32_t plen)
+{
+	struct rte_crypto_sym_op *sop;
+
+	cop->type = sa->ctp.type;
+	cop->status = sa->ctp.status;
+	cop->sess_type = sa->ctp.sess_type;
+
+	sop = cop->sym;
+
+	/* fill sym op fields */
+	sop->session = (void *)(uintptr_t)&sa->session.crypto;
+	sop->m_src = mb;
+
+	sop->cipher.data.offset = sa->ctp.cipher.offset;
+	sop->cipher.data.length = sa->ctp.cipher.length + plen;
+	sop->auth.data.offset = sa->ctp.auth.offset;
+	sop->auth.data.length = sa->ctp.auth.length + plen;
+	sop->auth.digest.data = icv->va;
+	sop->auth.digest.phys_addr = icv->pa;
+
+	/* !!! fill sym op aead fields */
+}
+
+static inline int32_t
+esp_outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	union sym_op_data *icv)
+{
+	uint32_t clen, hlen, pdlen, pdofs, tlen;
+	struct rte_mbuf *ml;
+	struct esp_hdr *esph;
+	struct esp_tail *espt;
+	char *ph, *pt;
+	uint64_t *iv;
+
+	hlen = sa->hdr_len + sa->iv_len + sizeof(*esph);
+	/* calculate padding and tail space required */
+
+	/* number of bytes to encrypt */
+	clen = mb->pkt_len + sizeof(*espt);
+	clen = RTE_ALIGN_CEIL(clen, sa->pad_align);
+
+	/* pad length + esp tail */
+	pdlen = clen - mb->pkt_len;
+	tlen = pdlen + sa->icv_len;
+
+	/* do append and prepend */
+	ml = rte_pktmbuf_lastseg(mb);
+	if (tlen + sa->aad_len > rte_pktmbuf_tailroom(ml))
+		return -ENOSPC;
+
+	/* prepend header */
+	ph = rte_pktmbuf_prepend(mb, hlen);
+	if (ph == NULL)
+		return -ENOSPC;
+
+	/* append tail */
+	pdofs = ml->data_len;
+	ml->data_len += tlen;
+	mb->pkt_len += tlen;
+	pt = rte_pktmbuf_mtod_offset(ml, typeof(pt), pdofs);
+
+	/* copy tunnel pkt header */
+	rte_memcpy(ph, sa->hdr, sa->hdr_len);
+
+	/* update original and new ip header fields */
+	if (sa->type & RTE_IPSEC_SATP_MODE_TUNLV4) {
+		struct ipv4_hdr *l3h;
+		l3h = (struct ipv4_hdr *)(ph + sa->hdr_l3_off);
+		l3h->packet_id = rte_cpu_to_be_16(sa->sqn);
+		l3h->total_length = rte_cpu_to_be_16(mb->pkt_len -
+			sa->hdr_l3_off);
+	} else {
+		struct ipv6_hdr *l3h;
+		l3h = (struct ipv6_hdr *)(ph + sa->hdr_l3_off);
+		l3h->payload_len = rte_cpu_to_be_16(mb->pkt_len -
+			sa->hdr_l3_off - sizeof(*l3h));
+	}
+
+	/* update spi, seqn and iv */
+	esph = (struct esp_hdr *)(ph + sa->hdr_len);
+	iv = (uint64_t *)(esph + 1);
+
+	esph->spi = sa->spi;
+	esph->seq = rte_cpu_to_be_32(sa->sqn);
+	rte_memcpy(iv, sa->iv_ptr, sa->iv_len);
+
+	/* offset for ICV */
+	pdofs += pdlen;
+
+	/* pad length */
+	pdlen -= sizeof(*espt);
+
+	/* copy padding data */
+	rte_memcpy(pt, esp_pad_bytes, pdlen);
+
+	/* update esp trailer */
+	espt = (struct esp_tail *)(pt + pdlen);
+	espt->pad_len = pdlen;
+	espt->next_proto = sa->proto;
+
+	/* !!! fill aad fields, if any (aad fields are placed after icv */
+
+	icv->va = rte_pktmbuf_mtod_offset(ml, void *, pdofs);
+	icv->pa = rte_pktmbuf_iova_offset(ml, pdofs);
+
+	return clen;
+}
+
+static inline uint32_t
+esn_outb_check_sqn(struct rte_ipsec_sa *sa, uint32_t num)
+{
+	RTE_SET_USED(sa);
+	return num;
+}
+
+static inline int
+esn_inb_check_sqn(struct rte_ipsec_sa *sa, uint32_t sqn)
+{
+	RTE_SET_USED(sa);
+	RTE_SET_USED(sqn);
+	return 0;
+}
+
+static inline uint16_t
+esp_outb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	struct rte_crypto_op *cop[], uint16_t num)
+{
+	int32_t rc;
+	uint32_t i, n;
+	union sym_op_data icv;
+
+	n = esn_outb_check_sqn(sa, num);
+
+	for (i = 0; i != n; i++) {
+
+		sa->sqn++;
+		sa->iv.v8 = rte_cpu_to_be_64(sa->sqn);
+
+		/* update the packet itself */
+		rc = esp_outb_tun_pkt_prepare(sa, mb[i], &icv);
+		if (rc < 0) {
+			rte_errno = -rc;
+			break;
+		}
+
+		/* update crypto op */
+		esp_outb_tun_cop_prepare(cop[i], sa, mb[i], &icv, rc);
+	}
+
+	return i;
+}
+
+static inline int32_t
+esp_inb_tun_cop_prepare(struct rte_crypto_op *cop,
+	const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	const union sym_op_data *icv, uint32_t pofs, uint32_t plen)
+{
+	struct rte_crypto_sym_op *sop;
+	uint64_t *ivc, *ivp;
+	uint32_t clen;
+
+	clen = plen - sa->ctp.cipher.length;
+	if ((int32_t)clen < 0 || (clen & (sa->pad_align - 1)) != 0)
+		return -EINVAL;
+
+	cop->type = sa->ctp.type;
+	cop->status = sa->ctp.status;
+	cop->sess_type = sa->ctp.sess_type;
+
+	sop = cop->sym;
+
+	/* fill sym op fields */
+	sop->session = (void *)(uintptr_t)&sa->session.crypto;
+	sop->m_src = mb;
+
+	sop->cipher.data.offset = pofs + sa->ctp.cipher.offset;
+	sop->cipher.data.length = clen;
+	sop->auth.data.offset = pofs + sa->ctp.auth.offset;
+	sop->auth.data.length = plen - sa->ctp.auth.length;
+	sop->auth.digest.data = icv->va;
+	sop->auth.digest.phys_addr = icv->pa;
+
+	/* !!! fill sym op aead fields */
+
+	/* copy iv from the input packet to the cop */
+	ivc = (uint64_t *)(sop + 1);
+	ivp = rte_pktmbuf_mtod_offset(mb, uint64_t *,
+			pofs + sizeof(struct esp_hdr));
+	rte_memcpy(ivc, ivp, sa->iv_len);
+	return 0;
+}
+
+static inline int32_t
+esp_inb_tun_pkt_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	uint32_t hlen, union sym_op_data *icv)
+{
+	struct rte_mbuf *ml;
+	uint32_t icv_ofs, plen;
+
+	plen = mb->pkt_len;
+	plen = plen - hlen;
+
+	ml = rte_pktmbuf_lastseg(mb);
+	icv_ofs = ml->data_len - sa->icv_len;
+
+	icv->va = rte_pktmbuf_mtod_offset(ml, void *, icv_ofs);
+	icv->pa = rte_pktmbuf_iova_offset(ml, icv_ofs);
+
+	return plen;
+}
+
+static inline uint16_t
+esp_inb_tun_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	struct rte_crypto_op *cop[], uint16_t num)
+{
+	int32_t rc;
+	uint32_t i, hl;
+	union sym_op_data icv;
+
+	for (i = 0; i != num; i++) {
+
+		hl = mb[i]->l2_len + mb[i]->l3_len;
+		rc = esp_inb_tun_pkt_prepare(sa, mb[i], hl, &icv);
+
+		if (rc >= 0)
+			rc = esp_inb_tun_cop_prepare(cop[i], sa, mb[i], &icv,
+				hl, rc);
+		if (rc < 0) {
+			rte_errno = -rc;
+			break;
+		}
+	}
+
+	return i;
+}
+
+uint16_t __rte_experimental
+rte_ipsec_crypto_prepare(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	struct rte_crypto_op *cop[], uint16_t num)
+{
+	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
+				RTE_IPSEC_SATP_MODE_MASK;
+
+	switch (sa->type & msk) {
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		return esp_inb_tun_prepare(sa, mb, cop, num);
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		return esp_outb_tun_prepare(sa, mb, cop, num);
+	default:
+		rte_errno = ENOTSUP;
+		return 0;
+	}
+}
+
+/*
+ * !!! create something more generic (and smarter)
+ * ideally in librte_mbuf
+ */
+static inline void
+free_mbuf_bulk(struct rte_mbuf *mb[], uint32_t num)
+{
+	uint32_t i;
+
+	for (i = 0; i != num; i++)
+		rte_pktmbuf_free(mb[i]);
+}
+
+/* exclude NULLs from the final list of packets. */
+static inline uint32_t
+compress_pkt_list(struct rte_mbuf *pkt[], uint32_t nb_pkt, uint32_t nb_zero)
+{
+	uint32_t i, j, k, l;
+
+	for (j = nb_pkt; nb_zero != 0 && j-- != 0; ) {
+
+		/* found a hole. */
+		if (pkt[j] == NULL) {
+
+			/* find how big is it. */
+			for (i = j; i-- != 0 && pkt[i] == NULL; )
+				;
+			/* fill the hole. */
+			for (k = j + 1, l = i + 1; k != nb_pkt; k++, l++)
+				pkt[l] = pkt[k];
+
+			nb_pkt -= j - i;
+			nb_zero -= j - i;
+			j = i + 1;
+		}
+	}
+
+	return nb_pkt;
+}
+
+static inline int
+esp_inb_tun_single_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	uint32_t icv_len)
+{
+	uint32_t hlen, tlen;
+	struct esp_hdr *esph;
+	struct esp_tail *espt;
+	struct rte_mbuf *ml;
+	char *pd;
+
+	ml = rte_pktmbuf_lastseg(mb);
+	espt = rte_pktmbuf_mtod_offset(ml, struct esp_tail *,
+		ml->data_len - icv_len - sizeof(*espt));
+
+	/* cut of ICV, ESP tail and padding bytes */
+	tlen = icv_len + sizeof(*espt) + espt->pad_len;
+	ml->data_len -= tlen;
+	mb->pkt_len -= tlen;
+
+	/* cut of L2/L3 headers, ESP header and IV */
+	hlen = mb->l2_len + mb->l3_len;
+	esph = rte_pktmbuf_mtod_offset(mb, struct esp_hdr *, hlen);
+	rte_pktmbuf_adj(mb, hlen + sa->ctp.cipher.offset);
+
+	/* reset mbuf metatdata: L2/L3 len, packet type */
+	mb->packet_type = RTE_PTYPE_UNKNOWN;
+	mb->l2_len = 0;
+	mb->l3_len = 0;
+
+	/* clear the PKT_RX_SEC_OFFLOAD flag if set */
+	mb->ol_flags &= ~(mb->ol_flags & PKT_RX_SEC_OFFLOAD);
+
+	/*
+	 * check spi, sqn, padding and next proto.
+	 * drop packet if something is wrong.
+	 * ??? consider move spi and sqn check to prepare.
+	 */
+
+	pd = (char *)espt - espt->pad_len;
+	if (esph->spi != sa->spi ||
+			esn_inb_check_sqn(sa, esph->seq) != 0 ||
+			 espt->next_proto != sa->proto ||
+			memcmp(pd, esp_pad_bytes, espt->pad_len))
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline uint16_t
+esp_inb_tun_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	struct rte_mbuf *dr[], uint16_t num)
+{
+	uint32_t i, k, icv_len;
+
+	icv_len = sa->icv_len;
+
+	k = 0;
+	for (i = 0; i != num; i++) {
+		if (esp_inb_tun_single_pkt_process(sa, mb[i], icv_len)) {
+			dr[k++] = mb[i];
+			mb[i] = NULL;
+		}
+	}
+
+	if (k != 0)
+		compress_pkt_list(mb, num, k);
+
+	return num - k;
+}
+
+static inline uint16_t
+esp_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	struct rte_mbuf *dr[], uint16_t num)
+{
+	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
+			RTE_IPSEC_SATP_MODE_MASK;
+
+	switch (sa->type & msk) {
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		return esp_inb_tun_pkt_process(sa, mb, dr, num);
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		return num;
+	default:
+		return 0;
+	}
+}
+
+static inline uint16_t
+esp_process(const struct rte_crypto_op *cop[], struct rte_mbuf *mb[],
+	struct rte_ipsec_group grp[], uint16_t num)
+{
+	uint32_t cnt, i, j, k, n;
+	uintptr_t ns, ps;
+	struct rte_ipsec_sa *sa;
+	struct rte_mbuf *m, *dr[num];
+
+	j = 0;
+	k = 0;
+	n = 0;
+	ps = 0;
+
+	for (i = 0; i != num; i++) {
+
+		m = cop[i]->sym[0].m_src;
+		ns = (uintptr_t)cop[i]->sym[0].session;
+
+		if (cop[i]->status != RTE_CRYPTO_OP_STATUS_SUCCESS) {
+			dr[k++] = m;
+			continue;
+		}
+
+		if (ps != ns) {
+
+			if (ps != 0) {
+				sa = cses2sa(ps);
+
+				/* setup count for current group */
+				grp[n].cnt = mb + j - grp[n].m;
+
+				/* do SA type specific processing */
+				cnt = esp_pkt_process(sa, grp[n].m, dr + k,
+					grp[n].cnt);
+
+				/* some packets were dropped */
+				cnt = grp[n].cnt - cnt;
+				if (cnt != 0) {
+					grp[n].cnt -= cnt;
+					j -= cnt;
+					k += cnt;
+				}
+
+				/* open new group */
+				n++;
+			}
+
+			grp[n].flowid = cses2sa(ns)->flowid;
+			grp[n].m = mb + j;
+			ps = ns;
+		}
+
+		mb[j++] = m;
+	}
+
+	if (ps != 0) {
+		sa = cses2sa(ps);
+		grp[n].cnt = mb + j - grp[n].m;
+		cnt = esp_pkt_process(sa, grp[n].m, dr + k, grp[n].cnt);
+		cnt = grp[n].cnt - cnt;
+		if (cnt != 0) {
+			grp[n].cnt -= cnt;
+			j -= cnt;
+			k += cnt;
+		}
+		n++;
+	}
+
+	if (k != 0)
+		free_mbuf_bulk(dr, k);
+
+	return n;
+}
+
+uint16_t __rte_experimental
+rte_ipsec_crypto_process(const struct rte_crypto_op *cop[],
+	struct rte_mbuf *mb[], struct rte_ipsec_group grp[], uint16_t num)
+{
+	return esp_process(cop, mb, grp, num);
+}
+
+static inline uint16_t
+inline_outb_tun_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	uint16_t num)
+{
+	uint32_t i;
+	struct rte_mbuf *m;
+	int rc;
+	union sym_op_data icv;
+
+	for (i = 0; i != num; i++) {
+		m = mb[i];
+
+		sa->sqn++;
+		sa->iv.v8 = rte_cpu_to_be_64(sa->sqn);
+
+		/* update the packet itself */
+		rc = esp_outb_tun_pkt_prepare(sa, m, &icv);
+		if (rc < 0) {
+			rte_errno = -rc;
+			break;
+		}
+
+		m->ol_flags |= PKT_TX_SEC_OFFLOAD;
+
+		if (sa->session.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA)
+			rte_security_set_pkt_metadata(sa->session.sctx,
+				sa->session.sec, m, NULL);
+	}
+
+	return i;
+}
+
+static inline uint16_t
+inline_inb_tun_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	uint16_t num)
+{
+	uint32_t i, icv_len;
+	int rc;
+
+	icv_len = sa->icv_len;
+
+	for (i = 0; i != num; i++) {
+		rc = esp_inb_tun_single_pkt_process(sa, mb[i], icv_len);
+		if (rc != 0) {
+			rte_errno = -rc;
+			break;
+		}
+	}
+
+	return i;
+}
+
+uint16_t __rte_experimental
+rte_ipsec_inline_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
+	uint16_t num)
+{
+	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
+			RTE_IPSEC_SATP_MODE_MASK;
+
+	switch (sa->type & msk) {
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		return inline_inb_tun_pkt_process(sa, mb, num);
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		return inline_outb_tun_pkt_process(sa, mb, num);
+	default:
+		rte_errno = ENOTSUP;
+	}
+
+	return 0;
+}
diff --git a/lib/librte_net/rte_esp.h b/lib/librte_net/rte_esp.h
index f77ec2eb2..8e1b3d2dd 100644
--- a/lib/librte_net/rte_esp.h
+++ b/lib/librte_net/rte_esp.h
@@ -11,7 +11,7 @@ 
  * ESP-related defines
  */
 
-#include <stdint.h>
+#include <rte_byteorder.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -25,6 +25,14 @@  struct esp_hdr {
 	rte_be32_t seq;  /**< packet sequence number */
 } __attribute__((__packed__));
 
+/**
+ * ESP Trailer
+ */
+struct esp_tail {
+	uint8_t pad_len;     /**< number of pad bytes (0-255) */
+	uint8_t next_proto;  /**< IPv4 or IPv6 or next layer header */
+} __attribute__((__packed__));
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/meson.build b/lib/meson.build
index eb91f100b..bb07e67bd 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -21,6 +21,8 @@  libraries = [ 'compat', # just a header, used for versioning
 	'kni', 'latencystats', 'lpm', 'member',
 	'meter', 'power', 'pdump', 'rawdev',
 	'reorder', 'sched', 'security', 'vhost',
+	# ipsec lib depends on crypto and security
+	'ipsec',
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index de33883be..7f4344ecd 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -62,6 +62,8 @@  ifeq ($(CONFIG_RTE_LIBRTE_BPF_ELF),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lelf
 endif
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_IPSEC)            += -lrte_ipsec
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile