[dpdk-dev,v7,06/10] mbuf_offload: library to support attaching offloads to a mbuf

Message ID 1447441090-8129-7-git-send-email-declan.doherty@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Doherty, Declan Nov. 13, 2015, 6:58 p.m. UTC
  This library add support for adding a chain of offload operations to a
mbuf. It contains the definition of the rte_mbuf_offload structure as
well as helper functions for attaching  offloads to mbufs and a mempool
management functions.

This initial implementation supports attaching multiple offload
operations to a single mbuf, but only a single offload operation of a
specific type can be attach to that mbuf.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_bsdapp                               |   6 +
 config/common_linuxapp                             |   6 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 lib/Makefile                                       |   1 +
 lib/librte_mbuf/rte_mbuf.h                         |   6 +
 lib/librte_mbuf_offload/Makefile                   |  52 ++++
 lib/librte_mbuf_offload/rte_mbuf_offload.c         | 100 +++++++
 lib/librte_mbuf_offload/rte_mbuf_offload.h         | 302 +++++++++++++++++++++
 .../rte_mbuf_offload_version.map                   |   7 +
 mk/rte.app.mk                                      |   1 +
 12 files changed, 487 insertions(+)
 create mode 100644 lib/librte_mbuf_offload/Makefile
 create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload.c
 create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload.h
 create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload_version.map
  

Comments

Olivier Matz Nov. 20, 2015, 3:27 p.m. UTC | #1
Hi Declan,

Please find some comments below.

On 11/13/2015 07:58 PM, Declan Doherty wrote:
> This library add support for adding a chain of offload operations to a
> mbuf. It contains the definition of the rte_mbuf_offload structure as
> well as helper functions for attaching  offloads to mbufs and a mempool
> management functions.
> 
> This initial implementation supports attaching multiple offload
> operations to a single mbuf, but only a single offload operation of a
> specific type can be attach to that mbuf.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  MAINTAINERS                                        |   4 +
>  config/common_bsdapp                               |   6 +
>  config/common_linuxapp                             |   6 +
>  doc/api/doxy-api-index.md                          |   1 +
>  doc/api/doxy-api.conf                              |   1 +
>  lib/Makefile                                       |   1 +
>  lib/librte_mbuf/rte_mbuf.h                         |   6 +
>  lib/librte_mbuf_offload/Makefile                   |  52 ++++
>  lib/librte_mbuf_offload/rte_mbuf_offload.c         | 100 +++++++
>  lib/librte_mbuf_offload/rte_mbuf_offload.h         | 302 +++++++++++++++++++++
>  .../rte_mbuf_offload_version.map                   |   7 +
>  mk/rte.app.mk                                      |   1 +
>  12 files changed, 487 insertions(+)
>  create mode 100644 lib/librte_mbuf_offload/Makefile
>  create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload.c
>  create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload.h
>  create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload_version.map

The new files are called rte_mbuf_offload, but from what I understand,
it is more like a mbuf metadata api. What you call "offload operation"
is not called because an offload is attached, but because you call
rte_cryptodev_enqueue_burst() on it.

From what I see, it is not said in the help of
rte_cryptodev_enqueue_burst() that the offload structure has to be
set in the given mbufs.

Is my understanding correct?


> +/**
> + * @file
> + * RTE mbuf offload
> + *
> + * The rte_mbuf_offload library provides the ability to specify a device generic
> + * off-load operation independent of the current Rx/Tx Ethernet offloads
> + * supported within the rte_mbuf structure, and add supports for multiple
> + * off-load operations and offload device types.
> + *
> + * The rte_mbuf_offload specifies the particular off-load operation type, such
> + * as a crypto operation, and provides a container for the operations
> + * parameter's inside the op union. These parameters are then used by the
> + * device which supports that operation to perform the specified offload.
> + *
> + * This library provides an API to create pre-allocated mempool of offload
> + * operations, with supporting allocate and free functions. It also provides
> + * APIs for attaching an offload to a mbuf, as well as an API to retrieve a
> + * specified offload type from an mbuf offload chain.
> + */
> +
> +#include <rte_mbuf.h>
> +#include <rte_crypto.h>
> +
> +
> +/** packet mbuf offload operation types */
> +enum rte_mbuf_ol_op_type {
> +	RTE_PKTMBUF_OL_NOT_SPECIFIED = 0,
> +	/**< Off-load not specified */
> +	RTE_PKTMBUF_OL_CRYPTO
> +	/**< Crypto offload operation */
> +};

Here, the mbuf offload API is presented as a generic API for offload.
But it includes rte_crypto. Actually, it means that at the end this
API needs to be aware of any offload type.

Wouldn't it be possible to hide the knowledge of the metadata structure
to this API?


> +/**
> + * Attach a rte_mbuf_offload to a mbuf. We only support a single offload of any
> + * one type in our chain of offloads.
> + *
> + * @param	m	packet mbuf.
> + * @param	ol	rte_mbuf_offload strucutre to be attached
> + *
> + * @returns
> + * - On success returns the pointer to the offload we just added
> + * - On failure returns NULL
> + */
> +static inline struct rte_mbuf_offload *
> +rte_pktmbuf_offload_attach(struct rte_mbuf *m, struct rte_mbuf_offload *ol)
> +{
> +	struct rte_mbuf_offload **ol_last;
> +
> +	for (ol_last = &m->offload_ops;	ol_last[0] != NULL;
> +			ol_last = &ol_last[0]->next)
> +		if (ol_last[0]->type == ol->type)
> +			return NULL;
> +
> +	ol_last[0] = ol;
> +	ol_last[0]->m = m;
> +	ol_last[0]->next = NULL;
> +
> +	return ol_last[0];
> +}

*ol_last would be clearer than ol_last[0] as it's not a table.
Here we expect that m->offload_ops == NULL at mbuf initialization.
I cannot find where it is initialized.

> +
> +
> +/** Rearms rte_mbuf_offload default parameters */
> +static inline void
> +__rte_pktmbuf_offload_reset(struct rte_mbuf_offload *ol,
> +		enum rte_mbuf_ol_op_type type)
> +{
> +	ol->m = NULL;
> +	ol->type = type;
> +
> +	switch (type) {
> +	case RTE_PKTMBUF_OL_CRYPTO:
> +		__rte_crypto_op_reset(&ol->op.crypto); break;
> +	default:
> +		break;
> +	}
> +}

Would it work if several OL are registered?


Also, what is not clear to me is how the offload structure is freed.
For instance, I think that calling rte_pktmbuf_free(m) on a mbuf
that has a offload attached would result in a leak.

It would mean that it is not allowed to call any function that free or
reassemble a mbuf when an offload is attached.

I'm wondering if there is such a leak in l2fwd_crypto_send_burst():

	/* <<<<<<< here packets have the offload attached */
	ret = rte_cryptodev_enqueue_burst(cparams->dev_id,
		cparams->qp_id, pkt_buffer, (uint16_t) n);
	crypto_statistics[cparams->dev_id].enqueued += ret;
	if (unlikely(ret < n)) {
		crypto_statistics[cparams->dev_id].errors += (n - ret);
		do {
			/* <<<<<<<<< offload struct is lost? */
			rte_pktmbuf_free(pkt_buffer[ret]);
		} while (++ret < n);
	}


It seems that these offload structures are only used to pass crypto
info to the cryptodev. Would it be a problem to have an API like this?

  rx_burst(port, q, mbuf_tab, crypto_tab, n);

Or even:

  rx_burst(port, q, crypto_tab, n);

  with each *cryto_tab pointing to a mbuf


If we really want to use mbufs to store the crypto info (is it
something we want? for pipelining?), another idea would be to use
the usual metadata after the mbuf. It may require to enhance this
metadata framework to allow to better register room for metadata,
because today anyone using metadata expects that its metadata are
the only ones.

Pros:
 - no additional allocation for crypto offload struct
 - no risk of leak
Cons:
 - room is reserved for crypto in all mbufs even if no crypto
   is done on them


Regards,
Olivier
  
Doherty, Declan Nov. 20, 2015, 5:26 p.m. UTC | #2
Hey Oliver, see reply inline.


On 20/11/15 15:27, Olivier MATZ wrote:
> Hi Declan,
>
> Please find some comments below.
>
> On 11/13/2015 07:58 PM, Declan Doherty wrote:
>> This library add support for adding a chain of offload operations to a
>> mbuf. It contains the definition of the rte_mbuf_offload structure as
>> well as helper functions for attaching  offloads to mbufs and a mempool
>> management functions.
>>
>> This initial implementation supports attaching multiple offload
>> operations to a single mbuf, but only a single offload operation of a
>> specific type can be attach to that mbuf.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
>>   MAINTAINERS                                        |   4 +
>>   config/common_bsdapp                               |   6 +
>>   config/common_linuxapp                             |   6 +
>>   doc/api/doxy-api-index.md                          |   1 +
>>   doc/api/doxy-api.conf                              |   1 +
>>   lib/Makefile                                       |   1 +
>>   lib/librte_mbuf/rte_mbuf.h                         |   6 +
>>   lib/librte_mbuf_offload/Makefile                   |  52 ++++
>>   lib/librte_mbuf_offload/rte_mbuf_offload.c         | 100 +++++++
>>   lib/librte_mbuf_offload/rte_mbuf_offload.h         | 302 +++++++++++++++++++++
>>   .../rte_mbuf_offload_version.map                   |   7 +
>>   mk/rte.app.mk                                      |   1 +
>>   12 files changed, 487 insertions(+)
>>   create mode 100644 lib/librte_mbuf_offload/Makefile
>>   create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload.c
>>   create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload.h
>>   create mode 100644 lib/librte_mbuf_offload/rte_mbuf_offload_version.map
>
> The new files are called rte_mbuf_offload, but from what I understand,
> it is more like a mbuf metadata api. What you call "offload operation"
> is not called because an offload is attached, but because you call
> rte_cryptodev_enqueue_burst() on it.

Maybe rte_mbuf_offload_metadata would be a better name, if not a bit 
more long winded :) The idea of this API set is to give a generic 
framework for attaching the the offload operation meta data to a mbuf 
which will be retrieved at a later point, when the particular offload 
burst function is called. I guess as we only have a single offload 
device at the moment the API may look a little over the top!


>
>  From what I see, it is not said in the help of
> rte_cryptodev_enqueue_burst() that the offload structure has to be
> set in the given mbufs.
>
I need to update the API documentation to make that explicit.

> Is my understanding correct?
>
>
>> +/**
>> + * @file
>> + * RTE mbuf offload
>> + *
>> + * The rte_mbuf_offload library provides the ability to specify a device generic
>> + * off-load operation independent of the current Rx/Tx Ethernet offloads
>> + * supported within the rte_mbuf structure, and add supports for multiple
>> + * off-load operations and offload device types.
>> + *
>> + * The rte_mbuf_offload specifies the particular off-load operation type, such
>> + * as a crypto operation, and provides a container for the operations
>> + * parameter's inside the op union. These parameters are then used by the
>> + * device which supports that operation to perform the specified offload.
>> + *
>> + * This library provides an API to create pre-allocated mempool of offload
>> + * operations, with supporting allocate and free functions. It also provides
>> + * APIs for attaching an offload to a mbuf, as well as an API to retrieve a
>> + * specified offload type from an mbuf offload chain.
>> + */
>> +
>> +#include <rte_mbuf.h>
>> +#include <rte_crypto.h>
>> +
>> +
>> +/** packet mbuf offload operation types */
>> +enum rte_mbuf_ol_op_type {
>> +	RTE_PKTMBUF_OL_NOT_SPECIFIED = 0,
>> +	/**< Off-load not specified */
>> +	RTE_PKTMBUF_OL_CRYPTO
>> +	/**< Crypto offload operation */
>> +};
>
> Here, the mbuf offload API is presented as a generic API for offload.
> But it includes rte_crypto. Actually, it means that at the end this
> API needs to be aware of any offload type.
>
> Wouldn't it be possible to hide the knowledge of the metadata structure
> to this API?

The design makes the offload API aware of the underlying offload 
operations, but I don't see this as a problem, the main idea was to have 
a no dependencies other than the structure pointer added to the 
rte_mbuf, while also making the offload extensible in the future. Also 
this approach makes the management of the offload elements very straight 
forward, with very simplified pool management functions and also there 
isn't the need for another level of indirection to the actual offload 
operation.

>
>
>> +/**
>> + * Attach a rte_mbuf_offload to a mbuf. We only support a single offload of any
>> + * one type in our chain of offloads.
>> + *
>> + * @param	m	packet mbuf.
>> + * @param	ol	rte_mbuf_offload strucutre to be attached
>> + *
>> + * @returns
>> + * - On success returns the pointer to the offload we just added
>> + * - On failure returns NULL
>> + */
>> +static inline struct rte_mbuf_offload *
>> +rte_pktmbuf_offload_attach(struct rte_mbuf *m, struct rte_mbuf_offload *ol)
>> +{
>> +	struct rte_mbuf_offload **ol_last;
>> +
>> +	for (ol_last = &m->offload_ops;	ol_last[0] != NULL;
>> +			ol_last = &ol_last[0]->next)
>> +		if (ol_last[0]->type == ol->type)
>> +			return NULL;
>> +
>> +	ol_last[0] = ol;
>> +	ol_last[0]->m = m;
>> +	ol_last[0]->next = NULL;
>> +
>> +	return ol_last[0];
>> +}
>
> *ol_last would be clearer than ol_last[0] as it's not a table.

Just a personal preference, but I can submit a change later.

> Here we expect that m->offload_ops == NULL at mbuf initialization.
> I cannot find where it is initialized.
>

I'll need to add the initialization to rte_pktmbuf_reset()

>> +
>> +
>> +/** Rearms rte_mbuf_offload default parameters */
>> +static inline void
>> +__rte_pktmbuf_offload_reset(struct rte_mbuf_offload *ol,
>> +		enum rte_mbuf_ol_op_type type)
>> +{
>> +	ol->m = NULL;
>> +	ol->type = type;
>> +
>> +	switch (type) {
>> +	case RTE_PKTMBUF_OL_CRYPTO:
>> +		__rte_crypto_op_reset(&ol->op.crypto); break;
>> +	default:
>> +		break;
>> +	}
>> +}
>
> Would it work if several OL are registered?
>

I can't see any reason why it wouldn't

>
> Also, what is not clear to me is how the offload structure is freed.
> For instance, I think that calling rte_pktmbuf_free(m) on a mbuf
> that has a offload attached would result in a leak.
>
> It would mean that it is not allowed to call any function that free or
> reassemble a mbuf when an offload is attached.

We just need to walk the chain of offloads calling 
rte_pktmbuf_offload_free(), before freeing the mbuf, which will be an 
issue with any externally attached meta data. In the case of 
reassembling I don't see why we would just move the chain to the head mbuf.

>
> I'm wondering if there is such a leak in l2fwd_crypto_send_burst():
>
> 	/* <<<<<<< here packets have the offload attached */
> 	ret = rte_cryptodev_enqueue_burst(cparams->dev_id,
> 		cparams->qp_id, pkt_buffer, (uint16_t) n);
> 	crypto_statistics[cparams->dev_id].enqueued += ret;
> 	if (unlikely(ret < n)) {
> 		crypto_statistics[cparams->dev_id].errors += (n - ret);
> 		do {
> 			/* <<<<<<<<< offload struct is lost? */
> 			rte_pktmbuf_free(pkt_buffer[ret]);
> 		} while (++ret < n);
> 	}
>
>
I'll push a fix for this leak, I just noticed it myself this morning.


> It seems that these offload structures are only used to pass crypto
> info to the cryptodev. Would it be a problem to have an API like this?
>
>    rx_burst(port, q, mbuf_tab, crypto_tab, n);
>

I really dislike this option, there's no direct linkage between mbuf and 
offload operation.

> Or even:
>
>    rx_burst(port, q, crypto_tab, n);
>
>    with each *cryto_tab pointing to a mbuf
>

I looked at this but it would really hamstring any pipelining 
applications which might want to attach multiple offloads to a mbuf at a 
point in the pipeline for processing at later steps.
>
> If we really want to use mbufs to store the crypto info (is it
> something we want? for pipelining?), another idea would be to use
> the usual metadata after the mbuf. It may require to enhance this
> metadata framework to allow to better register room for metadata,
> because today anyone using metadata expects that its metadata are
> the only ones.
>
> Pros:
>   - no additional allocation for crypto offload struct
>   - no risk of leak
> Cons:
>   - room is reserved for crypto in all mbufs even if no crypto
>     is done on them
>
I have some further API to add in R2.3 which would allow the offload to 
allocated in the private data of the mbuf but they are not fully tested 
as yet. Also I don't think it's practical from a memory management point 
of view to make the assumption that there will always be enough space in 
the mbufs private data, especially if a number of offloads were to be 
attached to a single mbuf.

>
> Regards,
> Olivier
>
  
Olivier Matz Nov. 23, 2015, 9:10 a.m. UTC | #3
Hi Declan,

On 11/20/2015 06:26 PM, Declan Doherty wrote:
>> The new files are called rte_mbuf_offload, but from what I understand,
>> it is more like a mbuf metadata api. What you call "offload operation"
>> is not called because an offload is attached, but because you call
>> rte_cryptodev_enqueue_burst() on it.
> 
> Maybe rte_mbuf_offload_metadata would be a better name, if not a bit
> more long winded :) The idea of this API set is to give a generic
> framework for attaching the the offload operation meta data to a mbuf
> which will be retrieved at a later point, when the particular offload
> burst function is called. I guess as we only have a single offload
> device at the moment the API may look a little over the top!

Indeed, not sure rte_mbuf_offload_metadata is better.
I'm still not convinced that offload should appear in the name, it
is a bit confusing with hardware offloads (ol_flags). Also, it
suggests that a work is delegated to another entity, but for instance
in this case it is just used as a storage area:

	ol = rte_pktmbuf_offload_alloc(pool, RTE_PKTMBUF_OL_CRYPTO);
	rte_crypto_op_attach_session(&ol->op.crypto, session);
	ol->op.crypto.digest.data = rte_pktmbuf_append(m, digest_len);
	ol->op.crypto.digest.phys_addr = ...;
	/* ... */
	rte_pktmbuf_offload_attach(m, ol);
	ret = rte_cryptodev_enqueue_burst(dev_id, qp_id, &m, 1);

Do you have some other examples in mind that would use this API?

>>> +/** Rearms rte_mbuf_offload default parameters */
>>> +static inline void
>>> +__rte_pktmbuf_offload_reset(struct rte_mbuf_offload *ol,
>>> +        enum rte_mbuf_ol_op_type type)
>>> +{
>>> +    ol->m = NULL;
>>> +    ol->type = type;
>>> +
>>> +    switch (type) {
>>> +    case RTE_PKTMBUF_OL_CRYPTO:
>>> +        __rte_crypto_op_reset(&ol->op.crypto); break;
>>> +    default:
>>> +        break;
>>> +    }
>>> +}
>>
>> Would it work if several OL are registered?
> 
> I can't see any reason why it wouldn't

Sorry, I read it to quickly. I thought it was a
rte_pktmbuf_offload_detach() function. By the way there is no
such function?


>> Also, what is not clear to me is how the offload structure is freed.
>> For instance, I think that calling rte_pktmbuf_free(m) on a mbuf
>> that has a offload attached would result in a leak.
>>
>> It would mean that it is not allowed to call any function that free or
>> reassemble a mbuf when an offload is attached.
> 
> We just need to walk the chain of offloads calling
> rte_pktmbuf_offload_free(), before freeing the mbuf, which will be an
> issue with any externally attached meta data. In the case of
> reassembling I don't see why we would just move the chain to the head mbuf.

Walking the chain of offload + adding the initialization will probably
have a little cost that should be evaluated.

The freeing is probably not the only problem:
- packet duplication: are the offload infos copied? If no, it means that
  the copy is not exactly a copy
- if you chain 2 mbufs that both have offload info attached, how does it
  behave?
- if you prepend a segment to your mbuf, you need to copy the mbuf
  offload pointer, and also parse the list of offload to update the
  ol->m pointer of each element.

>> It seems that these offload structures are only used to pass crypto
>> info to the cryptodev. Would it be a problem to have an API like this?
>>
>>    rx_burst(port, q, mbuf_tab, crypto_tab, n);
>>
> 
> I really dislike this option, there's no direct linkage between mbuf and
> offload operation.
> 
>> Or even:
>>
>>    rx_burst(port, q, crypto_tab, n);
>>
>>    with each *cryto_tab pointing to a mbuf
>>
> 
> I looked at this but it would really hamstring any pipelining
> applications which might want to attach multiple offloads to a mbuf at a
> point in the pipeline for processing at later steps.

As far as I can see, there is already a way to chain several crypto
operations in the crypto structure.

Another option is to use the mbuf offload API (or the m->userdata
pointer which already works for that) only in the application:

- the field is completely ignored by the mbuf api and the dpdk driver
- it is up to the application to initialize it and free it

-> it can only be used when passing mbuf from a part of the app
   to another, so it perfectly matches the pipeline use case.

Example:

app_core1:

  /* receive a mbuf */
  crypto = alloc()
  crypto->xxx = yyy:
  /* ... */
  m->userdata = crypto;
  enqueue(m, app_core2);

app_core2:

  m = dequeue();
  crypto = m->userdata;
  m->userdata = NULL;
  /* api is tx_burst(port, q, mbuf_tab, crypto_tab, n) */
  tx_burst(port, q, &m, &crypto, 1);



Regards,
Olivier
  
Ananyev, Konstantin Nov. 23, 2015, 11:52 a.m. UTC | #4
Hi Olivier,

> On 11/20/2015 06:26 PM, Declan Doherty wrote:
> >> The new files are called rte_mbuf_offload, but from what I understand,
> >> it is more like a mbuf metadata api. What you call "offload operation"
> >> is not called because an offload is attached, but because you call
> >> rte_cryptodev_enqueue_burst() on it.
> >
> > Maybe rte_mbuf_offload_metadata would be a better name, if not a bit
> > more long winded :) The idea of this API set is to give a generic
> > framework for attaching the the offload operation meta data to a mbuf
> > which will be retrieved at a later point, when the particular offload
> > burst function is called. I guess as we only have a single offload
> > device at the moment the API may look a little over the top!
> 
> Indeed, not sure rte_mbuf_offload_metadata is better.
> I'm still not convinced that offload should appear in the name, it
> is a bit confusing with hardware offloads (ol_flags). Also, it
> suggests that a work is delegated to another entity, but for instance
> in this case it is just used as a storage area:
> 
> 	ol = rte_pktmbuf_offload_alloc(pool, RTE_PKTMBUF_OL_CRYPTO);
> 	rte_crypto_op_attach_session(&ol->op.crypto, session);
> 	ol->op.crypto.digest.data = rte_pktmbuf_append(m, digest_len);
> 	ol->op.crypto.digest.phys_addr = ...;
> 	/* ... */
> 	rte_pktmbuf_offload_attach(m, ol);
> 	ret = rte_cryptodev_enqueue_burst(dev_id, qp_id, &m, 1);
> 
> Do you have some other examples in mind that would use this API?
> 
> >>> +/** Rearms rte_mbuf_offload default parameters */
> >>> +static inline void
> >>> +__rte_pktmbuf_offload_reset(struct rte_mbuf_offload *ol,
> >>> +        enum rte_mbuf_ol_op_type type)
> >>> +{
> >>> +    ol->m = NULL;
> >>> +    ol->type = type;
> >>> +
> >>> +    switch (type) {
> >>> +    case RTE_PKTMBUF_OL_CRYPTO:
> >>> +        __rte_crypto_op_reset(&ol->op.crypto); break;
> >>> +    default:
> >>> +        break;
> >>> +    }
> >>> +}
> >>
> >> Would it work if several OL are registered?
> >
> > I can't see any reason why it wouldn't
> 
> Sorry, I read it to quickly. I thought it was a
> rte_pktmbuf_offload_detach() function. By the way there is no
> such function?
> 
> 
> >> Also, what is not clear to me is how the offload structure is freed.
> >> For instance, I think that calling rte_pktmbuf_free(m) on a mbuf
> >> that has a offload attached would result in a leak.
> >>
> >> It would mean that it is not allowed to call any function that free or
> >> reassemble a mbuf when an offload is attached.
> >
> > We just need to walk the chain of offloads calling
> > rte_pktmbuf_offload_free(), before freeing the mbuf, which will be an
> > issue with any externally attached meta data. In the case of
> > reassembling I don't see why we would just move the chain to the head mbuf.
> 
> Walking the chain of offload + adding the initialization will probably
> have a little cost that should be evaluated.
> 
> The freeing is probably not the only problem:
> - packet duplication: are the offload infos copied? If no, it means that
>   the copy is not exactly a copy
> - if you chain 2 mbufs that both have offload info attached, how does it
>   behave?
> - if you prepend a segment to your mbuf, you need to copy the mbuf
>   offload pointer, and also parse the list of offload to update the
>   ol->m pointer of each element.
> 
> >> It seems that these offload structures are only used to pass crypto
> >> info to the cryptodev. Would it be a problem to have an API like this?
> >>
> >>    rx_burst(port, q, mbuf_tab, crypto_tab, n);
> >>
> >
> > I really dislike this option, there's no direct linkage between mbuf and
> > offload operation.
> >
> >> Or even:
> >>
> >>    rx_burst(port, q, crypto_tab, n);
> >>
> >>    with each *cryto_tab pointing to a mbuf
> >>
> >
> > I looked at this but it would really hamstring any pipelining
> > applications which might want to attach multiple offloads to a mbuf at a
> > point in the pipeline for processing at later steps.
> 
> As far as I can see, there is already a way to chain several crypto
> operations in the crypto structure.
> 
> Another option is to use the mbuf offload API (or the m->userdata
> pointer which already works for that) only in the application:
> 
> - the field is completely ignored by the mbuf api and the dpdk driver
> - it is up to the application to initialize it and free it
> 
> -> it can only be used when passing mbuf from a part of the app
>    to another, so it perfectly matches the pipeline use case.

I don't think we should start to re-use userdata.
Userdata was intended for the upper layer app to pass/store it's
private data associated with mbuf, and we probably should keep it this way.
While mbuf_offload (or whatever we'll name it) supposed to keep data
necessary for crypto (and other future type of devices) to operate with mbuf.

All your comments above about that this new field is just ignored by most mbuf
operations (copy/attach/append/free, etc) are valid.
By I suppose for now we can just state that for mbuf_offload is not supported by
all these mbuf ops.
I understand that current API is probably not perfect and might need to be revised in future.
Again, as it is completely new feature, I suspect it would be a lot of change requests for it anyway. 
But we need some generic way to pass other (not ethdev) specific information within mbuf
between user-level and PMDs for non-ethernet devices (and probably between different PMDs too).

Konstantin
  
Doherty, Declan Nov. 23, 2015, 12:16 p.m. UTC | #5
On 23/11/15 11:52, Ananyev, Konstantin wrote:
> Hi Olivier,
>
>> On 11/20/2015 06:26 PM, Declan Doherty wrote:
>>>> The new files are called rte_mbuf_offload, but from what I understand,
>>>> it is more like a mbuf metadata api. What you call "offload operation"
>>>> is not called because an offload is attached, but because you call
>>>> rte_cryptodev_enqueue_burst() on it.
>>>
>>> Maybe rte_mbuf_offload_metadata would be a better name, if not a bit
>>> more long winded :) The idea of this API set is to give a generic
>>> framework for attaching the the offload operation meta data to a mbuf
>>> which will be retrieved at a later point, when the particular offload
>>> burst function is called. I guess as we only have a single offload
>>> device at the moment the API may look a little over the top!
>>
>> Indeed, not sure rte_mbuf_offload_metadata is better.
>> I'm still not convinced that offload should appear in the name, it
>> is a bit confusing with hardware offloads (ol_flags). Also, it
>> suggests that a work is delegated to another entity, but for instance
>> in this case it is just used as a storage area:
>>
>> 	ol = rte_pktmbuf_offload_alloc(pool, RTE_PKTMBUF_OL_CRYPTO);
>> 	rte_crypto_op_attach_session(&ol->op.crypto, session);
>> 	ol->op.crypto.digest.data = rte_pktmbuf_append(m, digest_len);
>> 	ol->op.crypto.digest.phys_addr = ...;
>> 	/* ... */
>> 	rte_pktmbuf_offload_attach(m, ol);
>> 	ret = rte_cryptodev_enqueue_burst(dev_id, qp_id, &m, 1);
>>
>> Do you have some other examples in mind that would use this API?
>>
>>>>> +/** Rearms rte_mbuf_offload default parameters */
>>>>> +static inline void
>>>>> +__rte_pktmbuf_offload_reset(struct rte_mbuf_offload *ol,
>>>>> +        enum rte_mbuf_ol_op_type type)
>>>>> +{
>>>>> +    ol->m = NULL;
>>>>> +    ol->type = type;
>>>>> +
>>>>> +    switch (type) {
>>>>> +    case RTE_PKTMBUF_OL_CRYPTO:
>>>>> +        __rte_crypto_op_reset(&ol->op.crypto); break;
>>>>> +    default:
>>>>> +        break;
>>>>> +    }
>>>>> +}
>>>>
>>>> Would it work if several OL are registered?
>>>
>>> I can't see any reason why it wouldn't
>>
>> Sorry, I read it to quickly. I thought it was a
>> rte_pktmbuf_offload_detach() function. By the way there is no
>> such function?
>>
>>
>>>> Also, what is not clear to me is how the offload structure is freed.
>>>> For instance, I think that calling rte_pktmbuf_free(m) on a mbuf
>>>> that has a offload attached would result in a leak.
>>>>
>>>> It would mean that it is not allowed to call any function that free or
>>>> reassemble a mbuf when an offload is attached.
>>>
>>> We just need to walk the chain of offloads calling
>>> rte_pktmbuf_offload_free(), before freeing the mbuf, which will be an
>>> issue with any externally attached meta data. In the case of
>>> reassembling I don't see why we would just move the chain to the head mbuf.
>>
>> Walking the chain of offload + adding the initialization will probably
>> have a little cost that should be evaluated.
>>
>> The freeing is probably not the only problem:
>> - packet duplication: are the offload infos copied? If no, it means that
>>    the copy is not exactly a copy
>> - if you chain 2 mbufs that both have offload info attached, how does it
>>    behave?
>> - if you prepend a segment to your mbuf, you need to copy the mbuf
>>    offload pointer, and also parse the list of offload to update the
>>    ol->m pointer of each element.
>>
>>>> It seems that these offload structures are only used to pass crypto
>>>> info to the cryptodev. Would it be a problem to have an API like this?
>>>>
>>>>     rx_burst(port, q, mbuf_tab, crypto_tab, n);
>>>>
>>>
>>> I really dislike this option, there's no direct linkage between mbuf and
>>> offload operation.
>>>
>>>> Or even:
>>>>
>>>>     rx_burst(port, q, crypto_tab, n);
>>>>
>>>>     with each *cryto_tab pointing to a mbuf
>>>>
>>>
>>> I looked at this but it would really hamstring any pipelining
>>> applications which might want to attach multiple offloads to a mbuf at a
>>> point in the pipeline for processing at later steps.
>>
>> As far as I can see, there is already a way to chain several crypto
>> operations in the crypto structure.
>>
>> Another option is to use the mbuf offload API (or the m->userdata
>> pointer which already works for that) only in the application:
>>
>> - the field is completely ignored by the mbuf api and the dpdk driver
>> - it is up to the application to initialize it and free it
>>
>> -> it can only be used when passing mbuf from a part of the app
>>     to another, so it perfectly matches the pipeline use case.
>
> I don't think we should start to re-use userdata.
> Userdata was intended for the upper layer app to pass/store it's
> private data associated with mbuf, and we probably should keep it this way.
> While mbuf_offload (or whatever we'll name it) supposed to keep data
> necessary for crypto (and other future type of devices) to operate with mbuf.
>
> All your comments above about that this new field is just ignored by most mbuf
> operations (copy/attach/append/free, etc) are valid.
> By I suppose for now we can just state that for mbuf_offload is not supported by
> all these mbuf ops.
> I understand that current API is probably not perfect and might need to be revised in future.
> Again, as it is completely new feature, I suspect it would be a lot of change requests for it anyway.
> But we need some generic way to pass other (not ethdev) specific information within mbuf
> between user-level and PMDs for non-ethernet devices (and probably between different PMDs too).
>
> Konstantin
>


Just to re-iterate Konstantin's point above. I'm aware that a number of 
mbuf operations currently are not supported and are not aware of the 
rte_mbuf_offload, but we will be continuing development throughout 2.3 
and beyond to address this. Also I hope we will get much more community 
feedback and interaction so we can get a more definite feature set for 
the library, and  by minimizing the features in this release, it will 
allow more flexibility on how we can develop this part of handling 
offloaded operations and it will limit the ABI issues we will face if it 
turns out we need to change this going forwards.

Thanks
Declan
  
Olivier Matz Nov. 23, 2015, 1:08 p.m. UTC | #6
Hi,

On 11/23/2015 01:16 PM, Declan Doherty wrote:
>> I don't think we should start to re-use userdata.
>> Userdata was intended for the upper layer app to pass/store it's
>> private data associated with mbuf, and we probably should keep it this
>> way.

If the crypto API PMD takes both mbuf and crypto instead of just the
mbuf, the mbuf_offload and the userdata wouldn't be very different,
as the metadata would stay inside the application.

So, if it's application private (the dpdk does not know how to handle
it in case of mbuf free, dup, ...), the content of it should be
app-specific.

>> While mbuf_offload (or whatever we'll name it) supposed to keep data
>> necessary for crypto (and other future type of devices) to operate
>> with mbuf.

Any idea of future devices?

>> All your comments above about that this new field is just ignored by
>> most mbuf
>> operations (copy/attach/append/free, etc) are valid.
>> By I suppose for now we can just state that for mbuf_offload is not
>> supported by
>> all these mbuf ops.

So, who is responsible of freeing the mbuf offload metadata?
The crypto pmd ?

It means that as soon as the mbuf offload is added to the mbuf, it is
not possible to call any other dpdk function that would process the
mbuf... if we cannot call anything else before, why not just passing
the crypto argument as a parameter?

Managing offload data would even be more complex in the future if there
are more than one mbuf_offload attached to the mbuf.

>> I understand that current API is probably not perfect and might need
>> to be revised in future.

The problem is that it's not easy to change the dpdk API now.

>> Again, as it is completely new feature, I suspect it would be a lot of
>> change requests for it anyway.
>> But we need some generic way to pass other (not ethdev) specific
>> information within mbuf
>> between user-level and PMDs for non-ethernet devices (and probably
>> between different PMDs too).

If a crypto PMD needs a crypto info, why not adding a crypto argument?
I feel it's clearer from a user point of view.

About PMD to PMD metadata, do you have any use case?


> Just to re-iterate Konstantin's point above. I'm aware that a number of
> mbuf operations currently are not supported and are not aware of the
> rte_mbuf_offload, but we will be continuing development throughout 2.3
> and beyond to address this. Also I hope we will get much more community
> feedback and interaction so we can get a more definite feature set for
> the library, and  by minimizing the features in this release, it will
> allow more flexibility on how we can develop this part of handling
> offloaded operations and it will limit the ABI issues we will face if it
> turns out we need to change this going forwards.

I can see the amount of work you've done for making the cryptodev
happen in dpdk. I also recognize that I didn't make comments very early.

If we really want to have this feature in the next release, maybe there
is a way to mark it as experimental, meaning that the API is subject to
change? What do you think?

Thomas, any comment?

Regards,
Olivier
  
Thomas Monjalon Nov. 23, 2015, 2:17 p.m. UTC | #7
2015-11-23 14:08, Olivier MATZ:
> 2015-11-23 12:16, Declan Doherty:
> > 2015-11-23 11:52, Ananyev, Konstantin:
> >> I understand that current API is probably not perfect and might need
> >> to be revised in future.
> 
> The problem is that it's not easy to change the dpdk API now.
[...]
> > Just to re-iterate Konstantin's point above. I'm aware that a number of
> > mbuf operations currently are not supported and are not aware of the
> > rte_mbuf_offload, but we will be continuing development throughout 2.3
> > and beyond to address this. Also I hope we will get much more community
> > feedback and interaction so we can get a more definite feature set for
> > the library, and  by minimizing the features in this release, it will
> > allow more flexibility on how we can develop this part of handling
> > offloaded operations and it will limit the ABI issues we will face if it
> > turns out we need to change this going forwards.
> 
> I can see the amount of work you've done for making the cryptodev
> happen in dpdk. I also recognize that I didn't make comments very early.
> 
> If we really want to have this feature in the next release, maybe there
> is a way to mark it as experimental, meaning that the API is subject to
> change? What do you think?
> 
> Thomas, any comment?

Yes, it is a totally new work and it probably needs more time to have a
design working well for most of use cases.
As I already discussed with Olivier, I think it should be considered as
experimental. It means we can try it but do not consider it as a stable
API. So the deprecation process would not apply until the experimental
flag is removed.

For the release 2.2, it would be better to remove the crypto dependency
in mbuf. Do you think it is possible?
  
Doherty, Declan Nov. 23, 2015, 2:33 p.m. UTC | #8
On 23/11/15 13:08, Olivier MATZ wrote:
> Hi,
>
> On 11/23/2015 01:16 PM, Declan Doherty wrote:
>>> I don't think we should start to re-use userdata.
>>> Userdata was intended for the upper layer app to pass/store it's
>>> private data associated with mbuf, and we probably should keep it this
>>> way.
>
> If the crypto API PMD takes both mbuf and crypto instead of just the
> mbuf, the mbuf_offload and the userdata wouldn't be very different,
> as the metadata would stay inside the application.
>
> So, if it's application private (the dpdk does not know how to handle
> it in case of mbuf free, dup, ...), the content of it should be
> app-specific.
>
>>> While mbuf_offload (or whatever we'll name it) supposed to keep data
>>> necessary for crypto (and other future type of devices) to operate
>>> with mbuf.
>
> Any idea of future devices?

Well the QAT hardware supports compression, so I guess that might be a 
likely candidate for future work.

>
>>> All your comments above about that this new field is just ignored by
>>> most mbuf
>>> operations (copy/attach/append/free, etc) are valid.
>>> By I suppose for now we can just state that for mbuf_offload is not
>>> supported by
>>> all these mbuf ops.
>
> So, who is responsible of freeing the mbuf offload metadata?
> The crypto pmd ?
>
> It means that as soon as the mbuf offload is added to the mbuf, it is
> not possible to call any other dpdk function that would process the
> mbuf... if we cannot call anything else before, why not just passing
> the crypto argument as a parameter?
>
> Managing offload data would even be more complex in the future if there
> are more than one mbuf_offload attached to the mbuf.
>
>>> I understand that current API is probably not perfect and might need
>>> to be revised in future.
>
> The problem is that it's not easy to change the dpdk API now.
>
>>> Again, as it is completely new feature, I suspect it would be a lot of
>>> change requests for it anyway.
>>> But we need some generic way to pass other (not ethdev) specific
>>> information within mbuf
>>> between user-level and PMDs for non-ethernet devices (and probably
>>> between different PMDs too).
>
> If a crypto PMD needs a crypto info, why not adding a crypto argument?
> I feel it's clearer from a user point of view.
>
> About PMD to PMD metadata, do you have any use case?

One use case that comes to mind would be the enablement of inline IPsec. 
To enable this you would need a mechanism to attach the IPsec metadata, 
ESP header offsets etc to an mbuf which the PMD can then use to 
correctly complete the tx descriptor.


>
>
>> Just to re-iterate Konstantin's point above. I'm aware that a number of
>> mbuf operations currently are not supported and are not aware of the
>> rte_mbuf_offload, but we will be continuing development throughout 2.3
>> and beyond to address this. Also I hope we will get much more community
>> feedback and interaction so we can get a more definite feature set for
>> the library, and  by minimizing the features in this release, it will
>> allow more flexibility on how we can develop this part of handling
>> offloaded operations and it will limit the ABI issues we will face if it
>> turns out we need to change this going forwards.
>
> I can see the amount of work you've done for making the cryptodev
> happen in dpdk. I also recognize that I didn't make comments very early.
>
> If we really want to have this feature in the next release, maybe there
> is a way to mark it as experimental, meaning that the API is subject to
> change? What do you think?

I wouldn't be against this idea. I think a mechanism for marking new 
libraries / APIs as experimental for their initial release could be very 
useful to allow new features to be introduced with the flexibility to 
make further API changes on user feedback without the overhead of ABI 
management in the subsequent release.

>
> Thomas, any comment?
>
> Regards,
> Olivier
>
  
Thomas Monjalon Nov. 23, 2015, 2:46 p.m. UTC | #9
2015-11-23 15:17, Thomas Monjalon:
> Yes, it is a totally new work and it probably needs more time to have a
> design working well for most of use cases.
> As I already discussed with Olivier, I think it should be considered as
> experimental. It means we can try it but do not consider it as a stable
> API. So the deprecation process would not apply until the experimental
> flag is removed.

If nobody complains, I'll apply this v7 and will send a patch to add some
experimental markers.

> For the release 2.2, it would be better to remove the crypto dependency
> in mbuf. Do you think it is possible?

Sorry, forget it, the dependency is in mbuf_offload.
  
Doherty, Declan Nov. 23, 2015, 3:47 p.m. UTC | #10
On 23/11/15 14:46, Thomas Monjalon wrote:
> 2015-11-23 15:17, Thomas Monjalon:
>> Yes, it is a totally new work and it probably needs more time to have a
>> design working well for most of use cases.
>> As I already discussed with Olivier, I think it should be considered as
>> experimental. It means we can try it but do not consider it as a stable
>> API. So the deprecation process would not apply until the experimental
>> flag is removed.
>
> If nobody complains, I'll apply this v7 and will send a patch to add some
> experimental markers.
>
>> For the release 2.2, it would be better to remove the crypto dependency
>> in mbuf. Do you think it is possible?
>
> Sorry, forget it, the dependency is in mbuf_offload.
>

That sounds good to me, thanks Thomas!
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 68c6d74..73d9578 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -191,6 +191,10 @@  F: lib/librte_mbuf/
 F: doc/guides/prog_guide/mbuf_lib.rst
 F: app/test/test_mbuf.c
 
+Packet buffer offload
+M: Declan Doherty <declan.doherty@intel.com>
+F: lib/librte_mbuf_offload/
+
 Ethernet API
 M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
diff --git a/config/common_bsdapp b/config/common_bsdapp
index 8803350..ba2533a 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -332,6 +332,12 @@  CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128
 
 #
+# Compile librte_mbuf_offload
+#
+CONFIG_RTE_LIBRTE_MBUF_OFFLOAD=y
+CONFIG_RTE_LIBRTE_MBUF_OFFLOAD_DEBUG=n
+
+#
 # Compile librte_timer
 #
 CONFIG_RTE_LIBRTE_TIMER=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 815bea3..4c52f78 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -340,6 +340,12 @@  CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128
 
 #
+# Compile librte_mbuf_offload
+#
+CONFIG_RTE_LIBRTE_MBUF_OFFLOAD=y
+CONFIG_RTE_LIBRTE_MBUF_OFFLOAD_DEBUG=n
+
+#
 # Compile librte_timer
 #
 CONFIG_RTE_LIBRTE_TIMER=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index bdb6130..199cc2c 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -104,6 +104,7 @@  There are many libraries, so their headers may be grouped by topics:
 
 - **containers**:
   [mbuf]               (@ref rte_mbuf.h),
+  [mbuf_offload]       (@ref rte_mbuf_offload.h),
   [ring]               (@ref rte_ring.h),
   [distributor]        (@ref rte_distributor.h),
   [reorder]            (@ref rte_reorder.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 7244b8f..15bba16 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -48,6 +48,7 @@  INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_kvargs \
                           lib/librte_lpm \
                           lib/librte_mbuf \
+                          lib/librte_mbuf_offload \
                           lib/librte_mempool \
                           lib/librte_meter \
                           lib/librte_net \
diff --git a/lib/Makefile b/lib/Makefile
index 4c5c1b4..ef172ea 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -36,6 +36,7 @@  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
 DIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += librte_mempool
 DIRS-$(CONFIG_RTE_LIBRTE_MBUF) += librte_mbuf
+DIRS-$(CONFIG_RTE_LIBRTE_MBUF_OFFLOAD) += librte_mbuf_offload
 DIRS-$(CONFIG_RTE_LIBRTE_TIMER) += librte_timer
 DIRS-$(CONFIG_RTE_LIBRTE_CFGFILE) += librte_cfgfile
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ef1ee26..0b6741a 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -728,6 +728,9 @@  typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
 typedef uint64_t MARKER64[0]; /**< marker that allows us to overwrite 8 bytes
                                * with a single assignment */
 
+/** Opaque rte_mbuf_offload  structure declarations */
+struct rte_mbuf_offload;
+
 /**
  * The generic rte_mbuf, containing a packet mbuf.
  */
@@ -841,6 +844,9 @@  struct rte_mbuf {
 
 	/** Timesync flags for use with IEEE1588. */
 	uint16_t timesync;
+
+	/* Chain of off-load operations to perform on mbuf */
+	struct rte_mbuf_offload *offload_ops;
 } __rte_cache_aligned;
 
 static inline uint16_t rte_pktmbuf_priv_size(struct rte_mempool *mp);
diff --git a/lib/librte_mbuf_offload/Makefile b/lib/librte_mbuf_offload/Makefile
new file mode 100644
index 0000000..acdb449
--- /dev/null
+++ b/lib/librte_mbuf_offload/Makefile
@@ -0,0 +1,52 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2015 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_mbuf_offload.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+
+EXPORT_MAP := rte_mbuf_offload_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_MBUF_OFFLOAD) := rte_mbuf_offload.c
+
+# install includes
+SYMLINK-$(CONFIG_RTE_LIBRTE_MBUF_OFFLOAD)-include := rte_mbuf_offload.h
+
+# this lib needs eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MBUF_OFFLOAD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MBUF_OFFLOAD) += lib/librte_cryptodev
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mbuf_offload/rte_mbuf_offload.c b/lib/librte_mbuf_offload/rte_mbuf_offload.c
new file mode 100644
index 0000000..5c0c9dd
--- /dev/null
+++ b/lib/librte_mbuf_offload/rte_mbuf_offload.c
@@ -0,0 +1,100 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <rte_common.h>
+
+#include "rte_mbuf_offload.h"
+
+/** Initialize rte_mbuf_offload structure */
+static void
+rte_pktmbuf_offload_init(struct rte_mempool *mp,
+		__rte_unused void *opaque_arg,
+		void *_op_data,
+		__rte_unused unsigned i)
+{
+	struct rte_mbuf_offload *ol = _op_data;
+
+	memset(_op_data, 0, mp->elt_size);
+
+	ol->type = RTE_PKTMBUF_OL_NOT_SPECIFIED;
+	ol->mp = mp;
+}
+
+
+struct rte_mempool *
+rte_pktmbuf_offload_pool_create(const char *name, unsigned size,
+		unsigned cache_size, uint16_t priv_size, int socket_id)
+{
+	struct rte_pktmbuf_offload_pool_private *priv;
+	unsigned elt_size = sizeof(struct rte_mbuf_offload) + priv_size;
+
+
+	/* lookup mempool in case already allocated */
+	struct rte_mempool *mp = rte_mempool_lookup(name);
+
+	if (mp != NULL) {
+		priv = (struct rte_pktmbuf_offload_pool_private *)
+				rte_mempool_get_priv(mp);
+
+		if (priv->offload_priv_size <  priv_size ||
+				mp->elt_size != elt_size ||
+				mp->cache_size < cache_size ||
+				mp->size < size) {
+			mp = NULL;
+			return NULL;
+		}
+		return mp;
+	}
+
+	mp = rte_mempool_create(
+			name,
+			size,
+			elt_size,
+			cache_size,
+			sizeof(struct rte_pktmbuf_offload_pool_private),
+			NULL,
+			NULL,
+			rte_pktmbuf_offload_init,
+			NULL,
+			socket_id,
+			0);
+
+	if (mp == NULL)
+		return NULL;
+
+	priv = (struct rte_pktmbuf_offload_pool_private *)
+			rte_mempool_get_priv(mp);
+
+	priv->offload_priv_size = priv_size;
+	return mp;
+}
diff --git a/lib/librte_mbuf_offload/rte_mbuf_offload.h b/lib/librte_mbuf_offload/rte_mbuf_offload.h
new file mode 100644
index 0000000..1d9bb2b
--- /dev/null
+++ b/lib/librte_mbuf_offload/rte_mbuf_offload.h
@@ -0,0 +1,302 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MBUF_OFFLOAD_H_
+#define _RTE_MBUF_OFFLOAD_H_
+
+/**
+ * @file
+ * RTE mbuf offload
+ *
+ * The rte_mbuf_offload library provides the ability to specify a device generic
+ * off-load operation independent of the current Rx/Tx Ethernet offloads
+ * supported within the rte_mbuf structure, and add supports for multiple
+ * off-load operations and offload device types.
+ *
+ * The rte_mbuf_offload specifies the particular off-load operation type, such
+ * as a crypto operation, and provides a container for the operations
+ * parameter's inside the op union. These parameters are then used by the
+ * device which supports that operation to perform the specified offload.
+ *
+ * This library provides an API to create pre-allocated mempool of offload
+ * operations, with supporting allocate and free functions. It also provides
+ * APIs for attaching an offload to a mbuf, as well as an API to retrieve a
+ * specified offload type from an mbuf offload chain.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+
+
+/** packet mbuf offload operation types */
+enum rte_mbuf_ol_op_type {
+	RTE_PKTMBUF_OL_NOT_SPECIFIED = 0,
+	/**< Off-load not specified */
+	RTE_PKTMBUF_OL_CRYPTO
+	/**< Crypto offload operation */
+};
+
+/**
+ * Generic packet mbuf offload
+ * This is used to specify a offload operation to be performed on a rte_mbuf.
+ * Multiple offload operations can be chained to the same mbuf, but only a
+ * single offload operation of a particular type can be in the chain
+ */
+struct rte_mbuf_offload {
+	struct rte_mbuf_offload *next;	/**< next offload in chain */
+	struct rte_mbuf *m;		/**< mbuf offload is attached to */
+	struct rte_mempool *mp;		/**< mempool offload allocated from */
+
+	enum rte_mbuf_ol_op_type type;	/**< offload type */
+	union {
+		struct rte_crypto_op crypto;	/**< Crypto operation */
+	} op;
+};
+
+/**< private data structure belonging to packet mbug offload mempool */
+struct rte_pktmbuf_offload_pool_private {
+	uint16_t offload_priv_size;
+	/**< Size of private area in each mbuf_offload. */
+};
+
+
+/**
+ * Creates a mempool of rte_mbuf_offload objects
+ *
+ * @param	name		mempool name
+ * @param	size		number of objects in mempool
+ * @param	cache_size	cache size of objects for each core
+ * @param	priv_size	size of private data to be allocated with each
+ *				rte_mbuf_offload object
+ * @param	socket_id	Socket on which to allocate mempool objects
+ *
+ * @return
+ * - On success returns a valid mempool of rte_mbuf_offload objects
+ * - On failure return NULL
+ */
+extern struct rte_mempool *
+rte_pktmbuf_offload_pool_create(const char *name, unsigned size,
+		unsigned cache_size, uint16_t priv_size, int socket_id);
+
+
+/**
+ * Returns private data size allocated with each rte_mbuf_offload object by
+ * the mempool
+ *
+ * @param	mpool	rte_mbuf_offload mempool
+ *
+ * @return	private data size
+ */
+static inline uint16_t
+__rte_pktmbuf_offload_priv_size(struct rte_mempool *mpool)
+{
+	struct rte_pktmbuf_offload_pool_private *priv =
+			rte_mempool_get_priv(mpool);
+
+	return priv->offload_priv_size;
+}
+
+/**
+ * Get specified off-load operation type from mbuf.
+ *
+ * @param	m		packet mbuf.
+ * @param	type		offload operation type requested.
+ *
+ * @return
+ * - On success retruns rte_mbuf_offload pointer
+ * - On failure returns NULL
+ *
+ */
+static inline struct rte_mbuf_offload *
+rte_pktmbuf_offload_get(struct rte_mbuf *m, enum rte_mbuf_ol_op_type type)
+{
+	struct rte_mbuf_offload *ol;
+
+	for (ol = m->offload_ops; ol != NULL; ol = ol->next)
+		if (ol->type == type)
+			return ol;
+
+	return ol;
+}
+
+/**
+ * Attach a rte_mbuf_offload to a mbuf. We only support a single offload of any
+ * one type in our chain of offloads.
+ *
+ * @param	m	packet mbuf.
+ * @param	ol	rte_mbuf_offload strucutre to be attached
+ *
+ * @returns
+ * - On success returns the pointer to the offload we just added
+ * - On failure returns NULL
+ */
+static inline struct rte_mbuf_offload *
+rte_pktmbuf_offload_attach(struct rte_mbuf *m, struct rte_mbuf_offload *ol)
+{
+	struct rte_mbuf_offload **ol_last;
+
+	for (ol_last = &m->offload_ops;	ol_last[0] != NULL;
+			ol_last = &ol_last[0]->next)
+		if (ol_last[0]->type == ol->type)
+			return NULL;
+
+	ol_last[0] = ol;
+	ol_last[0]->m = m;
+	ol_last[0]->next = NULL;
+
+	return ol_last[0];
+}
+
+
+/** Rearms rte_mbuf_offload default parameters */
+static inline void
+__rte_pktmbuf_offload_reset(struct rte_mbuf_offload *ol,
+		enum rte_mbuf_ol_op_type type)
+{
+	ol->m = NULL;
+	ol->type = type;
+
+	switch (type) {
+	case RTE_PKTMBUF_OL_CRYPTO:
+		__rte_crypto_op_reset(&ol->op.crypto); break;
+	default:
+		break;
+	}
+}
+
+/** Allocate rte_mbuf_offload from mempool */
+static inline struct rte_mbuf_offload *
+__rte_pktmbuf_offload_raw_alloc(struct rte_mempool *mp)
+{
+	void *buf = NULL;
+
+	if (rte_mempool_get(mp, &buf) < 0)
+		return NULL;
+
+	return (struct rte_mbuf_offload *)buf;
+}
+
+/**
+ * Allocate a rte_mbuf_offload with a specified operation type from
+ * rte_mbuf_offload mempool
+ *
+ * @param	mpool		rte_mbuf_offload mempool
+ * @param	type		offload operation type
+ *
+ * @returns
+ * - On success returns a valid rte_mbuf_offload structure
+ * - On failure returns NULL
+ */
+static inline struct rte_mbuf_offload *
+rte_pktmbuf_offload_alloc(struct rte_mempool *mpool,
+		enum rte_mbuf_ol_op_type type)
+{
+	struct rte_mbuf_offload *ol = __rte_pktmbuf_offload_raw_alloc(mpool);
+
+	if (ol != NULL)
+		__rte_pktmbuf_offload_reset(ol, type);
+
+	return ol;
+}
+
+/**
+ * free rte_mbuf_offload structure
+ */
+static inline void
+rte_pktmbuf_offload_free(struct rte_mbuf_offload *ol)
+{
+	if (ol->mp != NULL)
+		rte_mempool_put(ol->mp, ol);
+}
+
+/**
+ * Checks if the private data of a rte_mbuf_offload has enough capacity for
+ * requested size
+ *
+ * @returns
+ * - if sufficient space available returns pointer to start of private data
+ * - if insufficient space returns NULL
+ */
+static inline void *
+__rte_pktmbuf_offload_check_priv_data_size(struct rte_mbuf_offload *ol,
+		uint16_t size)
+{
+	uint16_t priv_size;
+
+	if (likely(ol->mp != NULL)) {
+		priv_size = __rte_pktmbuf_offload_priv_size(ol->mp);
+
+		if (likely(priv_size >= size))
+			return (void *)(ol + 1);
+	}
+	return NULL;
+}
+
+/**
+ * Allocate space for crypto xforms in the private data space of the
+ * rte_mbuf_offload. This also defaults the crypto xform type and configures
+ * the chaining of the xform in the crypto operation
+ *
+ * @return
+ * - On success returns pointer to first crypto xform in crypto operations chain
+ * - On failure returns NULL
+ */
+static inline struct rte_crypto_xform *
+rte_pktmbuf_offload_alloc_crypto_xforms(struct rte_mbuf_offload *ol,
+		unsigned nb_xforms)
+{
+	struct rte_crypto_xform *xform;
+	void *priv_data;
+	uint16_t size;
+
+	size = sizeof(struct rte_crypto_xform) * nb_xforms;
+	priv_data = __rte_pktmbuf_offload_check_priv_data_size(ol, size);
+
+	if (priv_data == NULL)
+		return NULL;
+
+	ol->op.crypto.xform = xform = (struct rte_crypto_xform *)priv_data;
+
+	do {
+		xform->type = RTE_CRYPTO_XFORM_NOT_SPECIFIED;
+		xform = xform->next = --nb_xforms > 0 ? xform + 1 : NULL;
+	} while (xform);
+
+	return ol->op.crypto.xform;
+}
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MBUF_OFFLOAD_H_ */
diff --git a/lib/librte_mbuf_offload/rte_mbuf_offload_version.map b/lib/librte_mbuf_offload/rte_mbuf_offload_version.map
new file mode 100644
index 0000000..3d3b06a
--- /dev/null
+++ b/lib/librte_mbuf_offload/rte_mbuf_offload_version.map
@@ -0,0 +1,7 @@ 
+DPDK_2.2 {
+	global:
+
+	rte_pktmbuf_offload_pool_create;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 5d382bb..2b8ddce 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -116,6 +116,7 @@  ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KVARGS)         += -lrte_kvargs
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MBUF)           += -lrte_mbuf
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MBUF_OFFLOAD)   += -lrte_mbuf_offload
 _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)        += -lrte_ip_frag
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ETHER)          += -lethdev
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CRYPTODEV)      += -lrte_cryptodev