[2/2] ethdev: tunnel offload model
diff mbox series

Message ID 20200625160348.26220-3-getelson@mellanox.com
State New
Delegated to: Ferruh Yigit
Headers show
Series
  • ethdev: tunnel offload model
Related show

Checks

Context Check Description
ci/Intel-compilation fail Compilation issues
ci/travis-robot warning Travis build: failed
ci/checkpatch success coding style OK

Commit Message

Gregory Etelson June 25, 2020, 4:03 p.m. UTC
From: Eli Britstein <elibr@mellanox.com>

Hardware vendors implement tunneled traffic offload techniques
differently. Although RTE flow API provides tools capable to offload
all sorts of network stacks, software application must reference this
hardware differences in flow rules compilation. As the result tunneled
traffic flow rules that utilize hardware capabilities can be different
for the same traffic.

Tunnel port offload proposed in [1] provides software application with
unified rules model for tunneled traffic regardless underlying
hardware.
 - The model introduces a concept of a virtual tunnel port (VTP).
 - The model uses VTP to offload ingress tunneled network traffic 
   with RTE flow rules.
 - The model is implemented as set of helper functions. Each PMD
   implements VTP offload according to underlying hardware offload
   capabilities.  Applications must query PMD for VTP flow
   items / actions before using in creation of a VTP flow rule.

The model components:
- Virtual Tunnel Port (VTP) is a stateless software object that
  describes tunneled network traffic.  VTP object usually contains
  descriptions of outer headers, tunnel headers and inner headers.
- Tunnel Steering flow Rule (TSR) detects tunneled packets and
  delegates them to tunnel processing infrastructure, implemented
  in PMD for optimal hardware utilization, for further processing.
- Tunnel Matching flow Rule (TMR) verifies packet configuration and
  runs offload actions in case of a match.

Application actions:
1 Initialize VTP object according to tunnel
  network parameters.
2 Create TSR flow rule:
2.1 Query PMD for VTP actions: application can query for VTP actions
    more than once
    int
    rte_flow_tunnel_decap_set(uint16_t port_id,
                              struct rte_flow_tunnel *tunnel,
                              struct rte_flow_action **pmd_actions,
                              uint32_t *num_of_pmd_actions,
                              struct rte_flow_error *error);

2.2 Integrate PMD actions into TSR actions list.
2.3 Create TSR flow rule:
    flow create <port> group 0
          match {tunnel items} / end
          actions {PMD actions} / {App actions} / end

3 Create TMR flow rule:
3.1 Query PMD for VTP items: application can query for VTP items
    more than once
    int
    rte_flow_tunnel_match(uint16_t port_id,
                          struct rte_flow_tunnel *tunnel,
                          struct rte_flow_item **pmd_items,
                          uint32_t *num_of_pmd_items,
                          struct rte_flow_error *error);

3.2 Integrate PMD items into TMR items list:
3.3 Create TMR flow rule
    flow create <port> group 0
          match {PMD items} / {APP items} / end
          actions {offload actions} / end

The model provides helper function call to restore packets that miss
tunnel TMR rules to its original state:
int
rte_flow_get_restore_info(uint16_t port_id,
                          struct rte_mbuf *mbuf,
                          struct rte_flow_restore_info *info,
                          struct rte_flow_error *error);

rte_tunnel object filled by the call inside
rte_flow_restore_info *info parameter can be used by the application
to create new TMR rule for that tunnel.

The model requirements:
Software application must initialize
rte_tunnel object with tunnel parameters before calling
rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().

PMD actions array obtained in rte_flow_tunnel_decap_set() must be
released by application with rte_flow_action_release() call.
Application can release the actionsfter TSR rule was created.

PMD items array obtained with rte_flow_tunnel_match() must be released
by application with rte_flow_item_release() call.  Application can
release the items after rule was created. However, if the application
needs to create additional TMR rule for the same tunnel it will need
to obtain PMD items again.

Application cannot destroy rte_tunnel object before it releases all
PMD actions & PMD items referencing that tunnel.

[1] https://mails.dpdk.org/archives/dev/2020-June/169656.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 5 files changed, 450 insertions(+)

Comments

Gregory Etelson July 1, 2020, 6:52 a.m. UTC | #1
> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Tuesday, June 30, 2020 14:30
> To: Gregory Etelson <getelson@mellanox.com>
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Neil Horman <nhorman@tuxdriver.com>;
> Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; sriharsha.basavapatna@broadcom.com;
> hemal.shah@broadcom.com; Eli Britstein <elibr@mellanox.com>; Oz Shlomo
> <ozsh@mellanox.com>
> Subject: Re: [PATCH 2/2] ethdev: tunnel offload model
> 
> 
> 
> On 30/06/2020 10:05, Gregory Etelson wrote:
> >
> > + maintainers
> >
> > -----Original Message-----
> > From: Gregory Etelson <getelson@mellanox.com>
> > Sent: Thursday, June 25, 2020 19:04
> > To: dev@dpdk.org
> > Cc: Gregory Etelson <getelson@mellanox.com>; Matan Azrad
> > <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>; Eli
> > Britstein <elibr@mellanox.com>; Ori Kam <orika@mellanox.com>
> > Subject: [PATCH 2/2] ethdev: tunnel offload model
> >
> > From: Eli Britstein <elibr@mellanox.com>
> >
> > Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload all sorts
> of network stacks, software application must reference this hardware
> differences in flow rules compilation. As the result tunneled traffic flow rules
> that utilize hardware capabilities can be different for the same traffic.
> >
> > Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying hardware.
> >  - The model introduces a concept of a virtual tunnel port (VTP).
> >  - The model uses VTP to offload ingress tunneled network traffic
> >    with RTE flow rules.
> >  - The model is implemented as set of helper functions. Each PMD
> >    implements VTP offload according to underlying hardware offload
> >    capabilities.  Applications must query PMD for VTP flow
> >    items / actions before using in creation of a VTP flow rule.
> >
> > The model components:
> > - Virtual Tunnel Port (VTP) is a stateless software object that
> >   describes tunneled network traffic.  VTP object usually contains
> >   descriptions of outer headers, tunnel headers and inner headers.
> > - Tunnel Steering flow Rule (TSR) detects tunneled packets and
> >   delegates them to tunnel processing infrastructure, implemented
> >   in PMD for optimal hardware utilization, for further processing.
> > - Tunnel Matching flow Rule (TMR) verifies packet configuration and
> >   runs offload actions in case of a match.
> >
> > Application actions:
> > 1 Initialize VTP object according to tunnel
> >   network parameters.
> > 2 Create TSR flow rule:
> > 2.1 Query PMD for VTP actions: application can query for VTP actions
> >     more than once
> >     int
> >     rte_flow_tunnel_decap_set(uint16_t port_id,
> >                               struct rte_flow_tunnel *tunnel,
> >                               struct rte_flow_action **pmd_actions,
> >                               uint32_t *num_of_pmd_actions,
> >                               struct rte_flow_error *error);
> >
> > 2.2 Integrate PMD actions into TSR actions list.
> > 2.3 Create TSR flow rule:
> >     flow create <port> group 0
> >           match {tunnel items} / end
> >           actions {PMD actions} / {App actions} / end
> >
> > 3 Create TMR flow rule:
> > 3.1 Query PMD for VTP items: application can query for VTP items
> >     more than once
> >     int
> >     rte_flow_tunnel_match(uint16_t port_id,
> >                           struct rte_flow_tunnel *tunnel,
> >                           struct rte_flow_item **pmd_items,
> >                           uint32_t *num_of_pmd_items,
> >                           struct rte_flow_error *error);
> >
> > 3.2 Integrate PMD items into TMR items list:
> > 3.3 Create TMR flow rule
> >     flow create <port> group 0
> >           match {PMD items} / {APP items} / end
> >           actions {offload actions} / end
> >
> > The model provides helper function call to restore packets that miss tunnel
> TMR rules to its original state:
> > int
> > rte_flow_get_restore_info(uint16_t port_id,
> >                           struct rte_mbuf *mbuf,
> >                           struct rte_flow_restore_info *info,
> >                           struct rte_flow_error *error);
> >
> > rte_tunnel object filled by the call inside rte_flow_restore_info *info
> parameter can be used by the application to create new TMR rule for that
> tunnel.
> >
> > The model requirements:
> > Software application must initialize
> > rte_tunnel object with tunnel parameters before calling
> > rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> >
> > PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> > Application can release the actionsfter TSR rule was created.
> >
> > PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call.  Application can release the
> items after rule was created. However, if the application needs to create
> additional TMR rule for the same tunnel it will need to obtain PMD items
> again.
> >
> > Application cannot destroy rte_tunnel object before it releases all PMD
> actions & PMD items referencing that tunnel.
> >
> > [1]
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > s.dpdk.org%2Farchives%2Fdev%2F2020-
> June%2F169656.html&amp;data=02%7C01
> >
> %7Cgetelson%40mellanox.com%7C1178dd5eb0214d807d6d08d81ce8e739%
> 7Ca65297
> >
> 1c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637291133935729423&amp;sd
> ata=G%2B
> > GIPy%2Bxz73sgmkem4jojYGKDDsXs8nKVK0Ktdek28c%3D&amp;reserved=0
> >
> > Signed-off-by: Eli Britstein <elibr@mellanox.com>
> > Acked-by: Ori Kam <orika@mellanox.com>
> > ---
> >  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
> >  lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
> >  lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
> >  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
> >  5 files changed, 450 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index d5dd18ce99..cfd98c2e7d 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -3010,6 +3010,111 @@ operations include:
> >  - Duplication of a complete flow rule description.
> >  - Pattern item or action name retrieval.
> >
> > +Tunneled traffic offload
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Provide software application with unified rules model for tunneled
> > +traffic regardless underlying hardware.
> > +
> > + - The model introduces a concept of a virtual tunnel port (VTP).
> > + - The model uses VTP to offload ingress tunneled network traffic
> > +   with RTE flow rules.
> > + - The model is implemented as set of helper functions. Each PMD
> > +   implements VTP offload according to underlying hardware offload
> > +   capabilities.  Applications must query PMD for VTP flow
> > +   items / actions before using in creation of a VTP flow rule.
> > +
> > +The model components:
> > +
> > +- Virtual Tunnel Port (VTP) is a stateless software object that
> > +  describes tunneled network traffic.  VTP object usually contains
> > +  descriptions of outer headers, tunnel headers and inner headers.
> > +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> > +  delegates them to tunnel processing infrastructure, implemented
> > +  in PMD for optimal hardware utilization, for further processing.
> > +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> > +  runs offload actions in case of a match.
> > +
> > +Application actions:
> > +
> > +1 Initialize VTP object according to tunnel network parameters.
> > +
> > +2 Create TSR flow rule.
> > +
> > +2.1 Query PMD for VTP actions. Application can query for VTP actions
> more than once.
> > +
> > +  .. code-block:: c
> > +
> > +    int
> > +    rte_flow_tunnel_decap_set(uint16_t port_id,
> > +                              struct rte_flow_tunnel *tunnel,
> > +                              struct rte_flow_action **pmd_actions,
> > +                              uint32_t *num_of_pmd_actions,
> > +                              struct rte_flow_error *error);
> > +
> > +2.2 Integrate PMD actions into TSR actions list.
> > +
> > +2.3 Create TSR flow rule.
> > +
> > +    .. code-block:: console
> > +
> > +      flow create <port> group 0 match {tunnel items} / end actions
> > + {PMD actions} / {App actions} / end
> > +
> > +3 Create TMR flow rule.
> > +
> > +3.1 Query PMD for VTP items. Application can query for VTP items more
> than once.
> > +
> > +    .. code-block:: c
> > +
> > +      int
> > +      rte_flow_tunnel_match(uint16_t port_id,
> > +                            struct rte_flow_tunnel *tunnel,
> > +                            struct rte_flow_item **pmd_items,
> > +                            uint32_t *num_of_pmd_items,
> > +                            struct rte_flow_error *error);
> > +
> > +3.2 Integrate PMD items into TMR items list.
> > +
> > +3.3 Create TMR flow rule.
> > +
> > +    .. code-block:: console
> > +
> > +      flow create <port> group 0 match {PMD items} / {APP items} /
> > + end actions {offload actions} / end
> > +
> > +The model provides helper function call to restore packets that miss
> > +tunnel TMR rules to its original state:
> > +
> > +.. code-block:: c
> > +
> > +  int
> > +  rte_flow_get_restore_info(uint16_t port_id,
> > +                            struct rte_mbuf *mbuf,
> > +                            struct rte_flow_restore_info *info,
> > +                            struct rte_flow_error *error);
> > +
> > +rte_tunnel object filled by the call inside ``rte_flow_restore_info
> > +*info parameter`` can be used by the application to create new TMR
> > +rule for that tunnel.
> > +
> > +The model requirements:
> > +
> > +Software application must initialize
> > +rte_tunnel object with tunnel parameters before calling
> > +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> > +
> > +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> > +released by application with rte_flow_action_release() call.
> > +Application can release the actionsfter TSR rule was created.
> > +
> > +PMD items array obtained with rte_flow_tunnel_match() must be
> > +released by application with rte_flow_item_release() call.
> > +Application can release the items after rule was created. However, if
> > +the application needs to create additional TMR rule for the same
> > +tunnel it will need to obtain PMD items again.
> > +
> > +Application cannot destroy rte_tunnel object before it releases all
> > +PMD actions & PMD items referencing that tunnel.
> > +
> >  Caveats
> >  -------
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> > b/lib/librte_ethdev/rte_ethdev_version.map
> > index 7155056045..63800811df 100644
> > --- a/lib/librte_ethdev/rte_ethdev_version.map
> > +++ b/lib/librte_ethdev/rte_ethdev_version.map
> > @@ -241,4 +241,9 @@ EXPERIMENTAL {
> >  	__rte_ethdev_trace_rx_burst;
> >  	__rte_ethdev_trace_tx_burst;
> >  	rte_flow_get_aged_flows;
> > +	rte_flow_tunnel_decap_set;
> > +	rte_flow_tunnel_match;
> > +	rte_flow_tunnel_get_restore_info;
> > +	rte_flow_tunnel_action_decap_release;
> > +	rte_flow_tunnel_item_release;
> >  };
> > diff --git a/lib/librte_ethdev/rte_flow.c
> > b/lib/librte_ethdev/rte_flow.c index c19d25649f..2dc5bfbb3f 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -1268,3 +1268,115 @@ rte_flow_get_aged_flows(uint16_t port_id,
> void **contexts,
> >  				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> >  				  NULL, rte_strerror(ENOTSUP));
> >  }
> > +
> > +int
> > +rte_flow_tunnel_decap_set(uint16_t port_id,
> > +			  struct rte_flow_tunnel *tunnel,
> > +			  struct rte_flow_action **actions,
> > +			  uint32_t *num_of_actions,
> > +			  struct rte_flow_error *error)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->tunnel_decap_set)) {
> > +		return flow_err(port_id,
> > +				ops->tunnel_decap_set(dev, tunnel, actions,
> > +						      num_of_actions, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_match(uint16_t port_id,
> > +		      struct rte_flow_tunnel *tunnel,
> > +		      struct rte_flow_item **items,
> > +		      uint32_t *num_of_items,
> > +		      struct rte_flow_error *error) {
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->tunnel_match)) {
> > +		return flow_err(port_id,
> > +				ops->tunnel_match(dev, tunnel, items,
> > +						  num_of_items, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> > +				 struct rte_mbuf *m,
> > +				 struct rte_flow_restore_info *restore_info,
> > +				 struct rte_flow_error *error)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->get_restore_info)) {
> > +		return flow_err(port_id,
> > +				ops->get_restore_info(dev, m, restore_info,
> > +						      error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> > +				     struct rte_flow_action *actions,
> > +				     uint32_t num_of_actions,
> > +				     struct rte_flow_error *error) {
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->action_release)) {
> > +		return flow_err(port_id,
> > +				ops->action_release(dev, actions,
> > +						    num_of_actions, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_item_release(uint16_t port_id,
> > +			     struct rte_flow_item *items,
> > +			     uint32_t num_of_items,
> > +			     struct rte_flow_error *error) {
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->item_release)) {
> > +		return flow_err(port_id,
> > +				ops->item_release(dev, items,
> > +						  num_of_items, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > diff --git a/lib/librte_ethdev/rte_flow.h
> > b/lib/librte_ethdev/rte_flow.h index b0e4199192..1374b6e5a7 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -3324,6 +3324,202 @@ int
> >  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
> >  			uint32_t nb_contexts, struct rte_flow_error *error);
> >
> > +/* Tunnel information. */
> > +__rte_experimental
> 
> __rte_experimental is not required AFAIK on structure definitions, structure
> definitions are not symbols, just on exported functions and variables.
> 
> Did you get a specific warning, that made you add this?

[Gregory Etelson] The attribute is not required in structures definition.
It's removed in v2 patch version

> 
> > +struct rte_flow_ip_tunnel_key {
> > +	rte_be64_t tun_id; /**< Tunnel identification. */
> > +	union {
> > +		struct {
> > +			rte_be32_t src_addr; /**< IPv4 source address. */
> > +			rte_be32_t dst_addr; /**< IPv4 destination address.
> */
> > +		} ipv4;
> > +		struct {
> > +			uint8_t src_addr[16]; /**< IPv6 source address. */
> > +			uint8_t dst_addr[16]; /**< IPv6 destination address.
> */
> > +		} ipv6;
> > +	} u;
> > +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> > +	rte_be16_t tun_flags; /**< Tunnel flags. */
> > +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> > +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> > +	rte_be32_t label; /**< Flow Label for IPv6. */
> > +	rte_be16_t tp_src; /**< Tunnel port source. */
> > +	rte_be16_t tp_dst; /**< Tunnel port destination. */ };
> > +
> > +
> > +/* Tunnel has a type and the key information. */ __rte_experimental
> > +struct rte_flow_tunnel {
> > +	/**
> > +	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> > +	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> > +	 */
> > +	enum rte_flow_item_type		type;
> > +	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
> > +};
> > +
> > +/**
> > + * Indicate that the packet has a tunnel.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> > +
> > +/**
> > + * Indicate that the packet has a non decapsulated tunnel header.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> > +
> > +/**
> > + * Indicate that the packet has a group_id.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> > +
> > +/**
> > + * Restore information structure to communicate the current packet
> > +processing
> > + * state when some of the processing pipeline is done in hardware and
> > +should
> > + * continue in software.
> > + */
> > +__rte_experimental
> > +struct rte_flow_restore_info {
> > +	/**
> > +	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation
> of
> > +	 * other fields in struct rte_flow_restore_info.
> > +	 */
> > +	uint64_t flags;
> > +	uint32_t group_id; /**< Group ID. */
> > +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */ };
> > +
> > +/**
> > + * Allocate an array of actions to be used in rte_flow_create, to
> > +implement
> > + * tunnel-decap-set for the given tunnel.
> > + * Sample usage:
> > + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> > + *            jump group 0 / end
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] tunnel
> > + *   Tunnel properties.
> > + * @param[out] actions
> > + *   Array of actions to be allocated by the PMD. This array should be
> > + *   concatenated with the actions array provided to rte_flow_create.
> > + * @param[out] num_of_actions
> > + *   Number of actions allocated.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_decap_set(uint16_t port_id,
> > +			  struct rte_flow_tunnel *tunnel,
> > +			  struct rte_flow_action **actions,
> > +			  uint32_t *num_of_actions,
> > +			  struct rte_flow_error *error);
> > +
> > +/**
> > + * Allocate an array of items to be used in rte_flow_create, to
> > +implement
> > + * tunnel-match for the given tunnel.
> > + * Sample usage:
> > + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> > + *           inner-header-matches / end
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] tunnel
> > + *   Tunnel properties.
> > + * @param[out] items
> > + *   Array of items to be allocated by the PMD. This array should be
> > + *   concatenated with the items array provided to rte_flow_create.
> > + * @param[out] num_of_items
> > + *   Number of items allocated.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_match(uint16_t port_id,
> > +		      struct rte_flow_tunnel *tunnel,
> > +		      struct rte_flow_item **items,
> > +		      uint32_t *num_of_items,
> > +		      struct rte_flow_error *error);
> > +
> > +/**
> > + * Populate the current packet processing state, if exists, for the given
> mbuf.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] m
> > + *   Mbuf struct.
> > + * @param[out] info
> > + *   Restore information. Upon success contains the HW state.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> > +				 struct rte_mbuf *m,
> > +				 struct rte_flow_restore_info *info,
> > +				 struct rte_flow_error *error);
> > +
> > +/**
> > + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] actions
> > + *   Array of actions to be released.
> > + * @param[in] num_of_actions
> > + *   Number of elements in actions array.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> > +				     struct rte_flow_action *actions,
> > +				     uint32_t num_of_actions,
> > +				     struct rte_flow_error *error);
> > +
> > +/**
> > + * Release the item array as allocated by rte_flow_tunnel_match.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] items
> > + *   Array of items to be released.
> > + * @param[in] num_of_items
> > + *   Number of elements in item array.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_item_release(uint16_t port_id,
> > +			     struct rte_flow_item *items,
> > +			     uint32_t num_of_items,
> > +			     struct rte_flow_error *error);
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_ethdev/rte_flow_driver.h
> > b/lib/librte_ethdev/rte_flow_driver.h
> > index 881cc469b7..ad1d7a2cdc 100644
> > --- a/lib/librte_ethdev/rte_flow_driver.h
> > +++ b/lib/librte_ethdev/rte_flow_driver.h
> > @@ -107,6 +107,38 @@ struct rte_flow_ops {
> >  		 void **context,
> >  		 uint32_t nb_contexts,
> >  		 struct rte_flow_error *err);
> > +	/** See rte_flow_tunnel_decap_set() */
> > +	int (*tunnel_decap_set)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_tunnel *tunnel,
> > +		 struct rte_flow_action **pmd_actions,
> > +		 uint32_t *num_of_actions,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_tunnel_match() */
> > +	int (*tunnel_match)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_tunnel *tunnel,
> > +		 struct rte_flow_item **pmd_items,
> > +		 uint32_t *num_of_items,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_get_rte_flow_restore_info() */
> > +	int (*get_restore_info)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_mbuf *m,
> > +		 struct rte_flow_restore_info *info,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_action_tunnel_decap_release() */
> > +	int (*action_release)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_action *pmd_actions,
> > +		 uint32_t num_of_actions,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_item_release() */
> > +	int (*item_release)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_item *pmd_items,
> > +		 uint32_t num_of_items,
> > +		 struct rte_flow_error *err);
> >  };
> >
> >  /**
> > --
> > 2.25.1
> >
Andrew Rybchenko July 5, 2020, 2:50 p.m. UTC | #2
Hi Gregory,

I'm sorry for the review with toooo many questions without any
suggestions on how to answer it. Please, see below.

On 6/25/20 7:03 PM, Gregory Etelson wrote:
> From: Eli Britstein <elibr@mellanox.com>
> 
> Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload
> all sorts of network stacks, software application must reference this
> hardware differences in flow rules compilation. As the result tunneled
> traffic flow rules that utilize hardware capabilities can be different
> for the same traffic.
> 
> Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying
> hardware.
>  - The model introduces a concept of a virtual tunnel port (VTP).
>  - The model uses VTP to offload ingress tunneled network traffic 
>    with RTE flow rules.
>  - The model is implemented as set of helper functions. Each PMD
>    implements VTP offload according to underlying hardware offload
>    capabilities.  Applications must query PMD for VTP flow
>    items / actions before using in creation of a VTP flow rule.
> 
> The model components:
> - Virtual Tunnel Port (VTP) is a stateless software object that
>   describes tunneled network traffic.  VTP object usually contains
>   descriptions of outer headers, tunnel headers and inner headers.
> - Tunnel Steering flow Rule (TSR) detects tunneled packets and
>   delegates them to tunnel processing infrastructure, implemented
>   in PMD for optimal hardware utilization, for further processing.
> - Tunnel Matching flow Rule (TMR) verifies packet configuration and
>   runs offload actions in case of a match.
> 
> Application actions:
> 1 Initialize VTP object according to tunnel
>   network parameters.
> 2 Create TSR flow rule:
> 2.1 Query PMD for VTP actions: application can query for VTP actions
>     more than once
>     int
>     rte_flow_tunnel_decap_set(uint16_t port_id,
>                               struct rte_flow_tunnel *tunnel,
>                               struct rte_flow_action **pmd_actions,
>                               uint32_t *num_of_pmd_actions,
>                               struct rte_flow_error *error);
> 
> 2.2 Integrate PMD actions into TSR actions list.
> 2.3 Create TSR flow rule:
>     flow create <port> group 0
>           match {tunnel items} / end
>           actions {PMD actions} / {App actions} / end
> 
> 3 Create TMR flow rule:
> 3.1 Query PMD for VTP items: application can query for VTP items
>     more than once
>     int
>     rte_flow_tunnel_match(uint16_t port_id,
>                           struct rte_flow_tunnel *tunnel,
>                           struct rte_flow_item **pmd_items,
>                           uint32_t *num_of_pmd_items,
>                           struct rte_flow_error *error);
> 
> 3.2 Integrate PMD items into TMR items list:
> 3.3 Create TMR flow rule
>     flow create <port> group 0
>           match {PMD items} / {APP items} / end
>           actions {offload actions} / end
> 
> The model provides helper function call to restore packets that miss
> tunnel TMR rules to its original state:
> int
> rte_flow_get_restore_info(uint16_t port_id,
>                           struct rte_mbuf *mbuf,
>                           struct rte_flow_restore_info *info,
>                           struct rte_flow_error *error);
> 
> rte_tunnel object filled by the call inside
> rte_flow_restore_info *info parameter can be used by the application
> to create new TMR rule for that tunnel.
> 
> The model requirements:
> Software application must initialize
> rte_tunnel object with tunnel parameters before calling
> rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> 
> PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> Application can release the actionsfter TSR rule was created.
> 
> PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call.  Application can
> release the items after rule was created. However, if the application
> needs to create additional TMR rule for the same tunnel it will need
> to obtain PMD items again.
> 
> Application cannot destroy rte_tunnel object before it releases all
> PMD actions & PMD items referencing that tunnel.
> 
> [1] https://mails.dpdk.org/archives/dev/2020-June/169656.html
> 
> Signed-off-by: Eli Britstein <elibr@mellanox.com>
> Acked-by: Ori Kam <orika@mellanox.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
>  lib/librte_ethdev/rte_ethdev_version.map |   5 +
>  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
>  5 files changed, 450 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index d5dd18ce99..cfd98c2e7d 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3010,6 +3010,111 @@ operations include:
>  - Duplication of a complete flow rule description.
>  - Pattern item or action name retrieval.
>  
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).

It looks like it is absolutely abstract concept now, since it
is not mentioned/referenced in the header file. It makes it
hard to put the description and API together.

> + - The model uses VTP to offload ingress tunneled network traffic 
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.

For me it looks like "creation of a VTP flow rule" is not
covered yet. Flow rules examples mention it in pattern and
actions, but there is no corresponding pattern items and
actions. May be I simply misunderstand the idea.

> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.

Are inner headers really a part of the tunnel description?

> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.

Is it for fully offloaded tunnels with encap/decap or all
tunnels (detected, but partially offloaded, e.g. checksumming)?

> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.

I.e. fill in 'struct rte_flow_tunnel'. Is it correct?

> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end

Are application actions strictly required?
If no, it is better to make it clear.
Do tunnel items correlate here somehow with tunnel
specification in 'struct rte_flow_tunnel'?
Is it obtained using rte_flow_tunnel_match()?

> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.

I think an example, for example, for VXLAN over IPv4 tunnel
case with some concrete parameters would be very useful here
for understanding. Could it be annotated with a description
of the transformations happening with a packet on different
stages of the processing (including restore example).

> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.

actionsfter ?

> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
>  Caveats
>  -------
>  

[snip]

> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..1374b6e5a7 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,202 @@ int
>  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>  			uint32_t nb_contexts, struct rte_flow_error *error);
>  
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> +	rte_be64_t tun_id; /**< Tunnel identification. */

What is it? Why is it big-endian? Why is it in IP tunnel key?
I.e. why is it not in a generic structure?

> +	union {
> +		struct {
> +			rte_be32_t src_addr; /**< IPv4 source address. */
> +			rte_be32_t dst_addr; /**< IPv4 destination address. */
> +		} ipv4;
> +		struct {
> +			uint8_t src_addr[16]; /**< IPv6 source address. */
> +			uint8_t dst_addr[16]; /**< IPv6 destination address. */
> +		} ipv6;
> +	} u;
> +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +	rte_be16_t tun_flags; /**< Tunnel flags. */

Which flags? Where are these flags defined?
Why is it big-endian?

> +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */

If combine, I'd stick to IPv6 terminology since it is a bit
better (well-thought, especially current tendencies in
(re)naming in software).

> +	rte_be32_t label; /**< Flow Label for IPv6. */

What about IPv6 tunnels with extension headers? How to extend?

> +	rte_be16_t tp_src; /**< Tunnel port source. */
> +	rte_be16_t tp_dst; /**< Tunnel port destination. */

What about IP-in-IP tunnels? Is it applicable?

> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> +	/**
> +	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> +	 */
> +	enum rte_flow_item_type		type;
> +	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */

How to extended for non-IP tunnels? MPLS?
Or tunnels with more protocols? E.g. MPLS-over-UDP?

> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> +	/**
> +	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +	 * other fields in struct rte_flow_restore_info.
> +	 */
> +	uint64_t flags;
> +	uint32_t group_id; /**< Group ID. */

What is the group ID here?

> +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + *            jump group 0 / end

Why is jump to group used in example above? Is it mandatory?

> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.

Please, specify concatenation order explicitly.

> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +			  struct rte_flow_tunnel *tunnel,
> +			  struct rte_flow_action **actions,
> +			  uint32_t *num_of_actions,

Why does approach to specify actions differ here?
I.e. array of specified size vs END-terminated array?
Must the actions array be END-terminated here?
It must be a strong reason to do it and it should be
explained.

> +			  struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.

Concatenation order/rules should be described.
Since it is an output which entity does the concatenation.
Is it allowed to refine PMD rules in application
rule specification?

> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +		      struct rte_flow_tunnel *tunnel,
> +		      struct rte_flow_item **items,
> +		      uint32_t *num_of_items,

Same as above for actions.

> +		      struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> +				 struct rte_mbuf *m,
> +				 struct rte_flow_restore_info *info,

Is it suggesting to make a copy of the restore info for each
mbuf? It sounds very expensive. Could you share your thoughts
about it.

> +				 struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[in] num_of_actions
> + *   Number of elements in actions array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +				     struct rte_flow_action *actions,
> +				     uint32_t num_of_actions,

Same question as above for actions and items specification
approach.

> +				     struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[in] num_of_items
> + *   Number of elements in item array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +			     struct rte_flow_item *items,
> +			     uint32_t num_of_items,

Same question as above for actions and items specification
approach.

> +			     struct rte_flow_error *error);
>  #ifdef __cplusplus
>  }
>  #endif

[snip]

Andrew.

(Right now it is hard to fully imagine how to deal with it.
And it looks like a shim to vendor-specific API. May be I'm
wrong. Hopefully the next version will have PMD implementation
example and it will shed a bit more light on it.)
Thomas Monjalon July 13, 2020, 8:21 a.m. UTC | #3
01/07/2020 08:52, Gregory Etelson:
> From: Kinsella, Ray <mdr@ashroe.eu>
> > __rte_experimental is not required AFAIK on structure definitions, structure
> > definitions are not symbols, just on exported functions and variables.
> > 
> > Did you get a specific warning, that made you add this?
> 
> [Gregory Etelson] The attribute is not required in structures definition.
> It's removed in v2 patch version

Is v2 sent? I don't find it.
Gregory Etelson July 13, 2020, 1:23 p.m. UTC | #4
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 13, 2020 11:21
> To: Gregory Etelson <getelson@mellanox.com>
> Cc: Kinsella, Ray <mdr@ashroe.eu>; dev@dpdk.org; Matan Azrad
> <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>; Ori
> Kam <orika@mellanox.com>; John McNamara <john.mcnamara@intel.com>;
> Marko Kovacevic <marko.kovacevic@intel.com>; Neil Horman
> <nhorman@tuxdriver.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; sriharsha.basavapatna@broadcom.com;
> hemal.shah@broadcom.com; Eli Britstein <elibr@mellanox.com>; Oz Shlomo
> <ozsh@mellanox.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
> 
> 01/07/2020 08:52, Gregory Etelson:
> > From: Kinsella, Ray <mdr@ashroe.eu>
> > > __rte_experimental is not required AFAIK on structure definitions,
> > > structure definitions are not symbols, just on exported functions and
> variables.
> > >
> > > Did you get a specific warning, that made you add this?
> >
> > [Gregory Etelson] The attribute is not required in structures definition.
> > It's removed in v2 patch version
> 
> Is v2 sent? I don't find it.
> 

V2 was not posted yet. it's in progress.

Patch
diff mbox series

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index d5dd18ce99..cfd98c2e7d 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3010,6 +3010,111 @@  operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 7155056045..63800811df 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -241,4 +241,9 @@  EXPERIMENTAL {
 	__rte_ethdev_trace_rx_burst;
 	__rte_ethdev_trace_tx_burst;
 	rte_flow_get_aged_flows;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_tunnel_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index c19d25649f..2dc5bfbb3f 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1268,3 +1268,115 @@  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_get_restore_info(uint16_t port_id,
+				 struct rte_mbuf *m,
+				 struct rte_flow_restore_info *restore_info,
+				 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index b0e4199192..1374b6e5a7 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3324,6 +3324,202 @@  int
 rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 
+/* Tunnel information. */
+__rte_experimental
+struct rte_flow_ip_tunnel_key {
+	rte_be64_t tun_id; /**< Tunnel identification. */
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	} u;
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+	rte_be16_t tun_flags; /**< Tunnel flags. */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	rte_be32_t label; /**< Flow Label for IPv6. */
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+};
+
+
+/* Tunnel has a type and the key information. */
+__rte_experimental
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type		type;
+	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+__rte_experimental
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID. */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_get_restore_info(uint16_t port_id,
+				 struct rte_mbuf *m,
+				 struct rte_flow_restore_info *info,
+				 struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 881cc469b7..ad1d7a2cdc 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -107,6 +107,38 @@  struct rte_flow_ops {
 		 void **context,
 		 uint32_t nb_contexts,
 		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**