[v4,1/5] ethdev: negotiate delivery of packet metadata from HW to PMD

Message ID 20211004235007.12293-2-ivan.malov@oktetlabs.ru (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series Negotiate the NIC's ability to deliver Rx metadata to the PMD |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Ivan Malov Oct. 4, 2021, 11:50 p.m. UTC
  Provide an API to let the application control the NIC's ability
to deliver specific kinds of per-packet metadata to the PMD.

Checks for the NIC's ability to set these kinds of metadata
in the first place (support for the flow actions) belong in
flow API responsibility domain (flow validate mechanism).
This topic is out of scope of the new API in question.

The PMD's ability to deliver received metadata to the user
by virtue of mbuf fields should be covered by mbuf library.
It is also out of scope of the new API in question.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test-flow-perf/main.c              | 21 ++++++++++
 app/test-pmd/testpmd.c                 | 37 ++++++++++++++++++
 doc/guides/rel_notes/release_21_11.rst |  9 +++++
 lib/ethdev/ethdev_driver.h             | 22 +++++++++++
 lib/ethdev/rte_ethdev.c                | 25 ++++++++++++
 lib/ethdev/rte_ethdev.h                | 53 ++++++++++++++++++++++++++
 lib/ethdev/rte_flow.h                  | 12 ++++++
 lib/ethdev/version.map                 |  3 ++
 8 files changed, 182 insertions(+)
  

Comments

Ori Kam Oct. 5, 2021, 12:03 p.m. UTC | #1
Hi Ivan,

Just a nit below.

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Tuesday, October 5, 2021 2:50 AM
> Subject: [PATCH v4 1/5] ethdev: negotiate delivery of packet metadata from
> HW to PMD
> 
> Provide an API to let the application control the NIC's ability to deliver specific
> kinds of per-packet metadata to the PMD.
> 
> Checks for the NIC's ability to set these kinds of metadata in the first place
> (support for the flow actions) belong in flow API responsibility domain (flow
> validate mechanism).
> This topic is out of scope of the new API in question.
> 
> The PMD's ability to deliver received metadata to the user by virtue of mbuf
> fields should be covered by mbuf library.
> It is also out of scope of the new API in question.
> 

+1 very clear.

> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> ---

[Snip]

> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -4902,6 +4902,59 @@ __rte_experimental  int
> rte_eth_representor_info_get(uint16_t port_id,
>  				 struct rte_eth_representor_info *info);
> 
> +/** The NIC is able to deliver flag (if set) with packets to the PMD.
> +*/ #define RTE_ETH_RX_METADATA_USER_FLAG (UINT64_C(1) << 0)
> +
> +/** The NIC is able to deliver mark ID with packets to the PMD. */
> +#define RTE_ETH_RX_METADATA_USER_MARK (UINT64_C(1) << 1)
> +
> +/** The NIC is able to deliver tunnel ID with packets to the PMD. */
> +#define RTE_ETH_RX_METADATA_TUNNEL_ID (UINT64_C(1) << 2)
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Negotiate the NIC's ability to deliver specific kinds of metadata to the PMD.
> + *
> + * Invoke this API before the first rte_eth_dev_configure() invocation
> + * to let the PMD make preparations that are inconvenient to do later.
> + *
> + * The negotiation process is as follows:
> + *
> + * - the application requests features intending to use at least some
> +of them;
> + * - the PMD responds with the guaranteed subset of the requested
> +feature set;
> + * - the application can retry negotiation with another set of
> +features;
> + * - the application can pass zero to clear the negotiation result;
> + * - the last negotiated result takes effect upon the ethdev start.

Not upon ethdev configure?

> + *
> + * @note
> + *   The PMD is supposed to first consider enabling the requested feature set
> + *   in its entirety. Only if it fails to do so, does it have the right to
> + *   respond with a smaller set of the originally requested features.
> + *
> + * @note
> + *   Return code (-ENOTSUP) does not necessarily mean that the requested
> + *   features are unsupported. In this case, the application should just
> + *   assume that these features can be used without prior negotiations.
> + *
> + * @param port_id
> + *   Port (ethdev) identifier
> + *
> + * @param[inout] features
> + *   Feature selection buffer
> + *
> + * @return
> + *   - (-EBUSY) if the port can't handle this in its current state;
> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
> + *   - (-ENODEV) if *port_id* is invalid;
> + *   - (-EINVAL) if *features* is NULL;
> + *   - (-EIO) if the device is removed;
> + *   - (0) on success
> + */
> +__rte_experimental
> +int rte_eth_rx_metadata_negotiate(uint16_t port_id, uint64_t
> +*features);
> +
>  #include <rte_ethdev_core.h>
> 
>  /**
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> 7b1ed7f110..75656ff9f8 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -1904,6 +1904,10 @@ enum rte_flow_action_type {
>  	 * PKT_RX_FDIR_ID mbuf flags.
>  	 *
>  	 * See struct rte_flow_action_mark.
> +	 *
> +	 * One should negotiate mark delivery from the NIC to the PMD.
> +	 * @see rte_eth_rx_metadata_negotiate()
> +	 * @see RTE_ETH_RX_METADATA_USER_MARK
>  	 */
>  	RTE_FLOW_ACTION_TYPE_MARK,
> 
> @@ -1912,6 +1916,10 @@ enum rte_flow_action_type {
>  	 * sets the PKT_RX_FDIR mbuf flag.
>  	 *
>  	 * No associated configuration structure.
> +	 *
> +	 * One should negotiate flag delivery from the NIC to the PMD.
> +	 * @see rte_eth_rx_metadata_negotiate()
> +	 * @see RTE_ETH_RX_METADATA_USER_FLAG
>  	 */
>  	RTE_FLOW_ACTION_TYPE_FLAG,
> 
> @@ -4223,6 +4231,10 @@ rte_flow_tunnel_match(uint16_t port_id,
>  /**
>   * Populate the current packet processing state, if exists, for the given mbuf.
>   *
> + * One should negotiate tunnel metadata delivery from the NIC to the HW.
> + * @see rte_eth_rx_metadata_negotiate()
> + * @see RTE_ETH_RX_METADATA_TUNNEL_ID
> + *
>   * @param port_id
>   *   Port identifier of Ethernet device.
>   * @param[in] m
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> 904bce6ea1..2e638c680e 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -247,6 +247,9 @@ EXPERIMENTAL {
>  	rte_mtr_meter_policy_delete;
>  	rte_mtr_meter_policy_update;
>  	rte_mtr_meter_policy_validate;
> +
> +	# added in 21.11
> +	rte_eth_rx_metadata_negotiate;
>  };
> 
>  INTERNAL {
> --
> 2.20.1
Best,
Ori
  
Ivan Malov Oct. 5, 2021, 12:50 p.m. UTC | #2
Hi Ori,

On 05/10/2021 15:03, Ori Kam wrote:
> Hi Ivan,
> 
> Just a nit below.
> 
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Tuesday, October 5, 2021 2:50 AM
>> Subject: [PATCH v4 1/5] ethdev: negotiate delivery of packet metadata from
>> HW to PMD
>>
>> Provide an API to let the application control the NIC's ability to deliver specific
>> kinds of per-packet metadata to the PMD.
>>
>> Checks for the NIC's ability to set these kinds of metadata in the first place
>> (support for the flow actions) belong in flow API responsibility domain (flow
>> validate mechanism).
>> This topic is out of scope of the new API in question.
>>
>> The PMD's ability to deliver received metadata to the user by virtue of mbuf
>> fields should be covered by mbuf library.
>> It is also out of scope of the new API in question.
>>
> 
> +1 very clear.
> 
>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>> ---
> 
> [Snip]
> 
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -4902,6 +4902,59 @@ __rte_experimental  int
>> rte_eth_representor_info_get(uint16_t port_id,
>>   				 struct rte_eth_representor_info *info);
>>
>> +/** The NIC is able to deliver flag (if set) with packets to the PMD.
>> +*/ #define RTE_ETH_RX_METADATA_USER_FLAG (UINT64_C(1) << 0)
>> +
>> +/** The NIC is able to deliver mark ID with packets to the PMD. */
>> +#define RTE_ETH_RX_METADATA_USER_MARK (UINT64_C(1) << 1)
>> +
>> +/** The NIC is able to deliver tunnel ID with packets to the PMD. */
>> +#define RTE_ETH_RX_METADATA_TUNNEL_ID (UINT64_C(1) << 2)
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Negotiate the NIC's ability to deliver specific kinds of metadata to the PMD.
>> + *
>> + * Invoke this API before the first rte_eth_dev_configure() invocation
>> + * to let the PMD make preparations that are inconvenient to do later.
>> + *
>> + * The negotiation process is as follows:
>> + *
>> + * - the application requests features intending to use at least some
>> +of them;
>> + * - the PMD responds with the guaranteed subset of the requested
>> +feature set;
>> + * - the application can retry negotiation with another set of
>> +features;
>> + * - the application can pass zero to clear the negotiation result;
>> + * - the last negotiated result takes effect upon the ethdev start.
> 
> Not upon ethdev configure?

Well, technically, doing "configure()" just closes the negotiation 
window. I guess, "to take effect" is "to be activated", and activation 
of Rx features typically happens on Rx subsystem start.

I know it might seem a bit inconsistent, but in any case the API 
contract says clearly that invocations of "metadata_negotiate()" should 
be done before "configure()".

Andrew?

> 
>> + *
>> + * @note
>> + *   The PMD is supposed to first consider enabling the requested feature set
>> + *   in its entirety. Only if it fails to do so, does it have the right to
>> + *   respond with a smaller set of the originally requested features.
>> + *
>> + * @note
>> + *   Return code (-ENOTSUP) does not necessarily mean that the requested
>> + *   features are unsupported. In this case, the application should just
>> + *   assume that these features can be used without prior negotiations.
>> + *
>> + * @param port_id
>> + *   Port (ethdev) identifier
>> + *
>> + * @param[inout] features
>> + *   Feature selection buffer
>> + *
>> + * @return
>> + *   - (-EBUSY) if the port can't handle this in its current state;
>> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
>> + *   - (-ENODEV) if *port_id* is invalid;
>> + *   - (-EINVAL) if *features* is NULL;
>> + *   - (-EIO) if the device is removed;
>> + *   - (0) on success
>> + */
>> +__rte_experimental
>> +int rte_eth_rx_metadata_negotiate(uint16_t port_id, uint64_t
>> +*features);
>> +
>>   #include <rte_ethdev_core.h>
>>
>>   /**
>> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>> 7b1ed7f110..75656ff9f8 100644
>> --- a/lib/ethdev/rte_flow.h
>> +++ b/lib/ethdev/rte_flow.h
>> @@ -1904,6 +1904,10 @@ enum rte_flow_action_type {
>>   	 * PKT_RX_FDIR_ID mbuf flags.
>>   	 *
>>   	 * See struct rte_flow_action_mark.
>> +	 *
>> +	 * One should negotiate mark delivery from the NIC to the PMD.
>> +	 * @see rte_eth_rx_metadata_negotiate()
>> +	 * @see RTE_ETH_RX_METADATA_USER_MARK
>>   	 */
>>   	RTE_FLOW_ACTION_TYPE_MARK,
>>
>> @@ -1912,6 +1916,10 @@ enum rte_flow_action_type {
>>   	 * sets the PKT_RX_FDIR mbuf flag.
>>   	 *
>>   	 * No associated configuration structure.
>> +	 *
>> +	 * One should negotiate flag delivery from the NIC to the PMD.
>> +	 * @see rte_eth_rx_metadata_negotiate()
>> +	 * @see RTE_ETH_RX_METADATA_USER_FLAG
>>   	 */
>>   	RTE_FLOW_ACTION_TYPE_FLAG,
>>
>> @@ -4223,6 +4231,10 @@ rte_flow_tunnel_match(uint16_t port_id,
>>   /**
>>    * Populate the current packet processing state, if exists, for the given mbuf.
>>    *
>> + * One should negotiate tunnel metadata delivery from the NIC to the HW.
>> + * @see rte_eth_rx_metadata_negotiate()
>> + * @see RTE_ETH_RX_METADATA_TUNNEL_ID
>> + *
>>    * @param port_id
>>    *   Port identifier of Ethernet device.
>>    * @param[in] m
>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
>> 904bce6ea1..2e638c680e 100644
>> --- a/lib/ethdev/version.map
>> +++ b/lib/ethdev/version.map
>> @@ -247,6 +247,9 @@ EXPERIMENTAL {
>>   	rte_mtr_meter_policy_delete;
>>   	rte_mtr_meter_policy_update;
>>   	rte_mtr_meter_policy_validate;
>> +
>> +	# added in 21.11
>> +	rte_eth_rx_metadata_negotiate;
>>   };
>>
>>   INTERNAL {
>> --
>> 2.20.1
> Best,
> Ori
>
  
Andrew Rybchenko Oct. 5, 2021, 1:17 p.m. UTC | #3
On 10/5/21 3:50 PM, Ivan Malov wrote:
> Hi Ori,
> 
> On 05/10/2021 15:03, Ori Kam wrote:
>> Hi Ivan,
>>
>> Just a nit below.
>>
>>> -----Original Message-----
>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>> Sent: Tuesday, October 5, 2021 2:50 AM
>>> Subject: [PATCH v4 1/5] ethdev: negotiate delivery of packet metadata
>>> from
>>> HW to PMD
>>>
>>> Provide an API to let the application control the NIC's ability to
>>> deliver specific
>>> kinds of per-packet metadata to the PMD.
>>>
>>> Checks for the NIC's ability to set these kinds of metadata in the
>>> first place
>>> (support for the flow actions) belong in flow API responsibility
>>> domain (flow
>>> validate mechanism).
>>> This topic is out of scope of the new API in question.
>>>
>>> The PMD's ability to deliver received metadata to the user by virtue
>>> of mbuf
>>> fields should be covered by mbuf library.
>>> It is also out of scope of the new API in question.
>>>
>>
>> +1 very clear.
>>
>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>> ---
>>
>> [Snip]
>>
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -4902,6 +4902,59 @@ __rte_experimental  int
>>> rte_eth_representor_info_get(uint16_t port_id,
>>>                    struct rte_eth_representor_info *info);
>>>
>>> +/** The NIC is able to deliver flag (if set) with packets to the PMD.
>>> +*/ #define RTE_ETH_RX_METADATA_USER_FLAG (UINT64_C(1) << 0)
>>> +
>>> +/** The NIC is able to deliver mark ID with packets to the PMD. */
>>> +#define RTE_ETH_RX_METADATA_USER_MARK (UINT64_C(1) << 1)
>>> +
>>> +/** The NIC is able to deliver tunnel ID with packets to the PMD. */
>>> +#define RTE_ETH_RX_METADATA_TUNNEL_ID (UINT64_C(1) << 2)
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Negotiate the NIC's ability to deliver specific kinds of metadata
>>> to the PMD.
>>> + *
>>> + * Invoke this API before the first rte_eth_dev_configure() invocation
>>> + * to let the PMD make preparations that are inconvenient to do later.
>>> + *
>>> + * The negotiation process is as follows:
>>> + *
>>> + * - the application requests features intending to use at least some
>>> +of them;
>>> + * - the PMD responds with the guaranteed subset of the requested
>>> +feature set;
>>> + * - the application can retry negotiation with another set of
>>> +features;
>>> + * - the application can pass zero to clear the negotiation result;
>>> + * - the last negotiated result takes effect upon the ethdev start.
>>
>> Not upon ethdev configure?
> 
> Well, technically, doing "configure()" just closes the negotiation
> window. I guess, "to take effect" is "to be activated", and activation
> of Rx features typically happens on Rx subsystem start.

Yes, i.e. ethdev port start from application point of view

> I know it might seem a bit inconsistent, but in any case the API
> contract says clearly that invocations of "metadata_negotiate()" should
> be done before "configure()".
> 
> Andrew?

Yes, the reason to define order is to simplify implementation.
When configure is invoked, PMD know that Rx metadata are
negotiated and it should treat all other bits of the
configuration with respect to Rx metadata configuration,
of course, if applicable.

So, I think the question is right and correct description
should say: ... upon the ethdev configure and start.

Andrew.
  

Patch

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index 9be8edc31d..4d01791f6f 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -1760,6 +1760,27 @@  init_port(void)
 		rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
 
 	for (port_id = 0; port_id < nr_ports; port_id++) {
+		uint64_t rx_metadata = 0;
+
+		rx_metadata |= RTE_ETH_RX_METADATA_USER_FLAG;
+		rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK;
+
+		ret = rte_eth_rx_metadata_negotiate(port_id, &rx_metadata);
+		if (ret == 0) {
+			if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_FLAG)) {
+				printf(":: flow action FLAG will not affect Rx mbufs on port=%u\n",
+				       port_id);
+			}
+
+			if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) {
+				printf(":: flow action MARK will not affect Rx mbufs on port=%u\n",
+				       port_id);
+			}
+		} else if (ret != -ENOTSUP) {
+			rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta features on port=%u: %s\n",
+				 port_id, rte_strerror(-ret));
+		}
+
 		ret = rte_eth_dev_info_get(port_id, &dev_info);
 		if (ret != 0)
 			rte_exit(EXIT_FAILURE,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 97ae52e17e..bf80de4e80 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -533,6 +533,41 @@  int proc_id;
  */
 unsigned int num_procs = 1;
 
+static void
+eth_rx_metadata_negotiate_mp(uint16_t port_id)
+{
+	uint64_t rx_meta_features = 0;
+	int ret;
+
+	if (!is_proc_primary())
+		return;
+
+	rx_meta_features |= RTE_ETH_RX_METADATA_USER_FLAG;
+	rx_meta_features |= RTE_ETH_RX_METADATA_USER_MARK;
+	rx_meta_features |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+
+	ret = rte_eth_rx_metadata_negotiate(port_id, &rx_meta_features);
+	if (ret == 0) {
+		if (!(rx_meta_features & RTE_ETH_RX_METADATA_USER_FLAG)) {
+			TESTPMD_LOG(DEBUG, "Flow action FLAG will not affect Rx mbufs on port %u\n",
+				    port_id);
+		}
+
+		if (!(rx_meta_features & RTE_ETH_RX_METADATA_USER_MARK)) {
+			TESTPMD_LOG(DEBUG, "Flow action MARK will not affect Rx mbufs on port %u\n",
+				    port_id);
+		}
+
+		if (!(rx_meta_features & RTE_ETH_RX_METADATA_TUNNEL_ID)) {
+			TESTPMD_LOG(DEBUG, "Flow tunnel offload support might be limited or unavailable on port %u\n",
+				    port_id);
+		}
+	} else if (ret != -ENOTSUP) {
+		rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta features on port %u: %s\n",
+			 port_id, rte_strerror(-ret));
+	}
+}
+
 static int
 eth_dev_configure_mp(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 		      const struct rte_eth_conf *dev_conf)
@@ -1489,6 +1524,8 @@  init_config_port_offloads(portid_t pid, uint32_t socket_id)
 	int ret;
 	int i;
 
+	eth_rx_metadata_negotiate_mp(pid);
+
 	port->dev_conf.txmode = tx_mode;
 	port->dev_conf.rxmode = rx_mode;
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index f099b1cca2..48fd045db7 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -129,6 +129,15 @@  New Features
   * Added tests to validate packets hard expiry.
   * Added tests to verify tunnel header verification in IPsec inbound.
 
+* **Added an API to control delivery of Rx metadata from the HW to the PMD**
+
+  A new API, ``rte_eth_rx_metadata_negotiate()``, was added.
+  The following parts of Rx metadata were defined:
+
+  * ``RTE_ETH_RX_METADATA_USER_FLAG``
+  * ``RTE_ETH_RX_METADATA_USER_MARK``
+  * ``RTE_ETH_RX_METADATA_TUNNEL_ID``
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index cc2c75261c..d073d63ba8 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -785,6 +785,22 @@  typedef int (*eth_get_monitor_addr_t)(void *rxq,
 typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
 	struct rte_eth_representor_info *info);
 
+/**
+ * @internal
+ * Negotiate the NIC's ability to deliver specific kinds of metadata to the PMD.
+ *
+ * @param dev
+ *   Port (ethdev) handle
+ *
+ * @param[inout] features
+ *   Feature selection buffer
+ *
+ * @return
+ *   Negative errno value on error, zero otherwise
+ */
+typedef int (*eth_rx_metadata_negotiate_t)(struct rte_eth_dev *dev,
+				       uint64_t *features);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -945,6 +961,12 @@  struct eth_dev_ops {
 
 	eth_representor_info_get_t representor_info_get;
 	/**< Get representor info. */
+
+	/**
+	 * Negotiate the NIC's ability to deliver specific
+	 * kinds of metadata to the PMD.
+	 */
+	eth_rx_metadata_negotiate_t rx_metadata_negotiate;
 };
 
 /**
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index daf5ca9242..a41fb8a398 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6310,6 +6310,31 @@  rte_eth_representor_info_get(uint16_t port_id,
 	return eth_err(port_id, (*dev->dev_ops->representor_info_get)(dev, info));
 }
 
+int
+rte_eth_rx_metadata_negotiate(uint16_t port_id, uint64_t *features)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (dev->data->dev_configured != 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"The port (id=%"PRIu16") is already configured\n",
+			port_id);
+		return -EBUSY;
+	}
+
+	if (features == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_metadata_negotiate, -ENOTSUP);
+	return eth_err(port_id,
+		       (*dev->dev_ops->rx_metadata_negotiate)(dev, features));
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index afdc53b674..6b2da6de0a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4902,6 +4902,59 @@  __rte_experimental
 int rte_eth_representor_info_get(uint16_t port_id,
 				 struct rte_eth_representor_info *info);
 
+/** The NIC is able to deliver flag (if set) with packets to the PMD. */
+#define RTE_ETH_RX_METADATA_USER_FLAG (UINT64_C(1) << 0)
+
+/** The NIC is able to deliver mark ID with packets to the PMD. */
+#define RTE_ETH_RX_METADATA_USER_MARK (UINT64_C(1) << 1)
+
+/** The NIC is able to deliver tunnel ID with packets to the PMD. */
+#define RTE_ETH_RX_METADATA_TUNNEL_ID (UINT64_C(1) << 2)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Negotiate the NIC's ability to deliver specific kinds of metadata to the PMD.
+ *
+ * Invoke this API before the first rte_eth_dev_configure() invocation
+ * to let the PMD make preparations that are inconvenient to do later.
+ *
+ * The negotiation process is as follows:
+ *
+ * - the application requests features intending to use at least some of them;
+ * - the PMD responds with the guaranteed subset of the requested feature set;
+ * - the application can retry negotiation with another set of features;
+ * - the application can pass zero to clear the negotiation result;
+ * - the last negotiated result takes effect upon the ethdev start.
+ *
+ * @note
+ *   The PMD is supposed to first consider enabling the requested feature set
+ *   in its entirety. Only if it fails to do so, does it have the right to
+ *   respond with a smaller set of the originally requested features.
+ *
+ * @note
+ *   Return code (-ENOTSUP) does not necessarily mean that the requested
+ *   features are unsupported. In this case, the application should just
+ *   assume that these features can be used without prior negotiations.
+ *
+ * @param port_id
+ *   Port (ethdev) identifier
+ *
+ * @param[inout] features
+ *   Feature selection buffer
+ *
+ * @return
+ *   - (-EBUSY) if the port can't handle this in its current state;
+ *   - (-ENOTSUP) if the method itself is not supported by the PMD;
+ *   - (-ENODEV) if *port_id* is invalid;
+ *   - (-EINVAL) if *features* is NULL;
+ *   - (-EIO) if the device is removed;
+ *   - (0) on success
+ */
+__rte_experimental
+int rte_eth_rx_metadata_negotiate(uint16_t port_id, uint64_t *features);
+
 #include <rte_ethdev_core.h>
 
 /**
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 7b1ed7f110..75656ff9f8 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -1904,6 +1904,10 @@  enum rte_flow_action_type {
 	 * PKT_RX_FDIR_ID mbuf flags.
 	 *
 	 * See struct rte_flow_action_mark.
+	 *
+	 * One should negotiate mark delivery from the NIC to the PMD.
+	 * @see rte_eth_rx_metadata_negotiate()
+	 * @see RTE_ETH_RX_METADATA_USER_MARK
 	 */
 	RTE_FLOW_ACTION_TYPE_MARK,
 
@@ -1912,6 +1916,10 @@  enum rte_flow_action_type {
 	 * sets the PKT_RX_FDIR mbuf flag.
 	 *
 	 * No associated configuration structure.
+	 *
+	 * One should negotiate flag delivery from the NIC to the PMD.
+	 * @see rte_eth_rx_metadata_negotiate()
+	 * @see RTE_ETH_RX_METADATA_USER_FLAG
 	 */
 	RTE_FLOW_ACTION_TYPE_FLAG,
 
@@ -4223,6 +4231,10 @@  rte_flow_tunnel_match(uint16_t port_id,
 /**
  * Populate the current packet processing state, if exists, for the given mbuf.
  *
+ * One should negotiate tunnel metadata delivery from the NIC to the HW.
+ * @see rte_eth_rx_metadata_negotiate()
+ * @see RTE_ETH_RX_METADATA_TUNNEL_ID
+ *
  * @param port_id
  *   Port identifier of Ethernet device.
  * @param[in] m
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 904bce6ea1..2e638c680e 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -247,6 +247,9 @@  EXPERIMENTAL {
 	rte_mtr_meter_policy_delete;
 	rte_mtr_meter_policy_update;
 	rte_mtr_meter_policy_validate;
+
+	# added in 21.11
+	rte_eth_rx_metadata_negotiate;
 };
 
 INTERNAL {