[v3,1/5] ethdev: add API to negotiate delivery of Rx meta data

Message ID 20210923112012.14595-2-ivan.malov@oktetlabs.ru (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series A means to negotiate delivery of Rx meta data |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Ivan Malov Sept. 23, 2021, 11:20 a.m. UTC
  Delivery of mark, flag and the likes might affect small packet performance.
If these features are disabled by default, enabling them in started state
without causing traffic disruption may not always be possible.

Let applications negotiate delivery of Rx meta data beforehand.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test-flow-perf/main.c              | 21 ++++++++++++
 app/test-pmd/testpmd.c                 | 26 +++++++++++++++
 doc/guides/rel_notes/release_21_11.rst |  9 ++++++
 lib/ethdev/ethdev_driver.h             | 19 +++++++++++
 lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
 lib/ethdev/rte_ethdev.h                | 45 ++++++++++++++++++++++++++
 lib/ethdev/rte_flow.h                  | 12 +++++++
 lib/ethdev/version.map                 |  3 ++
 8 files changed, 160 insertions(+)
  

Comments

Ori Kam Sept. 30, 2021, 2:59 p.m. UTC | #1
Hi Ivan,
Sorry for jumping in late.

I have a concern that this patch breaks other PMDs.
From the rst file " One should negotiate flag delivery beforehand"
since you only added this function for your PMD all other PMD will fail.
I see that you added exception in the  examples, but it doesn't make sense
that applications will also need to add this exception which is not documented.

Please see more comments inline.

Thanks,
Ori

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Thursday, September 23, 2021 2:20 PM
> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> Delivery of mark, flag and the likes might affect small packet performance.
> If these features are disabled by default, enabling them in started state
> without causing traffic disruption may not always be possible.
> 
> Let applications negotiate delivery of Rx meta data beforehand.
> 
> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  app/test-flow-perf/main.c              | 21 ++++++++++++
>  app/test-pmd/testpmd.c                 | 26 +++++++++++++++
>  doc/guides/rel_notes/release_21_11.rst |  9 ++++++
>  lib/ethdev/ethdev_driver.h             | 19 +++++++++++
>  lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
>  lib/ethdev/rte_ethdev.h                | 45 ++++++++++++++++++++++++++
>  lib/ethdev/rte_flow.h                  | 12 +++++++
>  lib/ethdev/version.map                 |  3 ++
>  8 files changed, 160 insertions(+)
> 
> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c index
> 9be8edc31d..48eafffb1d 100644
> --- a/app/test-flow-perf/main.c
> +++ b/app/test-flow-perf/main.c
> @@ -1760,6 +1760,27 @@ init_port(void)
>  		rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
> 
>  	for (port_id = 0; port_id < nr_ports; port_id++) {
> +		uint64_t rx_meta_features = 0;
> +
> +		rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> +		rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> +
> +		ret = rte_eth_rx_meta_negotiate(port_id,
> &rx_meta_features);
> +		if (ret == 0) {
> +			if (!(rx_meta_features &
> RTE_ETH_RX_META_USER_FLAG)) {
> +				printf(":: flow action FLAG will not affect Rx
> mbufs on port=%u\n",
> +				       port_id);
> +			}
> +
> +			if (!(rx_meta_features &
> RTE_ETH_RX_META_USER_MARK)) {
> +				printf(":: flow action MARK will not affect Rx
> mbufs on port=%u\n",
> +				       port_id);
> +			}
> +		} else if (ret != -ENOTSUP) {
> +			rte_exit(EXIT_FAILURE, "Error when negotiating Rx
> meta features on port=%u: %s\n",
> +				 port_id, rte_strerror(-ret));
> +		}
> +
>  		ret = rte_eth_dev_info_get(port_id, &dev_info);
>  		if (ret != 0)
>  			rte_exit(EXIT_FAILURE,
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 97ae52e17e..7a8da3d7ab 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1485,10 +1485,36 @@ static void
>  init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
>  	struct rte_port *port = &ports[pid];
> +	uint64_t rx_meta_features = 0;
>  	uint16_t data_size;
>  	int ret;
>  	int i;
> 
> +	rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> +	rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> +	rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> +
> +	ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> +	if (ret == 0) {
> +		if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> +			TESTPMD_LOG(INFO, "Flow action FLAG will not
> affect Rx mbufs on port %u\n",
> +				    pid);
> +		}
> +
> +		if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
> {
> +			TESTPMD_LOG(INFO, "Flow action MARK will not
> affect Rx mbufs on port %u\n",
> +				    pid);
> +		}
> +
> +		if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> +			TESTPMD_LOG(INFO, "Flow tunnel offload support
> might be limited or unavailable on port %u\n",
> +				    pid);
> +		}
> +	} else if (ret != -ENOTSUP) {
> +		rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
> features on port %u: %s\n",
> +			 pid, rte_strerror(-ret));
> +	}
> +
>  	port->dev_conf.txmode = tx_mode;
>  	port->dev_conf.rxmode = rx_mode;
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> index 19356ac53c..6674d4474c 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -106,6 +106,15 @@ New Features
>    Added command-line options to specify total number of processes and
>    current process ID. Each process owns subset of Rx and Tx queues.
> 
> +* **Added an API to negotiate delivery of specific parts of Rx meta
> +data**
> +
> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
> +  The following parts of Rx meta data were defined:
> +
> +  * ``RTE_ETH_RX_META_USER_FLAG``
> +  * ``RTE_ETH_RX_META_USER_MARK``
> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> 40e474aa7e..96e0c60cae 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void *rxq,
> typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
>  	struct rte_eth_representor_info *info);
> 
> +/**
> + * @internal
> + * Negotiate delivery of specific parts of Rx meta data.
> + *
> + * @param dev
> + *   Port (ethdev) handle
> + *
> + * @param[inout] features
> + *   Feature selection buffer
> + *
> + * @return
> + *   Negative errno value on error, zero otherwise
> + */
> +typedef int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
> +				       uint64_t *features);
> +
>  /**
>   * @internal A structure containing the functions exported by an Ethernet
> driver.
>   */
> @@ -949,6 +965,9 @@ struct eth_dev_ops {
> 
>  	eth_representor_info_get_t representor_info_get;
>  	/**< Get representor info. */
> +
> +	eth_rx_meta_negotiate_t rx_meta_negotiate;
> +	/**< Negotiate delivery of specific parts of Rx meta data. */
>  };
> 
>  /**
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> daf5ca9242..49cb84d64c 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t port_id,
>  	return eth_err(port_id, (*dev->dev_ops-
> >representor_info_get)(dev, info));  }
> 
> +int
> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (dev->data->dev_configured != 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"The port (id=%"PRIu16") is already configured\n",
> +			port_id);
> +		return -EBUSY;
> +	}
> +
> +	if (features == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_meta_negotiate,
> -ENOTSUP);
> +	return eth_err(port_id,
> +		       (*dev->dev_ops->rx_meta_negotiate)(dev, features)); }
> +
>  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> 
>  RTE_INIT(ethdev_init_telemetry)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> 1da37896d8..8467a7a362 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -4888,6 +4888,51 @@ __rte_experimental  int
> rte_eth_representor_info_get(uint16_t port_id,
>  				 struct rte_eth_representor_info *info);
> 
> +/** The ethdev sees flagged packets if there are flows with action
> +FLAG. */ #define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
> +
> +/** The ethdev sees mark IDs in packets if there are flows with action
> +MARK. */ #define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1)
> +
> +/** The ethdev detects missed packets if there are "tunnel_set" flows
> +in use. */ #define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2)
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Negotiate delivery of specific parts of Rx meta data.
> + *
> + * Invoke this API before the first rte_eth_dev_configure() invocation
> + * to let the PMD make preparations that are inconvenient to do later.
> + *
> + * The negotiation process is as follows:
> + *
> + * - the application requests features intending to use at least some
> +of them;
> + * - the PMD responds with the guaranteed subset of the requested
> +feature set;
> + * - the application can retry negotiation with another set of
> +features;
> + * - the application can pass zero to clear the negotiation result;
> + * - the last negotiated result takes effect upon the ethdev start.
> + *
> + * If this API is unsupported, the application should gracefully ignore that.
> + *
> + * @param port_id
> + *   Port (ethdev) identifier
> + *
> + * @param[inout] features
> + *   Feature selection buffer
> + *
> + * @return
> + *   - (-EBUSY) if the port can't handle this in its current state;
> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
> + *   - (-ENODEV) if *port_id* is invalid;
> + *   - (-EINVAL) if *features* is NULL;
> + *   - (-EIO) if the device is removed;
> + *   - (0) on success
> + */
> +__rte_experimental
> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features);

I don't think meta is the best name since we also have meta item and the word meta can be used
in other cases.

> +
>  #include <rte_ethdev_core.h>
> 
>  /**
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> 70f455d47d..6eb7ec0574 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -1904,6 +1904,10 @@ enum rte_flow_action_type {
>  	 * PKT_RX_FDIR_ID mbuf flags.
>  	 *
>  	 * See struct rte_flow_action_mark.
> +	 *
> +	 * One should negotiate delivery of mark IDs beforehand.
> +	 * @see rte_eth_rx_meta_negotiate()
> +	 * @see RTE_ETH_RX_META_USER_MARK
>  	 */
>  	RTE_FLOW_ACTION_TYPE_MARK,
> 
> @@ -1912,6 +1916,10 @@ enum rte_flow_action_type {
>  	 * sets the PKT_RX_FDIR mbuf flag.
>  	 *
>  	 * No associated configuration structure.
> +	 *
> +	 * One should negotiate flag delivery beforehand.
> +	 * @see rte_eth_rx_meta_negotiate()
> +	 * @see RTE_ETH_RX_META_USER_FLAG
>  	 */
>  	RTE_FLOW_ACTION_TYPE_FLAG,
> 
> @@ -4223,6 +4231,10 @@ rte_flow_tunnel_match(uint16_t port_id,
>  /**
>   * Populate the current packet processing state, if exists, for the given mbuf.
>   *
> + * One should negotiate the processing state information delivery
> beforehand.
> + * @see rte_eth_rx_meta_negotiate()
> + * @see RTE_ETH_RX_META_TUNNEL_ID
> + *
>   * @param port_id
>   *   Port identifier of Ethernet device.
>   * @param[in] m
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> 904bce6ea1..a8928266a9 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -247,6 +247,9 @@ EXPERIMENTAL {
>  	rte_mtr_meter_policy_delete;
>  	rte_mtr_meter_policy_update;
>  	rte_mtr_meter_policy_validate;
> +
> +	# added in 21.11
> +	rte_eth_rx_meta_negotiate;
>  };
> 
>  INTERNAL {
> --
> 2.20.1
  
Andrew Rybchenko Sept. 30, 2021, 3:07 p.m. UTC | #2
Hi Ori,

On 9/30/21 5:59 PM, Ori Kam wrote:
> Hi Ivan,
> Sorry for jumping in late.
> 
> I have a concern that this patch breaks other PMDs.
>>From the rst file " One should negotiate flag delivery beforehand"
> since you only added this function for your PMD all other PMD will fail.
> I see that you added exception in the  examples, but it doesn't make sense
> that applications will also need to add this exception which is not documented.

It is a new API and the function description lists possible
return codes. An application can handle these return codes
gracefully. I'm not sure that it makes sense to highlight
it as a special case.

> 
> Please see more comments inline.

See below

> Thanks,
> Ori
> 
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Thursday, September 23, 2021 2:20 PM
>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> Delivery of mark, flag and the likes might affect small packet performance.
>> If these features are disabled by default, enabling them in started state
>> without causing traffic disruption may not always be possible.
>>
>> Let applications negotiate delivery of Rx meta data beforehand.
>>
>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>> Acked-by: Jerin Jacob <jerinj@marvell.com>

[snip]

>> +__rte_experimental
>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features);
> 
> I don't think meta is the best name since we also have meta item and the word meta can be used
> in other cases.

Do you have any idea what could be used instead of it?

Thanks,
Andrew.
  
Ivan Malov Sept. 30, 2021, 7:07 p.m. UTC | #3
Hi Ori,

On 30/09/2021 17:59, Ori Kam wrote:
> Hi Ivan,
> Sorry for jumping in late.

No worries. That's OK.

> I have a concern that this patch breaks other PMDs.

It does no such thing.

>>From the rst file " One should negotiate flag delivery beforehand"
> since you only added this function for your PMD all other PMD will fail.
> I see that you added exception in the  examples, but it doesn't make sense
> that applications will also need to add this exception which is not documented.

Say, you have an application, and you use it with some specific PMD. 
Say, that PMD doesn't run into the problem as ours does. In other words, 
the user can insert a flow with action MARK at any point and get mark 
delivery working starting from that moment without any problem. Say, 
this is exactly the way how it works for you at the moment.

Now. This new API kicks in. We update the application to invoke it as 
early as possible. But your PMD in question still doesn't support this 
API. The comment in the patch says that if the method returns ENOTSUP, 
the application should ignore that without batting an eyelid. It should 
just keep on working as it did before the introduction of this API.

More specific example:
Say, the application doesn't mind using either "RSS + MARK" or tunnel 
offload. What it does right now is attempt to insert tunnel flows first 
and, if this fails, fall back to "RSS + MARK". With this API, the 
application will try to invoke this API with "USER_MARK | TUNNEL_ID" in 
adapter initialised state. If the PMD says that it can only enable the 
tunnel offload, then the application will get the knowledge that it 
doesn't make sense to even try inserting "RSS + MARK" flows. It just can 
skip useless actions. But if the PMD doesn't support the method, the 
application will see ENOTSUP and handle this gracefully: it will make no 
assumptions about what's guaranteed to be supported and what's not and 
will just keep on its old behaviour: try to insert a flow, fail, fall 
back to another type of flow.

So no breakages with this API.

> 
> Please see more comments inline.
> 
> Thanks,
> Ori
> 
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Thursday, September 23, 2021 2:20 PM
>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> Delivery of mark, flag and the likes might affect small packet performance.
>> If these features are disabled by default, enabling them in started state
>> without causing traffic disruption may not always be possible.
>>
>> Let applications negotiate delivery of Rx meta data beforehand.
>>
>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>> ---
>>   app/test-flow-perf/main.c              | 21 ++++++++++++
>>   app/test-pmd/testpmd.c                 | 26 +++++++++++++++
>>   doc/guides/rel_notes/release_21_11.rst |  9 ++++++
>>   lib/ethdev/ethdev_driver.h             | 19 +++++++++++
>>   lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
>>   lib/ethdev/rte_ethdev.h                | 45 ++++++++++++++++++++++++++
>>   lib/ethdev/rte_flow.h                  | 12 +++++++
>>   lib/ethdev/version.map                 |  3 ++
>>   8 files changed, 160 insertions(+)
>>
>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c index
>> 9be8edc31d..48eafffb1d 100644
>> --- a/app/test-flow-perf/main.c
>> +++ b/app/test-flow-perf/main.c
>> @@ -1760,6 +1760,27 @@ init_port(void)
>>   		rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
>>
>>   	for (port_id = 0; port_id < nr_ports; port_id++) {
>> +		uint64_t rx_meta_features = 0;
>> +
>> +		rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>> +		rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>> +
>> +		ret = rte_eth_rx_meta_negotiate(port_id,
>> &rx_meta_features);
>> +		if (ret == 0) {
>> +			if (!(rx_meta_features &
>> RTE_ETH_RX_META_USER_FLAG)) {
>> +				printf(":: flow action FLAG will not affect Rx
>> mbufs on port=%u\n",
>> +				       port_id);
>> +			}
>> +
>> +			if (!(rx_meta_features &
>> RTE_ETH_RX_META_USER_MARK)) {
>> +				printf(":: flow action MARK will not affect Rx
>> mbufs on port=%u\n",
>> +				       port_id);
>> +			}
>> +		} else if (ret != -ENOTSUP) {
>> +			rte_exit(EXIT_FAILURE, "Error when negotiating Rx
>> meta features on port=%u: %s\n",
>> +				 port_id, rte_strerror(-ret));
>> +		}
>> +
>>   		ret = rte_eth_dev_info_get(port_id, &dev_info);
>>   		if (ret != 0)
>>   			rte_exit(EXIT_FAILURE,
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 97ae52e17e..7a8da3d7ab 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -1485,10 +1485,36 @@ static void
>>   init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
>>   	struct rte_port *port = &ports[pid];
>> +	uint64_t rx_meta_features = 0;
>>   	uint16_t data_size;
>>   	int ret;
>>   	int i;
>>
>> +	rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>> +	rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>> +	rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>> +
>> +	ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>> +	if (ret == 0) {
>> +		if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>> +			TESTPMD_LOG(INFO, "Flow action FLAG will not
>> affect Rx mbufs on port %u\n",
>> +				    pid);
>> +		}
>> +
>> +		if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
>> {
>> +			TESTPMD_LOG(INFO, "Flow action MARK will not
>> affect Rx mbufs on port %u\n",
>> +				    pid);
>> +		}
>> +
>> +		if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>> +			TESTPMD_LOG(INFO, "Flow tunnel offload support
>> might be limited or unavailable on port %u\n",
>> +				    pid);
>> +		}
>> +	} else if (ret != -ENOTSUP) {
>> +		rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
>> features on port %u: %s\n",
>> +			 pid, rte_strerror(-ret));
>> +	}
>> +
>>   	port->dev_conf.txmode = tx_mode;
>>   	port->dev_conf.rxmode = rx_mode;
>>
>> diff --git a/doc/guides/rel_notes/release_21_11.rst
>> b/doc/guides/rel_notes/release_21_11.rst
>> index 19356ac53c..6674d4474c 100644
>> --- a/doc/guides/rel_notes/release_21_11.rst
>> +++ b/doc/guides/rel_notes/release_21_11.rst
>> @@ -106,6 +106,15 @@ New Features
>>     Added command-line options to specify total number of processes and
>>     current process ID. Each process owns subset of Rx and Tx queues.
>>
>> +* **Added an API to negotiate delivery of specific parts of Rx meta
>> +data**
>> +
>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
>> +  The following parts of Rx meta data were defined:
>> +
>> +  * ``RTE_ETH_RX_META_USER_FLAG``
>> +  * ``RTE_ETH_RX_META_USER_MARK``
>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
>> +
>>
>>   Removed Items
>>   -------------
>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
>> 40e474aa7e..96e0c60cae 100644
>> --- a/lib/ethdev/ethdev_driver.h
>> +++ b/lib/ethdev/ethdev_driver.h
>> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void *rxq,
>> typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
>>   	struct rte_eth_representor_info *info);
>>
>> +/**
>> + * @internal
>> + * Negotiate delivery of specific parts of Rx meta data.
>> + *
>> + * @param dev
>> + *   Port (ethdev) handle
>> + *
>> + * @param[inout] features
>> + *   Feature selection buffer
>> + *
>> + * @return
>> + *   Negative errno value on error, zero otherwise
>> + */
>> +typedef int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
>> +				       uint64_t *features);
>> +
>>   /**
>>    * @internal A structure containing the functions exported by an Ethernet
>> driver.
>>    */
>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
>>
>>   	eth_representor_info_get_t representor_info_get;
>>   	/**< Get representor info. */
>> +
>> +	eth_rx_meta_negotiate_t rx_meta_negotiate;
>> +	/**< Negotiate delivery of specific parts of Rx meta data. */
>>   };
>>
>>   /**
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>> daf5ca9242..49cb84d64c 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t port_id,
>>   	return eth_err(port_id, (*dev->dev_ops-
>>> representor_info_get)(dev, info));  }
>>
>> +int
>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
>> +	struct rte_eth_dev *dev;
>> +
>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> +	dev = &rte_eth_devices[port_id];
>> +
>> +	if (dev->data->dev_configured != 0) {
>> +		RTE_ETHDEV_LOG(ERR,
>> +			"The port (id=%"PRIu16") is already configured\n",
>> +			port_id);
>> +		return -EBUSY;
>> +	}
>> +
>> +	if (features == NULL) {
>> +		RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_meta_negotiate,
>> -ENOTSUP);
>> +	return eth_err(port_id,
>> +		       (*dev->dev_ops->rx_meta_negotiate)(dev, features)); }
>> +
>>   RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>>
>>   RTE_INIT(ethdev_init_telemetry)
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>> 1da37896d8..8467a7a362 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
>> rte_eth_representor_info_get(uint16_t port_id,
>>   				 struct rte_eth_representor_info *info);
>>
>> +/** The ethdev sees flagged packets if there are flows with action
>> +FLAG. */ #define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
>> +
>> +/** The ethdev sees mark IDs in packets if there are flows with action
>> +MARK. */ #define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1)
>> +
>> +/** The ethdev detects missed packets if there are "tunnel_set" flows
>> +in use. */ #define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2)
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Negotiate delivery of specific parts of Rx meta data.
>> + *
>> + * Invoke this API before the first rte_eth_dev_configure() invocation
>> + * to let the PMD make preparations that are inconvenient to do later.
>> + *
>> + * The negotiation process is as follows:
>> + *
>> + * - the application requests features intending to use at least some
>> +of them;
>> + * - the PMD responds with the guaranteed subset of the requested
>> +feature set;
>> + * - the application can retry negotiation with another set of
>> +features;
>> + * - the application can pass zero to clear the negotiation result;
>> + * - the last negotiated result takes effect upon the ethdev start.
>> + *
>> + * If this API is unsupported, the application should gracefully ignore that.
>> + *
>> + * @param port_id
>> + *   Port (ethdev) identifier
>> + *
>> + * @param[inout] features
>> + *   Feature selection buffer
>> + *
>> + * @return
>> + *   - (-EBUSY) if the port can't handle this in its current state;
>> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
>> + *   - (-ENODEV) if *port_id* is invalid;
>> + *   - (-EINVAL) if *features* is NULL;
>> + *   - (-EIO) if the device is removed;
>> + *   - (0) on success
>> + */
>> +__rte_experimental
>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features);
> 
> I don't think meta is the best name since we also have meta item and the word meta can be used
> in other cases.

I'm no expert in naming. What could be a better term for this? 
Personally, I'd rather not perceive "meta" the way you describe. It's 
not just "meta". It's "rx_meta", and the flags supplied with this API 
provide enough context to explain what it's all about.

> 
>> +
>>   #include <rte_ethdev_core.h>
>>
>>   /**
>> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>> 70f455d47d..6eb7ec0574 100644
>> --- a/lib/ethdev/rte_flow.h
>> +++ b/lib/ethdev/rte_flow.h
>> @@ -1904,6 +1904,10 @@ enum rte_flow_action_type {
>>   	 * PKT_RX_FDIR_ID mbuf flags.
>>   	 *
>>   	 * See struct rte_flow_action_mark.
>> +	 *
>> +	 * One should negotiate delivery of mark IDs beforehand.
>> +	 * @see rte_eth_rx_meta_negotiate()
>> +	 * @see RTE_ETH_RX_META_USER_MARK
>>   	 */
>>   	RTE_FLOW_ACTION_TYPE_MARK,
>>
>> @@ -1912,6 +1916,10 @@ enum rte_flow_action_type {
>>   	 * sets the PKT_RX_FDIR mbuf flag.
>>   	 *
>>   	 * No associated configuration structure.
>> +	 *
>> +	 * One should negotiate flag delivery beforehand.
>> +	 * @see rte_eth_rx_meta_negotiate()
>> +	 * @see RTE_ETH_RX_META_USER_FLAG
>>   	 */
>>   	RTE_FLOW_ACTION_TYPE_FLAG,
>>
>> @@ -4223,6 +4231,10 @@ rte_flow_tunnel_match(uint16_t port_id,
>>   /**
>>    * Populate the current packet processing state, if exists, for the given mbuf.
>>    *
>> + * One should negotiate the processing state information delivery
>> beforehand.
>> + * @see rte_eth_rx_meta_negotiate()
>> + * @see RTE_ETH_RX_META_TUNNEL_ID
>> + *
>>    * @param port_id
>>    *   Port identifier of Ethernet device.
>>    * @param[in] m
>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
>> 904bce6ea1..a8928266a9 100644
>> --- a/lib/ethdev/version.map
>> +++ b/lib/ethdev/version.map
>> @@ -247,6 +247,9 @@ EXPERIMENTAL {
>>   	rte_mtr_meter_policy_delete;
>>   	rte_mtr_meter_policy_update;
>>   	rte_mtr_meter_policy_validate;
>> +
>> +	# added in 21.11
>> +	rte_eth_rx_meta_negotiate;
>>   };
>>
>>   INTERNAL {
>> --
>> 2.20.1
  
Ajit Khaparde Sept. 30, 2021, 9:48 p.m. UTC | #4
::::
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 97ae52e17e..7a8da3d7ab 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1485,10 +1485,36 @@ static void
>  init_config_port_offloads(portid_t pid, uint32_t socket_id)
>  {
>         struct rte_port *port = &ports[pid];
> +       uint64_t rx_meta_features = 0;
>         uint16_t data_size;
>         int ret;
>         int i;
>
> +       rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> +       rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> +       rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> +
> +       ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> +       if (ret == 0) {
> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> +                       TESTPMD_LOG(INFO, "Flow action FLAG will not affect Rx mbufs on port %u\n",
Log level info might be a little too noisy?

> +                                   pid);
> +               }
> +
> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
> +                       TESTPMD_LOG(INFO, "Flow action MARK will not affect Rx mbufs on port %u\n",
> +                                   pid);
> +               }
> +
> +               if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> +                       TESTPMD_LOG(INFO, "Flow tunnel offload support might be limited or unavailable on port %u\n",
> +                                   pid);
> +               }
:::
>
  
Ivan Malov Sept. 30, 2021, 10 p.m. UTC | #5
Hi Ajit,

On 01/10/2021 00:48, Ajit Khaparde wrote:
> ::::
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>> index 97ae52e17e..7a8da3d7ab 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -1485,10 +1485,36 @@ static void
>>   init_config_port_offloads(portid_t pid, uint32_t socket_id)
>>   {
>>          struct rte_port *port = &ports[pid];
>> +       uint64_t rx_meta_features = 0;
>>          uint16_t data_size;
>>          int ret;
>>          int i;
>>
>> +       rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>> +       rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>> +       rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>> +
>> +       ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>> +       if (ret == 0) {
>> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>> +                       TESTPMD_LOG(INFO, "Flow action FLAG will not affect Rx mbufs on port %u\n",
> Log level info might be a little too noisy?

Do you really think so? But main() sets default log level to DEBUG, quote:
     rte_log_set_level(testpmd_logtype, RTE_LOG_DEBUG);

If I go for DEBUG instead of INFO here, it won't get any quieter, will it?

> 
>> +                                   pid);
>> +               }
>> +
>> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
>> +                       TESTPMD_LOG(INFO, "Flow action MARK will not affect Rx mbufs on port %u\n",
>> +                                   pid);
>> +               }
>> +
>> +               if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>> +                       TESTPMD_LOG(INFO, "Flow tunnel offload support might be limited or unavailable on port %u\n",
>> +                                   pid);
>> +               }
> :::
>>
  
Ajit Khaparde Sept. 30, 2021, 10:12 p.m. UTC | #6
On Thu, Sep 30, 2021 at 3:01 PM Ivan Malov <Ivan.Malov@oktetlabs.ru> wrote:
>
> Hi Ajit,
>
> On 01/10/2021 00:48, Ajit Khaparde wrote:
> > ::::
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> >> index 97ae52e17e..7a8da3d7ab 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -1485,10 +1485,36 @@ static void
> >>   init_config_port_offloads(portid_t pid, uint32_t socket_id)
> >>   {
> >>          struct rte_port *port = &ports[pid];
> >> +       uint64_t rx_meta_features = 0;
> >>          uint16_t data_size;
> >>          int ret;
> >>          int i;
> >>
> >> +       rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >> +       rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >> +       rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> >> +
> >> +       ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> >> +       if (ret == 0) {
> >> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> >> +                       TESTPMD_LOG(INFO, "Flow action FLAG will not affect Rx mbufs on port %u\n",
> > Log level info might be a little too noisy?
>
> Do you really think so? But main() sets default log level to DEBUG, quote:
>      rte_log_set_level(testpmd_logtype, RTE_LOG_DEBUG);
>
> If I go for DEBUG instead of INFO here, it won't get any quieter, will it?
You are right. It won't.
But then three extra messages per port will stand out. But that's my opinion.
Maybe you could log the message when a flow is created with any of the
meta features?

>
> >
> >> +                                   pid);
> >> +               }
> >> +
> >> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
> >> +                       TESTPMD_LOG(INFO, "Flow action MARK will not affect Rx mbufs on port %u\n",
> >> +                                   pid);
> >> +               }
> >> +
> >> +               if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> >> +                       TESTPMD_LOG(INFO, "Flow tunnel offload support might be limited or unavailable on port %u\n",
> >> +                                   pid);
> >> +               }
> > :::
> >>
>
> --
> Ivan M
  
Ivan Malov Sept. 30, 2021, 10:22 p.m. UTC | #7
On 01/10/2021 01:12, Ajit Khaparde wrote:
> On Thu, Sep 30, 2021 at 3:01 PM Ivan Malov <Ivan.Malov@oktetlabs.ru> wrote:
>>
>> Hi Ajit,
>>
>> On 01/10/2021 00:48, Ajit Khaparde wrote:
>>> ::::
>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>>>> index 97ae52e17e..7a8da3d7ab 100644
>>>> --- a/app/test-pmd/testpmd.c
>>>> +++ b/app/test-pmd/testpmd.c
>>>> @@ -1485,10 +1485,36 @@ static void
>>>>    init_config_port_offloads(portid_t pid, uint32_t socket_id)
>>>>    {
>>>>           struct rte_port *port = &ports[pid];
>>>> +       uint64_t rx_meta_features = 0;
>>>>           uint16_t data_size;
>>>>           int ret;
>>>>           int i;
>>>>
>>>> +       rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>> +       rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>> +       rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>>>> +
>>>> +       ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>>>> +       if (ret == 0) {
>>>> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>>>> +                       TESTPMD_LOG(INFO, "Flow action FLAG will not affect Rx mbufs on port %u\n",
>>> Log level info might be a little too noisy?
>>
>> Do you really think so? But main() sets default log level to DEBUG, quote:
>>       rte_log_set_level(testpmd_logtype, RTE_LOG_DEBUG);
>>
>> If I go for DEBUG instead of INFO here, it won't get any quieter, will it?
> You are right. It won't.
> But then three extra messages per port will stand out. But that's my opinion.
> Maybe you could log the message when a flow is created with any of the
> meta features?

The idea is to warn the user from the very beginning that certain flow 
primitives won't actually work. This way, the user can refrain from 
trying to use them in flow rules. This might save their time.

But I don't mind going for DEBUG here. More opinions are welcome.

> 
>>
>>>
>>>> +                                   pid);
>>>> +               }
>>>> +
>>>> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
>>>> +                       TESTPMD_LOG(INFO, "Flow action MARK will not affect Rx mbufs on port %u\n",
>>>> +                                   pid);
>>>> +               }
>>>> +
>>>> +               if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>>>> +                       TESTPMD_LOG(INFO, "Flow tunnel offload support might be limited or unavailable on port %u\n",
>>>> +                                   pid);
>>>> +               }
>>> :::
>>>>
>>
>> --
>> Ivan M
  
Andrew Rybchenko Oct. 1, 2021, 6:50 a.m. UTC | #8
On 9/30/21 10:07 PM, Ivan Malov wrote:
> Hi Ori,
> 
> On 30/09/2021 17:59, Ori Kam wrote:
>> Hi Ivan,
>> Sorry for jumping in late.
> 
> No worries. That's OK.
> 
>> I have a concern that this patch breaks other PMDs.
> 
> It does no such thing.
> 
>>> From the rst file " One should negotiate flag delivery beforehand"
>> since you only added this function for your PMD all other PMD will fail.
>> I see that you added exception in the  examples, but it doesn't make
>> sense
>> that applications will also need to add this exception which is not
>> documented.
> 
> Say, you have an application, and you use it with some specific PMD.
> Say, that PMD doesn't run into the problem as ours does. In other words,
> the user can insert a flow with action MARK at any point and get mark
> delivery working starting from that moment without any problem. Say,
> this is exactly the way how it works for you at the moment.
> 
> Now. This new API kicks in. We update the application to invoke it as
> early as possible. But your PMD in question still doesn't support this
> API. The comment in the patch says that if the method returns ENOTSUP,
> the application should ignore that without batting an eyelid. It should
> just keep on working as it did before the introduction of this API.
> 
> More specific example:
> Say, the application doesn't mind using either "RSS + MARK" or tunnel
> offload. What it does right now is attempt to insert tunnel flows first
> and, if this fails, fall back to "RSS + MARK". With this API, the
> application will try to invoke this API with "USER_MARK | TUNNEL_ID" in
> adapter initialised state. If the PMD says that it can only enable the
> tunnel offload, then the application will get the knowledge that it
> doesn't make sense to even try inserting "RSS + MARK" flows. It just can
> skip useless actions. But if the PMD doesn't support the method, the
> application will see ENOTSUP and handle this gracefully: it will make no
> assumptions about what's guaranteed to be supported and what's not and
> will just keep on its old behaviour: try to insert a flow, fail, fall
> back to another type of flow.
> 
> So no breakages with this API.
> 
>>
>> Please see more comments inline.
>>
>> Thanks,
>> Ori
>>
>>> -----Original Message-----
>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>> Sent: Thursday, September 23, 2021 2:20 PM
>>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>>> data
>>>
>>> Delivery of mark, flag and the likes might affect small packet
>>> performance.
>>> If these features are disabled by default, enabling them in started
>>> state
>>> without causing traffic disruption may not always be possible.
>>>
>>> Let applications negotiate delivery of Rx meta data beforehand.
>>>
>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>> ---
>>>   app/test-flow-perf/main.c              | 21 ++++++++++++
>>>   app/test-pmd/testpmd.c                 | 26 +++++++++++++++
>>>   doc/guides/rel_notes/release_21_11.rst |  9 ++++++
>>>   lib/ethdev/ethdev_driver.h             | 19 +++++++++++
>>>   lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
>>>   lib/ethdev/rte_ethdev.h                | 45 ++++++++++++++++++++++++++
>>>   lib/ethdev/rte_flow.h                  | 12 +++++++
>>>   lib/ethdev/version.map                 |  3 ++
>>>   8 files changed, 160 insertions(+)
>>>
>>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c index
>>> 9be8edc31d..48eafffb1d 100644
>>> --- a/app/test-flow-perf/main.c
>>> +++ b/app/test-flow-perf/main.c
>>> @@ -1760,6 +1760,27 @@ init_port(void)
>>>           rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
>>>
>>>       for (port_id = 0; port_id < nr_ports; port_id++) {
>>> +        uint64_t rx_meta_features = 0;
>>> +
>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>> +
>>> +        ret = rte_eth_rx_meta_negotiate(port_id,
>>> &rx_meta_features);
>>> +        if (ret == 0) {
>>> +            if (!(rx_meta_features &
>>> RTE_ETH_RX_META_USER_FLAG)) {
>>> +                printf(":: flow action FLAG will not affect Rx
>>> mbufs on port=%u\n",
>>> +                       port_id);
>>> +            }
>>> +
>>> +            if (!(rx_meta_features &
>>> RTE_ETH_RX_META_USER_MARK)) {
>>> +                printf(":: flow action MARK will not affect Rx
>>> mbufs on port=%u\n",
>>> +                       port_id);
>>> +            }
>>> +        } else if (ret != -ENOTSUP) {
>>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
>>> meta features on port=%u: %s\n",
>>> +                 port_id, rte_strerror(-ret));
>>> +        }
>>> +
>>>           ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>           if (ret != 0)
>>>               rte_exit(EXIT_FAILURE,
>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>> 97ae52e17e..7a8da3d7ab 100644
>>> --- a/app/test-pmd/testpmd.c
>>> +++ b/app/test-pmd/testpmd.c
>>> @@ -1485,10 +1485,36 @@ static void
>>>   init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
>>>       struct rte_port *port = &ports[pid];
>>> +    uint64_t rx_meta_features = 0;
>>>       uint16_t data_size;
>>>       int ret;
>>>       int i;
>>>
>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>>> +
>>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>>> +    if (ret == 0) {
>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
>>> affect Rx mbufs on port %u\n",
>>> +                    pid);
>>> +        }
>>> +
>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
>>> {
>>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
>>> affect Rx mbufs on port %u\n",
>>> +                    pid);
>>> +        }
>>> +
>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
>>> might be limited or unavailable on port %u\n",
>>> +                    pid);
>>> +        }
>>> +    } else if (ret != -ENOTSUP) {
>>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
>>> features on port %u: %s\n",
>>> +             pid, rte_strerror(-ret));
>>> +    }
>>> +
>>>       port->dev_conf.txmode = tx_mode;
>>>       port->dev_conf.rxmode = rx_mode;
>>>
>>> diff --git a/doc/guides/rel_notes/release_21_11.rst
>>> b/doc/guides/rel_notes/release_21_11.rst
>>> index 19356ac53c..6674d4474c 100644
>>> --- a/doc/guides/rel_notes/release_21_11.rst
>>> +++ b/doc/guides/rel_notes/release_21_11.rst
>>> @@ -106,6 +106,15 @@ New Features
>>>     Added command-line options to specify total number of processes and
>>>     current process ID. Each process owns subset of Rx and Tx queues.
>>>
>>> +* **Added an API to negotiate delivery of specific parts of Rx meta
>>> +data**
>>> +
>>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
>>> +  The following parts of Rx meta data were defined:
>>> +
>>> +  * ``RTE_ETH_RX_META_USER_FLAG``
>>> +  * ``RTE_ETH_RX_META_USER_MARK``
>>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
>>> +
>>>
>>>   Removed Items
>>>   -------------
>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>> index
>>> 40e474aa7e..96e0c60cae 100644
>>> --- a/lib/ethdev/ethdev_driver.h
>>> +++ b/lib/ethdev/ethdev_driver.h
>>> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void *rxq,
>>> typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
>>>       struct rte_eth_representor_info *info);
>>>
>>> +/**
>>> + * @internal
>>> + * Negotiate delivery of specific parts of Rx meta data.
>>> + *
>>> + * @param dev
>>> + *   Port (ethdev) handle
>>> + *
>>> + * @param[inout] features
>>> + *   Feature selection buffer
>>> + *
>>> + * @return
>>> + *   Negative errno value on error, zero otherwise
>>> + */
>>> +typedef int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
>>> +                       uint64_t *features);
>>> +
>>>   /**
>>>    * @internal A structure containing the functions exported by an
>>> Ethernet
>>> driver.
>>>    */
>>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
>>>
>>>       eth_representor_info_get_t representor_info_get;
>>>       /**< Get representor info. */
>>> +
>>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
>>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
>>>   };
>>>
>>>   /**
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> daf5ca9242..49cb84d64c 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t port_id,
>>>       return eth_err(port_id, (*dev->dev_ops-
>>>> representor_info_get)(dev, info));  }
>>>
>>> +int
>>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
>>> +    struct rte_eth_dev *dev;
>>> +
>>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>> +    dev = &rte_eth_devices[port_id];
>>> +
>>> +    if (dev->data->dev_configured != 0) {
>>> +        RTE_ETHDEV_LOG(ERR,
>>> +            "The port (id=%"PRIu16") is already configured\n",
>>> +            port_id);
>>> +        return -EBUSY;
>>> +    }
>>> +
>>> +    if (features == NULL) {
>>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_meta_negotiate,
>>> -ENOTSUP);
>>> +    return eth_err(port_id,
>>> +               (*dev->dev_ops->rx_meta_negotiate)(dev, features)); }
>>> +
>>>   RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>>>
>>>   RTE_INIT(ethdev_init_telemetry)
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> 1da37896d8..8467a7a362 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
>>> rte_eth_representor_info_get(uint16_t port_id,
>>>                    struct rte_eth_representor_info *info);
>>>
>>> +/** The ethdev sees flagged packets if there are flows with action
>>> +FLAG. */ #define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
>>> +
>>> +/** The ethdev sees mark IDs in packets if there are flows with action
>>> +MARK. */ #define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1)
>>> +
>>> +/** The ethdev detects missed packets if there are "tunnel_set" flows
>>> +in use. */ #define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2)
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Negotiate delivery of specific parts of Rx meta data.
>>> + *
>>> + * Invoke this API before the first rte_eth_dev_configure() invocation
>>> + * to let the PMD make preparations that are inconvenient to do later.
>>> + *
>>> + * The negotiation process is as follows:
>>> + *
>>> + * - the application requests features intending to use at least some
>>> +of them;
>>> + * - the PMD responds with the guaranteed subset of the requested
>>> +feature set;
>>> + * - the application can retry negotiation with another set of
>>> +features;
>>> + * - the application can pass zero to clear the negotiation result;
>>> + * - the last negotiated result takes effect upon the ethdev start.
>>> + *
>>> + * If this API is unsupported, the application should gracefully
>>> ignore that.
>>> + *
>>> + * @param port_id
>>> + *   Port (ethdev) identifier
>>> + *
>>> + * @param[inout] features
>>> + *   Feature selection buffer
>>> + *
>>> + * @return
>>> + *   - (-EBUSY) if the port can't handle this in its current state;
>>> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
>>> + *   - (-ENODEV) if *port_id* is invalid;
>>> + *   - (-EINVAL) if *features* is NULL;
>>> + *   - (-EIO) if the device is removed;
>>> + *   - (0) on success
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features);
>>
>> I don't think meta is the best name since we also have meta item and
>> the word meta can be used
>> in other cases.
> 
> I'm no expert in naming. What could be a better term for this?
> Personally, I'd rather not perceive "meta" the way you describe. It's
> not just "meta". It's "rx_meta", and the flags supplied with this API
> provide enough context to explain what it's all about.

Thinking overnight about it I'd suggest full "metadata".
Yes, it will name a bit longer, but less confusing versus
term META already used in flow API.

Andrew.
  
Ori Kam Oct. 3, 2021, 7:05 a.m. UTC | #9
Hi

> -----Original Message-----
> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> Subject: Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add API to negotiate delivery
> of Rx meta data
> 
> 
> 
> On 01/10/2021 01:12, Ajit Khaparde wrote:
> > On Thu, Sep 30, 2021 at 3:01 PM Ivan Malov <Ivan.Malov@oktetlabs.ru>
> wrote:
> >>
> >> Hi Ajit,
> >>
> >> On 01/10/2021 00:48, Ajit Khaparde wrote:
> >>> ::::
> >>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>>> 97ae52e17e..7a8da3d7ab 100644
> >>>> --- a/app/test-pmd/testpmd.c
> >>>> +++ b/app/test-pmd/testpmd.c
> >>>> @@ -1485,10 +1485,36 @@ static void
> >>>>    init_config_port_offloads(portid_t pid, uint32_t socket_id)
> >>>>    {
> >>>>           struct rte_port *port = &ports[pid];
> >>>> +       uint64_t rx_meta_features = 0;
> >>>>           uint16_t data_size;
> >>>>           int ret;
> >>>>           int i;
> >>>>
> >>>> +       rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>>> +       rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>>> +       rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> >>>> +
> >>>> +       ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> >>>> +       if (ret == 0) {
> >>>> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> >>>> +                       TESTPMD_LOG(INFO, "Flow action FLAG will
> >>>> + not affect Rx mbufs on port %u\n",
> >>> Log level info might be a little too noisy?
> >>
> >> Do you really think so? But main() sets default log level to DEBUG, quote:
> >>       rte_log_set_level(testpmd_logtype, RTE_LOG_DEBUG);
> >>
> >> If I go for DEBUG instead of INFO here, it won't get any quieter, will it?
> > You are right. It won't.
> > But then three extra messages per port will stand out. But that's my
> opinion.
> > Maybe you could log the message when a flow is created with any of the
> > meta features?
> 
> The idea is to warn the user from the very beginning that certain flow
> primitives won't actually work. This way, the user can refrain from trying to
> use them in flow rules. This might save their time.
> 
> But I don't mind going for DEBUG here. More opinions are welcome.
> 

+1 for doing it only for configuration, and not per flow.

> >
> >>
> >>>
> >>>> +                                   pid);
> >>>> +               }
> >>>> +
> >>>> +               if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
> >>>> +                       TESTPMD_LOG(INFO, "Flow action MARK will not affect Rx
> mbufs on port %u\n",
> >>>> +                                   pid);
> >>>> +               }
> >>>> +
> >>>> +               if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> >>>> +                       TESTPMD_LOG(INFO, "Flow tunnel offload support might
> be limited or unavailable on port %u\n",
> >>>> +                                   pid);
> >>>> +               }
> >>> :::
> >>>>
> >>
> >> --
> >> Ivan M
> 
> --
> Ivan M
Best,
Ori
  
Ori Kam Oct. 3, 2021, 7:42 a.m. UTC | #10
Hi Andrew and Ivan,


> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Friday, October 1, 2021 9:50 AM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> On 9/30/21 10:07 PM, Ivan Malov wrote:
> > Hi Ori,
> >
> > On 30/09/2021 17:59, Ori Kam wrote:
> >> Hi Ivan,
> >> Sorry for jumping in late.
> >
> > No worries. That's OK.
> >
> >> I have a concern that this patch breaks other PMDs.
> >
> > It does no such thing.
> >
> >>> From the rst file " One should negotiate flag delivery beforehand"
> >> since you only added this function for your PMD all other PMD will fail.
> >> I see that you added exception in the  examples, but it doesn't make
> >> sense that applications will also need to add this exception which is
> >> not documented.
> >
> > Say, you have an application, and you use it with some specific PMD.
> > Say, that PMD doesn't run into the problem as ours does. In other
> > words, the user can insert a flow with action MARK at any point and
> > get mark delivery working starting from that moment without any
> > problem. Say, this is exactly the way how it works for you at the moment.
> >
> > Now. This new API kicks in. We update the application to invoke it as
> > early as possible. But your PMD in question still doesn't support this
> > API. The comment in the patch says that if the method returns ENOTSUP,
> > the application should ignore that without batting an eyelid. It
> > should just keep on working as it did before the introduction of this API.
> >

I understand that it is nice to write in the patch comment that application
should disregard this function in case of 
ENOTSUP but in a few month someone will read the official doc,
where it is stated that this function call is a must and then what do you
think the application will do?
I think that the correct way is to add this function to all PMDs.
Another option is to add to the doc that if the function is returning ENOTSUP
the application should assume that all is supported.
 
So from this point of view there is API break.

> > More specific example:
> > Say, the application doesn't mind using either "RSS + MARK" or tunnel
> > offload. What it does right now is attempt to insert tunnel flows
> > first and, if this fails, fall back to "RSS + MARK". With this API,
> > the application will try to invoke this API with "USER_MARK |
> > TUNNEL_ID" in adapter initialised state. If the PMD says that it can
> > only enable the tunnel offload, then the application will get the
> > knowledge that it doesn't make sense to even try inserting "RSS +
> > MARK" flows. It just can skip useless actions. But if the PMD doesn't
> > support the method, the application will see ENOTSUP and handle this
> > gracefully: it will make no assumptions about what's guaranteed to be
> > supported and what's not and will just keep on its old behavior: try
> > to insert a flow, fail, fall back to another type of flow.
> >

I fully agree with your example, and think that this is the way
to go, application should supply as much info as possible during startup.
My question/comment is the negotiated result means that all of the actions
are supported on the same rule?
for example if application wants to add mark and tag on the same rule.
(I know it doesn't make much sense) and the PMD can support both of them
but not on the same rule, what should it return?
Or for example if using the mark can only be supported if no decap action is set
on this rule what should be the result?
From my undstanding this function is only to let the PMD know that on some
rules the application will use those actions, the checking if the action combination
is valid only happens on validate function right?

In any case I think this is good idea and I will see how we can add a more generic approach of
this API to the new API that I'm going to present.


> > So no breakages with this API.
> >
> >>
> >> Please see more comments inline.
> >>
> >> Thanks,
> >> Ori
> >>
> >>> -----Original Message-----
> >>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>> Sent: Thursday, September 23, 2021 2:20 PM
> >>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx
> >>> meta data
> >>>
> >>> Delivery of mark, flag and the likes might affect small packet
> >>> performance.
> >>> If these features are disabled by default, enabling them in started
> >>> state without causing traffic disruption may not always be possible.
> >>>
> >>> Let applications negotiate delivery of Rx meta data beforehand.
> >>>
> >>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
> >>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> >>> Acked-by: Jerin Jacob <jerinj@marvell.com>
> >>> ---
> >>>   app/test-flow-perf/main.c              | 21 ++++++++++++
> >>>   app/test-pmd/testpmd.c                 | 26 +++++++++++++++
> >>>   doc/guides/rel_notes/release_21_11.rst |  9 ++++++
> >>>   lib/ethdev/ethdev_driver.h             | 19 +++++++++++
> >>>   lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
> >>>   lib/ethdev/rte_ethdev.h                | 45
> >>> ++++++++++++++++++++++++++
> >>>   lib/ethdev/rte_flow.h                  | 12 +++++++
> >>>   lib/ethdev/version.map                 |  3 ++
> >>>   8 files changed, 160 insertions(+)
> >>>
> >>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
> >>> index 9be8edc31d..48eafffb1d 100644
> >>> --- a/app/test-flow-perf/main.c
> >>> +++ b/app/test-flow-perf/main.c
> >>> @@ -1760,6 +1760,27 @@ init_port(void)
> >>>           rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
> >>>
> >>>       for (port_id = 0; port_id < nr_ports; port_id++) {
> >>> +        uint64_t rx_meta_features = 0;
> >>> +
> >>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>> +
> >>> +        ret = rte_eth_rx_meta_negotiate(port_id,
> >>> &rx_meta_features);
> >>> +        if (ret == 0) {
> >>> +            if (!(rx_meta_features &
> >>> RTE_ETH_RX_META_USER_FLAG)) {
> >>> +                printf(":: flow action FLAG will not affect Rx
> >>> mbufs on port=%u\n",
> >>> +                       port_id);
> >>> +            }
> >>> +
> >>> +            if (!(rx_meta_features &
> >>> RTE_ETH_RX_META_USER_MARK)) {
> >>> +                printf(":: flow action MARK will not affect Rx
> >>> mbufs on port=%u\n",
> >>> +                       port_id);
> >>> +            }
> >>> +        } else if (ret != -ENOTSUP) {
> >>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
> >>> meta features on port=%u: %s\n",
> >>> +                 port_id, rte_strerror(-ret));
> >>> +        }
> >>> +
> >>>           ret = rte_eth_dev_info_get(port_id, &dev_info);
> >>>           if (ret != 0)
> >>>               rte_exit(EXIT_FAILURE, diff --git
> >>> a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>> 97ae52e17e..7a8da3d7ab 100644
> >>> --- a/app/test-pmd/testpmd.c
> >>> +++ b/app/test-pmd/testpmd.c
> >>> @@ -1485,10 +1485,36 @@ static void
> >>>   init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
> >>>       struct rte_port *port = &ports[pid];
> >>> +    uint64_t rx_meta_features = 0;
> >>>       uint16_t data_size;
> >>>       int ret;
> >>>       int i;
> >>>
> >>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> >>> +
> >>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> >>> +    if (ret == 0) {
> >>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> >>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
> >>> affect Rx mbufs on port %u\n",
> >>> +                    pid);
> >>> +        }
> >>> +
> >>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
> >>> {
> >>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
> >>> affect Rx mbufs on port %u\n",
> >>> +                    pid);
> >>> +        }
> >>> +
> >>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> >>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
> >>> might be limited or unavailable on port %u\n",
> >>> +                    pid);
> >>> +        }
> >>> +    } else if (ret != -ENOTSUP) {
> >>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
> >>> features on port %u: %s\n",
> >>> +             pid, rte_strerror(-ret));
> >>> +    }
> >>> +
> >>>       port->dev_conf.txmode = tx_mode;
> >>>       port->dev_conf.rxmode = rx_mode;
> >>>
> >>> diff --git a/doc/guides/rel_notes/release_21_11.rst
> >>> b/doc/guides/rel_notes/release_21_11.rst
> >>> index 19356ac53c..6674d4474c 100644
> >>> --- a/doc/guides/rel_notes/release_21_11.rst
> >>> +++ b/doc/guides/rel_notes/release_21_11.rst
> >>> @@ -106,6 +106,15 @@ New Features
> >>>     Added command-line options to specify total number of processes
> >>> and
> >>>     current process ID. Each process owns subset of Rx and Tx queues.
> >>>
> >>> +* **Added an API to negotiate delivery of specific parts of Rx meta
> >>> +data**
> >>> +
> >>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
> >>> +  The following parts of Rx meta data were defined:
> >>> +
> >>> +  * ``RTE_ETH_RX_META_USER_FLAG``
> >>> +  * ``RTE_ETH_RX_META_USER_MARK``
> >>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
> >>> +
> >>>
> >>>   Removed Items
> >>>   -------------
> >>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> >>> index 40e474aa7e..96e0c60cae 100644
> >>> --- a/lib/ethdev/ethdev_driver.h
> >>> +++ b/lib/ethdev/ethdev_driver.h
> >>> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void
> >>> *rxq, typedef int (*eth_representor_info_get_t)(struct rte_eth_dev
> >>> *dev,
> >>>       struct rte_eth_representor_info *info);
> >>>
> >>> +/**
> >>> + * @internal
> >>> + * Negotiate delivery of specific parts of Rx meta data.
> >>> + *
> >>> + * @param dev
> >>> + *   Port (ethdev) handle
> >>> + *
> >>> + * @param[inout] features
> >>> + *   Feature selection buffer
> >>> + *
> >>> + * @return
> >>> + *   Negative errno value on error, zero otherwise  */ typedef int
> >>> +(*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
> >>> +                       uint64_t *features);
> >>> +
> >>>   /**
> >>>    * @internal A structure containing the functions exported by an
> >>> Ethernet driver.
> >>>    */
> >>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
> >>>
> >>>       eth_representor_info_get_t representor_info_get;
> >>>       /**< Get representor info. */
> >>> +
> >>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
> >>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
> >>>   };
> >>>
> >>>   /**
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> daf5ca9242..49cb84d64c 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t
> >>> port_id,
> >>>       return eth_err(port_id, (*dev->dev_ops-
> >>>> representor_info_get)(dev, info));  }
> >>>
> >>> +int
> >>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
> >>> +    struct rte_eth_dev *dev;
> >>> +
> >>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>> +    dev = &rte_eth_devices[port_id];
> >>> +
> >>> +    if (dev->data->dev_configured != 0) {
> >>> +        RTE_ETHDEV_LOG(ERR,
> >>> +            "The port (id=%"PRIu16") is already configured\n",
> >>> +            port_id);
> >>> +        return -EBUSY;
> >>> +    }
> >>> +
> >>> +    if (features == NULL) {
> >>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
> >>> +        return -EINVAL;
> >>> +    }
> >>> +
> >>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_meta_negotiate,
> >>> -ENOTSUP);
> >>> +    return eth_err(port_id,
> >>> +               (*dev->dev_ops->rx_meta_negotiate)(dev, features));
> >>> +}
> >>> +
> >>>   RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> >>>
> >>>   RTE_INIT(ethdev_init_telemetry)
> >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>> 1da37896d8..8467a7a362 100644
> >>> --- a/lib/ethdev/rte_ethdev.h
> >>> +++ b/lib/ethdev/rte_ethdev.h
> >>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
> >>> rte_eth_representor_info_get(uint16_t port_id,
> >>>                    struct rte_eth_representor_info *info);
> >>>
> >>> +/** The ethdev sees flagged packets if there are flows with action
> >>> +FLAG. */ #define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
> >>> +
> >>> +/** The ethdev sees mark IDs in packets if there are flows with
> >>> +action MARK. */ #define RTE_ETH_RX_META_USER_MARK
> (UINT64_C(1) <<
> >>> +1)
> >>> +
> >>> +/** The ethdev detects missed packets if there are "tunnel_set"
> >>> +flows in use. */ #define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1)
> <<
> >>> +2)
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Negotiate delivery of specific parts of Rx meta data.
> >>> + *
> >>> + * Invoke this API before the first rte_eth_dev_configure()
> >>> +invocation
> >>> + * to let the PMD make preparations that are inconvenient to do later.
> >>> + *
> >>> + * The negotiation process is as follows:
> >>> + *
> >>> + * - the application requests features intending to use at least
> >>> +some of them;
> >>> + * - the PMD responds with the guaranteed subset of the requested
> >>> +feature set;
> >>> + * - the application can retry negotiation with another set of
> >>> +features;
> >>> + * - the application can pass zero to clear the negotiation result;
> >>> + * - the last negotiated result takes effect upon the ethdev start.
> >>> + *
> >>> + * If this API is unsupported, the application should gracefully
> >>> ignore that.
> >>> + *
> >>> + * @param port_id
> >>> + *   Port (ethdev) identifier
> >>> + *
> >>> + * @param[inout] features
> >>> + *   Feature selection buffer
> >>> + *
> >>> + * @return
> >>> + *   - (-EBUSY) if the port can't handle this in its current state;
> >>> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
> >>> + *   - (-ENODEV) if *port_id* is invalid;
> >>> + *   - (-EINVAL) if *features* is NULL;
> >>> + *   - (-EIO) if the device is removed;
> >>> + *   - (0) on success
> >>> + */
> >>> +__rte_experimental
> >>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
> >>> +*features);
> >>
> >> I don't think meta is the best name since we also have meta item and
> >> the word meta can be used in other cases.
> >
> > I'm no expert in naming. What could be a better term for this?
> > Personally, I'd rather not perceive "meta" the way you describe. It's
> > not just "meta". It's "rx_meta", and the flags supplied with this API
> > provide enough context to explain what it's all about.
> 
> Thinking overnight about it I'd suggest full "metadata".
> Yes, it will name a bit longer, but less confusing versus term META already
> used in flow API.
> 
Following my above comments, I think it should be part of the new API
but in any case what about rx_flow_action_negotiate?

> Andrew.
Best,
Ori
  
Ivan Malov Oct. 3, 2021, 9:30 a.m. UTC | #11
Hi Ori,

Thanks for reviewing this.

On 03/10/2021 10:42, Ori Kam wrote:
> Hi Andrew and Ivan,
> 
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Friday, October 1, 2021 9:50 AM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> On 9/30/21 10:07 PM, Ivan Malov wrote:
>>> Hi Ori,
>>>
>>> On 30/09/2021 17:59, Ori Kam wrote:
>>>> Hi Ivan,
>>>> Sorry for jumping in late.
>>>
>>> No worries. That's OK.
>>>
>>>> I have a concern that this patch breaks other PMDs.
>>>
>>> It does no such thing.
>>>
>>>>>  From the rst file " One should negotiate flag delivery beforehand"
>>>> since you only added this function for your PMD all other PMD will fail.
>>>> I see that you added exception in the  examples, but it doesn't make
>>>> sense that applications will also need to add this exception which is
>>>> not documented.
>>>
>>> Say, you have an application, and you use it with some specific PMD.
>>> Say, that PMD doesn't run into the problem as ours does. In other
>>> words, the user can insert a flow with action MARK at any point and
>>> get mark delivery working starting from that moment without any
>>> problem. Say, this is exactly the way how it works for you at the moment.
>>>
>>> Now. This new API kicks in. We update the application to invoke it as
>>> early as possible. But your PMD in question still doesn't support this
>>> API. The comment in the patch says that if the method returns ENOTSUP,
>>> the application should ignore that without batting an eyelid. It
>>> should just keep on working as it did before the introduction of this API.
>>>
> 
> I understand that it is nice to write in the patch comment that application
> should disregard this function in case of
> ENOTSUP but in a few month someone will read the official doc,
> where it is stated that this function call is a must and then what do you
> think the application will do?
> I think that the correct way is to add this function to all PMDs.
> Another option is to add to the doc that if the function is returning ENOTSUP
> the application should assume that all is supported.
>   
> So from this point of view there is API break.

So, you mean an API breakage in some formal sense? If the doc is fixed 
in accordance with the second option you suggest, will it suffice to 
avoid this formal API breakage?

> 
>>> More specific example:
>>> Say, the application doesn't mind using either "RSS + MARK" or tunnel
>>> offload. What it does right now is attempt to insert tunnel flows
>>> first and, if this fails, fall back to "RSS + MARK". With this API,
>>> the application will try to invoke this API with "USER_MARK |
>>> TUNNEL_ID" in adapter initialised state. If the PMD says that it can
>>> only enable the tunnel offload, then the application will get the
>>> knowledge that it doesn't make sense to even try inserting "RSS +
>>> MARK" flows. It just can skip useless actions. But if the PMD doesn't
>>> support the method, the application will see ENOTSUP and handle this
>>> gracefully: it will make no assumptions about what's guaranteed to be
>>> supported and what's not and will just keep on its old behavior: try
>>> to insert a flow, fail, fall back to another type of flow.
>>>
> 
> I fully agree with your example, and think that this is the way
> to go, application should supply as much info as possible during startup.

Right.

> My question/comment is the negotiated result means that all of the actions
> are supported on the same rule?
> for example if application wants to add mark and tag on the same rule.
> (I know it doesn't make much sense) and the PMD can support both of them
> but not on the same rule, what should it return?
> Or for example if using the mark can only be supported if no decap action is set
> on this rule what should be the result?
>  From my undstanding this function is only to let the PMD know that on some
> rules the application will use those actions, the checking if the action combination
> is valid only happens on validate function right?

This API does not bind itself to flow API. It's *not* about enabling 
support for metadata *actions* (they are conducted entirely *inside* the 
NIC). It's about enabling *delivery* of metadata from the NIC to host.

Say, you insert a flow rule to mark some packets. The NIC, internally 
(in the e-switch) adds the mark to matching packets. Yes, in the 
boundaries of the NIC HW, the packets bear the mark on them. It has been 
set, yes. But when time comes to *deliver* the packets to the host, the 
NIC (at least, in net/sfc case) has two options: either provide only a 
small chunk of the metadata for each packet *to the host*, which doesn't 
include mark ID, flag and RSS hash, OR, alternatively, provide the full 
set of metadata. In the former option, the mark is simply not delivered. 
Once again: it *has been set*, but simply will not be *delivered to the 
host*.

So, this API is about negotiating *delivery* of metadata. In pure 
technical sense. And the set of flags that this API returns indicates 
which kinds of metadata the NIC will be able to deliver simultaneously.

For example, as I understand, in the case of tunnel offload, MLX5 claims 
Rx mark entirely for tunnel ID metadata, so, if an application requests 
"MARK | TUNNEL_ID" with this API, this PMD should probably want to 
respond with just "TUNNEL_ID". The application will see the response and 
realise that, even if it adds its *own* (user) action MARK to a flow and 
if the flow is not rejected by the PMD, it won't be able to see the mark 
in the received mbufs (or the mark will be incorrect).

But some other PMDs (net/sfc, for instance) claim only a small fraction 
of bits in Rx mark to deliver tunnel ID information. Remaining bits are 
still available for delivery of *user* mark ID. Please see an example at 
https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-ivan.malov@oktetlabs.ru/ 
. In this case, the PMD may want to return both flags in the response: 
"MARK | TUNNEL_ID". This way, the application knows that both features 
are enabled and available for use.

Now. I anticipate more questions asking why wouldn't we prefer flow API 
terminology or why wouldn't we add an API for negotiating support for 
metadata *actions* and not just metadata *delivery*. There's an answer. 
Always has been.

The thing is, the use of *actions* is very complicated. For example, the 
PMD may support action MARK for "transfer" flows but not for 
non-"transfer" ones. Also, simultaneous use of multiple different 
metadata actions may not be possible. And, last but not least, if we 
force the application to check support for *actions* on 
action-after-action basis, the order of checks will be very confusing to 
applications.

Previously, in this thread, Thomas suggested to go for exactly this type 
of API, to check support for actions one-by-one, without any context 
("transfer" / non-"transfer"). I'm afraid, this won't be OK.

> 
> In any case I think this is good idea and I will see how we can add a more generic approach of
> this API to the new API that I'm going to present.
> 
> 
>>> So no breakages with this API.
>>>
>>>>
>>>> Please see more comments inline.
>>>>
>>>> Thanks,
>>>> Ori
>>>>
>>>>> -----Original Message-----
>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>> Sent: Thursday, September 23, 2021 2:20 PM
>>>>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx
>>>>> meta data
>>>>>
>>>>> Delivery of mark, flag and the likes might affect small packet
>>>>> performance.
>>>>> If these features are disabled by default, enabling them in started
>>>>> state without causing traffic disruption may not always be possible.
>>>>>
>>>>> Let applications negotiate delivery of Rx meta data beforehand.
>>>>>
>>>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>>>> ---
>>>>>    app/test-flow-perf/main.c              | 21 ++++++++++++
>>>>>    app/test-pmd/testpmd.c                 | 26 +++++++++++++++
>>>>>    doc/guides/rel_notes/release_21_11.rst |  9 ++++++
>>>>>    lib/ethdev/ethdev_driver.h             | 19 +++++++++++
>>>>>    lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
>>>>>    lib/ethdev/rte_ethdev.h                | 45
>>>>> ++++++++++++++++++++++++++
>>>>>    lib/ethdev/rte_flow.h                  | 12 +++++++
>>>>>    lib/ethdev/version.map                 |  3 ++
>>>>>    8 files changed, 160 insertions(+)
>>>>>
>>>>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
>>>>> index 9be8edc31d..48eafffb1d 100644
>>>>> --- a/app/test-flow-perf/main.c
>>>>> +++ b/app/test-flow-perf/main.c
>>>>> @@ -1760,6 +1760,27 @@ init_port(void)
>>>>>            rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
>>>>>
>>>>>        for (port_id = 0; port_id < nr_ports; port_id++) {
>>>>> +        uint64_t rx_meta_features = 0;
>>>>> +
>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>>> +
>>>>> +        ret = rte_eth_rx_meta_negotiate(port_id,
>>>>> &rx_meta_features);
>>>>> +        if (ret == 0) {
>>>>> +            if (!(rx_meta_features &
>>>>> RTE_ETH_RX_META_USER_FLAG)) {
>>>>> +                printf(":: flow action FLAG will not affect Rx
>>>>> mbufs on port=%u\n",
>>>>> +                       port_id);
>>>>> +            }
>>>>> +
>>>>> +            if (!(rx_meta_features &
>>>>> RTE_ETH_RX_META_USER_MARK)) {
>>>>> +                printf(":: flow action MARK will not affect Rx
>>>>> mbufs on port=%u\n",
>>>>> +                       port_id);
>>>>> +            }
>>>>> +        } else if (ret != -ENOTSUP) {
>>>>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
>>>>> meta features on port=%u: %s\n",
>>>>> +                 port_id, rte_strerror(-ret));
>>>>> +        }
>>>>> +
>>>>>            ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>>            if (ret != 0)
>>>>>                rte_exit(EXIT_FAILURE, diff --git
>>>>> a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>> 97ae52e17e..7a8da3d7ab 100644
>>>>> --- a/app/test-pmd/testpmd.c
>>>>> +++ b/app/test-pmd/testpmd.c
>>>>> @@ -1485,10 +1485,36 @@ static void
>>>>>    init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
>>>>>        struct rte_port *port = &ports[pid];
>>>>> +    uint64_t rx_meta_features = 0;
>>>>>        uint16_t data_size;
>>>>>        int ret;
>>>>>        int i;
>>>>>
>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>>>>> +
>>>>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>>>>> +    if (ret == 0) {
>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>>>>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
>>>>> affect Rx mbufs on port %u\n",
>>>>> +                    pid);
>>>>> +        }
>>>>> +
>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
>>>>> {
>>>>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
>>>>> affect Rx mbufs on port %u\n",
>>>>> +                    pid);
>>>>> +        }
>>>>> +
>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>>>>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
>>>>> might be limited or unavailable on port %u\n",
>>>>> +                    pid);
>>>>> +        }
>>>>> +    } else if (ret != -ENOTSUP) {
>>>>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
>>>>> features on port %u: %s\n",
>>>>> +             pid, rte_strerror(-ret));
>>>>> +    }
>>>>> +
>>>>>        port->dev_conf.txmode = tx_mode;
>>>>>        port->dev_conf.rxmode = rx_mode;
>>>>>
>>>>> diff --git a/doc/guides/rel_notes/release_21_11.rst
>>>>> b/doc/guides/rel_notes/release_21_11.rst
>>>>> index 19356ac53c..6674d4474c 100644
>>>>> --- a/doc/guides/rel_notes/release_21_11.rst
>>>>> +++ b/doc/guides/rel_notes/release_21_11.rst
>>>>> @@ -106,6 +106,15 @@ New Features
>>>>>      Added command-line options to specify total number of processes
>>>>> and
>>>>>      current process ID. Each process owns subset of Rx and Tx queues.
>>>>>
>>>>> +* **Added an API to negotiate delivery of specific parts of Rx meta
>>>>> +data**
>>>>> +
>>>>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
>>>>> +  The following parts of Rx meta data were defined:
>>>>> +
>>>>> +  * ``RTE_ETH_RX_META_USER_FLAG``
>>>>> +  * ``RTE_ETH_RX_META_USER_MARK``
>>>>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
>>>>> +
>>>>>
>>>>>    Removed Items
>>>>>    -------------
>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>> index 40e474aa7e..96e0c60cae 100644
>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void
>>>>> *rxq, typedef int (*eth_representor_info_get_t)(struct rte_eth_dev
>>>>> *dev,
>>>>>        struct rte_eth_representor_info *info);
>>>>>
>>>>> +/**
>>>>> + * @internal
>>>>> + * Negotiate delivery of specific parts of Rx meta data.
>>>>> + *
>>>>> + * @param dev
>>>>> + *   Port (ethdev) handle
>>>>> + *
>>>>> + * @param[inout] features
>>>>> + *   Feature selection buffer
>>>>> + *
>>>>> + * @return
>>>>> + *   Negative errno value on error, zero otherwise  */ typedef int
>>>>> +(*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
>>>>> +                       uint64_t *features);
>>>>> +
>>>>>    /**
>>>>>     * @internal A structure containing the functions exported by an
>>>>> Ethernet driver.
>>>>>     */
>>>>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
>>>>>
>>>>>        eth_representor_info_get_t representor_info_get;
>>>>>        /**< Get representor info. */
>>>>> +
>>>>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
>>>>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
>>>>>    };
>>>>>
>>>>>    /**
>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>>>> daf5ca9242..49cb84d64c 100644
>>>>> --- a/lib/ethdev/rte_ethdev.c
>>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t
>>>>> port_id,
>>>>>        return eth_err(port_id, (*dev->dev_ops-
>>>>>> representor_info_get)(dev, info));  }
>>>>>
>>>>> +int
>>>>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
>>>>> +    struct rte_eth_dev *dev;
>>>>> +
>>>>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>>> +    dev = &rte_eth_devices[port_id];
>>>>> +
>>>>> +    if (dev->data->dev_configured != 0) {
>>>>> +        RTE_ETHDEV_LOG(ERR,
>>>>> +            "The port (id=%"PRIu16") is already configured\n",
>>>>> +            port_id);
>>>>> +        return -EBUSY;
>>>>> +    }
>>>>> +
>>>>> +    if (features == NULL) {
>>>>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +
>>>>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_meta_negotiate,
>>>>> -ENOTSUP);
>>>>> +    return eth_err(port_id,
>>>>> +               (*dev->dev_ops->rx_meta_negotiate)(dev, features));
>>>>> +}
>>>>> +
>>>>>    RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>>>>>
>>>>>    RTE_INIT(ethdev_init_telemetry)
>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>>> 1da37896d8..8467a7a362 100644
>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
>>>>> rte_eth_representor_info_get(uint16_t port_id,
>>>>>                     struct rte_eth_representor_info *info);
>>>>>
>>>>> +/** The ethdev sees flagged packets if there are flows with action
>>>>> +FLAG. */ #define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
>>>>> +
>>>>> +/** The ethdev sees mark IDs in packets if there are flows with
>>>>> +action MARK. */ #define RTE_ETH_RX_META_USER_MARK
>> (UINT64_C(1) <<
>>>>> +1)
>>>>> +
>>>>> +/** The ethdev detects missed packets if there are "tunnel_set"
>>>>> +flows in use. */ #define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1)
>> <<
>>>>> +2)
>>>>> +
>>>>> +/**
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>> + *
>>>>> + * Negotiate delivery of specific parts of Rx meta data.
>>>>> + *
>>>>> + * Invoke this API before the first rte_eth_dev_configure()
>>>>> +invocation
>>>>> + * to let the PMD make preparations that are inconvenient to do later.
>>>>> + *
>>>>> + * The negotiation process is as follows:
>>>>> + *
>>>>> + * - the application requests features intending to use at least
>>>>> +some of them;
>>>>> + * - the PMD responds with the guaranteed subset of the requested
>>>>> +feature set;
>>>>> + * - the application can retry negotiation with another set of
>>>>> +features;
>>>>> + * - the application can pass zero to clear the negotiation result;
>>>>> + * - the last negotiated result takes effect upon the ethdev start.
>>>>> + *
>>>>> + * If this API is unsupported, the application should gracefully
>>>>> ignore that.
>>>>> + *
>>>>> + * @param port_id
>>>>> + *   Port (ethdev) identifier
>>>>> + *
>>>>> + * @param[inout] features
>>>>> + *   Feature selection buffer
>>>>> + *
>>>>> + * @return
>>>>> + *   - (-EBUSY) if the port can't handle this in its current state;
>>>>> + *   - (-ENOTSUP) if the method itself is not supported by the PMD;
>>>>> + *   - (-ENODEV) if *port_id* is invalid;
>>>>> + *   - (-EINVAL) if *features* is NULL;
>>>>> + *   - (-EIO) if the device is removed;
>>>>> + *   - (0) on success
>>>>> + */
>>>>> +__rte_experimental
>>>>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
>>>>> +*features);
>>>>
>>>> I don't think meta is the best name since we also have meta item and
>>>> the word meta can be used in other cases.
>>>
>>> I'm no expert in naming. What could be a better term for this?
>>> Personally, I'd rather not perceive "meta" the way you describe. It's
>>> not just "meta". It's "rx_meta", and the flags supplied with this API
>>> provide enough context to explain what it's all about.
>>
>> Thinking overnight about it I'd suggest full "metadata".
>> Yes, it will name a bit longer, but less confusing versus term META already
>> used in flow API.
>>
> Following my above comments, I think it should be part of the new API
> but in any case what about rx_flow_action_negotiate?

See my thoughts above. It makes no sense to negotiate *support for 
actions*. Existing "rte_flow_validate()" already does that job. The new 
"negotiate Rx metadata* API is all about *delivery* of metadata which is 
supposed to be *already* set for the packets *inside* the NIC. So, we 
negotiate *delivery from the NIC to the host*. Nothing more.

> 
>> Andrew.
> Best,
> Ori
>
  
Ori Kam Oct. 3, 2021, 11:01 a.m. UTC | #12
Hi Ivan,

> -----Original Message-----
> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> Sent: Sunday, October 3, 2021 12:30 PM
> data
> 
> Hi Ori,
> 
> Thanks for reviewing this.
> 

No problem.

> On 03/10/2021 10:42, Ori Kam wrote:
> > Hi Andrew and Ivan,
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Friday, October 1, 2021 9:50 AM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >> Rx meta data
> >>
> >> On 9/30/21 10:07 PM, Ivan Malov wrote:
> >>> Hi Ori,
> >>>
> >>> On 30/09/2021 17:59, Ori Kam wrote:
> >>>> Hi Ivan,
> >>>> Sorry for jumping in late.
> >>>
> >>> No worries. That's OK.
> >>>
> >>>> I have a concern that this patch breaks other PMDs.
> >>>
> >>> It does no such thing.
> >>>
> >>>>>  From the rst file " One should negotiate flag delivery beforehand"
> >>>> since you only added this function for your PMD all other PMD will fail.
> >>>> I see that you added exception in the  examples, but it doesn't
> >>>> make sense that applications will also need to add this exception
> >>>> which is not documented.
> >>>
> >>> Say, you have an application, and you use it with some specific PMD.
> >>> Say, that PMD doesn't run into the problem as ours does. In other
> >>> words, the user can insert a flow with action MARK at any point and
> >>> get mark delivery working starting from that moment without any
> >>> problem. Say, this is exactly the way how it works for you at the
> moment.
> >>>
> >>> Now. This new API kicks in. We update the application to invoke it
> >>> as early as possible. But your PMD in question still doesn't support
> >>> this API. The comment in the patch says that if the method returns
> >>> ENOTSUP, the application should ignore that without batting an
> >>> eyelid. It should just keep on working as it did before the introduction of
> this API.
> >>>
> >
> > I understand that it is nice to write in the patch comment that
> > application should disregard this function in case of ENOTSUP but in a
> > few month someone will read the official doc, where it is stated that
> > this function call is a must and then what do you think the
> > application will do?
> > I think that the correct way is to add this function to all PMDs.
> > Another option is to add to the doc that if the function is returning
> > ENOTSUP the application should assume that all is supported.
> >
> > So from this point of view there is API break.
> 
> So, you mean an API breakage in some formal sense? If the doc is fixed in
> accordance with the second option you suggest, will it suffice to avoid this
> formal API breakage?
> 

Yes, but I think it will be better to add the missing function.

> >
> >>> More specific example:
> >>> Say, the application doesn't mind using either "RSS + MARK" or
> >>> tunnel offload. What it does right now is attempt to insert tunnel
> >>> flows first and, if this fails, fall back to "RSS + MARK". With this
> >>> API, the application will try to invoke this API with "USER_MARK |
> >>> TUNNEL_ID" in adapter initialised state. If the PMD says that it can
> >>> only enable the tunnel offload, then the application will get the
> >>> knowledge that it doesn't make sense to even try inserting "RSS +
> >>> MARK" flows. It just can skip useless actions. But if the PMD
> >>> doesn't support the method, the application will see ENOTSUP and
> >>> handle this
> >>> gracefully: it will make no assumptions about what's guaranteed to
> >>> be supported and what's not and will just keep on its old behavior:
> >>> try to insert a flow, fail, fall back to another type of flow.
> >>>
> >
> > I fully agree with your example, and think that this is the way to go,
> > application should supply as much info as possible during startup.
> 
> Right.
> 
> > My question/comment is the negotiated result means that all of the
> > actions are supported on the same rule?
> > for example if application wants to add mark and tag on the same rule.
> > (I know it doesn't make much sense) and the PMD can support both of
> > them but not on the same rule, what should it return?
> > Or for example if using the mark can only be supported if no decap
> > action is set on this rule what should be the result?
> >  From my undstanding this function is only to let the PMD know that on
> > some rules the application will use those actions, the checking if the
> > action combination is valid only happens on validate function right?
> 
> This API does not bind itself to flow API. It's *not* about enabling support for
> metadata *actions* (they are conducted entirely *inside* the NIC). It's
> about enabling *delivery* of metadata from the NIC to host.
>

Good point so why not use the same logic as the metadata and register it?
Since in any case, this is something in the mbuf so maybe this should be the answer?
 
> Say, you insert a flow rule to mark some packets. The NIC, internally (in the
> e-switch) adds the mark to matching packets. Yes, in the boundaries of the
> NIC HW, the packets bear the mark on them. It has been set, yes. But when
> time comes to *deliver* the packets to the host, the NIC (at least, in net/sfc
> case) has two options: either provide only a small chunk of the metadata for
> each packet *to the host*, which doesn't include mark ID, flag and RSS hash,
> OR, alternatively, provide the full set of metadata. In the former option, the
> mark is simply not delivered.
> Once again: it *has been set*, but simply will not be *delivered to the host*.
> 
> So, this API is about negotiating *delivery* of metadata. In pure technical
> sense. And the set of flags that this API returns indicates which kinds of
> metadata the NIC will be able to deliver simultaneously.
> 
> For example, as I understand, in the case of tunnel offload, MLX5 claims Rx
> mark entirely for tunnel ID metadata, so, if an application requests "MARK |
> TUNNEL_ID" with this API, this PMD should probably want to respond with
> just "TUNNEL_ID". The application will see the response and realise that,
> even if it adds its *own* (user) action MARK to a flow and if the flow is not
> rejected by the PMD, it won't be able to see the mark in the received mbufs
> (or the mark will be incorrect).
>
So what should the application do if on some flows it wants MARK and on other FLAG?
From DPDK viewpoint both of them can't be shared on the same rule
(they are using the same space in mbuf) so the application will never
ask for both of them in the same rule but he can on some rules ask for mark
while on other request for FLAG, even in your code you added both of them.

So what should the PMD return if it can support both of them just not at the same
rule?

One option is to document that the supported values are not per rule but for the entire
port. For example in the example you gave MLX5 will support mark + flag but will not
support mark + tunnel.

Also considering your example, the negotiation may result in subpar result.
taking your example the PMD returned  TUNNEL_ID maybe application would prefer
to have the mark and not the TUNNEL_ID. I understand that application can check
and try again with just the MARK. 
You are inserting logic to the PMD, maybe the function should just fail maybe
returning the conflicting items?


 
> But some other PMDs (net/sfc, for instance) claim only a small fraction of bits
> in Rx mark to deliver tunnel ID information. Remaining bits are still available
> for delivery of *user* mark ID. Please see an example at
> https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
> ivan.malov@oktetlabs.ru/
> . In this case, the PMD may want to return both flags in the response:
> "MARK | TUNNEL_ID". This way, the application knows that both features
> are enabled and available for use.
> 
> Now. I anticipate more questions asking why wouldn't we prefer flow API
> terminology or why wouldn't we add an API for negotiating support for
> metadata *actions* and not just metadata *delivery*. There's an answer.
> Always has been.
> 
> The thing is, the use of *actions* is very complicated. For example, the PMD
> may support action MARK for "transfer" flows but not for non-"transfer"
> ones. Also, simultaneous use of multiple different metadata actions may not
> be possible. And, last but not least, if we force the application to check
> support for *actions* on action-after-action basis, the order of checks will be
> very confusing to applications.
> 
> Previously, in this thread, Thomas suggested to go for exactly this type of
> API, to check support for actions one-by-one, without any context
> ("transfer" / non-"transfer"). I'm afraid, this won't be OK.
> 
+1 to keeping it as a separated API. (I agree actions limitation are very complex metrix)

> >
> > In any case I think this is good idea and I will see how we can add a
> > more generic approach of this API to the new API that I'm going to present.
> >
> >
> >>> So no breakages with this API.
> >>>
> >>>>
> >>>> Please see more comments inline.
> >>>>
> >>>> Thanks,
> >>>> Ori
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>>> Sent: Thursday, September 23, 2021 2:20 PM
> >>>>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >>>>> Rx meta data
> >>>>>
> >>>>> Delivery of mark, flag and the likes might affect small packet
> >>>>> performance.
> >>>>> If these features are disabled by default, enabling them in
> >>>>> started state without causing traffic disruption may not always be
> possible.
> >>>>>
> >>>>> Let applications negotiate delivery of Rx meta data beforehand.
> >>>>>
> >>>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>>> Reviewed-by: Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> >>>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
> >>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> >>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
> >>>>> ---
> >>>>>    app/test-flow-perf/main.c              | 21 ++++++++++++
> >>>>>    app/test-pmd/testpmd.c                 | 26 +++++++++++++++
> >>>>>    doc/guides/rel_notes/release_21_11.rst |  9 ++++++
> >>>>>    lib/ethdev/ethdev_driver.h             | 19 +++++++++++
> >>>>>    lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
> >>>>>    lib/ethdev/rte_ethdev.h                | 45
> >>>>> ++++++++++++++++++++++++++
> >>>>>    lib/ethdev/rte_flow.h                  | 12 +++++++
> >>>>>    lib/ethdev/version.map                 |  3 ++
> >>>>>    8 files changed, 160 insertions(+)
> >>>>>
> >>>>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
> >>>>> index 9be8edc31d..48eafffb1d 100644
> >>>>> --- a/app/test-flow-perf/main.c
> >>>>> +++ b/app/test-flow-perf/main.c
> >>>>> @@ -1760,6 +1760,27 @@ init_port(void)
> >>>>>            rte_exit(EXIT_FAILURE, "Error: can't init mbuf
> >>>>> pool\n");
> >>>>>
> >>>>>        for (port_id = 0; port_id < nr_ports; port_id++) {
> >>>>> +        uint64_t rx_meta_features = 0;
> >>>>> +
> >>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>>>> +
> >>>>> +        ret = rte_eth_rx_meta_negotiate(port_id,
> >>>>> &rx_meta_features);
> >>>>> +        if (ret == 0) {
> >>>>> +            if (!(rx_meta_features &
> >>>>> RTE_ETH_RX_META_USER_FLAG)) {
> >>>>> +                printf(":: flow action FLAG will not affect Rx
> >>>>> mbufs on port=%u\n",
> >>>>> +                       port_id);
> >>>>> +            }
> >>>>> +
> >>>>> +            if (!(rx_meta_features &
> >>>>> RTE_ETH_RX_META_USER_MARK)) {
> >>>>> +                printf(":: flow action MARK will not affect Rx
> >>>>> mbufs on port=%u\n",
> >>>>> +                       port_id);
> >>>>> +            }
> >>>>> +        } else if (ret != -ENOTSUP) {
> >>>>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
> >>>>> meta features on port=%u: %s\n",
> >>>>> +                 port_id, rte_strerror(-ret));
> >>>>> +        }
> >>>>> +
> >>>>>            ret = rte_eth_dev_info_get(port_id, &dev_info);
> >>>>>            if (ret != 0)
> >>>>>                rte_exit(EXIT_FAILURE, diff --git
> >>>>> a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>>>> 97ae52e17e..7a8da3d7ab 100644
> >>>>> --- a/app/test-pmd/testpmd.c
> >>>>> +++ b/app/test-pmd/testpmd.c
> >>>>> @@ -1485,10 +1485,36 @@ static void
> >>>>>    init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
> >>>>>        struct rte_port *port = &ports[pid];
> >>>>> +    uint64_t rx_meta_features = 0;
> >>>>>        uint16_t data_size;
> >>>>>        int ret;
> >>>>>        int i;
> >>>>>
> >>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>>>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> >>>>> +
> >>>>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> >>>>> +    if (ret == 0) {
> >>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> >>>>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
> >>>>> affect Rx mbufs on port %u\n",
> >>>>> +                    pid);
> >>>>> +        }
> >>>>> +
> >>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
> >>>>> {
> >>>>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
> >>>>> affect Rx mbufs on port %u\n",
> >>>>> +                    pid);
> >>>>> +        }
> >>>>> +
> >>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> >>>>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
> >>>>> might be limited or unavailable on port %u\n",
> >>>>> +                    pid);
> >>>>> +        }
> >>>>> +    } else if (ret != -ENOTSUP) {
> >>>>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
> >>>>> features on port %u: %s\n",
> >>>>> +             pid, rte_strerror(-ret));
> >>>>> +    }
> >>>>> +
> >>>>>        port->dev_conf.txmode = tx_mode;
> >>>>>        port->dev_conf.rxmode = rx_mode;
> >>>>>
> >>>>> diff --git a/doc/guides/rel_notes/release_21_11.rst
> >>>>> b/doc/guides/rel_notes/release_21_11.rst
> >>>>> index 19356ac53c..6674d4474c 100644
> >>>>> --- a/doc/guides/rel_notes/release_21_11.rst
> >>>>> +++ b/doc/guides/rel_notes/release_21_11.rst
> >>>>> @@ -106,6 +106,15 @@ New Features
> >>>>>      Added command-line options to specify total number of
> >>>>> processes and
> >>>>>      current process ID. Each process owns subset of Rx and Tx queues.
> >>>>>
> >>>>> +* **Added an API to negotiate delivery of specific parts of Rx
> >>>>> +meta
> >>>>> +data**
> >>>>> +
> >>>>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
> >>>>> +  The following parts of Rx meta data were defined:
> >>>>> +
> >>>>> +  * ``RTE_ETH_RX_META_USER_FLAG``
> >>>>> +  * ``RTE_ETH_RX_META_USER_MARK``
> >>>>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
> >>>>> +
> >>>>>
> >>>>>    Removed Items
> >>>>>    -------------
> >>>>> diff --git a/lib/ethdev/ethdev_driver.h
> >>>>> b/lib/ethdev/ethdev_driver.h index 40e474aa7e..96e0c60cae 100644
> >>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void
> >>>>> *rxq, typedef int (*eth_representor_info_get_t)(struct rte_eth_dev
> >>>>> *dev,
> >>>>>        struct rte_eth_representor_info *info);
> >>>>>
> >>>>> +/**
> >>>>> + * @internal
> >>>>> + * Negotiate delivery of specific parts of Rx meta data.
> >>>>> + *
> >>>>> + * @param dev
> >>>>> + *   Port (ethdev) handle
> >>>>> + *
> >>>>> + * @param[inout] features
> >>>>> + *   Feature selection buffer
> >>>>> + *
> >>>>> + * @return
> >>>>> + *   Negative errno value on error, zero otherwise  */ typedef
> >>>>> +int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
> >>>>> +                       uint64_t *features);
> >>>>> +
> >>>>>    /**
> >>>>>     * @internal A structure containing the functions exported by
> >>>>> an Ethernet driver.
> >>>>>     */
> >>>>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
> >>>>>
> >>>>>        eth_representor_info_get_t representor_info_get;
> >>>>>        /**< Get representor info. */
> >>>>> +
> >>>>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
> >>>>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
> >>>>>    };
> >>>>>
> >>>>>    /**
> >>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> >>>>> index daf5ca9242..49cb84d64c 100644
> >>>>> --- a/lib/ethdev/rte_ethdev.c
> >>>>> +++ b/lib/ethdev/rte_ethdev.c
> >>>>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t
> >>>>> port_id,
> >>>>>        return eth_err(port_id, (*dev->dev_ops-
> >>>>>> representor_info_get)(dev, info));  }
> >>>>>
> >>>>> +int
> >>>>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
> >>>>> +    struct rte_eth_dev *dev;
> >>>>> +
> >>>>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>>>> +    dev = &rte_eth_devices[port_id];
> >>>>> +
> >>>>> +    if (dev->data->dev_configured != 0) {
> >>>>> +        RTE_ETHDEV_LOG(ERR,
> >>>>> +            "The port (id=%"PRIu16") is already configured\n",
> >>>>> +            port_id);
> >>>>> +        return -EBUSY;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (features == NULL) {
> >>>>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
> >>>>> +        return -EINVAL;
> >>>>> +    }
> >>>>> +
> >>>>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_meta_negotiate,
> >>>>> -ENOTSUP);
> >>>>> +    return eth_err(port_id,
> >>>>> +               (*dev->dev_ops->rx_meta_negotiate)(dev,
> >>>>> +features)); }
> >>>>> +
> >>>>>    RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> >>>>>
> >>>>>    RTE_INIT(ethdev_init_telemetry) diff --git
> >>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>>>> 1da37896d8..8467a7a362 100644
> >>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
> >>>>> rte_eth_representor_info_get(uint16_t port_id,
> >>>>>                     struct rte_eth_representor_info *info);
> >>>>>
> >>>>> +/** The ethdev sees flagged packets if there are flows with
> >>>>> +action FLAG. */ #define RTE_ETH_RX_META_USER_FLAG
> (UINT64_C(1) <<
> >>>>> +0)
> >>>>> +
> >>>>> +/** The ethdev sees mark IDs in packets if there are flows with
> >>>>> +action MARK. */ #define RTE_ETH_RX_META_USER_MARK
> >> (UINT64_C(1) <<
> >>>>> +1)
> >>>>> +
> >>>>> +/** The ethdev detects missed packets if there are "tunnel_set"
> >>>>> +flows in use. */ #define RTE_ETH_RX_META_TUNNEL_ID
> (UINT64_C(1)
> >> <<
> >>>>> +2)
> >>>>> +
> >>>>> +/**
> >>>>> + * @warning
> >>>>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>>>> + *
> >>>>> + * Negotiate delivery of specific parts of Rx meta data.
> >>>>> + *
> >>>>> + * Invoke this API before the first rte_eth_dev_configure()
> >>>>> +invocation
> >>>>> + * to let the PMD make preparations that are inconvenient to do
> later.
> >>>>> + *
> >>>>> + * The negotiation process is as follows:
> >>>>> + *
> >>>>> + * - the application requests features intending to use at least
> >>>>> +some of them;
> >>>>> + * - the PMD responds with the guaranteed subset of the requested
> >>>>> +feature set;
> >>>>> + * - the application can retry negotiation with another set of
> >>>>> +features;
> >>>>> + * - the application can pass zero to clear the negotiation
> >>>>> +result;
> >>>>> + * - the last negotiated result takes effect upon the ethdev start.
> >>>>> + *
> >>>>> + * If this API is unsupported, the application should gracefully
> >>>>> ignore that.
> >>>>> + *
> >>>>> + * @param port_id
> >>>>> + *   Port (ethdev) identifier
> >>>>> + *
> >>>>> + * @param[inout] features
> >>>>> + *   Feature selection buffer
> >>>>> + *
> >>>>> + * @return
> >>>>> + *   - (-EBUSY) if the port can't handle this in its current
> >>>>> +state;
> >>>>> + *   - (-ENOTSUP) if the method itself is not supported by the
> >>>>> +PMD;
> >>>>> + *   - (-ENODEV) if *port_id* is invalid;
> >>>>> + *   - (-EINVAL) if *features* is NULL;
> >>>>> + *   - (-EIO) if the device is removed;
> >>>>> + *   - (0) on success
> >>>>> + */
> >>>>> +__rte_experimental
> >>>>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
> >>>>> +*features);
> >>>>
> >>>> I don't think meta is the best name since we also have meta item
> >>>> and the word meta can be used in other cases.
> >>>
> >>> I'm no expert in naming. What could be a better term for this?
> >>> Personally, I'd rather not perceive "meta" the way you describe.
> >>> It's not just "meta". It's "rx_meta", and the flags supplied with
> >>> this API provide enough context to explain what it's all about.
> >>
> >> Thinking overnight about it I'd suggest full "metadata".
> >> Yes, it will name a bit longer, but less confusing versus term META
> >> already used in flow API.
> >>
> > Following my above comments, I think it should be part of the new API
> > but in any case what about rx_flow_action_negotiate?
> 
> See my thoughts above. It makes no sense to negotiate *support for
> actions*. Existing "rte_flow_validate()" already does that job. The new
> "negotiate Rx metadata* API is all about *delivery* of metadata which is
> supposed to be *already* set for the packets *inside* the NIC. So, we
> negotiate *delivery from the NIC to the host*. Nothing more.
> 
Agree with your comment but then maybe we should go to the register
approach just like metadata?

Best,
Ori
> >
> >> Andrew.
> > Best,
> > Ori
> >
> 
> --
> Ivan M
  
Ivan Malov Oct. 3, 2021, 5:30 p.m. UTC | #13
Hi Ori,

On 03/10/2021 14:01, Ori Kam wrote:
> Hi Ivan,
> 
>> -----Original Message-----
>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>> Sent: Sunday, October 3, 2021 12:30 PM
>> data
>>
>> Hi Ori,
>>
>> Thanks for reviewing this.
>>
> 
> No problem.
> 
>> On 03/10/2021 10:42, Ori Kam wrote:
>>> Hi Andrew and Ivan,
>>>
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Friday, October 1, 2021 9:50 AM
>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
>>>> Rx meta data
>>>>
>>>> On 9/30/21 10:07 PM, Ivan Malov wrote:
>>>>> Hi Ori,
>>>>>
>>>>> On 30/09/2021 17:59, Ori Kam wrote:
>>>>>> Hi Ivan,
>>>>>> Sorry for jumping in late.
>>>>>
>>>>> No worries. That's OK.
>>>>>
>>>>>> I have a concern that this patch breaks other PMDs.
>>>>>
>>>>> It does no such thing.
>>>>>
>>>>>>>   From the rst file " One should negotiate flag delivery beforehand"
>>>>>> since you only added this function for your PMD all other PMD will fail.
>>>>>> I see that you added exception in the  examples, but it doesn't
>>>>>> make sense that applications will also need to add this exception
>>>>>> which is not documented.
>>>>>
>>>>> Say, you have an application, and you use it with some specific PMD.
>>>>> Say, that PMD doesn't run into the problem as ours does. In other
>>>>> words, the user can insert a flow with action MARK at any point and
>>>>> get mark delivery working starting from that moment without any
>>>>> problem. Say, this is exactly the way how it works for you at the
>> moment.
>>>>>
>>>>> Now. This new API kicks in. We update the application to invoke it
>>>>> as early as possible. But your PMD in question still doesn't support
>>>>> this API. The comment in the patch says that if the method returns
>>>>> ENOTSUP, the application should ignore that without batting an
>>>>> eyelid. It should just keep on working as it did before the introduction of
>> this API.
>>>>>
>>>
>>> I understand that it is nice to write in the patch comment that
>>> application should disregard this function in case of ENOTSUP but in a
>>> few month someone will read the official doc, where it is stated that
>>> this function call is a must and then what do you think the
>>> application will do?
>>> I think that the correct way is to add this function to all PMDs.
>>> Another option is to add to the doc that if the function is returning
>>> ENOTSUP the application should assume that all is supported.
>>>
>>> So from this point of view there is API break.
>>
>> So, you mean an API breakage in some formal sense? If the doc is fixed in
>> accordance with the second option you suggest, will it suffice to avoid this
>> formal API breakage?
>>
> 
> Yes, but I think it will be better to add the missing function.
> 
>>>
>>>>> More specific example:
>>>>> Say, the application doesn't mind using either "RSS + MARK" or
>>>>> tunnel offload. What it does right now is attempt to insert tunnel
>>>>> flows first and, if this fails, fall back to "RSS + MARK". With this
>>>>> API, the application will try to invoke this API with "USER_MARK |
>>>>> TUNNEL_ID" in adapter initialised state. If the PMD says that it can
>>>>> only enable the tunnel offload, then the application will get the
>>>>> knowledge that it doesn't make sense to even try inserting "RSS +
>>>>> MARK" flows. It just can skip useless actions. But if the PMD
>>>>> doesn't support the method, the application will see ENOTSUP and
>>>>> handle this
>>>>> gracefully: it will make no assumptions about what's guaranteed to
>>>>> be supported and what's not and will just keep on its old behavior:
>>>>> try to insert a flow, fail, fall back to another type of flow.
>>>>>
>>>
>>> I fully agree with your example, and think that this is the way to go,
>>> application should supply as much info as possible during startup.
>>
>> Right.
>>
>>> My question/comment is the negotiated result means that all of the
>>> actions are supported on the same rule?
>>> for example if application wants to add mark and tag on the same rule.
>>> (I know it doesn't make much sense) and the PMD can support both of
>>> them but not on the same rule, what should it return?
>>> Or for example if using the mark can only be supported if no decap
>>> action is set on this rule what should be the result?
>>>   From my undstanding this function is only to let the PMD know that on
>>> some rules the application will use those actions, the checking if the
>>> action combination is valid only happens on validate function right?
>>
>> This API does not bind itself to flow API. It's *not* about enabling support for
>> metadata *actions* (they are conducted entirely *inside* the NIC). It's
>> about enabling *delivery* of metadata from the NIC to host.
>>
> 
> Good point so why not use the same logic as the metadata and register it?
> Since in any case, this is something in the mbuf so maybe this should be the answer?

I didn't catch your thought. Could you please elaborate on it?

>   
>> Say, you insert a flow rule to mark some packets. The NIC, internally (in the
>> e-switch) adds the mark to matching packets. Yes, in the boundaries of the
>> NIC HW, the packets bear the mark on them. It has been set, yes. But when
>> time comes to *deliver* the packets to the host, the NIC (at least, in net/sfc
>> case) has two options: either provide only a small chunk of the metadata for
>> each packet *to the host*, which doesn't include mark ID, flag and RSS hash,
>> OR, alternatively, provide the full set of metadata. In the former option, the
>> mark is simply not delivered.
>> Once again: it *has been set*, but simply will not be *delivered to the host*.
>>
>> So, this API is about negotiating *delivery* of metadata. In pure technical
>> sense. And the set of flags that this API returns indicates which kinds of
>> metadata the NIC will be able to deliver simultaneously.
>>
>> For example, as I understand, in the case of tunnel offload, MLX5 claims Rx
>> mark entirely for tunnel ID metadata, so, if an application requests "MARK |
>> TUNNEL_ID" with this API, this PMD should probably want to respond with
>> just "TUNNEL_ID". The application will see the response and realise that,
>> even if it adds its *own* (user) action MARK to a flow and if the flow is not
>> rejected by the PMD, it won't be able to see the mark in the received mbufs
>> (or the mark will be incorrect).
>>
> So what should the application do if on some flows it wants MARK and on other FLAG?

You mentioned flows, so I'd like to stress this out one more time: what 
this API cares about is solely the possibility to deliver metadata 
between the NIC and the host. The host == the PMD (*not* application).

>  From DPDK viewpoint both of them can't be shared on the same rule
> (they are using the same space in mbuf) so the application will never
> ask for both of them in the same rule but he can on some rules ask for mark
> while on other request for FLAG, even in your code you added both of them.
> 
> So what should the PMD return if it can support both of them just not at the same
> rule?

Please see above. This is not about rules. This is not about the way how 
flag and mark are presented *by* the PMD *to* the application in mbufs. 
Simultaneous use of actions FLAG and MARK in flows must be ruled out by 
rte_flow_validate() / rte_flow_create(). The way how flag and mark are 
*represented* in mbufs belongs in mbuf library responsibility domain.

Consider the following operational sequence:

1) The NIC has a packet, which has metadata associated with it;
2) The NIC transfers this packet to the host;
3) The PMD sees the packet and its metadata;
4) The PMD represents whatever available metadata in mbuf format.

Features negotiated by virtue of this API (for instance, FLAG and MARK) 
enable delivery of these kinds of metadata between points (2) and (3).

And the problem of flag / mark co-existence in mbufs sits at point (4).

-> Completely different problems, in fact.

> 
> One option is to document that the supported values are not per rule but for the entire
> port. For example in the example you gave MLX5 will support mark + flag but will not
> support mark + tunnel.

Yes, for the port. Flow rules are a don't care to this API.

> 
> Also considering your example, the negotiation may result in subpar result.
> taking your example the PMD returned  TUNNEL_ID maybe application would prefer
> to have the mark and not the TUNNEL_ID. I understand that application can check
> and try again with just the MARK.

Exactly. The Application can repeat negotiation with just MARK. Is there 
any problem with that?

> You are inserting logic to the PMD, maybe the function should just fail maybe
> returning the conflicting items?

Why return conflicting items? The optimal subset (from the PMD's 
perspective) should be returned. It's a silver lining. In the end, the 
application can learn which features can be enabled and in what 
combinations. And it can rely on the outcome of the negotiation process.

> 
> 
>   
>> But some other PMDs (net/sfc, for instance) claim only a small fraction of bits
>> in Rx mark to deliver tunnel ID information. Remaining bits are still available
>> for delivery of *user* mark ID. Please see an example at
>> https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
>> ivan.malov@oktetlabs.ru/
>> . In this case, the PMD may want to return both flags in the response:
>> "MARK | TUNNEL_ID". This way, the application knows that both features
>> are enabled and available for use.
>>
>> Now. I anticipate more questions asking why wouldn't we prefer flow API
>> terminology or why wouldn't we add an API for negotiating support for
>> metadata *actions* and not just metadata *delivery*. There's an answer.
>> Always has been.
>>
>> The thing is, the use of *actions* is very complicated. For example, the PMD
>> may support action MARK for "transfer" flows but not for non-"transfer"
>> ones. Also, simultaneous use of multiple different metadata actions may not
>> be possible. And, last but not least, if we force the application to check
>> support for *actions* on action-after-action basis, the order of checks will be
>> very confusing to applications.
>>
>> Previously, in this thread, Thomas suggested to go for exactly this type of
>> API, to check support for actions one-by-one, without any context
>> ("transfer" / non-"transfer"). I'm afraid, this won't be OK.
>>
> +1 to keeping it as a separated API. (I agree actions limitation are very complex metrix)
> 
>>>
>>> In any case I think this is good idea and I will see how we can add a
>>> more generic approach of this API to the new API that I'm going to present.
>>>
>>>
>>>>> So no breakages with this API.
>>>>>
>>>>>>
>>>>>> Please see more comments inline.
>>>>>>
>>>>>> Thanks,
>>>>>> Ori
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>>> Sent: Thursday, September 23, 2021 2:20 PM
>>>>>>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
>>>>>>> Rx meta data
>>>>>>>
>>>>>>> Delivery of mark, flag and the likes might affect small packet
>>>>>>> performance.
>>>>>>> If these features are disabled by default, enabling them in
>>>>>>> started state without causing traffic disruption may not always be
>> possible.
>>>>>>>
>>>>>>> Let applications negotiate delivery of Rx meta data beforehand.
>>>>>>>
>>>>>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>>> Reviewed-by: Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>
>>>>>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>>>>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>>>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>>>>>> ---
>>>>>>>     app/test-flow-perf/main.c              | 21 ++++++++++++
>>>>>>>     app/test-pmd/testpmd.c                 | 26 +++++++++++++++
>>>>>>>     doc/guides/rel_notes/release_21_11.rst |  9 ++++++
>>>>>>>     lib/ethdev/ethdev_driver.h             | 19 +++++++++++
>>>>>>>     lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
>>>>>>>     lib/ethdev/rte_ethdev.h                | 45
>>>>>>> ++++++++++++++++++++++++++
>>>>>>>     lib/ethdev/rte_flow.h                  | 12 +++++++
>>>>>>>     lib/ethdev/version.map                 |  3 ++
>>>>>>>     8 files changed, 160 insertions(+)
>>>>>>>
>>>>>>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
>>>>>>> index 9be8edc31d..48eafffb1d 100644
>>>>>>> --- a/app/test-flow-perf/main.c
>>>>>>> +++ b/app/test-flow-perf/main.c
>>>>>>> @@ -1760,6 +1760,27 @@ init_port(void)
>>>>>>>             rte_exit(EXIT_FAILURE, "Error: can't init mbuf
>>>>>>> pool\n");
>>>>>>>
>>>>>>>         for (port_id = 0; port_id < nr_ports; port_id++) {
>>>>>>> +        uint64_t rx_meta_features = 0;
>>>>>>> +
>>>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>>>>> +
>>>>>>> +        ret = rte_eth_rx_meta_negotiate(port_id,
>>>>>>> &rx_meta_features);
>>>>>>> +        if (ret == 0) {
>>>>>>> +            if (!(rx_meta_features &
>>>>>>> RTE_ETH_RX_META_USER_FLAG)) {
>>>>>>> +                printf(":: flow action FLAG will not affect Rx
>>>>>>> mbufs on port=%u\n",
>>>>>>> +                       port_id);
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            if (!(rx_meta_features &
>>>>>>> RTE_ETH_RX_META_USER_MARK)) {
>>>>>>> +                printf(":: flow action MARK will not affect Rx
>>>>>>> mbufs on port=%u\n",
>>>>>>> +                       port_id);
>>>>>>> +            }
>>>>>>> +        } else if (ret != -ENOTSUP) {
>>>>>>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
>>>>>>> meta features on port=%u: %s\n",
>>>>>>> +                 port_id, rte_strerror(-ret));
>>>>>>> +        }
>>>>>>> +
>>>>>>>             ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>>>>             if (ret != 0)
>>>>>>>                 rte_exit(EXIT_FAILURE, diff --git
>>>>>>> a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>>>> 97ae52e17e..7a8da3d7ab 100644
>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>> @@ -1485,10 +1485,36 @@ static void
>>>>>>>     init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
>>>>>>>         struct rte_port *port = &ports[pid];
>>>>>>> +    uint64_t rx_meta_features = 0;
>>>>>>>         uint16_t data_size;
>>>>>>>         int ret;
>>>>>>>         int i;
>>>>>>>
>>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>>>>>>> +
>>>>>>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>>>>>>> +    if (ret == 0) {
>>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>>>>>>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
>>>>>>> affect Rx mbufs on port %u\n",
>>>>>>> +                    pid);
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
>>>>>>> {
>>>>>>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
>>>>>>> affect Rx mbufs on port %u\n",
>>>>>>> +                    pid);
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>>>>>>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
>>>>>>> might be limited or unavailable on port %u\n",
>>>>>>> +                    pid);
>>>>>>> +        }
>>>>>>> +    } else if (ret != -ENOTSUP) {
>>>>>>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
>>>>>>> features on port %u: %s\n",
>>>>>>> +             pid, rte_strerror(-ret));
>>>>>>> +    }
>>>>>>> +
>>>>>>>         port->dev_conf.txmode = tx_mode;
>>>>>>>         port->dev_conf.rxmode = rx_mode;
>>>>>>>
>>>>>>> diff --git a/doc/guides/rel_notes/release_21_11.rst
>>>>>>> b/doc/guides/rel_notes/release_21_11.rst
>>>>>>> index 19356ac53c..6674d4474c 100644
>>>>>>> --- a/doc/guides/rel_notes/release_21_11.rst
>>>>>>> +++ b/doc/guides/rel_notes/release_21_11.rst
>>>>>>> @@ -106,6 +106,15 @@ New Features
>>>>>>>       Added command-line options to specify total number of
>>>>>>> processes and
>>>>>>>       current process ID. Each process owns subset of Rx and Tx queues.
>>>>>>>
>>>>>>> +* **Added an API to negotiate delivery of specific parts of Rx
>>>>>>> +meta
>>>>>>> +data**
>>>>>>> +
>>>>>>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
>>>>>>> +  The following parts of Rx meta data were defined:
>>>>>>> +
>>>>>>> +  * ``RTE_ETH_RX_META_USER_FLAG``
>>>>>>> +  * ``RTE_ETH_RX_META_USER_MARK``
>>>>>>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
>>>>>>> +
>>>>>>>
>>>>>>>     Removed Items
>>>>>>>     -------------
>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h
>>>>>>> b/lib/ethdev/ethdev_driver.h index 40e474aa7e..96e0c60cae 100644
>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>> @@ -789,6 +789,22 @@ typedef int (*eth_get_monitor_addr_t)(void
>>>>>>> *rxq, typedef int (*eth_representor_info_get_t)(struct rte_eth_dev
>>>>>>> *dev,
>>>>>>>         struct rte_eth_representor_info *info);
>>>>>>>
>>>>>>> +/**
>>>>>>> + * @internal
>>>>>>> + * Negotiate delivery of specific parts of Rx meta data.
>>>>>>> + *
>>>>>>> + * @param dev
>>>>>>> + *   Port (ethdev) handle
>>>>>>> + *
>>>>>>> + * @param[inout] features
>>>>>>> + *   Feature selection buffer
>>>>>>> + *
>>>>>>> + * @return
>>>>>>> + *   Negative errno value on error, zero otherwise  */ typedef
>>>>>>> +int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
>>>>>>> +                       uint64_t *features);
>>>>>>> +
>>>>>>>     /**
>>>>>>>      * @internal A structure containing the functions exported by
>>>>>>> an Ethernet driver.
>>>>>>>      */
>>>>>>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
>>>>>>>
>>>>>>>         eth_representor_info_get_t representor_info_get;
>>>>>>>         /**< Get representor info. */
>>>>>>> +
>>>>>>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
>>>>>>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
>>>>>>>     };
>>>>>>>
>>>>>>>     /**
>>>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>>>>>> index daf5ca9242..49cb84d64c 100644
>>>>>>> --- a/lib/ethdev/rte_ethdev.c
>>>>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>>>>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t
>>>>>>> port_id,
>>>>>>>         return eth_err(port_id, (*dev->dev_ops-
>>>>>>>> representor_info_get)(dev, info));  }
>>>>>>>
>>>>>>> +int
>>>>>>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features) {
>>>>>>> +    struct rte_eth_dev *dev;
>>>>>>> +
>>>>>>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>>>>> +    dev = &rte_eth_devices[port_id];
>>>>>>> +
>>>>>>> +    if (dev->data->dev_configured != 0) {
>>>>>>> +        RTE_ETHDEV_LOG(ERR,
>>>>>>> +            "The port (id=%"PRIu16") is already configured\n",
>>>>>>> +            port_id);
>>>>>>> +        return -EBUSY;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (features == NULL) {
>>>>>>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
>>>>>>> +        return -EINVAL;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_meta_negotiate,
>>>>>>> -ENOTSUP);
>>>>>>> +    return eth_err(port_id,
>>>>>>> +               (*dev->dev_ops->rx_meta_negotiate)(dev,
>>>>>>> +features)); }
>>>>>>> +
>>>>>>>     RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>>>>>>>
>>>>>>>     RTE_INIT(ethdev_init_telemetry) diff --git
>>>>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>>>>> 1da37896d8..8467a7a362 100644
>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
>>>>>>> rte_eth_representor_info_get(uint16_t port_id,
>>>>>>>                      struct rte_eth_representor_info *info);
>>>>>>>
>>>>>>> +/** The ethdev sees flagged packets if there are flows with
>>>>>>> +action FLAG. */ #define RTE_ETH_RX_META_USER_FLAG
>> (UINT64_C(1) <<
>>>>>>> +0)
>>>>>>> +
>>>>>>> +/** The ethdev sees mark IDs in packets if there are flows with
>>>>>>> +action MARK. */ #define RTE_ETH_RX_META_USER_MARK
>>>> (UINT64_C(1) <<
>>>>>>> +1)
>>>>>>> +
>>>>>>> +/** The ethdev detects missed packets if there are "tunnel_set"
>>>>>>> +flows in use. */ #define RTE_ETH_RX_META_TUNNEL_ID
>> (UINT64_C(1)
>>>> <<
>>>>>>> +2)
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * @warning
>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>> + *
>>>>>>> + * Negotiate delivery of specific parts of Rx meta data.
>>>>>>> + *
>>>>>>> + * Invoke this API before the first rte_eth_dev_configure()
>>>>>>> +invocation
>>>>>>> + * to let the PMD make preparations that are inconvenient to do
>> later.
>>>>>>> + *
>>>>>>> + * The negotiation process is as follows:
>>>>>>> + *
>>>>>>> + * - the application requests features intending to use at least
>>>>>>> +some of them;
>>>>>>> + * - the PMD responds with the guaranteed subset of the requested
>>>>>>> +feature set;
>>>>>>> + * - the application can retry negotiation with another set of
>>>>>>> +features;
>>>>>>> + * - the application can pass zero to clear the negotiation
>>>>>>> +result;
>>>>>>> + * - the last negotiated result takes effect upon the ethdev start.
>>>>>>> + *
>>>>>>> + * If this API is unsupported, the application should gracefully
>>>>>>> ignore that.
>>>>>>> + *
>>>>>>> + * @param port_id
>>>>>>> + *   Port (ethdev) identifier
>>>>>>> + *
>>>>>>> + * @param[inout] features
>>>>>>> + *   Feature selection buffer
>>>>>>> + *
>>>>>>> + * @return
>>>>>>> + *   - (-EBUSY) if the port can't handle this in its current
>>>>>>> +state;
>>>>>>> + *   - (-ENOTSUP) if the method itself is not supported by the
>>>>>>> +PMD;
>>>>>>> + *   - (-ENODEV) if *port_id* is invalid;
>>>>>>> + *   - (-EINVAL) if *features* is NULL;
>>>>>>> + *   - (-EIO) if the device is removed;
>>>>>>> + *   - (0) on success
>>>>>>> + */
>>>>>>> +__rte_experimental
>>>>>>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
>>>>>>> +*features);
>>>>>>
>>>>>> I don't think meta is the best name since we also have meta item
>>>>>> and the word meta can be used in other cases.
>>>>>
>>>>> I'm no expert in naming. What could be a better term for this?
>>>>> Personally, I'd rather not perceive "meta" the way you describe.
>>>>> It's not just "meta". It's "rx_meta", and the flags supplied with
>>>>> this API provide enough context to explain what it's all about.
>>>>
>>>> Thinking overnight about it I'd suggest full "metadata".
>>>> Yes, it will name a bit longer, but less confusing versus term META
>>>> already used in flow API.
>>>>
>>> Following my above comments, I think it should be part of the new API
>>> but in any case what about rx_flow_action_negotiate?
>>
>> See my thoughts above. It makes no sense to negotiate *support for
>> actions*. Existing "rte_flow_validate()" already does that job. The new
>> "negotiate Rx metadata* API is all about *delivery* of metadata which is
>> supposed to be *already* set for the packets *inside* the NIC. So, we
>> negotiate *delivery from the NIC to the host*. Nothing more.
>>
> Agree with your comment but then maybe we should go to the register
> approach just like metadata?

Don't you mean "registering mbuf dynamic field / flag" by any chance? 
Even if it's technically possible, this may complicate the API contract 
because the key idea here is to demand that the application negotiate 
metadata delivery at the earliest step possible (before configuring the 
port), whilst a dynamic field / flag can be (theoretically) registered 
at any time. But, of course, feel free to elaborate on your idea.

We should make sure that we all reach an agreement.

> 
> Best,
> Ori
>>>
>>>> Andrew.
>>> Best,
>>> Ori
>>>
>>
>> --
>> Ivan M
  
Ori Kam Oct. 3, 2021, 9:04 p.m. UTC | #14
Hi Ivan,

Sorry for the long review.

> -----Original Message-----
> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> Sent: Sunday, October 3, 2021 8:30 PM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> Hi Ori,
> 
> On 03/10/2021 14:01, Ori Kam wrote:
> > Hi Ivan,
> >
> >> -----Original Message-----
> >> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> >> Sent: Sunday, October 3, 2021 12:30 PM data
> >>
> >> Hi Ori,
> >>
> >> Thanks for reviewing this.
> >>
> >
> > No problem.
> >
> >> On 03/10/2021 10:42, Ori Kam wrote:
> >>> Hi Andrew and Ivan,
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Friday, October 1, 2021 9:50 AM
> >>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
> >>>> of Rx meta data
> >>>>
> >>>> On 9/30/21 10:07 PM, Ivan Malov wrote:
> >>>>> Hi Ori,
> >>>>>
> >>>>> On 30/09/2021 17:59, Ori Kam wrote:
> >>>>>> Hi Ivan,
> >>>>>> Sorry for jumping in late.
> >>>>>
> >>>>> No worries. That's OK.
> >>>>>
> >>>>>> I have a concern that this patch breaks other PMDs.
> >>>>>
> >>>>> It does no such thing.
> >>>>>
> >>>>>>>   From the rst file " One should negotiate flag delivery beforehand"
> >>>>>> since you only added this function for your PMD all other PMD will
> fail.
> >>>>>> I see that you added exception in the  examples, but it doesn't
> >>>>>> make sense that applications will also need to add this exception
> >>>>>> which is not documented.
> >>>>>
> >>>>> Say, you have an application, and you use it with some specific PMD.
> >>>>> Say, that PMD doesn't run into the problem as ours does. In other
> >>>>> words, the user can insert a flow with action MARK at any point
> >>>>> and get mark delivery working starting from that moment without
> >>>>> any problem. Say, this is exactly the way how it works for you at
> >>>>> the
> >> moment.
> >>>>>
> >>>>> Now. This new API kicks in. We update the application to invoke it
> >>>>> as early as possible. But your PMD in question still doesn't
> >>>>> support this API. The comment in the patch says that if the method
> >>>>> returns ENOTSUP, the application should ignore that without
> >>>>> batting an eyelid. It should just keep on working as it did before
> >>>>> the introduction of
> >> this API.
> >>>>>
> >>>
> >>> I understand that it is nice to write in the patch comment that
> >>> application should disregard this function in case of ENOTSUP but in
> >>> a few month someone will read the official doc, where it is stated
> >>> that this function call is a must and then what do you think the
> >>> application will do?
> >>> I think that the correct way is to add this function to all PMDs.
> >>> Another option is to add to the doc that if the function is
> >>> returning ENOTSUP the application should assume that all is supported.
> >>>
> >>> So from this point of view there is API break.
> >>
> >> So, you mean an API breakage in some formal sense? If the doc is
> >> fixed in accordance with the second option you suggest, will it
> >> suffice to avoid this formal API breakage?
> >>
> >
> > Yes, but I think it will be better to add the missing function.
> >
> >>>
> >>>>> More specific example:
> >>>>> Say, the application doesn't mind using either "RSS + MARK" or
> >>>>> tunnel offload. What it does right now is attempt to insert tunnel
> >>>>> flows first and, if this fails, fall back to "RSS + MARK". With
> >>>>> this API, the application will try to invoke this API with
> >>>>> "USER_MARK | TUNNEL_ID" in adapter initialised state. If the PMD
> >>>>> says that it can only enable the tunnel offload, then the
> >>>>> application will get the knowledge that it doesn't make sense to
> >>>>> even try inserting "RSS + MARK" flows. It just can skip useless
> >>>>> actions. But if the PMD doesn't support the method, the
> >>>>> application will see ENOTSUP and handle this
> >>>>> gracefully: it will make no assumptions about what's guaranteed to
> >>>>> be supported and what's not and will just keep on its old behavior:
> >>>>> try to insert a flow, fail, fall back to another type of flow.
> >>>>>
> >>>
> >>> I fully agree with your example, and think that this is the way to
> >>> go, application should supply as much info as possible during startup.
> >>
> >> Right.
> >>
> >>> My question/comment is the negotiated result means that all of the
> >>> actions are supported on the same rule?
> >>> for example if application wants to add mark and tag on the same rule.
> >>> (I know it doesn't make much sense) and the PMD can support both of
> >>> them but not on the same rule, what should it return?
> >>> Or for example if using the mark can only be supported if no decap
> >>> action is set on this rule what should be the result?
> >>>   From my undstanding this function is only to let the PMD know that
> >>> on some rules the application will use those actions, the checking
> >>> if the action combination is valid only happens on validate function right?
> >>
> >> This API does not bind itself to flow API. It's *not* about enabling
> >> support for metadata *actions* (they are conducted entirely *inside*
> >> the NIC). It's about enabling *delivery* of metadata from the NIC to host.
> >>
> >
> > Good point so why not use the same logic as the metadata and register it?
> > Since in any case, this is something in the mbuf so maybe this should be the
> answer?
> 
> I didn't catch your thought. Could you please elaborate on it?

The metadata action just like the mark or flag is used to give application
data that was set by a flow rule.
To enable the metadata the application must register the metadata field.
Since this happens during the creation of the mbuf it means that it must be
created before the device start.

I understand that the mark and flag don't need to be registered in the mbuf
since they have saved space but from application point of view there is no
difference between the metadata and mark, so why does negotiate function
doesn't handle the metadata?

I hope this is clearer.

> 
> >
> >> Say, you insert a flow rule to mark some packets. The NIC, internally
> >> (in the
> >> e-switch) adds the mark to matching packets. Yes, in the boundaries
> >> of the NIC HW, the packets bear the mark on them. It has been set,
> >> yes. But when time comes to *deliver* the packets to the host, the
> >> NIC (at least, in net/sfc
> >> case) has two options: either provide only a small chunk of the
> >> metadata for each packet *to the host*, which doesn't include mark
> >> ID, flag and RSS hash, OR, alternatively, provide the full set of
> >> metadata. In the former option, the mark is simply not delivered.
> >> Once again: it *has been set*, but simply will not be *delivered to the
> host*.
> >>
> >> So, this API is about negotiating *delivery* of metadata. In pure
> >> technical sense. And the set of flags that this API returns indicates
> >> which kinds of metadata the NIC will be able to deliver simultaneously.
> >>
> >> For example, as I understand, in the case of tunnel offload, MLX5
> >> claims Rx mark entirely for tunnel ID metadata, so, if an application
> >> requests "MARK | TUNNEL_ID" with this API, this PMD should probably
> >> want to respond with just "TUNNEL_ID". The application will see the
> >> response and realise that, even if it adds its *own* (user) action
> >> MARK to a flow and if the flow is not rejected by the PMD, it won't
> >> be able to see the mark in the received mbufs (or the mark will be
> incorrect).
> >>
> > So what should the application do if on some flows it wants MARK and on
> other FLAG?
> 
> You mentioned flows, so I'd like to stress this out one more time: what this
> API cares about is solely the possibility to deliver metadata between the NIC
> and the host. The host == the PMD (*not* application).
> 

I understand that you are only talking about enabling the action,
meaning to let the PMD know that at some point there will be a rule
that will use the mark action for example. 
Is my understanding correct?
I don't understand your last comment about host == PMD since at the end
this value should be given to the application.

> >  From DPDK viewpoint both of them can't be shared on the same rule
> > (they are using the same space in mbuf) so the application will never
> > ask for both of them in the same rule but he can on some rules ask for
> > mark while on other request for FLAG, even in your code you added both
> of them.
> >
> > So what should the PMD return if it can support both of them just not
> > at the same rule?
> 
> Please see above. This is not about rules. This is not about the way how flag
> and mark are presented *by* the PMD *to* the application in mbufs.
> Simultaneous use of actions FLAG and MARK in flows must be ruled out by
> rte_flow_validate() / rte_flow_create(). The way how flag and mark are
> *represented* in mbufs belongs in mbuf library responsibility domain.
> 
> Consider the following operational sequence:
> 
> 1) The NIC has a packet, which has metadata associated with it;
> 2) The NIC transfers this packet to the host;
> 3) The PMD sees the packet and its metadata;
> 4) The PMD represents whatever available metadata in mbuf format.
> 
> Features negotiated by virtue of this API (for instance, FLAG and MARK)
> enable delivery of these kinds of metadata between points (2) and (3).
> 
> And the problem of flag / mark co-existence in mbufs sits at point (4).
> 
> -> Completely different problems, in fact.
> 

Agree.

> >
> > One option is to document that the supported values are not per rule
> > but for the entire port. For example in the example you gave MLX5 will
> > support mark + flag but will not support mark + tunnel.
> 
> Yes, for the port. Flow rules are a don't care to this API.
> 
> >
> > Also considering your example, the negotiation may result in subpar result.
> > taking your example the PMD returned  TUNNEL_ID maybe application
> > would prefer to have the mark and not the TUNNEL_ID. I understand that
> > application can check and try again with just the MARK.
> 
> Exactly. The Application can repeat negotiation with just MARK. Is there any
> problem with that?
> 

I understand that the application can negotiate again and again.
I just don't like that the PMD has logic and selects what he thinks will be best.

I wanted to suggest that the PMD will just tell what are the conflicts and the application
will negotiate again based on its logic.

> > You are inserting logic to the PMD, maybe the function should just
> > fail maybe returning the conflicting items?
> 
> Why return conflicting items? The optimal subset (from the PMD's
> perspective) should be returned. It's a silver lining. In the end, the application
> can learn which features can be enabled and in what combinations. And it
> can rely on the outcome of the negotiation process.
> 
That is my point this is PMD perspective, not the application. 
how can a PMD define an optimal subset? How can it know what is more
important to the application?
Also, the PMD logic is internal so if for some reason
the PMD selected the best for the application by chance, so the application learns
that this is a good value for him. A release later the internal PMD logic changes
for example, a new feature was added, other customer requests.
since this is PMD the original app is not aware of this change and may fail.

We both agree that the application should check the result and renegotiate if needed
I only suggested that the PMD will only return error and not assume he knows best.


> >
> >
> >
> >> But some other PMDs (net/sfc, for instance) claim only a small fraction of
> bits
> >> in Rx mark to deliver tunnel ID information. Remaining bits are still
> available
> >> for delivery of *user* mark ID. Please see an example at
> >> https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
> >> ivan.malov@oktetlabs.ru/
> >> . In this case, the PMD may want to return both flags in the response:
> >> "MARK | TUNNEL_ID". This way, the application knows that both features
> >> are enabled and available for use.
> >>
> >> Now. I anticipate more questions asking why wouldn't we prefer flow API
> >> terminology or why wouldn't we add an API for negotiating support for
> >> metadata *actions* and not just metadata *delivery*. There's an answer.
> >> Always has been.
> >>
> >> The thing is, the use of *actions* is very complicated. For example, the
> PMD
> >> may support action MARK for "transfer" flows but not for non-"transfer"
> >> ones. Also, simultaneous use of multiple different metadata actions may
> not
> >> be possible. And, last but not least, if we force the application to check
> >> support for *actions* on action-after-action basis, the order of checks will
> be
> >> very confusing to applications.
> >>
> >> Previously, in this thread, Thomas suggested to go for exactly this type of
> >> API, to check support for actions one-by-one, without any context
> >> ("transfer" / non-"transfer"). I'm afraid, this won't be OK.
> >>
> > +1 to keeping it as a separated API. (I agree actions limitation are very
> complex metrix)
> >
> >>>
> >>> In any case I think this is good idea and I will see how we can add a
> >>> more generic approach of this API to the new API that I'm going to
> present.
> >>>
> >>>
> >>>>> So no breakages with this API.
> >>>>>
> >>>>>>
> >>>>>> Please see more comments inline.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ori
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>>>>> Sent: Thursday, September 23, 2021 2:20 PM
> >>>>>>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >>>>>>> Rx meta data
> >>>>>>>
> >>>>>>> Delivery of mark, flag and the likes might affect small packet
> >>>>>>> performance.
> >>>>>>> If these features are disabled by default, enabling them in
> >>>>>>> started state without causing traffic disruption may not always be
> >> possible.
> >>>>>>>
> >>>>>>> Let applications negotiate delivery of Rx meta data beforehand.
> >>>>>>>
> >>>>>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>>>>> Reviewed-by: Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >>>>>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
> >>>>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> >>>>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
> >>>>>>> ---
> >>>>>>>     app/test-flow-perf/main.c              | 21 ++++++++++++
> >>>>>>>     app/test-pmd/testpmd.c                 | 26 +++++++++++++++
> >>>>>>>     doc/guides/rel_notes/release_21_11.rst |  9 ++++++
> >>>>>>>     lib/ethdev/ethdev_driver.h             | 19 +++++++++++
> >>>>>>>     lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
> >>>>>>>     lib/ethdev/rte_ethdev.h                | 45
> >>>>>>> ++++++++++++++++++++++++++
> >>>>>>>     lib/ethdev/rte_flow.h                  | 12 +++++++
> >>>>>>>     lib/ethdev/version.map                 |  3 ++
> >>>>>>>     8 files changed, 160 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
> >>>>>>> index 9be8edc31d..48eafffb1d 100644
> >>>>>>> --- a/app/test-flow-perf/main.c
> >>>>>>> +++ b/app/test-flow-perf/main.c
> >>>>>>> @@ -1760,6 +1760,27 @@ init_port(void)
> >>>>>>>             rte_exit(EXIT_FAILURE, "Error: can't init mbuf
> >>>>>>> pool\n");
> >>>>>>>
> >>>>>>>         for (port_id = 0; port_id < nr_ports; port_id++) {
> >>>>>>> +        uint64_t rx_meta_features = 0;
> >>>>>>> +
> >>>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>>>>>> +
> >>>>>>> +        ret = rte_eth_rx_meta_negotiate(port_id,
> >>>>>>> &rx_meta_features);
> >>>>>>> +        if (ret == 0) {
> >>>>>>> +            if (!(rx_meta_features &
> >>>>>>> RTE_ETH_RX_META_USER_FLAG)) {
> >>>>>>> +                printf(":: flow action FLAG will not affect Rx
> >>>>>>> mbufs on port=%u\n",
> >>>>>>> +                       port_id);
> >>>>>>> +            }
> >>>>>>> +
> >>>>>>> +            if (!(rx_meta_features &
> >>>>>>> RTE_ETH_RX_META_USER_MARK)) {
> >>>>>>> +                printf(":: flow action MARK will not affect Rx
> >>>>>>> mbufs on port=%u\n",
> >>>>>>> +                       port_id);
> >>>>>>> +            }
> >>>>>>> +        } else if (ret != -ENOTSUP) {
> >>>>>>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
> >>>>>>> meta features on port=%u: %s\n",
> >>>>>>> +                 port_id, rte_strerror(-ret));
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>>             ret = rte_eth_dev_info_get(port_id, &dev_info);
> >>>>>>>             if (ret != 0)
> >>>>>>>                 rte_exit(EXIT_FAILURE, diff --git
> >>>>>>> a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>>>>>> 97ae52e17e..7a8da3d7ab 100644
> >>>>>>> --- a/app/test-pmd/testpmd.c
> >>>>>>> +++ b/app/test-pmd/testpmd.c
> >>>>>>> @@ -1485,10 +1485,36 @@ static void
> >>>>>>>     init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
> >>>>>>>         struct rte_port *port = &ports[pid];
> >>>>>>> +    uint64_t rx_meta_features = 0;
> >>>>>>>         uint16_t data_size;
> >>>>>>>         int ret;
> >>>>>>>         int i;
> >>>>>>>
> >>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
> >>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
> >>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
> >>>>>>> +
> >>>>>>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
> >>>>>>> +    if (ret == 0) {
> >>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
> >>>>>>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
> >>>>>>> affect Rx mbufs on port %u\n",
> >>>>>>> +                    pid);
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
> >>>>>>> {
> >>>>>>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
> >>>>>>> affect Rx mbufs on port %u\n",
> >>>>>>> +                    pid);
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
> >>>>>>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
> >>>>>>> might be limited or unavailable on port %u\n",
> >>>>>>> +                    pid);
> >>>>>>> +        }
> >>>>>>> +    } else if (ret != -ENOTSUP) {
> >>>>>>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
> >>>>>>> features on port %u: %s\n",
> >>>>>>> +             pid, rte_strerror(-ret));
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>>         port->dev_conf.txmode = tx_mode;
> >>>>>>>         port->dev_conf.rxmode = rx_mode;
> >>>>>>>
> >>>>>>> diff --git a/doc/guides/rel_notes/release_21_11.rst
> >>>>>>> b/doc/guides/rel_notes/release_21_11.rst
> >>>>>>> index 19356ac53c..6674d4474c 100644
> >>>>>>> --- a/doc/guides/rel_notes/release_21_11.rst
> >>>>>>> +++ b/doc/guides/rel_notes/release_21_11.rst
> >>>>>>> @@ -106,6 +106,15 @@ New Features
> >>>>>>>       Added command-line options to specify total number of
> >>>>>>> processes and
> >>>>>>>       current process ID. Each process owns subset of Rx and Tx
> queues.
> >>>>>>>
> >>>>>>> +* **Added an API to negotiate delivery of specific parts of Rx
> >>>>>>> +meta
> >>>>>>> +data**
> >>>>>>> +
> >>>>>>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
> >>>>>>> +  The following parts of Rx meta data were defined:
> >>>>>>> +
> >>>>>>> +  * ``RTE_ETH_RX_META_USER_FLAG``
> >>>>>>> +  * ``RTE_ETH_RX_META_USER_MARK``
> >>>>>>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
> >>>>>>> +
> >>>>>>>
> >>>>>>>     Removed Items
> >>>>>>>     -------------
> >>>>>>> diff --git a/lib/ethdev/ethdev_driver.h
> >>>>>>> b/lib/ethdev/ethdev_driver.h index 40e474aa7e..96e0c60cae
> 100644
> >>>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>>> @@ -789,6 +789,22 @@ typedef int
> (*eth_get_monitor_addr_t)(void
> >>>>>>> *rxq, typedef int (*eth_representor_info_get_t)(struct
> rte_eth_dev
> >>>>>>> *dev,
> >>>>>>>         struct rte_eth_representor_info *info);
> >>>>>>>
> >>>>>>> +/**
> >>>>>>> + * @internal
> >>>>>>> + * Negotiate delivery of specific parts of Rx meta data.
> >>>>>>> + *
> >>>>>>> + * @param dev
> >>>>>>> + *   Port (ethdev) handle
> >>>>>>> + *
> >>>>>>> + * @param[inout] features
> >>>>>>> + *   Feature selection buffer
> >>>>>>> + *
> >>>>>>> + * @return
> >>>>>>> + *   Negative errno value on error, zero otherwise  */ typedef
> >>>>>>> +int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
> >>>>>>> +                       uint64_t *features);
> >>>>>>> +
> >>>>>>>     /**
> >>>>>>>      * @internal A structure containing the functions exported by
> >>>>>>> an Ethernet driver.
> >>>>>>>      */
> >>>>>>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
> >>>>>>>
> >>>>>>>         eth_representor_info_get_t representor_info_get;
> >>>>>>>         /**< Get representor info. */
> >>>>>>> +
> >>>>>>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
> >>>>>>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
> >>>>>>>     };
> >>>>>>>
> >>>>>>>     /**
> >>>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> >>>>>>> index daf5ca9242..49cb84d64c 100644
> >>>>>>> --- a/lib/ethdev/rte_ethdev.c
> >>>>>>> +++ b/lib/ethdev/rte_ethdev.c
> >>>>>>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t
> >>>>>>> port_id,
> >>>>>>>         return eth_err(port_id, (*dev->dev_ops-
> >>>>>>>> representor_info_get)(dev, info));  }
> >>>>>>>
> >>>>>>> +int
> >>>>>>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
> *features) {
> >>>>>>> +    struct rte_eth_dev *dev;
> >>>>>>> +
> >>>>>>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>>>>>> +    dev = &rte_eth_devices[port_id];
> >>>>>>> +
> >>>>>>> +    if (dev->data->dev_configured != 0) {
> >>>>>>> +        RTE_ETHDEV_LOG(ERR,
> >>>>>>> +            "The port (id=%"PRIu16") is already configured\n",
> >>>>>>> +            port_id);
> >>>>>>> +        return -EBUSY;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    if (features == NULL) {
> >>>>>>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
> >>>>>>> +        return -EINVAL;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_meta_negotiate,
> >>>>>>> -ENOTSUP);
> >>>>>>> +    return eth_err(port_id,
> >>>>>>> +               (*dev->dev_ops->rx_meta_negotiate)(dev,
> >>>>>>> +features)); }
> >>>>>>> +
> >>>>>>>     RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> >>>>>>>
> >>>>>>>     RTE_INIT(ethdev_init_telemetry) diff --git
> >>>>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>>>>>> 1da37896d8..8467a7a362 100644
> >>>>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>>>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
> >>>>>>> rte_eth_representor_info_get(uint16_t port_id,
> >>>>>>>                      struct rte_eth_representor_info *info);
> >>>>>>>
> >>>>>>> +/** The ethdev sees flagged packets if there are flows with
> >>>>>>> +action FLAG. */ #define RTE_ETH_RX_META_USER_FLAG
> >> (UINT64_C(1) <<
> >>>>>>> +0)
> >>>>>>> +
> >>>>>>> +/** The ethdev sees mark IDs in packets if there are flows with
> >>>>>>> +action MARK. */ #define RTE_ETH_RX_META_USER_MARK
> >>>> (UINT64_C(1) <<
> >>>>>>> +1)
> >>>>>>> +
> >>>>>>> +/** The ethdev detects missed packets if there are "tunnel_set"
> >>>>>>> +flows in use. */ #define RTE_ETH_RX_META_TUNNEL_ID
> >> (UINT64_C(1)
> >>>> <<
> >>>>>>> +2)
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>> + * @warning
> >>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>>>>>> + *
> >>>>>>> + * Negotiate delivery of specific parts of Rx meta data.
> >>>>>>> + *
> >>>>>>> + * Invoke this API before the first rte_eth_dev_configure()
> >>>>>>> +invocation
> >>>>>>> + * to let the PMD make preparations that are inconvenient to do
> >> later.
> >>>>>>> + *
> >>>>>>> + * The negotiation process is as follows:
> >>>>>>> + *
> >>>>>>> + * - the application requests features intending to use at least
> >>>>>>> +some of them;
> >>>>>>> + * - the PMD responds with the guaranteed subset of the
> requested
> >>>>>>> +feature set;
> >>>>>>> + * - the application can retry negotiation with another set of
> >>>>>>> +features;
> >>>>>>> + * - the application can pass zero to clear the negotiation
> >>>>>>> +result;
> >>>>>>> + * - the last negotiated result takes effect upon the ethdev start.
> >>>>>>> + *
> >>>>>>> + * If this API is unsupported, the application should gracefully
> >>>>>>> ignore that.
> >>>>>>> + *
> >>>>>>> + * @param port_id
> >>>>>>> + *   Port (ethdev) identifier
> >>>>>>> + *
> >>>>>>> + * @param[inout] features
> >>>>>>> + *   Feature selection buffer
> >>>>>>> + *
> >>>>>>> + * @return
> >>>>>>> + *   - (-EBUSY) if the port can't handle this in its current
> >>>>>>> +state;
> >>>>>>> + *   - (-ENOTSUP) if the method itself is not supported by the
> >>>>>>> +PMD;
> >>>>>>> + *   - (-ENODEV) if *port_id* is invalid;
> >>>>>>> + *   - (-EINVAL) if *features* is NULL;
> >>>>>>> + *   - (-EIO) if the device is removed;
> >>>>>>> + *   - (0) on success
> >>>>>>> + */
> >>>>>>> +__rte_experimental
> >>>>>>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
> >>>>>>> +*features);
> >>>>>>
> >>>>>> I don't think meta is the best name since we also have meta item
> >>>>>> and the word meta can be used in other cases.
> >>>>>
> >>>>> I'm no expert in naming. What could be a better term for this?
> >>>>> Personally, I'd rather not perceive "meta" the way you describe.
> >>>>> It's not just "meta". It's "rx_meta", and the flags supplied with
> >>>>> this API provide enough context to explain what it's all about.
> >>>>
> >>>> Thinking overnight about it I'd suggest full "metadata".
> >>>> Yes, it will name a bit longer, but less confusing versus term META
> >>>> already used in flow API.
> >>>>
> >>> Following my above comments, I think it should be part of the new API
> >>> but in any case what about rx_flow_action_negotiate?
> >>
> >> See my thoughts above. It makes no sense to negotiate *support for
> >> actions*. Existing "rte_flow_validate()" already does that job. The new
> >> "negotiate Rx metadata* API is all about *delivery* of metadata which is
> >> supposed to be *already* set for the packets *inside* the NIC. So, we
> >> negotiate *delivery from the NIC to the host*. Nothing more.
> >>
> > Agree with your comment but then maybe we should go to the register
> > approach just like metadata?
> 
> Don't you mean "registering mbuf dynamic field / flag" by any chance?
> Even if it's technically possible, this may complicate the API contract
> because the key idea here is to demand that the application negotiate
> metadata delivery at the earliest step possible (before configuring the
> port), whilst a dynamic field / flag can be (theoretically) registered
> at any time. But, of course, feel free to elaborate on your idea.
> 

Yes, like I said above I don't see a difference between metadata
and mark, at least not from the application usage.
I assume you need this info at device start and by definition
the registration should happen before. (mbuf should be configured
before start)

> We should make sure that we all reach an agreement.
> 

+1 I think we can agree that there is a need for letting the PMD
know before the start that some action will be used.

And I'm sorry if I sound picky and hard, this is not my intention.
I'm also doing my best to review this as fast as I can.
My open issues and priorities:
1. the API breakage the possible solution adding support for the rest of the PMDs / update doc
to say that if the function is not supported the application should assume that
it can still use the mark / flag. -- blocker this must be resolved.
2. function name. my main issue is that metadata should be just like mark
maybe the solution can be adding metadata flag to this function.
the drawback is that the application needs to calls two functions to configure
metadata. -- high priority but if you can give me good reasoning not just
we don't need to register the mark I guess I will be O.K.
3. If PMD has internal logic in case of conflict or not.
Please think about it. -- low prio I will agree to what you decide.
but if you decide that PMD will have internal logic then this must be documented
so the application will know not to rely on the results.

Best,
Ori

> >
> > Best,
> > Ori
> >>>
> >>>> Andrew.
> >>> Best,
> >>> Ori
> >>>
> >>
> >> --
> >> Ivan M
> 
> --
> Ivan M
  
Ivan Malov Oct. 3, 2021, 11:50 p.m. UTC | #15
Hi Ori,

On 04/10/2021 00:04, Ori Kam wrote:
> Hi Ivan,
> 
> Sorry for the long review.
> 
>> -----Original Message-----
>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>> Sent: Sunday, October 3, 2021 8:30 PM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> Hi Ori,
>>
>> On 03/10/2021 14:01, Ori Kam wrote:
>>> Hi Ivan,
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>>>> Sent: Sunday, October 3, 2021 12:30 PM data
>>>>
>>>> Hi Ori,
>>>>
>>>> Thanks for reviewing this.
>>>>
>>>
>>> No problem.
>>>
>>>> On 03/10/2021 10:42, Ori Kam wrote:
>>>>> Hi Andrew and Ivan,
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> Sent: Friday, October 1, 2021 9:50 AM
>>>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
>>>>>> of Rx meta data
>>>>>>
>>>>>> On 9/30/21 10:07 PM, Ivan Malov wrote:
>>>>>>> Hi Ori,
>>>>>>>
>>>>>>> On 30/09/2021 17:59, Ori Kam wrote:
>>>>>>>> Hi Ivan,
>>>>>>>> Sorry for jumping in late.
>>>>>>>
>>>>>>> No worries. That's OK.
>>>>>>>
>>>>>>>> I have a concern that this patch breaks other PMDs.
>>>>>>>
>>>>>>> It does no such thing.
>>>>>>>
>>>>>>>>>    From the rst file " One should negotiate flag delivery beforehand"
>>>>>>>> since you only added this function for your PMD all other PMD will
>> fail.
>>>>>>>> I see that you added exception in the  examples, but it doesn't
>>>>>>>> make sense that applications will also need to add this exception
>>>>>>>> which is not documented.
>>>>>>>
>>>>>>> Say, you have an application, and you use it with some specific PMD.
>>>>>>> Say, that PMD doesn't run into the problem as ours does. In other
>>>>>>> words, the user can insert a flow with action MARK at any point
>>>>>>> and get mark delivery working starting from that moment without
>>>>>>> any problem. Say, this is exactly the way how it works for you at
>>>>>>> the
>>>> moment.
>>>>>>>
>>>>>>> Now. This new API kicks in. We update the application to invoke it
>>>>>>> as early as possible. But your PMD in question still doesn't
>>>>>>> support this API. The comment in the patch says that if the method
>>>>>>> returns ENOTSUP, the application should ignore that without
>>>>>>> batting an eyelid. It should just keep on working as it did before
>>>>>>> the introduction of
>>>> this API.
>>>>>>>
>>>>>
>>>>> I understand that it is nice to write in the patch comment that
>>>>> application should disregard this function in case of ENOTSUP but in
>>>>> a few month someone will read the official doc, where it is stated
>>>>> that this function call is a must and then what do you think the
>>>>> application will do?
>>>>> I think that the correct way is to add this function to all PMDs.
>>>>> Another option is to add to the doc that if the function is
>>>>> returning ENOTSUP the application should assume that all is supported.
>>>>>
>>>>> So from this point of view there is API break.
>>>>
>>>> So, you mean an API breakage in some formal sense? If the doc is
>>>> fixed in accordance with the second option you suggest, will it
>>>> suffice to avoid this formal API breakage?
>>>>
>>>
>>> Yes, but I think it will be better to add the missing function.
>>>
>>>>>
>>>>>>> More specific example:
>>>>>>> Say, the application doesn't mind using either "RSS + MARK" or
>>>>>>> tunnel offload. What it does right now is attempt to insert tunnel
>>>>>>> flows first and, if this fails, fall back to "RSS + MARK". With
>>>>>>> this API, the application will try to invoke this API with
>>>>>>> "USER_MARK | TUNNEL_ID" in adapter initialised state. If the PMD
>>>>>>> says that it can only enable the tunnel offload, then the
>>>>>>> application will get the knowledge that it doesn't make sense to
>>>>>>> even try inserting "RSS + MARK" flows. It just can skip useless
>>>>>>> actions. But if the PMD doesn't support the method, the
>>>>>>> application will see ENOTSUP and handle this
>>>>>>> gracefully: it will make no assumptions about what's guaranteed to
>>>>>>> be supported and what's not and will just keep on its old behavior:
>>>>>>> try to insert a flow, fail, fall back to another type of flow.
>>>>>>>
>>>>>
>>>>> I fully agree with your example, and think that this is the way to
>>>>> go, application should supply as much info as possible during startup.
>>>>
>>>> Right.
>>>>
>>>>> My question/comment is the negotiated result means that all of the
>>>>> actions are supported on the same rule?
>>>>> for example if application wants to add mark and tag on the same rule.
>>>>> (I know it doesn't make much sense) and the PMD can support both of
>>>>> them but not on the same rule, what should it return?
>>>>> Or for example if using the mark can only be supported if no decap
>>>>> action is set on this rule what should be the result?
>>>>>    From my undstanding this function is only to let the PMD know that
>>>>> on some rules the application will use those actions, the checking
>>>>> if the action combination is valid only happens on validate function right?
>>>>
>>>> This API does not bind itself to flow API. It's *not* about enabling
>>>> support for metadata *actions* (they are conducted entirely *inside*
>>>> the NIC). It's about enabling *delivery* of metadata from the NIC to host.
>>>>
>>>
>>> Good point so why not use the same logic as the metadata and register it?
>>> Since in any case, this is something in the mbuf so maybe this should be the
>> answer?
>>
>> I didn't catch your thought. Could you please elaborate on it?
> 
> The metadata action just like the mark or flag is used to give application
> data that was set by a flow rule.
> To enable the metadata the application must register the metadata field.
> Since this happens during the creation of the mbuf it means that it must be
> created before the device start.
> 
> I understand that the mark and flag don't need to be registered in the mbuf
> since they have saved space but from application point of view there is no
> difference between the metadata and mark, so why does negotiate function
> doesn't handle the metadata?
> 
> I hope this is clearer.

Thank you. That's a lot clearer.

I inspected struct rte_flow_action_set_meta as well as 
rte_flow_dynf_metadata_register(). The latter API doesn't require that 
applications invoke it precisely before adapter start. It says "must be 
called prior to use SET_META action", and the comment before the 
structure says just "in advance". So, at a bare minimum, the API 
contract could've been made more strict with this respect. However, far 
more important points are as follows:

1) This API enables delivery of this "custom" metadata between the PMD 
and the application, whilst the API under review, as I noted before, 
negotiates delivery of various kinds of metadata between the NIC and the 
PMD. These are two completely different (albeit adjacent) stages of 
packet delivery process.

2) This API doesn't negotiate anything with the PMD. It doesn't interact 
with the PMD at all. It just reserves extra room in mbufs for the 
metadata field and exits.

3) As a consequence of (3), the PMD can't immediately learn about this 
field being enabled. It's forced to face this fact at some deferred 
point. If the PMD, for instance, learns about that during adapter start 
and if it for some reason decides to deny the use of this field, it 
won't be able to convey its decision to the application. As a result, 
the application will live in the wrong assumption that it has 
successfully enabled the feature.

4) Even if we add similar APIs to "register" more kinds of metadata 
(flag, mark, tunnel ID, etc) and re-define the meaning of all these APIs 
to say that not only they enable delivery of the metadata between the 
PMD and the application but also enable the HW transport to get the 
metadata delivered from the NIC to the PMD itself, we won't be able to 
use this set of APIs to actually *negotiate* something. The order of 
invocations will be confusing to the application. If the PMD can't 
combine some of these features, it won't be able to communicate this 
clearly to the application. It will have to silently disregard some of 
the "registered" features. And this is something that we probably want 
to avoid. Right?

But I tend to agree that the API under review could have one more (4th) 
flag to negotiate delivery of this "custom" metadata from the NIC to the 
PMD. At the same time, enabling delivery of this data from the PMD to 
the application will remain in the responsibility domain of 
rte_flow_dynf_metadata_register().

> 
>>
>>>
>>>> Say, you insert a flow rule to mark some packets. The NIC, internally
>>>> (in the
>>>> e-switch) adds the mark to matching packets. Yes, in the boundaries
>>>> of the NIC HW, the packets bear the mark on them. It has been set,
>>>> yes. But when time comes to *deliver* the packets to the host, the
>>>> NIC (at least, in net/sfc
>>>> case) has two options: either provide only a small chunk of the
>>>> metadata for each packet *to the host*, which doesn't include mark
>>>> ID, flag and RSS hash, OR, alternatively, provide the full set of
>>>> metadata. In the former option, the mark is simply not delivered.
>>>> Once again: it *has been set*, but simply will not be *delivered to the
>> host*.
>>>>
>>>> So, this API is about negotiating *delivery* of metadata. In pure
>>>> technical sense. And the set of flags that this API returns indicates
>>>> which kinds of metadata the NIC will be able to deliver simultaneously.
>>>>
>>>> For example, as I understand, in the case of tunnel offload, MLX5
>>>> claims Rx mark entirely for tunnel ID metadata, so, if an application
>>>> requests "MARK | TUNNEL_ID" with this API, this PMD should probably
>>>> want to respond with just "TUNNEL_ID". The application will see the
>>>> response and realise that, even if it adds its *own* (user) action
>>>> MARK to a flow and if the flow is not rejected by the PMD, it won't
>>>> be able to see the mark in the received mbufs (or the mark will be
>> incorrect).
>>>>
>>> So what should the application do if on some flows it wants MARK and on
>> other FLAG?
>>
>> You mentioned flows, so I'd like to stress this out one more time: what this
>> API cares about is solely the possibility to deliver metadata between the NIC
>> and the host. The host == the PMD (*not* application).
>>
> 
> I understand that you are only talking about enabling the action,
> meaning to let the PMD know that at some point there will be a rule
> that will use the mark action for example.
> Is my understanding correct?

Not really. The causal relationships are as follows. The application 
comes to realise that it will need to use, say, action MARK in flows. 
This, in turn, means that, in order to be able to actually see the mark 
in received packets, the application needs to ensure that a) the NIC 
will be able to deliver the mark to the PMD and b) that the PMD will be 
able to deliver the mark to the application. In particular, in the case 
of Rx mark, (b) doesn't need to be negotiated = field "mark" is anyway 
provisioned in the mbuf structure, so no need to enable it. But (a) 
needs to be negotiated. Hence this API.

> I don't understand your last comment about host == PMD since at the end
> this value should be given to the application.

Two different steps, Ori, two different steps. The first one is to 
deliver the mark from the NIC to the PMD. And the second one is to 
deliver the mark from the PMD to the application. As you might 
understand, mbufs get filled out on the second step. That's it.

> 
>>>   From DPDK viewpoint both of them can't be shared on the same rule
>>> (they are using the same space in mbuf) so the application will never
>>> ask for both of them in the same rule but he can on some rules ask for
>>> mark while on other request for FLAG, even in your code you added both
>> of them.
>>>
>>> So what should the PMD return if it can support both of them just not
>>> at the same rule?
>>
>> Please see above. This is not about rules. This is not about the way how flag
>> and mark are presented *by* the PMD *to* the application in mbufs.
>> Simultaneous use of actions FLAG and MARK in flows must be ruled out by
>> rte_flow_validate() / rte_flow_create(). The way how flag and mark are
>> *represented* in mbufs belongs in mbuf library responsibility domain.
>>
>> Consider the following operational sequence:
>>
>> 1) The NIC has a packet, which has metadata associated with it;
>> 2) The NIC transfers this packet to the host;
>> 3) The PMD sees the packet and its metadata;
>> 4) The PMD represents whatever available metadata in mbuf format.
>>
>> Features negotiated by virtue of this API (for instance, FLAG and MARK)
>> enable delivery of these kinds of metadata between points (2) and (3).
>>
>> And the problem of flag / mark co-existence in mbufs sits at point (4).
>>
>> -> Completely different problems, in fact.
>>
> 
> Agree.
> 
>>>
>>> One option is to document that the supported values are not per rule
>>> but for the entire port. For example in the example you gave MLX5 will
>>> support mark + flag but will not support mark + tunnel.
>>
>> Yes, for the port. Flow rules are a don't care to this API.
>>
>>>
>>> Also considering your example, the negotiation may result in subpar result.
>>> taking your example the PMD returned  TUNNEL_ID maybe application
>>> would prefer to have the mark and not the TUNNEL_ID. I understand that
>>> application can check and try again with just the MARK.
>>
>> Exactly. The Application can repeat negotiation with just MARK. Is there any
>> problem with that?
>>
> 
> I understand that the application can negotiate again and again.
> I just don't like that the PMD has logic and selects what he thinks will be best.
> 
> I wanted to suggest that the PMD will just tell what are the conflicts and the application
> will negotiate again based on its logic.

Well, I'm not saying that letting the PMD decide on the optimal feature 
subset is the only reasonable MO. But it simplifies the negotiation 
procedure a lot. Conveying conflicts and feature inter-dependencies to 
the application might be rather complex and prone to errors.

At this point I believe it's important to clarify: the original intent 
is to assume that the PMD will first consider enabling all requested 
features. Only in the case when it fails to do so should it come up with 
the optimal subset.

> 
>>> You are inserting logic to the PMD, maybe the function should just
>>> fail maybe returning the conflicting items?
>>
>> Why return conflicting items? The optimal subset (from the PMD's
>> perspective) should be returned. It's a silver lining. In the end, the application
>> can learn which features can be enabled and in what combinations. And it
>> can rely on the outcome of the negotiation process.
>>
> That is my point this is PMD perspective, not the application.
> how can a PMD define an optimal subset? How can it know what is more
> important to the application?

How does "ls" command know the optimal sort mode? Why does it prefer 
sorting by name over sorting by date? Thanks to its options, it allows 
the user to express their own preference. So does the API in question. 
If the application knows that tunnel offload is more important to it 
(compared to MARK, for instance), it can request just TUNNEL_ID. Why 
complicate this?

> Also, the PMD logic is internal so if for some reason
> the PMD selected the best for the application by chance, so the application learns
> that this is a good value for him. A release later the internal PMD logic changes
> for example, a new feature was added, other customer requests.
> since this is PMD the original app is not aware of this change and may fail.

The same argumentation can equally apply to default RSS table, for 
example. What if an application gets accustomed to the round-robin table 
being default in some specific PMD (and the PMD maintainers change 
default RSS table out of a sudden)? Oops!

The truth is that the application shouldn't bind itself to some specific 
vendor / PMD. In any case. Hence the negotiation process. It's just a 
building block for some automation in the application.

> 
> We both agree that the application should check the result and renegotiate if needed
> I only suggested that the PMD will only return error and not assume he knows best.

I believe we should give this more thought. Maybe Andrew can join this 
conversation.

> 
> 
>>>
>>>
>>>
>>>> But some other PMDs (net/sfc, for instance) claim only a small fraction of
>> bits
>>>> in Rx mark to deliver tunnel ID information. Remaining bits are still
>> available
>>>> for delivery of *user* mark ID. Please see an example at
>>>> https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
>>>> ivan.malov@oktetlabs.ru/
>>>> . In this case, the PMD may want to return both flags in the response:
>>>> "MARK | TUNNEL_ID". This way, the application knows that both features
>>>> are enabled and available for use.
>>>>
>>>> Now. I anticipate more questions asking why wouldn't we prefer flow API
>>>> terminology or why wouldn't we add an API for negotiating support for
>>>> metadata *actions* and not just metadata *delivery*. There's an answer.
>>>> Always has been.
>>>>
>>>> The thing is, the use of *actions* is very complicated. For example, the
>> PMD
>>>> may support action MARK for "transfer" flows but not for non-"transfer"
>>>> ones. Also, simultaneous use of multiple different metadata actions may
>> not
>>>> be possible. And, last but not least, if we force the application to check
>>>> support for *actions* on action-after-action basis, the order of checks will
>> be
>>>> very confusing to applications.
>>>>
>>>> Previously, in this thread, Thomas suggested to go for exactly this type of
>>>> API, to check support for actions one-by-one, without any context
>>>> ("transfer" / non-"transfer"). I'm afraid, this won't be OK.
>>>>
>>> +1 to keeping it as a separated API. (I agree actions limitation are very
>> complex metrix)
>>>
>>>>>
>>>>> In any case I think this is good idea and I will see how we can add a
>>>>> more generic approach of this API to the new API that I'm going to
>> present.
>>>>>
>>>>>
>>>>>>> So no breakages with this API.
>>>>>>>
>>>>>>>>
>>>>>>>> Please see more comments inline.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ori
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>>>>> Sent: Thursday, September 23, 2021 2:20 PM
>>>>>>>>> Subject: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
>>>>>>>>> Rx meta data
>>>>>>>>>
>>>>>>>>> Delivery of mark, flag and the likes might affect small packet
>>>>>>>>> performance.
>>>>>>>>> If these features are disabled by default, enabling them in
>>>>>>>>> started state without causing traffic disruption may not always be
>>>> possible.
>>>>>>>>>
>>>>>>>>> Let applications negotiate delivery of Rx meta data beforehand.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>>>>> Reviewed-by: Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>
>>>>>>>>> Reviewed-by: Andy Moreton <amoreton@xilinx.com>
>>>>>>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>>>>>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>>>>>>>> ---
>>>>>>>>>      app/test-flow-perf/main.c              | 21 ++++++++++++
>>>>>>>>>      app/test-pmd/testpmd.c                 | 26 +++++++++++++++
>>>>>>>>>      doc/guides/rel_notes/release_21_11.rst |  9 ++++++
>>>>>>>>>      lib/ethdev/ethdev_driver.h             | 19 +++++++++++
>>>>>>>>>      lib/ethdev/rte_ethdev.c                | 25 ++++++++++++++
>>>>>>>>>      lib/ethdev/rte_ethdev.h                | 45
>>>>>>>>> ++++++++++++++++++++++++++
>>>>>>>>>      lib/ethdev/rte_flow.h                  | 12 +++++++
>>>>>>>>>      lib/ethdev/version.map                 |  3 ++
>>>>>>>>>      8 files changed, 160 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
>>>>>>>>> index 9be8edc31d..48eafffb1d 100644
>>>>>>>>> --- a/app/test-flow-perf/main.c
>>>>>>>>> +++ b/app/test-flow-perf/main.c
>>>>>>>>> @@ -1760,6 +1760,27 @@ init_port(void)
>>>>>>>>>              rte_exit(EXIT_FAILURE, "Error: can't init mbuf
>>>>>>>>> pool\n");
>>>>>>>>>
>>>>>>>>>          for (port_id = 0; port_id < nr_ports; port_id++) {
>>>>>>>>> +        uint64_t rx_meta_features = 0;
>>>>>>>>> +
>>>>>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>>>>>>> +        rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>>>>>>> +
>>>>>>>>> +        ret = rte_eth_rx_meta_negotiate(port_id,
>>>>>>>>> &rx_meta_features);
>>>>>>>>> +        if (ret == 0) {
>>>>>>>>> +            if (!(rx_meta_features &
>>>>>>>>> RTE_ETH_RX_META_USER_FLAG)) {
>>>>>>>>> +                printf(":: flow action FLAG will not affect Rx
>>>>>>>>> mbufs on port=%u\n",
>>>>>>>>> +                       port_id);
>>>>>>>>> +            }
>>>>>>>>> +
>>>>>>>>> +            if (!(rx_meta_features &
>>>>>>>>> RTE_ETH_RX_META_USER_MARK)) {
>>>>>>>>> +                printf(":: flow action MARK will not affect Rx
>>>>>>>>> mbufs on port=%u\n",
>>>>>>>>> +                       port_id);
>>>>>>>>> +            }
>>>>>>>>> +        } else if (ret != -ENOTSUP) {
>>>>>>>>> +            rte_exit(EXIT_FAILURE, "Error when negotiating Rx
>>>>>>>>> meta features on port=%u: %s\n",
>>>>>>>>> +                 port_id, rte_strerror(-ret));
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>>              ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>>>>>>              if (ret != 0)
>>>>>>>>>                  rte_exit(EXIT_FAILURE, diff --git
>>>>>>>>> a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>>>>>> 97ae52e17e..7a8da3d7ab 100644
>>>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>>>> @@ -1485,10 +1485,36 @@ static void
>>>>>>>>>      init_config_port_offloads(portid_t pid, uint32_t socket_id)  {
>>>>>>>>>          struct rte_port *port = &ports[pid];
>>>>>>>>> +    uint64_t rx_meta_features = 0;
>>>>>>>>>          uint16_t data_size;
>>>>>>>>>          int ret;
>>>>>>>>>          int i;
>>>>>>>>>
>>>>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
>>>>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
>>>>>>>>> +    rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
>>>>>>>>> +
>>>>>>>>> +    ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
>>>>>>>>> +    if (ret == 0) {
>>>>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
>>>>>>>>> +            TESTPMD_LOG(INFO, "Flow action FLAG will not
>>>>>>>>> affect Rx mbufs on port %u\n",
>>>>>>>>> +                    pid);
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK))
>>>>>>>>> {
>>>>>>>>> +            TESTPMD_LOG(INFO, "Flow action MARK will not
>>>>>>>>> affect Rx mbufs on port %u\n",
>>>>>>>>> +                    pid);
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>> +        if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
>>>>>>>>> +            TESTPMD_LOG(INFO, "Flow tunnel offload support
>>>>>>>>> might be limited or unavailable on port %u\n",
>>>>>>>>> +                    pid);
>>>>>>>>> +        }
>>>>>>>>> +    } else if (ret != -ENOTSUP) {
>>>>>>>>> +        rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta
>>>>>>>>> features on port %u: %s\n",
>>>>>>>>> +             pid, rte_strerror(-ret));
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>          port->dev_conf.txmode = tx_mode;
>>>>>>>>>          port->dev_conf.rxmode = rx_mode;
>>>>>>>>>
>>>>>>>>> diff --git a/doc/guides/rel_notes/release_21_11.rst
>>>>>>>>> b/doc/guides/rel_notes/release_21_11.rst
>>>>>>>>> index 19356ac53c..6674d4474c 100644
>>>>>>>>> --- a/doc/guides/rel_notes/release_21_11.rst
>>>>>>>>> +++ b/doc/guides/rel_notes/release_21_11.rst
>>>>>>>>> @@ -106,6 +106,15 @@ New Features
>>>>>>>>>        Added command-line options to specify total number of
>>>>>>>>> processes and
>>>>>>>>>        current process ID. Each process owns subset of Rx and Tx
>> queues.
>>>>>>>>>
>>>>>>>>> +* **Added an API to negotiate delivery of specific parts of Rx
>>>>>>>>> +meta
>>>>>>>>> +data**
>>>>>>>>> +
>>>>>>>>> +  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
>>>>>>>>> +  The following parts of Rx meta data were defined:
>>>>>>>>> +
>>>>>>>>> +  * ``RTE_ETH_RX_META_USER_FLAG``
>>>>>>>>> +  * ``RTE_ETH_RX_META_USER_MARK``
>>>>>>>>> +  * ``RTE_ETH_RX_META_TUNNEL_ID``
>>>>>>>>> +
>>>>>>>>>
>>>>>>>>>      Removed Items
>>>>>>>>>      -------------
>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h
>>>>>>>>> b/lib/ethdev/ethdev_driver.h index 40e474aa7e..96e0c60cae
>> 100644
>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>>>> @@ -789,6 +789,22 @@ typedef int
>> (*eth_get_monitor_addr_t)(void
>>>>>>>>> *rxq, typedef int (*eth_representor_info_get_t)(struct
>> rte_eth_dev
>>>>>>>>> *dev,
>>>>>>>>>          struct rte_eth_representor_info *info);
>>>>>>>>>
>>>>>>>>> +/**
>>>>>>>>> + * @internal
>>>>>>>>> + * Negotiate delivery of specific parts of Rx meta data.
>>>>>>>>> + *
>>>>>>>>> + * @param dev
>>>>>>>>> + *   Port (ethdev) handle
>>>>>>>>> + *
>>>>>>>>> + * @param[inout] features
>>>>>>>>> + *   Feature selection buffer
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   Negative errno value on error, zero otherwise  */ typedef
>>>>>>>>> +int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
>>>>>>>>> +                       uint64_t *features);
>>>>>>>>> +
>>>>>>>>>      /**
>>>>>>>>>       * @internal A structure containing the functions exported by
>>>>>>>>> an Ethernet driver.
>>>>>>>>>       */
>>>>>>>>> @@ -949,6 +965,9 @@ struct eth_dev_ops {
>>>>>>>>>
>>>>>>>>>          eth_representor_info_get_t representor_info_get;
>>>>>>>>>          /**< Get representor info. */
>>>>>>>>> +
>>>>>>>>> +    eth_rx_meta_negotiate_t rx_meta_negotiate;
>>>>>>>>> +    /**< Negotiate delivery of specific parts of Rx meta data. */
>>>>>>>>>      };
>>>>>>>>>
>>>>>>>>>      /**
>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>>>>>>>> index daf5ca9242..49cb84d64c 100644
>>>>>>>>> --- a/lib/ethdev/rte_ethdev.c
>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>>>>>>> @@ -6310,6 +6310,31 @@ rte_eth_representor_info_get(uint16_t
>>>>>>>>> port_id,
>>>>>>>>>          return eth_err(port_id, (*dev->dev_ops-
>>>>>>>>>> representor_info_get)(dev, info));  }
>>>>>>>>>
>>>>>>>>> +int
>>>>>>>>> +rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
>> *features) {
>>>>>>>>> +    struct rte_eth_dev *dev;
>>>>>>>>> +
>>>>>>>>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>>>>>>> +    dev = &rte_eth_devices[port_id];
>>>>>>>>> +
>>>>>>>>> +    if (dev->data->dev_configured != 0) {
>>>>>>>>> +        RTE_ETHDEV_LOG(ERR,
>>>>>>>>> +            "The port (id=%"PRIu16") is already configured\n",
>>>>>>>>> +            port_id);
>>>>>>>>> +        return -EBUSY;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    if (features == NULL) {
>>>>>>>>> +        RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
>>>>>>>>> +        return -EINVAL;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>>>> rx_meta_negotiate,
>>>>>>>>> -ENOTSUP);
>>>>>>>>> +    return eth_err(port_id,
>>>>>>>>> +               (*dev->dev_ops->rx_meta_negotiate)(dev,
>>>>>>>>> +features)); }
>>>>>>>>> +
>>>>>>>>>      RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>>>>>>>>>
>>>>>>>>>      RTE_INIT(ethdev_init_telemetry) diff --git
>>>>>>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>>>>>>> 1da37896d8..8467a7a362 100644
>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>> @@ -4888,6 +4888,51 @@ __rte_experimental  int
>>>>>>>>> rte_eth_representor_info_get(uint16_t port_id,
>>>>>>>>>                       struct rte_eth_representor_info *info);
>>>>>>>>>
>>>>>>>>> +/** The ethdev sees flagged packets if there are flows with
>>>>>>>>> +action FLAG. */ #define RTE_ETH_RX_META_USER_FLAG
>>>> (UINT64_C(1) <<
>>>>>>>>> +0)
>>>>>>>>> +
>>>>>>>>> +/** The ethdev sees mark IDs in packets if there are flows with
>>>>>>>>> +action MARK. */ #define RTE_ETH_RX_META_USER_MARK
>>>>>> (UINT64_C(1) <<
>>>>>>>>> +1)
>>>>>>>>> +
>>>>>>>>> +/** The ethdev detects missed packets if there are "tunnel_set"
>>>>>>>>> +flows in use. */ #define RTE_ETH_RX_META_TUNNEL_ID
>>>> (UINT64_C(1)
>>>>>> <<
>>>>>>>>> +2)
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Negotiate delivery of specific parts of Rx meta data.
>>>>>>>>> + *
>>>>>>>>> + * Invoke this API before the first rte_eth_dev_configure()
>>>>>>>>> +invocation
>>>>>>>>> + * to let the PMD make preparations that are inconvenient to do
>>>> later.
>>>>>>>>> + *
>>>>>>>>> + * The negotiation process is as follows:
>>>>>>>>> + *
>>>>>>>>> + * - the application requests features intending to use at least
>>>>>>>>> +some of them;
>>>>>>>>> + * - the PMD responds with the guaranteed subset of the
>> requested
>>>>>>>>> +feature set;
>>>>>>>>> + * - the application can retry negotiation with another set of
>>>>>>>>> +features;
>>>>>>>>> + * - the application can pass zero to clear the negotiation
>>>>>>>>> +result;
>>>>>>>>> + * - the last negotiated result takes effect upon the ethdev start.
>>>>>>>>> + *
>>>>>>>>> + * If this API is unsupported, the application should gracefully
>>>>>>>>> ignore that.
>>>>>>>>> + *
>>>>>>>>> + * @param port_id
>>>>>>>>> + *   Port (ethdev) identifier
>>>>>>>>> + *
>>>>>>>>> + * @param[inout] features
>>>>>>>>> + *   Feature selection buffer
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   - (-EBUSY) if the port can't handle this in its current
>>>>>>>>> +state;
>>>>>>>>> + *   - (-ENOTSUP) if the method itself is not supported by the
>>>>>>>>> +PMD;
>>>>>>>>> + *   - (-ENODEV) if *port_id* is invalid;
>>>>>>>>> + *   - (-EINVAL) if *features* is NULL;
>>>>>>>>> + *   - (-EIO) if the device is removed;
>>>>>>>>> + *   - (0) on success
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t
>>>>>>>>> +*features);
>>>>>>>>
>>>>>>>> I don't think meta is the best name since we also have meta item
>>>>>>>> and the word meta can be used in other cases.
>>>>>>>
>>>>>>> I'm no expert in naming. What could be a better term for this?
>>>>>>> Personally, I'd rather not perceive "meta" the way you describe.
>>>>>>> It's not just "meta". It's "rx_meta", and the flags supplied with
>>>>>>> this API provide enough context to explain what it's all about.
>>>>>>
>>>>>> Thinking overnight about it I'd suggest full "metadata".
>>>>>> Yes, it will name a bit longer, but less confusing versus term META
>>>>>> already used in flow API.
>>>>>>
>>>>> Following my above comments, I think it should be part of the new API
>>>>> but in any case what about rx_flow_action_negotiate?
>>>>
>>>> See my thoughts above. It makes no sense to negotiate *support for
>>>> actions*. Existing "rte_flow_validate()" already does that job. The new
>>>> "negotiate Rx metadata* API is all about *delivery* of metadata which is
>>>> supposed to be *already* set for the packets *inside* the NIC. So, we
>>>> negotiate *delivery from the NIC to the host*. Nothing more.
>>>>
>>> Agree with your comment but then maybe we should go to the register
>>> approach just like metadata?
>>
>> Don't you mean "registering mbuf dynamic field / flag" by any chance?
>> Even if it's technically possible, this may complicate the API contract
>> because the key idea here is to demand that the application negotiate
>> metadata delivery at the earliest step possible (before configuring the
>> port), whilst a dynamic field / flag can be (theoretically) registered
>> at any time. But, of course, feel free to elaborate on your idea.
>>
> 
> Yes, like I said above I don't see a difference between metadata
> and mark, at least not from the application usage.
> I assume you need this info at device start and by definition
> the registration should happen before. (mbuf should be configured
> before start)

Please see my thoughts about dynamic fields above.

> 
>> We should make sure that we all reach an agreement.
>>
> 
> +1 I think we can agree that there is a need for letting the PMD
> know before the start that some action will be used.
> 
> And I'm sorry if I sound picky and hard, this is not my intention.
> I'm also doing my best to review this as fast as I can.
> My open issues and priorities:
> 1. the API breakage the possible solution adding support for the rest of the PMDs / update doc
> to say that if the function is not supported the application should assume that
> it can still use the mark / flag. -- blocker this must be resolved.

I see.

> 2. function name. my main issue is that metadata should be just like mark
> maybe the solution can be adding metadata flag to this function.
> the drawback is that the application needs to calls two functions to configure
> metadata. -- high priority but if you can give me good reasoning not just
> we don't need to register the mark I guess I will be O.K.

Please see my thoughts above. This API negotiates metadata delivery on 
the path between the NIC and the PMD. The metadata mbuf register API 
does this on the path between the PMD and the application. So no 
contradiction here.

> 3. If PMD has internal logic in case of conflict or not.
> Please think about it. -- low prio I will agree to what you decide.
> but if you decide that PMD will have internal logic then this must be documented
> so the application will know not to rely on the results.

Please see my reply above. The application can rely on the outcome of 
the negotiation (the last negotiated subset of features), but it should 
know that if it disagrees with the suggested feature subset, it can 
re-negotiate. All fair and square.

> 
> Best,
> Ori
> 
>>>
>>> Best,
>>> Ori
>>>>>
>>>>>> Andrew.
>>>>> Best,
>>>>> Ori
>>>>>
>>>>
>>>> --
>>>> Ivan M
>>
>> --
>> Ivan M
  
Ori Kam Oct. 4, 2021, 6:56 a.m. UTC | #16
Hi Ivan,

> -----Original Message-----
> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> Sent: Monday, October 4, 2021 2:50 AM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> Hi Ori,
> 
> On 04/10/2021 00:04, Ori Kam wrote:
> > Hi Ivan,
> >
> > Sorry for the long review.
> >
> >> -----Original Message-----
> >> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> >> Sent: Sunday, October 3, 2021 8:30 PM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >> Rx meta data
> >>
> >> Hi Ori,
> >>
> >> On 03/10/2021 14:01, Ori Kam wrote:
> >>> Hi Ivan,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
> >>>> Sent: Sunday, October 3, 2021 12:30 PM data
> >>>>
> >>>> Hi Ori,
> >>>>
> >>>> Thanks for reviewing this.
> >>>>
> >>>
> >>> No problem.
> >>>
> >>>> On 03/10/2021 10:42, Ori Kam wrote:
> >>>>> Hi Andrew and Ivan,
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> Sent: Friday, October 1, 2021 9:50 AM
> >>>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
> >>>>>> of Rx meta data
> >>>>>>
> >>>>>> On 9/30/21 10:07 PM, Ivan Malov wrote:
> >>>>>>> Hi Ori,
> >>>>>>>
> >>>>>>> On 30/09/2021 17:59, Ori Kam wrote:
> >>>>>>>> Hi Ivan,

[Snip]

> >>> Good point so why not use the same logic as the metadata and register
> it?
> >>> Since in any case, this is something in the mbuf so maybe this
> >>> should be the
> >> answer?
> >>
> >> I didn't catch your thought. Could you please elaborate on it?
> >
> > The metadata action just like the mark or flag is used to give
> > application data that was set by a flow rule.
> > To enable the metadata the application must register the metadata field.
> > Since this happens during the creation of the mbuf it means that it
> > must be created before the device start.
> >
> > I understand that the mark and flag don't need to be registered in the
> > mbuf since they have saved space but from application point of view
> > there is no difference between the metadata and mark, so why does
> > negotiate function doesn't handle the metadata?
> >
> > I hope this is clearer.
> 
> Thank you. That's a lot clearer.
> 
> I inspected struct rte_flow_action_set_meta as well as
> rte_flow_dynf_metadata_register(). The latter API doesn't require that
> applications invoke it precisely before adapter start. It says "must be called
> prior to use SET_META action", and the comment before the structure says
> just "in advance". So, at a bare minimum, the API contract could've been
> made more strict with this respect. However, far more important points are
> as follows:
> 

Agree, that doc should be updated but by definition this must be set before mbuf
creation this means before device start.

> 1) This API enables delivery of this "custom" metadata between the PMD
> and the application, whilst the API under review, as I noted before,
> negotiates delivery of various kinds of metadata between the NIC and the
> PMD. These are two completely different (albeit adjacent) stages of packet
> delivery process.
>
They are exactly alike also in the metadata case the registertion does two things:
Saves a place for the info in the mbuf and tells the PMD that it should configure the NIC
to supply this information upon request.
Even in your PMD assuming that it can support the metadata, you will need to configure
it otherwise when the application will request this data using a rule you will be at the
same spot you are now with the mark.

> 2) This API doesn't negotiate anything with the PMD. It doesn't interact with
> the PMD at all. It just reserves extra room in mbufs for the metadata field
> and exits.
> 
> 3) As a consequence of (3), the PMD can't immediately learn about this field
> being enabled. It's forced to face this fact at some deferred point. If the
> PMD, for instance, learns about that during adapter start and if it for some
> reason decides to deny the use of this field, it won't be able to convey its
> decision to the application. As a result, the application will live in the wrong
> assumption that it has successfully enabled the feature.
>
> 4) Even if we add similar APIs to "register" more kinds of metadata (flag,
> mark, tunnel ID, etc) and re-define the meaning of all these APIs to say that
> not only they enable delivery of the metadata between the PMD and the
> application but also enable the HW transport to get the metadata delivered
> from the NIC to the PMD itself, we won't be able to use this set of APIs to
> actually *negotiate* something. The order of invocations will be confusing to
> the application. If the PMD can't combine some of these features, it won't be
> able to communicate this clearly to the application. It will have to silently
> disregard some of the "registered" features. And this is something that we
> probably want to avoid. Right?
> 
> But I tend to agree that the API under review could have one more (4th) flag
> to negotiate delivery of this "custom" metadata from the NIC to the PMD. At
> the same time, enabling delivery of this data from the PMD to the application
> will remain in the responsibility domain of
> rte_flow_dynf_metadata_register().
> 

I agree and think this is the best solution.

> >
> >>
> >>>
> >>>> Say, you insert a flow rule to mark some packets. The NIC,
> >>>> internally (in the
> >>>> e-switch) adds the mark to matching packets. Yes, in the boundaries
> >>>> of the NIC HW, the packets bear the mark on them. It has been set,
> >>>> yes. But when time comes to *deliver* the packets to the host, the
> >>>> NIC (at least, in net/sfc
> >>>> case) has two options: either provide only a small chunk of the
> >>>> metadata for each packet *to the host*, which doesn't include mark
> >>>> ID, flag and RSS hash, OR, alternatively, provide the full set of
> >>>> metadata. In the former option, the mark is simply not delivered.
> >>>> Once again: it *has been set*, but simply will not be *delivered to
> >>>> the
> >> host*.
> >>>>
> >>>> So, this API is about negotiating *delivery* of metadata. In pure
> >>>> technical sense. And the set of flags that this API returns
> >>>> indicates which kinds of metadata the NIC will be able to deliver
> simultaneously.
> >>>>
> >>>> For example, as I understand, in the case of tunnel offload, MLX5
> >>>> claims Rx mark entirely for tunnel ID metadata, so, if an
> >>>> application requests "MARK | TUNNEL_ID" with this API, this PMD
> >>>> should probably want to respond with just "TUNNEL_ID". The
> >>>> application will see the response and realise that, even if it adds
> >>>> its *own* (user) action MARK to a flow and if the flow is not
> >>>> rejected by the PMD, it won't be able to see the mark in the
> >>>> received mbufs (or the mark will be
> >> incorrect).
> >>>>
> >>> So what should the application do if on some flows it wants MARK and
> >>> on
> >> other FLAG?
> >>
> >> You mentioned flows, so I'd like to stress this out one more time:
> >> what this API cares about is solely the possibility to deliver
> >> metadata between the NIC and the host. The host == the PMD (*not*
> application).
> >>
> >
> > I understand that you are only talking about enabling the action,
> > meaning to let the PMD know that at some point there will be a rule
> > that will use the mark action for example.
> > Is my understanding correct?
> 
> Not really. The causal relationships are as follows. The application comes to
> realise that it will need to use, say, action MARK in flows.
> This, in turn, means that, in order to be able to actually see the mark in
> received packets, the application needs to ensure that a) the NIC will be able
> to deliver the mark to the PMD and b) that the PMD will be able to deliver
> the mark to the application. In particular, in the case of Rx mark, (b) doesn't
> need to be negotiated = field "mark" is anyway provisioned in the mbuf
> structure, so no need to enable it. But (a) needs to be negotiated. Hence this
> API.
> 
Please see my above comment I think we both agree.

> > I don't understand your last comment about host == PMD since at the
> > end this value should be given to the application.
> 
> Two different steps, Ori, two different steps. The first one is to deliver the
> mark from the NIC to the PMD. And the second one is to deliver the mark
> from the PMD to the application. As you might understand, mbufs get filled
> out on the second step. That's it.
> 
> >
> >>>   From DPDK viewpoint both of them can't be shared on the same rule
> >>> (they are using the same space in mbuf) so the application will never
> >>> ask for both of them in the same rule but he can on some rules ask for
> >>> mark while on other request for FLAG, even in your code you added
> both
> >> of them.
> >>>
> >>> So what should the PMD return if it can support both of them just not
> >>> at the same rule?
> >>
> >> Please see above. This is not about rules. This is not about the way how
> flag
> >> and mark are presented *by* the PMD *to* the application in mbufs.
> >> Simultaneous use of actions FLAG and MARK in flows must be ruled out
> by
> >> rte_flow_validate() / rte_flow_create(). The way how flag and mark are
> >> *represented* in mbufs belongs in mbuf library responsibility domain.
> >>
> >> Consider the following operational sequence:
> >>
> >> 1) The NIC has a packet, which has metadata associated with it;
> >> 2) The NIC transfers this packet to the host;
> >> 3) The PMD sees the packet and its metadata;
> >> 4) The PMD represents whatever available metadata in mbuf format.
> >>
> >> Features negotiated by virtue of this API (for instance, FLAG and MARK)
> >> enable delivery of these kinds of metadata between points (2) and (3).
> >>
> >> And the problem of flag / mark co-existence in mbufs sits at point (4).
> >>
> >> -> Completely different problems, in fact.
> >>
> >
> > Agree.
> >
> >>>
> >>> One option is to document that the supported values are not per rule
> >>> but for the entire port. For example in the example you gave MLX5 will
> >>> support mark + flag but will not support mark + tunnel.
> >>
> >> Yes, for the port. Flow rules are a don't care to this API.
> >>
> >>>
> >>> Also considering your example, the negotiation may result in subpar
> result.
> >>> taking your example the PMD returned  TUNNEL_ID maybe application
> >>> would prefer to have the mark and not the TUNNEL_ID. I understand
> that
> >>> application can check and try again with just the MARK.
> >>
> >> Exactly. The Application can repeat negotiation with just MARK. Is there
> any
> >> problem with that?
> >>
> >
> > I understand that the application can negotiate again and again.
> > I just don't like that the PMD has logic and selects what he thinks will be
> best.
> >
> > I wanted to suggest that the PMD will just tell what are the conflicts and
> the application
> > will negotiate again based on its logic.
> 
> Well, I'm not saying that letting the PMD decide on the optimal feature
> subset is the only reasonable MO. But it simplifies the negotiation
> procedure a lot. Conveying conflicts and feature inter-dependencies to
> the application might be rather complex and prone to errors.
> 
> At this point I believe it's important to clarify: the original intent
> is to assume that the PMD will first consider enabling all requested
> features. Only in the case when it fails to do so should it come up with
> the optimal subset.
> 

I understand my issue is the the later case and how can PMD know what is
the optimal subset.

> >
> >>> You are inserting logic to the PMD, maybe the function should just
> >>> fail maybe returning the conflicting items?
> >>
> >> Why return conflicting items? The optimal subset (from the PMD's
> >> perspective) should be returned. It's a silver lining. In the end, the
> application
> >> can learn which features can be enabled and in what combinations. And it
> >> can rely on the outcome of the negotiation process.
> >>
> > That is my point this is PMD perspective, not the application.
> > how can a PMD define an optimal subset? How can it know what is more
> > important to the application?
> 
> How does "ls" command know the optimal sort mode? Why does it prefer
> sorting by name over sorting by date? Thanks to its options, it allows
> the user to express their own preference. So does the API in question.
> If the application knows that tunnel offload is more important to it
> (compared to MARK, for instance), it can request just TUNNEL_ID. Why
> complicate this?
> 
I don't agree with your example, the "ls"  is clearly defined and each
time you run it you get the same order. It doesn't change between versions.
While in this case there will be change between versions.
Think about it this way lets assume that PMD doesn't support the TUNNEL_ID
so the application request at startup both TUNNEL_ID and MARK.
PMD returnes only MARK, application checks and see that the PMD
didn't return the TUNNEL_ID so it negotiate again only to get that nothing
is supported, then application try only the mark and to this the PMD agree.

Again this is not critical to me. But keep it in mind.

> > Also, the PMD logic is internal so if for some reason
> > the PMD selected the best for the application by chance, so the application
> learns
> > that this is a good value for him. A release later the internal PMD logic
> changes
> > for example, a new feature was added, other customer requests.
> > since this is PMD the original app is not aware of this change and may fail.
> 
> The same argumentation can equally apply to default RSS table, for
> example. What if an application gets accustomed to the round-robin table
> being default in some specific PMD (and the PMD maintainers change
> default RSS table out of a sudden)? Oops!
> 
Yes but this is why the use has the option to select the mode,
in case of RSS if the requested mode isn't supported the PMD fails not
just select different algorithm right?

> The truth is that the application shouldn't bind itself to some specific
> vendor / PMD. In any case. Hence the negotiation process. It's just a
> building block for some automation in the application.
> 
> >
> > We both agree that the application should check the result and renegotiate
> if needed
> > I only suggested that the PMD will only return error and not assume he
> knows best.
> 
> I believe we should give this more thought. Maybe Andrew can join this
> conversation.
> 
I fully agree lets sleep on it, 
This will not be a blocker.

> >
> >
> >>>
> >>>
> >>>
> >>>> But some other PMDs (net/sfc, for instance) claim only a small fraction
> of
> >> bits
> >>>> in Rx mark to deliver tunnel ID information. Remaining bits are still
> >> available
> >>>> for delivery of *user* mark ID. Please see an example at
> >>>> https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
> >>>> ivan.malov@oktetlabs.ru/
> >>>> . In this case, the PMD may want to return both flags in the response:
> >>>> "MARK | TUNNEL_ID". This way, the application knows that both
> features
> >>>> are enabled and available for use.
> >>>>
> >>>> Now. I anticipate more questions asking why wouldn't we prefer flow
> API
> >>>> terminology or why wouldn't we add an API for negotiating support for
> >>>> metadata *actions* and not just metadata *delivery*. There's an
> answer.
> >>>> Always has been.
> >>>>
> >>>> The thing is, the use of *actions* is very complicated. For example, the
> >> PMD
> >>>> may support action MARK for "transfer" flows but not for non-
> "transfer"
> >>>> ones. Also, simultaneous use of multiple different metadata actions
> may
> >> not
> >>>> be possible. And, last but not least, if we force the application to check
> >>>> support for *actions* on action-after-action basis, the order of checks
> will
> >> be
> >>>> very confusing to applications.
> >>>>
> >>>> Previously, in this thread, Thomas suggested to go for exactly this type
> of
> >>>> API, to check support for actions one-by-one, without any context
> >>>> ("transfer" / non-"transfer"). I'm afraid, this won't be OK.
> >>>>
> >>> +1 to keeping it as a separated API. (I agree actions limitation are very
> >> complex metrix)
> >>>
> >>>>>
> >>>>> In any case I think this is good idea and I will see how we can add a
> >>>>> more generic approach of this API to the new API that I'm going to
> >> present.
> >>>>>
> >>>>>
> >>>>>>> So no breakages with this API.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Please see more comments inline.


[Snip]

> > Yes, like I said above I don't see a difference between metadata
> > and mark, at least not from the application usage.
> > I assume you need this info at device start and by definition
> > the registration should happen before. (mbuf should be configured
> > before start)
> 
> Please see my thoughts about dynamic fields above.
> 
> >
> >> We should make sure that we all reach an agreement.
> >>
> >
> > +1 I think we can agree that there is a need for letting the PMD
> > know before the start that some action will be used.
> >
> > And I'm sorry if I sound picky and hard, this is not my intention.
> > I'm also doing my best to review this as fast as I can.
> > My open issues and priorities:
> > 1. the API breakage the possible solution adding support for the rest of the
> PMDs / update doc
> > to say that if the function is not supported the application should assume
> that
> > it can still use the mark / flag. -- blocker this must be resolved.
> 
> I see.
> 
> > 2. function name. my main issue is that metadata should be just like mark
> > maybe the solution can be adding metadata flag to this function.
> > the drawback is that the application needs to calls two functions to
> configure
> > metadata. -- high priority but if you can give me good reasoning not just
> > we don't need to register the mark I guess I will be O.K.
> 
> Please see my thoughts above. This API negotiates metadata delivery on
> the path between the NIC and the PMD. The metadata mbuf register API
> does this on the path between the PMD and the application. So no
> contradiction here.
> 

See my comments above I think we have an agreed solution.

> > 3. If PMD has internal logic in case of conflict or not.
> > Please think about it. -- low prio I will agree to what you decide.
> > but if you decide that PMD will have internal logic then this must be
> documented
> > so the application will know not to rely on the results.
> 
> Please see my reply above. The application can rely on the outcome of
> the negotiation (the last negotiated subset of features), but it should
> know that if it disagrees with the suggested feature subset, it can
> re-negotiate. All fair and square.
> 

Like I said above think about it some more, I will also think in any
case this will not be a blocker.

Best,
Ori
> >
> > Best,
> > Ori
> >
> >>>
> >>> Best,
> >>> Ori
> >>>>>
> >>>>>> Andrew.
> >>>>> Best,
> >>>>> Ori
> >>>>>
> >>>>
> >>>> --
> >>>> Ivan M
> >>
> >> --
> >> Ivan M
> 
> --
> Ivan M
  
Ivan Malov Oct. 4, 2021, 11:39 a.m. UTC | #17
Hi Ori,

On 04/10/2021 09:56, Ori Kam wrote:
> Hi Ivan,
> 
>> -----Original Message-----
>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>> Sent: Monday, October 4, 2021 2:50 AM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> Hi Ori,
>>
>> On 04/10/2021 00:04, Ori Kam wrote:
>>> Hi Ivan,
>>>
>>> Sorry for the long review.
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>>>> Sent: Sunday, October 3, 2021 8:30 PM
>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
>>>> Rx meta data
>>>>
>>>> Hi Ori,
>>>>
>>>> On 03/10/2021 14:01, Ori Kam wrote:
>>>>> Hi Ivan,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>>>>>> Sent: Sunday, October 3, 2021 12:30 PM data
>>>>>>
>>>>>> Hi Ori,
>>>>>>
>>>>>> Thanks for reviewing this.
>>>>>>
>>>>>
>>>>> No problem.
>>>>>
>>>>>> On 03/10/2021 10:42, Ori Kam wrote:
>>>>>>> Hi Andrew and Ivan,
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>>>> Sent: Friday, October 1, 2021 9:50 AM
>>>>>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
>>>>>>>> of Rx meta data
>>>>>>>>
>>>>>>>> On 9/30/21 10:07 PM, Ivan Malov wrote:
>>>>>>>>> Hi Ori,
>>>>>>>>>
>>>>>>>>> On 30/09/2021 17:59, Ori Kam wrote:
>>>>>>>>>> Hi Ivan,
> 
> [Snip]
> 
>>>>> Good point so why not use the same logic as the metadata and register
>> it?
>>>>> Since in any case, this is something in the mbuf so maybe this
>>>>> should be the
>>>> answer?
>>>>
>>>> I didn't catch your thought. Could you please elaborate on it?
>>>
>>> The metadata action just like the mark or flag is used to give
>>> application data that was set by a flow rule.
>>> To enable the metadata the application must register the metadata field.
>>> Since this happens during the creation of the mbuf it means that it
>>> must be created before the device start.
>>>
>>> I understand that the mark and flag don't need to be registered in the
>>> mbuf since they have saved space but from application point of view
>>> there is no difference between the metadata and mark, so why does
>>> negotiate function doesn't handle the metadata?
>>>
>>> I hope this is clearer.
>>
>> Thank you. That's a lot clearer.
>>
>> I inspected struct rte_flow_action_set_meta as well as
>> rte_flow_dynf_metadata_register(). The latter API doesn't require that
>> applications invoke it precisely before adapter start. It says "must be called
>> prior to use SET_META action", and the comment before the structure says
>> just "in advance". So, at a bare minimum, the API contract could've been
>> made more strict with this respect. However, far more important points are
>> as follows:
>>
> 
> Agree, that doc should be updated but by definition this must be set before mbuf
> creation this means before device start.
> 
>> 1) This API enables delivery of this "custom" metadata between the PMD
>> and the application, whilst the API under review, as I noted before,
>> negotiates delivery of various kinds of metadata between the NIC and the
>> PMD. These are two completely different (albeit adjacent) stages of packet
>> delivery process.
>>
> They are exactly alike also in the metadata case the registertion does two things:
> Saves a place for the info in the mbuf and tells the PMD that it should configure the NIC
> to supply this information upon request.

Looking at rte_flow_dynf_metadata_register() implementation, it doesn't 
seem to notify the PMD of the new field directly. Yes, the PMD will 
finally know, but at that point it won't be able to reject the field. 
It's one-sided communication in fact.

> Even in your PMD assuming that it can support the metadata, you will need to configure
> it otherwise when the application will request this data using a rule you will be at the
> same spot you are now with the mark.

Right, but as I said, the primary concern is to configure delivery of 
metadata from the NIC HW to the PMD. It's not about mbuf dynfields.

> 
>> 2) This API doesn't negotiate anything with the PMD. It doesn't interact with
>> the PMD at all. It just reserves extra room in mbufs for the metadata field
>> and exits.
>>
>> 3) As a consequence of (3), the PMD can't immediately learn about this field
>> being enabled. It's forced to face this fact at some deferred point. If the
>> PMD, for instance, learns about that during adapter start and if it for some
>> reason decides to deny the use of this field, it won't be able to convey its
>> decision to the application. As a result, the application will live in the wrong
>> assumption that it has successfully enabled the feature.
>>
>> 4) Even if we add similar APIs to "register" more kinds of metadata (flag,
>> mark, tunnel ID, etc) and re-define the meaning of all these APIs to say that
>> not only they enable delivery of the metadata between the PMD and the
>> application but also enable the HW transport to get the metadata delivered
>> from the NIC to the PMD itself, we won't be able to use this set of APIs to
>> actually *negotiate* something. The order of invocations will be confusing to
>> the application. If the PMD can't combine some of these features, it won't be
>> able to communicate this clearly to the application. It will have to silently
>> disregard some of the "registered" features. And this is something that we
>> probably want to avoid. Right?
>>
>> But I tend to agree that the API under review could have one more (4th) flag
>> to negotiate delivery of this "custom" metadata from the NIC to the PMD. At
>> the same time, enabling delivery of this data from the PMD to the application
>> will remain in the responsibility domain of
>> rte_flow_dynf_metadata_register().
>>
> 
> I agree and think this is the best solution.

Thank you.

> 
>>>
>>>>
>>>>>
>>>>>> Say, you insert a flow rule to mark some packets. The NIC,
>>>>>> internally (in the
>>>>>> e-switch) adds the mark to matching packets. Yes, in the boundaries
>>>>>> of the NIC HW, the packets bear the mark on them. It has been set,
>>>>>> yes. But when time comes to *deliver* the packets to the host, the
>>>>>> NIC (at least, in net/sfc
>>>>>> case) has two options: either provide only a small chunk of the
>>>>>> metadata for each packet *to the host*, which doesn't include mark
>>>>>> ID, flag and RSS hash, OR, alternatively, provide the full set of
>>>>>> metadata. In the former option, the mark is simply not delivered.
>>>>>> Once again: it *has been set*, but simply will not be *delivered to
>>>>>> the
>>>> host*.
>>>>>>
>>>>>> So, this API is about negotiating *delivery* of metadata. In pure
>>>>>> technical sense. And the set of flags that this API returns
>>>>>> indicates which kinds of metadata the NIC will be able to deliver
>> simultaneously.
>>>>>>
>>>>>> For example, as I understand, in the case of tunnel offload, MLX5
>>>>>> claims Rx mark entirely for tunnel ID metadata, so, if an
>>>>>> application requests "MARK | TUNNEL_ID" with this API, this PMD
>>>>>> should probably want to respond with just "TUNNEL_ID". The
>>>>>> application will see the response and realise that, even if it adds
>>>>>> its *own* (user) action MARK to a flow and if the flow is not
>>>>>> rejected by the PMD, it won't be able to see the mark in the
>>>>>> received mbufs (or the mark will be
>>>> incorrect).
>>>>>>
>>>>> So what should the application do if on some flows it wants MARK and
>>>>> on
>>>> other FLAG?
>>>>
>>>> You mentioned flows, so I'd like to stress this out one more time:
>>>> what this API cares about is solely the possibility to deliver
>>>> metadata between the NIC and the host. The host == the PMD (*not*
>> application).
>>>>
>>>
>>> I understand that you are only talking about enabling the action,
>>> meaning to let the PMD know that at some point there will be a rule
>>> that will use the mark action for example.
>>> Is my understanding correct?
>>
>> Not really. The causal relationships are as follows. The application comes to
>> realise that it will need to use, say, action MARK in flows.
>> This, in turn, means that, in order to be able to actually see the mark in
>> received packets, the application needs to ensure that a) the NIC will be able
>> to deliver the mark to the PMD and b) that the PMD will be able to deliver
>> the mark to the application. In particular, in the case of Rx mark, (b) doesn't
>> need to be negotiated = field "mark" is anyway provisioned in the mbuf
>> structure, so no need to enable it. But (a) needs to be negotiated. Hence this
>> API.
>>
> Please see my above comment I think we both agree.

Agree to have the 4-th flag in the new API to cover this "custom / raw 
metdata" delivery? Personally, I tend to agree, but maybe Andrew can 
express his opinion, too.

> 
>>> I don't understand your last comment about host == PMD since at the
>>> end this value should be given to the application.
>>
>> Two different steps, Ori, two different steps. The first one is to deliver the
>> mark from the NIC to the PMD. And the second one is to deliver the mark
>> from the PMD to the application. As you might understand, mbufs get filled
>> out on the second step. That's it.
>>
>>>
>>>>>    From DPDK viewpoint both of them can't be shared on the same rule
>>>>> (they are using the same space in mbuf) so the application will never
>>>>> ask for both of them in the same rule but he can on some rules ask for
>>>>> mark while on other request for FLAG, even in your code you added
>> both
>>>> of them.
>>>>>
>>>>> So what should the PMD return if it can support both of them just not
>>>>> at the same rule?
>>>>
>>>> Please see above. This is not about rules. This is not about the way how
>> flag
>>>> and mark are presented *by* the PMD *to* the application in mbufs.
>>>> Simultaneous use of actions FLAG and MARK in flows must be ruled out
>> by
>>>> rte_flow_validate() / rte_flow_create(). The way how flag and mark are
>>>> *represented* in mbufs belongs in mbuf library responsibility domain.
>>>>
>>>> Consider the following operational sequence:
>>>>
>>>> 1) The NIC has a packet, which has metadata associated with it;
>>>> 2) The NIC transfers this packet to the host;
>>>> 3) The PMD sees the packet and its metadata;
>>>> 4) The PMD represents whatever available metadata in mbuf format.
>>>>
>>>> Features negotiated by virtue of this API (for instance, FLAG and MARK)
>>>> enable delivery of these kinds of metadata between points (2) and (3).
>>>>
>>>> And the problem of flag / mark co-existence in mbufs sits at point (4).
>>>>
>>>> -> Completely different problems, in fact.
>>>>
>>>
>>> Agree.
>>>
>>>>>
>>>>> One option is to document that the supported values are not per rule
>>>>> but for the entire port. For example in the example you gave MLX5 will
>>>>> support mark + flag but will not support mark + tunnel.
>>>>
>>>> Yes, for the port. Flow rules are a don't care to this API.
>>>>
>>>>>
>>>>> Also considering your example, the negotiation may result in subpar
>> result.
>>>>> taking your example the PMD returned  TUNNEL_ID maybe application
>>>>> would prefer to have the mark and not the TUNNEL_ID. I understand
>> that
>>>>> application can check and try again with just the MARK.
>>>>
>>>> Exactly. The Application can repeat negotiation with just MARK. Is there
>> any
>>>> problem with that?
>>>>
>>>
>>> I understand that the application can negotiate again and again.
>>> I just don't like that the PMD has logic and selects what he thinks will be
>> best.
>>>
>>> I wanted to suggest that the PMD will just tell what are the conflicts and
>> the application
>>> will negotiate again based on its logic.
>>
>> Well, I'm not saying that letting the PMD decide on the optimal feature
>> subset is the only reasonable MO. But it simplifies the negotiation
>> procedure a lot. Conveying conflicts and feature inter-dependencies to
>> the application might be rather complex and prone to errors.
>>
>> At this point I believe it's important to clarify: the original intent
>> is to assume that the PMD will first consider enabling all requested
>> features. Only in the case when it fails to do so should it come up with
>> the optimal subset.
>>
> 
> I understand my issue is the the later case and how can PMD know what is
> the optimal subset.
> 
>>>
>>>>> You are inserting logic to the PMD, maybe the function should just
>>>>> fail maybe returning the conflicting items?
>>>>
>>>> Why return conflicting items? The optimal subset (from the PMD's
>>>> perspective) should be returned. It's a silver lining. In the end, the
>> application
>>>> can learn which features can be enabled and in what combinations. And it
>>>> can rely on the outcome of the negotiation process.
>>>>
>>> That is my point this is PMD perspective, not the application.
>>> how can a PMD define an optimal subset? How can it know what is more
>>> important to the application?
>>
>> How does "ls" command know the optimal sort mode? Why does it prefer
>> sorting by name over sorting by date? Thanks to its options, it allows
>> the user to express their own preference. So does the API in question.
>> If the application knows that tunnel offload is more important to it
>> (compared to MARK, for instance), it can request just TUNNEL_ID. Why
>> complicate this?
>>
> I don't agree with your example, the "ls"  is clearly defined and each
> time you run it you get the same order. It doesn't change between versions.
> While in this case there will be change between versions.

Maybe not that good example, indeed. But the fact that it's clearly 
defined is true in this particular case. There are tons of programs 
which don't document their defaults clearly and never cease to surprise 
their users when new versions get released... It's so customary.

> Think about it this way lets assume that PMD doesn't support the TUNNEL_ID
> so the application request at startup both TUNNEL_ID and MARK.
> PMD returnes only MARK, application checks and see that the PMD
> didn't return the TUNNEL_ID so it negotiate again only to get that nothing
> is supported, then application try only the mark and to this the PMD agree.

So what's the problem? The key phrase here is that "application checks". 
Yes, it does check the output. And has the right to disagree, to 
re-negotiate.

> 
> Again this is not critical to me. But keep it in mind.

We never lost this from our view.

Frankly, we had internal discussions and of course we did realise that 
letting the PMD chose the optimal subset would raise concerns. But we 
also should keep in mind the fact that communicating conflicts might be 
difficult. I'll refrain from ranting about possible algorithms, though.

It's a trade off between avoiding PMDs push their vision of the optimal 
feature set and keeping the API simple and concise and thus user-friendly.

> 
>>> Also, the PMD logic is internal so if for some reason
>>> the PMD selected the best for the application by chance, so the application
>> learns
>>> that this is a good value for him. A release later the internal PMD logic
>> changes
>>> for example, a new feature was added, other customer requests.
>>> since this is PMD the original app is not aware of this change and may fail.
>>
>> The same argumentation can equally apply to default RSS table, for
>> example. What if an application gets accustomed to the round-robin table
>> being default in some specific PMD (and the PMD maintainers change
>> default RSS table out of a sudden)? Oops!
>>
> Yes but this is why the use has the option to select the mode,
> in case of RSS if the requested mode isn't supported the PMD fails not
> just select different algorithm right?

I don't refer to the MQ mode or hash algorithm. I refer to default 
fill-out of RETA. The application author may test its product once with 
some PMD and watch the RETA work in round-robin manner by default. They 
may then mistakenly assume that its guaranteed behaviour while it's not. 
Hence the existence of an API to let the application explicitly set RETA 
entries. And the applications are encouraged to use this API.

The same might apply to the API in question. Yes, it allows the PMD to 
suggest the optimal feature subset *if* it can't enable the full / 
originally requested set of features simultaneously. But nobody prevents 
the application from re-negotiating this. The application can narrow 
down the requested set of features or check them one-by one.

And *this* effectively enables the application to have its own logic and 
fully control it. It can do multiple invocations of the API and join the 
dots itself. Conflicts between some features can be very clear to the 
application this way.

> 
>> The truth is that the application shouldn't bind itself to some specific
>> vendor / PMD. In any case. Hence the negotiation process. It's just a
>> building block for some automation in the application.
>>
>>>
>>> We both agree that the application should check the result and renegotiate
>> if needed
>>> I only suggested that the PMD will only return error and not assume he
>> knows best.
>>
>> I believe we should give this more thought. Maybe Andrew can join this
>> conversation.
>>
> I fully agree lets sleep on it,
> This will not be a blocker.
> 
>>>
>>>
>>>>>
>>>>>
>>>>>
>>>>>> But some other PMDs (net/sfc, for instance) claim only a small fraction
>> of
>>>> bits
>>>>>> in Rx mark to deliver tunnel ID information. Remaining bits are still
>>>> available
>>>>>> for delivery of *user* mark ID. Please see an example at
>>>>>> https://patches.dpdk.org/project/dpdk/patch/20210929205730.775-2-
>>>>>> ivan.malov@oktetlabs.ru/
>>>>>> . In this case, the PMD may want to return both flags in the response:
>>>>>> "MARK | TUNNEL_ID". This way, the application knows that both
>> features
>>>>>> are enabled and available for use.
>>>>>>
>>>>>> Now. I anticipate more questions asking why wouldn't we prefer flow
>> API
>>>>>> terminology or why wouldn't we add an API for negotiating support for
>>>>>> metadata *actions* and not just metadata *delivery*. There's an
>> answer.
>>>>>> Always has been.
>>>>>>
>>>>>> The thing is, the use of *actions* is very complicated. For example, the
>>>> PMD
>>>>>> may support action MARK for "transfer" flows but not for non-
>> "transfer"
>>>>>> ones. Also, simultaneous use of multiple different metadata actions
>> may
>>>> not
>>>>>> be possible. And, last but not least, if we force the application to check
>>>>>> support for *actions* on action-after-action basis, the order of checks
>> will
>>>> be
>>>>>> very confusing to applications.
>>>>>>
>>>>>> Previously, in this thread, Thomas suggested to go for exactly this type
>> of
>>>>>> API, to check support for actions one-by-one, without any context
>>>>>> ("transfer" / non-"transfer"). I'm afraid, this won't be OK.
>>>>>>
>>>>> +1 to keeping it as a separated API. (I agree actions limitation are very
>>>> complex metrix)
>>>>>
>>>>>>>
>>>>>>> In any case I think this is good idea and I will see how we can add a
>>>>>>> more generic approach of this API to the new API that I'm going to
>>>> present.
>>>>>>>
>>>>>>>
>>>>>>>>> So no breakages with this API.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please see more comments inline.
> 
> 
> [Snip]
> 
>>> Yes, like I said above I don't see a difference between metadata
>>> and mark, at least not from the application usage.
>>> I assume you need this info at device start and by definition
>>> the registration should happen before. (mbuf should be configured
>>> before start)
>>
>> Please see my thoughts about dynamic fields above.
>>
>>>
>>>> We should make sure that we all reach an agreement.
>>>>
>>>
>>> +1 I think we can agree that there is a need for letting the PMD
>>> know before the start that some action will be used.
>>>
>>> And I'm sorry if I sound picky and hard, this is not my intention.
>>> I'm also doing my best to review this as fast as I can.
>>> My open issues and priorities:
>>> 1. the API breakage the possible solution adding support for the rest of the
>> PMDs / update doc
>>> to say that if the function is not supported the application should assume
>> that
>>> it can still use the mark / flag. -- blocker this must be resolved.
>>
>> I see.
>>
>>> 2. function name. my main issue is that metadata should be just like mark
>>> maybe the solution can be adding metadata flag to this function.
>>> the drawback is that the application needs to calls two functions to
>> configure
>>> metadata. -- high priority but if you can give me good reasoning not just
>>> we don't need to register the mark I guess I will be O.K.
>>
>> Please see my thoughts above. This API negotiates metadata delivery on
>> the path between the NIC and the PMD. The metadata mbuf register API
>> does this on the path between the PMD and the application. So no
>> contradiction here.
>>
> 
> See my comments above I think we have an agreed solution.
> 
>>> 3. If PMD has internal logic in case of conflict or not.
>>> Please think about it. -- low prio I will agree to what you decide.
>>> but if you decide that PMD will have internal logic then this must be
>> documented
>>> so the application will know not to rely on the results.
>>
>> Please see my reply above. The application can rely on the outcome of
>> the negotiation (the last negotiated subset of features), but it should
>> know that if it disagrees with the suggested feature subset, it can
>> re-negotiate. All fair and square.
>>
> 
> Like I said above think about it some more, I will also think in any
> case this will not be a blocker.
> 
> Best,
> Ori
>>>
>>> Best,
>>> Ori
>>>
>>>>>
>>>>> Best,
>>>>> Ori
>>>>>>>
>>>>>>>> Andrew.
>>>>>>> Best,
>>>>>>> Ori
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ivan M
>>>>
>>>> --
>>>> Ivan M
>>
>> --
>> Ivan M
  
Andrew Rybchenko Oct. 4, 2021, 1:53 p.m. UTC | #18
On 10/4/21 2:39 PM, Ivan Malov wrote:
> On 04/10/2021 09:56, Ori Kam wrote:
>>> On 04/10/2021 00:04, Ori Kam wrote:
>>>> I understand that you are only talking about enabling the action,
>>>> meaning to let the PMD know that at some point there will be a rule
>>>> that will use the mark action for example.
>>>> Is my understanding correct?
>>>
>>> Not really. The causal relationships are as follows. The application
>>> comes to
>>> realise that it will need to use, say, action MARK in flows.
>>> This, in turn, means that, in order to be able to actually see the
>>> mark in
>>> received packets, the application needs to ensure that a) the NIC
>>> will be able
>>> to deliver the mark to the PMD and b) that the PMD will be able to
>>> deliver
>>> the mark to the application. In particular, in the case of Rx mark,
>>> (b) doesn't
>>> need to be negotiated = field "mark" is anyway provisioned in the mbuf
>>> structure, so no need to enable it. But (a) needs to be negotiated.
>>> Hence this
>>> API.
>>>
>> Please see my above comment I think we both agree.
> 
> Agree to have the 4-th flag in the new API to cover this "custom / raw
> metdata" delivery? Personally, I tend to agree, but maybe Andrew can
> express his opinion, too.

Of course, it could be added, but we're not going to support it
in net/sfc. So, I think the flag should be added when a PMD
will going to support it (e.g. net/mlx5).
  
Ori Kam Oct. 5, 2021, 6:30 a.m. UTC | #19
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, October 4, 2021 4:53 PM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
>
> On 10/4/21 2:39 PM, Ivan Malov wrote:
> > On 04/10/2021 09:56, Ori Kam wrote:
> >>> On 04/10/2021 00:04, Ori Kam wrote:
> >>>> I understand that you are only talking about enabling the action,
> >>>> meaning to let the PMD know that at some point there will be a rule
> >>>> that will use the mark action for example.
> >>>> Is my understanding correct?
> >>>
> >>> Not really. The causal relationships are as follows. The application
> >>> comes to realise that it will need to use, say, action MARK in
> >>> flows.
> >>> This, in turn, means that, in order to be able to actually see the
> >>> mark in received packets, the application needs to ensure that a)
> >>> the NIC will be able to deliver the mark to the PMD and b) that the
> >>> PMD will be able to deliver the mark to the application. In
> >>> particular, in the case of Rx mark,
> >>> (b) doesn't
> >>> need to be negotiated = field "mark" is anyway provisioned in the
> >>> mbuf structure, so no need to enable it. But (a) needs to be negotiated.
> >>> Hence this
> >>> API.
> >>>
> >> Please see my above comment I think we both agree.
> >
> > Agree to have the 4-th flag in the new API to cover this "custom / raw
> > metdata" delivery? Personally, I tend to agree, but maybe Andrew can
> > express his opinion, too.
>
> Of course, it could be added, but we're not going to support it in net/sfc. So, I
> think the flag should be added when a PMD will going to support it (e.g.
> net/mlx5).

I think it should be added now, and more I think that this patch should add the missing function
to all PMDs 😊

Best,
Ori
  
Andrew Rybchenko Oct. 5, 2021, 7:27 a.m. UTC | #20
On 10/5/21 9:30 AM, Ori Kam wrote:
> Hi Andrew,
>
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, October 4, 2021 4:53 PM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> On 10/4/21 2:39 PM, Ivan Malov wrote:
>>> On 04/10/2021 09:56, Ori Kam wrote:
>>>>> On 04/10/2021 00:04, Ori Kam wrote:
>>>>>> I understand that you are only talking about enabling the action,
>>>>>> meaning to let the PMD know that at some point there will be a rule
>>>>>> that will use the mark action for example.
>>>>>> Is my understanding correct?
>>>>> Not really. The causal relationships are as follows. The application
>>>>> comes to realise that it will need to use, say, action MARK in
>>>>> flows.
>>>>> This, in turn, means that, in order to be able to actually see the
>>>>> mark in received packets, the application needs to ensure that a)
>>>>> the NIC will be able to deliver the mark to the PMD and b) that the
>>>>> PMD will be able to deliver the mark to the application. In
>>>>> particular, in the case of Rx mark,
>>>>> (b) doesn't
>>>>> need to be negotiated = field "mark" is anyway provisioned in the
>>>>> mbuf structure, so no need to enable it. But (a) needs to be negotiated.
>>>>> Hence this
>>>>> API.
>>>>>
>>>> Please see my above comment I think we both agree.
>>> Agree to have the 4-th flag in the new API to cover this "custom / raw
>>> metdata" delivery? Personally, I tend to agree, but maybe Andrew can
>>> express his opinion, too.
>> Of course, it could be added, but we're not going to support it in net/sfc. So, I
>> think the flag should be added when a PMD will going to support it (e.g.
>> net/mlx5).
> I think it should be added now, and more I think that this patch should add the missing function
> to all PMDs 😊

Sorry, but I disagree. Could you point out to DPDK documentation
where it is written? Should all new API be supported in all PMDs
by the API contributor?

Andrew.
  
Ori Kam Oct. 5, 2021, 8:17 a.m. UTC | #21
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, October 5, 2021 10:27 AM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> On 10/5/21 9:30 AM, Ori Kam wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Monday, October 4, 2021 4:53 PM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >> Rx meta data
> >>
> >> On 10/4/21 2:39 PM, Ivan Malov wrote:
> >>> On 04/10/2021 09:56, Ori Kam wrote:
> >>>>> On 04/10/2021 00:04, Ori Kam wrote:
> >>>>>> I understand that you are only talking about enabling the action,
> >>>>>> meaning to let the PMD know that at some point there will be a
> >>>>>> rule that will use the mark action for example.
> >>>>>> Is my understanding correct?
> >>>>> Not really. The causal relationships are as follows. The
> >>>>> application comes to realise that it will need to use, say, action
> >>>>> MARK in flows.
> >>>>> This, in turn, means that, in order to be able to actually see the
> >>>>> mark in received packets, the application needs to ensure that a)
> >>>>> the NIC will be able to deliver the mark to the PMD and b) that
> >>>>> the PMD will be able to deliver the mark to the application. In
> >>>>> particular, in the case of Rx mark,
> >>>>> (b) doesn't
> >>>>> need to be negotiated = field "mark" is anyway provisioned in the
> >>>>> mbuf structure, so no need to enable it. But (a) needs to be negotiated.
> >>>>> Hence this
> >>>>> API.
> >>>>>
> >>>> Please see my above comment I think we both agree.
> >>> Agree to have the 4-th flag in the new API to cover this "custom /
> >>> raw metdata" delivery? Personally, I tend to agree, but maybe Andrew
> >>> can express his opinion, too.
> >> Of course, it could be added, but we're not going to support it in
> >> net/sfc. So, I think the flag should be added when a PMD will going to
> support it (e.g.
> >> net/mlx5).
> > I think it should be added now, and more I think that this patch
> > should add the missing function to all PMDs 😊
> 
> Sorry, but I disagree. Could you point out to DPDK documentation where it is
> written? Should all new API be supported in all PMDs by the API contributor?
> 
This changes existing PMD beavior, until now there was no need to register the MARK
now you require it, it is just like change the shared counter you needed to fix different drivers.
This is not critical to me like I said in other thread as long is it is clear that if PMD doesn't support
the new function it doesn't mean the the PMD has issue with the request.

One more thing, I think this flag should be added now since you need it,
I think you should report that you don't support it.
since just like we talked there is no real difference between metadata and MARK.
What do you think?

Best,
Ori
> Andrew.
  
Andrew Rybchenko Oct. 5, 2021, 8:38 a.m. UTC | #22
Hi Ori,

On 10/5/21 11:17 AM, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, October 5, 2021 10:27 AM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> On 10/5/21 9:30 AM, Ori Kam wrote:
>>> Hi Andrew,
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Monday, October 4, 2021 4:53 PM
>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
>>>> Rx meta data
>>>>
>>>> On 10/4/21 2:39 PM, Ivan Malov wrote:
>>>>> On 04/10/2021 09:56, Ori Kam wrote:
>>>>>>> On 04/10/2021 00:04, Ori Kam wrote:
>>>>>>>> I understand that you are only talking about enabling the action,
>>>>>>>> meaning to let the PMD know that at some point there will be a
>>>>>>>> rule that will use the mark action for example.
>>>>>>>> Is my understanding correct?
>>>>>>> Not really. The causal relationships are as follows. The
>>>>>>> application comes to realise that it will need to use, say, action
>>>>>>> MARK in flows.
>>>>>>> This, in turn, means that, in order to be able to actually see the
>>>>>>> mark in received packets, the application needs to ensure that a)
>>>>>>> the NIC will be able to deliver the mark to the PMD and b) that
>>>>>>> the PMD will be able to deliver the mark to the application. In
>>>>>>> particular, in the case of Rx mark,
>>>>>>> (b) doesn't
>>>>>>> need to be negotiated = field "mark" is anyway provisioned in the
>>>>>>> mbuf structure, so no need to enable it. But (a) needs to be negotiated.
>>>>>>> Hence this
>>>>>>> API.
>>>>>>>
>>>>>> Please see my above comment I think we both agree.
>>>>> Agree to have the 4-th flag in the new API to cover this "custom /
>>>>> raw metdata" delivery? Personally, I tend to agree, but maybe Andrew
>>>>> can express his opinion, too.
>>>> Of course, it could be added, but we're not going to support it in
>>>> net/sfc. So, I think the flag should be added when a PMD will going to
>> support it (e.g.
>>>> net/mlx5).
>>> I think it should be added now, and more I think that this patch
>>> should add the missing function to all PMDs 😊
>>
>> Sorry, but I disagree. Could you point out to DPDK documentation where it is
>> written? Should all new API be supported in all PMDs by the API contributor?
>>
> This changes existing PMD beavior, until now there was no need to register the MARK
> now you require it, it is just like change the shared counter you needed to fix different drivers.
> This is not critical to me like I said in other thread as long is it is clear that if PMD doesn't support
> the new function it doesn't mean the the PMD has issue with the request.

I see your point. Hopefully the function description in v4 is
clear that it is not the case. If callback is not supported by
a driver, application should try to use all required metadata.
So, there is no breakage in accordance with defined API
contract.

Many thanks for your review notes. The review really
makes the API clearer and better documented.

> One more thing, I think this flag should be added now since you need it,
> I think you should report that you don't support it.
> since just like we talked there is no real difference between metadata and MARK.
> What do you think?

It sounds like a trick :) Negative support is *not* a support
in fact. DPDK policy requires support of a feature in a PMD
and in-tree application. Of course, it is not a problem to
add meta. It is really easy to do. I just don't want to add
it in v5 to be deleted in v6 because of my above concerns.

@Thomas, what do you think?

Andrew.
  
Ori Kam Oct. 5, 2021, 9:41 a.m. UTC | #23
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, October 5, 2021 11:39 AM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> Hi Ori,
> 
> On 10/5/21 11:17 AM, Ori Kam wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Tuesday, October 5, 2021 10:27 AM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >> Rx meta data
> >>
> >> On 10/5/21 9:30 AM, Ori Kam wrote:
> >>> Hi Andrew,
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Monday, October 4, 2021 4:53 PM
> >>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
> >>>> of Rx meta data
> >>>>
> >>>> On 10/4/21 2:39 PM, Ivan Malov wrote:
> >>>>> On 04/10/2021 09:56, Ori Kam wrote:
> >>>>>>> On 04/10/2021 00:04, Ori Kam wrote:
> >>>>>>>> I understand that you are only talking about enabling the
> >>>>>>>> action, meaning to let the PMD know that at some point there
> >>>>>>>> will be a rule that will use the mark action for example.
> >>>>>>>> Is my understanding correct?
> >>>>>>> Not really. The causal relationships are as follows. The
> >>>>>>> application comes to realise that it will need to use, say,
> >>>>>>> action MARK in flows.
> >>>>>>> This, in turn, means that, in order to be able to actually see
> >>>>>>> the mark in received packets, the application needs to ensure
> >>>>>>> that a) the NIC will be able to deliver the mark to the PMD and
> >>>>>>> b) that the PMD will be able to deliver the mark to the
> >>>>>>> application. In particular, in the case of Rx mark,
> >>>>>>> (b) doesn't
> >>>>>>> need to be negotiated = field "mark" is anyway provisioned in
> >>>>>>> the mbuf structure, so no need to enable it. But (a) needs to be
> negotiated.
> >>>>>>> Hence this
> >>>>>>> API.
> >>>>>>>
> >>>>>> Please see my above comment I think we both agree.
> >>>>> Agree to have the 4-th flag in the new API to cover this "custom /
> >>>>> raw metdata" delivery? Personally, I tend to agree, but maybe
> >>>>> Andrew can express his opinion, too.
> >>>> Of course, it could be added, but we're not going to support it in
> >>>> net/sfc. So, I think the flag should be added when a PMD will going
> >>>> to
> >> support it (e.g.
> >>>> net/mlx5).
> >>> I think it should be added now, and more I think that this patch
> >>> should add the missing function to all PMDs 😊
> >>
> >> Sorry, but I disagree. Could you point out to DPDK documentation
> >> where it is written? Should all new API be supported in all PMDs by the API
> contributor?
> >>
> > This changes existing PMD beavior, until now there was no need to
> > register the MARK now you require it, it is just like change the shared counter
> you needed to fix different drivers.
> > This is not critical to me like I said in other thread as long is it
> > is clear that if PMD doesn't support the new function it doesn't mean the the
> PMD has issue with the request.
> 
> I see your point. Hopefully the function description in v4 is clear that it is not
> the case. If callback is not supported by a driver, application should try to use
> all required metadata.
> So, there is no breakage in accordance with defined API contract.
> 

Agree les pretty but works.

> Many thanks for your review notes. The review really makes the API clearer
> and better documented.
> 

Trying to do my best.

> > One more thing, I think this flag should be added now since you need
> > it, I think you should report that you don't support it.
> > since just like we talked there is no real difference between metadata and
> MARK.
> > What do you think?
> 
> It sounds like a trick :) Negative support is *not* a support in fact. DPDK policy
> requires support of a feature in a PMD and in-tree application. Of course, it is
> not a problem to add meta. It is really easy to do. I just don't want to add it in
> v5 to be deleted in v6 because of my above concerns.
> 
This was not a trick. I understand what you are saying.
if we say that metadata is the same as mark, (I think we all agree on it) and that
application need to notify pmd about such operations, I assume it will try to see how to
request the metadata.

I'm O.K. with adding it later and in any case I promise you that if you add it
it will stay.

> @Thomas, what do you think?
> 
> Andrew.

Ori
  
Andrew Rybchenko Oct. 5, 2021, 10:01 a.m. UTC | #24
Hi Ori,

On 10/5/21 12:41 PM, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, October 5, 2021 11:39 AM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> Hi Ori,
>>
>> On 10/5/21 11:17 AM, Ori Kam wrote:
>>
>>> One more thing, I think this flag should be added now since you need
>>> it, I think you should report that you don't support it.
>>> since just like we talked there is no real difference between metadata and MARK.
>>> What do you think?
>>
>> It sounds like a trick :) Negative support is *not* a support in fact. DPDK policy
>> requires support of a feature in a PMD and in-tree application. Of course, it is
>> not a problem to add meta. It is really easy to do. I just don't want to add it in
>> v5 to be deleted in v6 because of my above concerns.
>>
> This was not a trick. I understand what you are saying.
> if we say that metadata is the same as mark, (I think we all agree on it) and that
> application need to notify pmd about such operations, I assume it will try to see how to
> request the metadata.

Frankly speaking I feel sick when I think about META and MARK
together. Do we really need both in DPDK?

> I'm O.K. with adding it later and in any case I promise you that if you add it
> it will stay.

Many thanks, I see.

>> @Thomas, what do you think?
>>
>> Andrew.
> 
> Ori
> 

Andrew.
  
Ori Kam Oct. 5, 2021, 10:10 a.m. UTC | #25
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, October 5, 2021 1:02 PM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> Hi Ori,
> 
> On 10/5/21 12:41 PM, Ori Kam wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Tuesday, October 5, 2021 11:39 AM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >> Rx meta data
> >>
> >> Hi Ori,
> >>
> >> On 10/5/21 11:17 AM, Ori Kam wrote:
> >>
> >>> One more thing, I think this flag should be added now since you need
> >>> it, I think you should report that you don't support it.
> >>> since just like we talked there is no real difference between metadata and
> MARK.
> >>> What do you think?
> >>
> >> It sounds like a trick :) Negative support is *not* a support in
> >> fact. DPDK policy requires support of a feature in a PMD and in-tree
> >> application. Of course, it is not a problem to add meta. It is really
> >> easy to do. I just don't want to add it in
> >> v5 to be deleted in v6 because of my above concerns.
> >>
> > This was not a trick. I understand what you are saying.
> > if we say that metadata is the same as mark, (I think we all agree on
> > it) and that application need to notify pmd about such operations, I
> > assume it will try to see how to request the metadata.
> 
> Frankly speaking I feel sick when I think about META and MARK together. Do
> we really need both in DPDK?
> 
I realy don't want you the be sick,
The resoun that we need both of them is that 32 in Nvidia it is only 24 bits of mark is not
enough, so there is a need for more bits.
I think that in the end we will go to something much more generic that the application
will just say how many bits it wants to get and this what he will get.
for example the application may say it needs 128 bits and it will register this size to the mbuf
or give in the mbuf pointer two where those values should be set.
In any case as you can see we have already to many changes in rte_flow in this release and the
next one, but I'm planning to push this feature in the future
what do you think of such a feature?

Ori
> > I'm O.K. with adding it later and in any case I promise you that if
> > you add it it will stay.
> 
> Many thanks, I see.
> 
> >> @Thomas, what do you think?
> >>
> >> Andrew.
> >
> > Ori
> >
> 
> Andrew.
  
Andrew Rybchenko Oct. 5, 2021, 11:11 a.m. UTC | #26
Hi Ori,

On 10/5/21 1:10 PM, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, October 5, 2021 1:02 PM
>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
>> data
>>
>> Hi Ori,
>>
>> On 10/5/21 12:41 PM, Ori Kam wrote:
>>> Hi Andrew,
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Tuesday, October 5, 2021 11:39 AM
>>>> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
>>>> Rx meta data
>>>>
>>>> Hi Ori,
>>>>
>>>> On 10/5/21 11:17 AM, Ori Kam wrote:
>>>>
>>>>> One more thing, I think this flag should be added now since you need
>>>>> it, I think you should report that you don't support it.
>>>>> since just like we talked there is no real difference between metadata and
>> MARK.
>>>>> What do you think?
>>>>
>>>> It sounds like a trick :) Negative support is *not* a support in
>>>> fact. DPDK policy requires support of a feature in a PMD and in-tree
>>>> application. Of course, it is not a problem to add meta. It is really
>>>> easy to do. I just don't want to add it in
>>>> v5 to be deleted in v6 because of my above concerns.
>>>>
>>> This was not a trick. I understand what you are saying.
>>> if we say that metadata is the same as mark, (I think we all agree on
>>> it) and that application need to notify pmd about such operations, I
>>> assume it will try to see how to request the metadata.
>>
>> Frankly speaking I feel sick when I think about META and MARK together. Do
>> we really need both in DPDK?
>>
> I realy don't want you the be sick,
> The resoun that we need both of them is that 32 in Nvidia it is only 24 bits of mark is not
> enough, so there is a need for more bits.
> I think that in the end we will go to something much more generic that the application
> will just say how many bits it wants to get and this what he will get.
> for example the application may say it needs 128 bits and it will register this size to the mbuf
> or give in the mbuf pointer two where those values should be set.
> In any case as you can see we have already to many changes in rte_flow in this release and the
> next one, but I'm planning to push this feature in the future
> what do you think of such a feature?

I agree that there are really many changes in flow API which
are on review in the release cycle.
I hope the above idea will allow to merge MARK and META.

Could you take a look at v4 sent by Ivan yesterday and
summarize current status of the review.
Which points are still unclear and must be improved?
What is desirable to improve from your point of view?

Andrew.
  
Thomas Monjalon Oct. 6, 2021, 8:30 a.m. UTC | #27
05/10/2021 13:11, Andrew Rybchenko:
> On 10/5/21 1:10 PM, Ori Kam wrote:
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> On 10/5/21 12:41 PM, Ori Kam wrote:
> >>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> On 10/5/21 11:17 AM, Ori Kam wrote:
> >>>>> One more thing, I think this flag should be added now since you need
> >>>>> it, I think you should report that you don't support it.
> >>>>> since just like we talked there is no real difference between metadata and
> >> MARK.
> >>>>> What do you think?
> >>>>
> >>>> It sounds like a trick :) Negative support is *not* a support in
> >>>> fact. DPDK policy requires support of a feature in a PMD and in-tree
> >>>> application. Of course, it is not a problem to add meta. It is really
> >>>> easy to do. I just don't want to add it in
> >>>> v5 to be deleted in v6 because of my above concerns.
> >>>>
> >>> This was not a trick. I understand what you are saying.
> >>> if we say that metadata is the same as mark, (I think we all agree on
> >>> it) and that application need to notify pmd about such operations, I
> >>> assume it will try to see how to request the metadata.
> >>
> >> Frankly speaking I feel sick when I think about META and MARK together. Do
> >> we really need both in DPDK?
> >>
> > I realy don't want you the be sick,
> > The resoun that we need both of them is that 32 in Nvidia it is only 24 bits of mark is not
> > enough, so there is a need for more bits.
> > I think that in the end we will go to something much more generic that the application
> > will just say how many bits it wants to get and this what he will get.
> > for example the application may say it needs 128 bits and it will register this size to the mbuf
> > or give in the mbuf pointer two where those values should be set.
> > In any case as you can see we have already to many changes in rte_flow in this release and the
> > next one, but I'm planning to push this feature in the future
> > what do you think of such a feature?
> 
> I agree that there are really many changes in flow API which
> are on review in the release cycle.
> I hope the above idea will allow to merge MARK and META.

I agree we should merge mark and meta in a common dynamic mbuf field.
What do we need in mark which is not in meta?
I think dynamic mbuf field of meta is the way to go but I prefer the name "mark" :)
  
Andrew Rybchenko Oct. 6, 2021, 8:38 a.m. UTC | #28
On 10/6/21 11:30 AM, Thomas Monjalon wrote:
> 05/10/2021 13:11, Andrew Rybchenko:
>> On 10/5/21 1:10 PM, Ori Kam wrote:
>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> On 10/5/21 12:41 PM, Ori Kam wrote:
>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> On 10/5/21 11:17 AM, Ori Kam wrote:
>>>>>>> One more thing, I think this flag should be added now since you need
>>>>>>> it, I think you should report that you don't support it.
>>>>>>> since just like we talked there is no real difference between metadata and
>>>> MARK.
>>>>>>> What do you think?
>>>>>>
>>>>>> It sounds like a trick :) Negative support is *not* a support in
>>>>>> fact. DPDK policy requires support of a feature in a PMD and in-tree
>>>>>> application. Of course, it is not a problem to add meta. It is really
>>>>>> easy to do. I just don't want to add it in
>>>>>> v5 to be deleted in v6 because of my above concerns.
>>>>>>
>>>>> This was not a trick. I understand what you are saying.
>>>>> if we say that metadata is the same as mark, (I think we all agree on
>>>>> it) and that application need to notify pmd about such operations, I
>>>>> assume it will try to see how to request the metadata.
>>>>
>>>> Frankly speaking I feel sick when I think about META and MARK together. Do
>>>> we really need both in DPDK?
>>>>
>>> I realy don't want you the be sick,
>>> The resoun that we need both of them is that 32 in Nvidia it is only 24 bits of mark is not
>>> enough, so there is a need for more bits.
>>> I think that in the end we will go to something much more generic that the application
>>> will just say how many bits it wants to get and this what he will get.
>>> for example the application may say it needs 128 bits and it will register this size to the mbuf
>>> or give in the mbuf pointer two where those values should be set.
>>> In any case as you can see we have already to many changes in rte_flow in this release and the
>>> next one, but I'm planning to push this feature in the future
>>> what do you think of such a feature?
>>
>> I agree that there are really many changes in flow API which
>> are on review in the release cycle.
>> I hope the above idea will allow to merge MARK and META.
> 
> I agree we should merge mark and meta in a common dynamic mbuf field.
> What do we need in mark which is not in meta?
> I think dynamic mbuf field of meta is the way to go but I prefer the name "mark" :)
> 

+1 but I don't have answer to the question
  
Ori Kam Oct. 6, 2021, 9:14 a.m. UTC | #29
Hi

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Wednesday, October 6, 2021 11:38 AM
> data
> 
> On 10/6/21 11:30 AM, Thomas Monjalon wrote:
> > 05/10/2021 13:11, Andrew Rybchenko:
> >> On 10/5/21 1:10 PM, Ori Kam wrote:
> >>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> On 10/5/21 12:41 PM, Ori Kam wrote:
> >>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> On 10/5/21 11:17 AM, Ori Kam wrote:
> >>>>>>> One more thing, I think this flag should be added now since you
> >>>>>>> need it, I think you should report that you don't support it.
> >>>>>>> since just like we talked there is no real difference between
> >>>>>>> metadata and
> >>>> MARK.
> >>>>>>> What do you think?
> >>>>>>
> >>>>>> It sounds like a trick :) Negative support is *not* a support in
> >>>>>> fact. DPDK policy requires support of a feature in a PMD and
> >>>>>> in-tree application. Of course, it is not a problem to add meta.
> >>>>>> It is really easy to do. I just don't want to add it in
> >>>>>> v5 to be deleted in v6 because of my above concerns.
> >>>>>>
> >>>>> This was not a trick. I understand what you are saying.
> >>>>> if we say that metadata is the same as mark, (I think we all agree
> >>>>> on
> >>>>> it) and that application need to notify pmd about such operations,
> >>>>> I assume it will try to see how to request the metadata.
> >>>>
> >>>> Frankly speaking I feel sick when I think about META and MARK
> >>>> together. Do we really need both in DPDK?
> >>>>
> >>> I realy don't want you the be sick,
> >>> The resoun that we need both of them is that 32 in Nvidia it is only
> >>> 24 bits of mark is not enough, so there is a need for more bits.
> >>> I think that in the end we will go to something much more generic
> >>> that the application will just say how many bits it wants to get and this
> what he will get.
> >>> for example the application may say it needs 128 bits and it will
> >>> register this size to the mbuf or give in the mbuf pointer two where those
> values should be set.
> >>> In any case as you can see we have already to many changes in
> >>> rte_flow in this release and the next one, but I'm planning to push
> >>> this feature in the future what do you think of such a feature?
> >>
> >> I agree that there are really many changes in flow API which are on
> >> review in the release cycle.
> >> I hope the above idea will allow to merge MARK and META.
> >
> > I agree we should merge mark and meta in a common dynamic mbuf field.
> > What do we need in mark which is not in meta?
> > I think dynamic mbuf field of meta is the way to go but I prefer the
> > name "mark" :)
> >
> 
> +1 but I don't have answer to the question

We have MARK, FLAG, and META.
MARK and FLAG are the same just one of them give predefined value.
we should merged those two for sure.
META allows the application to get more bits from the HW.
Like I said above I think we should merge everything.
but this is a talk for a different thread, and different time.

Ori
  

Patch

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index 9be8edc31d..48eafffb1d 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -1760,6 +1760,27 @@  init_port(void)
 		rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
 
 	for (port_id = 0; port_id < nr_ports; port_id++) {
+		uint64_t rx_meta_features = 0;
+
+		rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
+		rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
+
+		ret = rte_eth_rx_meta_negotiate(port_id, &rx_meta_features);
+		if (ret == 0) {
+			if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
+				printf(":: flow action FLAG will not affect Rx mbufs on port=%u\n",
+				       port_id);
+			}
+
+			if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
+				printf(":: flow action MARK will not affect Rx mbufs on port=%u\n",
+				       port_id);
+			}
+		} else if (ret != -ENOTSUP) {
+			rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta features on port=%u: %s\n",
+				 port_id, rte_strerror(-ret));
+		}
+
 		ret = rte_eth_dev_info_get(port_id, &dev_info);
 		if (ret != 0)
 			rte_exit(EXIT_FAILURE,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 97ae52e17e..7a8da3d7ab 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1485,10 +1485,36 @@  static void
 init_config_port_offloads(portid_t pid, uint32_t socket_id)
 {
 	struct rte_port *port = &ports[pid];
+	uint64_t rx_meta_features = 0;
 	uint16_t data_size;
 	int ret;
 	int i;
 
+	rx_meta_features |= RTE_ETH_RX_META_USER_FLAG;
+	rx_meta_features |= RTE_ETH_RX_META_USER_MARK;
+	rx_meta_features |= RTE_ETH_RX_META_TUNNEL_ID;
+
+	ret = rte_eth_rx_meta_negotiate(pid, &rx_meta_features);
+	if (ret == 0) {
+		if (!(rx_meta_features & RTE_ETH_RX_META_USER_FLAG)) {
+			TESTPMD_LOG(INFO, "Flow action FLAG will not affect Rx mbufs on port %u\n",
+				    pid);
+		}
+
+		if (!(rx_meta_features & RTE_ETH_RX_META_USER_MARK)) {
+			TESTPMD_LOG(INFO, "Flow action MARK will not affect Rx mbufs on port %u\n",
+				    pid);
+		}
+
+		if (!(rx_meta_features & RTE_ETH_RX_META_TUNNEL_ID)) {
+			TESTPMD_LOG(INFO, "Flow tunnel offload support might be limited or unavailable on port %u\n",
+				    pid);
+		}
+	} else if (ret != -ENOTSUP) {
+		rte_exit(EXIT_FAILURE, "Error when negotiating Rx meta features on port %u: %s\n",
+			 pid, rte_strerror(-ret));
+	}
+
 	port->dev_conf.txmode = tx_mode;
 	port->dev_conf.rxmode = rx_mode;
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 19356ac53c..6674d4474c 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -106,6 +106,15 @@  New Features
   Added command-line options to specify total number of processes and
   current process ID. Each process owns subset of Rx and Tx queues.
 
+* **Added an API to negotiate delivery of specific parts of Rx meta data**
+
+  A new API, ``rte_eth_rx_meta_negotiate()``, was added.
+  The following parts of Rx meta data were defined:
+
+  * ``RTE_ETH_RX_META_USER_FLAG``
+  * ``RTE_ETH_RX_META_USER_MARK``
+  * ``RTE_ETH_RX_META_TUNNEL_ID``
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 40e474aa7e..96e0c60cae 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -789,6 +789,22 @@  typedef int (*eth_get_monitor_addr_t)(void *rxq,
 typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
 	struct rte_eth_representor_info *info);
 
+/**
+ * @internal
+ * Negotiate delivery of specific parts of Rx meta data.
+ *
+ * @param dev
+ *   Port (ethdev) handle
+ *
+ * @param[inout] features
+ *   Feature selection buffer
+ *
+ * @return
+ *   Negative errno value on error, zero otherwise
+ */
+typedef int (*eth_rx_meta_negotiate_t)(struct rte_eth_dev *dev,
+				       uint64_t *features);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -949,6 +965,9 @@  struct eth_dev_ops {
 
 	eth_representor_info_get_t representor_info_get;
 	/**< Get representor info. */
+
+	eth_rx_meta_negotiate_t rx_meta_negotiate;
+	/**< Negotiate delivery of specific parts of Rx meta data. */
 };
 
 /**
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index daf5ca9242..49cb84d64c 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6310,6 +6310,31 @@  rte_eth_representor_info_get(uint16_t port_id,
 	return eth_err(port_id, (*dev->dev_ops->representor_info_get)(dev, info));
 }
 
+int
+rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (dev->data->dev_configured != 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"The port (id=%"PRIu16") is already configured\n",
+			port_id);
+		return -EBUSY;
+	}
+
+	if (features == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid features (NULL)\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_meta_negotiate, -ENOTSUP);
+	return eth_err(port_id,
+		       (*dev->dev_ops->rx_meta_negotiate)(dev, features));
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 1da37896d8..8467a7a362 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4888,6 +4888,51 @@  __rte_experimental
 int rte_eth_representor_info_get(uint16_t port_id,
 				 struct rte_eth_representor_info *info);
 
+/** The ethdev sees flagged packets if there are flows with action FLAG. */
+#define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
+
+/** The ethdev sees mark IDs in packets if there are flows with action MARK. */
+#define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1)
+
+/** The ethdev detects missed packets if there are "tunnel_set" flows in use. */
+#define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Negotiate delivery of specific parts of Rx meta data.
+ *
+ * Invoke this API before the first rte_eth_dev_configure() invocation
+ * to let the PMD make preparations that are inconvenient to do later.
+ *
+ * The negotiation process is as follows:
+ *
+ * - the application requests features intending to use at least some of them;
+ * - the PMD responds with the guaranteed subset of the requested feature set;
+ * - the application can retry negotiation with another set of features;
+ * - the application can pass zero to clear the negotiation result;
+ * - the last negotiated result takes effect upon the ethdev start.
+ *
+ * If this API is unsupported, the application should gracefully ignore that.
+ *
+ * @param port_id
+ *   Port (ethdev) identifier
+ *
+ * @param[inout] features
+ *   Feature selection buffer
+ *
+ * @return
+ *   - (-EBUSY) if the port can't handle this in its current state;
+ *   - (-ENOTSUP) if the method itself is not supported by the PMD;
+ *   - (-ENODEV) if *port_id* is invalid;
+ *   - (-EINVAL) if *features* is NULL;
+ *   - (-EIO) if the device is removed;
+ *   - (0) on success
+ */
+__rte_experimental
+int rte_eth_rx_meta_negotiate(uint16_t port_id, uint64_t *features);
+
 #include <rte_ethdev_core.h>
 
 /**
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 70f455d47d..6eb7ec0574 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -1904,6 +1904,10 @@  enum rte_flow_action_type {
 	 * PKT_RX_FDIR_ID mbuf flags.
 	 *
 	 * See struct rte_flow_action_mark.
+	 *
+	 * One should negotiate delivery of mark IDs beforehand.
+	 * @see rte_eth_rx_meta_negotiate()
+	 * @see RTE_ETH_RX_META_USER_MARK
 	 */
 	RTE_FLOW_ACTION_TYPE_MARK,
 
@@ -1912,6 +1916,10 @@  enum rte_flow_action_type {
 	 * sets the PKT_RX_FDIR mbuf flag.
 	 *
 	 * No associated configuration structure.
+	 *
+	 * One should negotiate flag delivery beforehand.
+	 * @see rte_eth_rx_meta_negotiate()
+	 * @see RTE_ETH_RX_META_USER_FLAG
 	 */
 	RTE_FLOW_ACTION_TYPE_FLAG,
 
@@ -4223,6 +4231,10 @@  rte_flow_tunnel_match(uint16_t port_id,
 /**
  * Populate the current packet processing state, if exists, for the given mbuf.
  *
+ * One should negotiate the processing state information delivery beforehand.
+ * @see rte_eth_rx_meta_negotiate()
+ * @see RTE_ETH_RX_META_TUNNEL_ID
+ *
  * @param port_id
  *   Port identifier of Ethernet device.
  * @param[in] m
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 904bce6ea1..a8928266a9 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -247,6 +247,9 @@  EXPERIMENTAL {
 	rte_mtr_meter_policy_delete;
 	rte_mtr_meter_policy_update;
 	rte_mtr_meter_policy_validate;
+
+	# added in 21.11
+	rte_eth_rx_meta_negotiate;
 };
 
 INTERNAL {