ethdev: support queue-based priority flow control
Checks
Commit Message
From: Jerin Jacob <jerinj@marvell.com>
Based on device support and use-case need, there are two different ways
to enable PFC. The first case is the port level PFC configuration, in
this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
configure the PFC, and PFC frames will be generated using based on VLAN
TC value.
The second case is the queue level PFC configuration, in this
case, Any packet field content can be used to steer the packet to the
specific queue using rte_flow or RSS and then use
rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping on each
queue. Based on congestion selected on the specific queue, configured TC
shall be used to generate PFC frames.
Operation of these modes are mutually exclusive, when driver sets
non zero value for rte_eth_dev_info::pfc_queue_tc_max,
application must use queue level PFC configuration via
rte_eth_dev_priority_flow_ctrl_queue_set() API instead of port level
PFC configuration via rte_eth_dev_priority_flow_ctrl_set() API to
realize PFC configuration.
This patch enables the configuration for second case a.k.a queue
based PFC also updates rte_eth_dev_priority_flow_ctrl_set()
implmentaion to adheher to rte_eth_dev_info::pfc_queue_tc_max
handling.
Also updated libabigail.abignore to ignore the update
to reserved fields in rte_eth_dev_info.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
devtools/libabigail.abignore | 6 ++
doc/guides/nics/features.rst | 5 +-
doc/guides/rel_notes/release_22_03.rst | 4 ++
lib/ethdev/ethdev_driver.h | 6 +-
lib/ethdev/rte_ethdev.c | 73 ++++++++++++++++++++++++
lib/ethdev/rte_ethdev.h | 77 +++++++++++++++++++++++++-
lib/ethdev/version.map | 3 +
7 files changed, 168 insertions(+), 6 deletions(-)
Comments
On Sat, 4 Dec 2021 22:54:58 +0530
<jerinj@marvell.com> wrote:
> + /**
> + * Maximum supported traffic class as per PFC (802.1Qbb) specification.
> + *
> + * Based on device support and use-case need, there are two different
> + * ways to enable PFC. The first case is the port level PFC
> + * configuration, in this case, rte_eth_dev_priority_flow_ctrl_set()
> + * API shall be used to configure the PFC, and PFC frames will be
> + * generated using based on VLAN TC value.
> + * The second case is the queue level PFC configuration, in this case,
> + * Any packet field content can be used to steer the packet to the
> + * specific queue using rte_flow or RSS and then use
> + * rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping
> + * on each queue. Based on congestion selected on the specific queue,
> + * configured TC shall be used to generate PFC frames.
> + *
> + * When set to non zero value, application must use queue level
> + * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
> + * instead of port level PFC configuration via
> + * rte_eth_dev_priority_flow_ctrl_set() API to realize
> + * PFC configuration.
> + */
> + uint8_t pfc_queue_tc_max;
> + uint8_t reserved_8s[7];
> + uint64_t reserved_64s[1]; /**< Reserved for future fields */
> void *reserved_ptrs[2]; /**< Reserved for future fields */
Not sure you can claim ABI compatibility because the previous versions of DPDK
did not enforce that reserved fields must be zero. The Linux kernel
learned this when adding flags for new system calls; reserved fields only
work if you enforce that application must set them to zero.
On Sat, Dec 4, 2021 at 11:08 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Sat, 4 Dec 2021 22:54:58 +0530
> <jerinj@marvell.com> wrote:
>
> > + /**
> > + * Maximum supported traffic class as per PFC (802.1Qbb) specification.
> > + *
> > + * Based on device support and use-case need, there are two different
> > + * ways to enable PFC. The first case is the port level PFC
> > + * configuration, in this case, rte_eth_dev_priority_flow_ctrl_set()
> > + * API shall be used to configure the PFC, and PFC frames will be
> > + * generated using based on VLAN TC value.
> > + * The second case is the queue level PFC configuration, in this case,
> > + * Any packet field content can be used to steer the packet to the
> > + * specific queue using rte_flow or RSS and then use
> > + * rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping
> > + * on each queue. Based on congestion selected on the specific queue,
> > + * configured TC shall be used to generate PFC frames.
> > + *
> > + * When set to non zero value, application must use queue level
> > + * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
> > + * instead of port level PFC configuration via
> > + * rte_eth_dev_priority_flow_ctrl_set() API to realize
> > + * PFC configuration.
> > + */
> > + uint8_t pfc_queue_tc_max;
> > + uint8_t reserved_8s[7];
> > + uint64_t reserved_64s[1]; /**< Reserved for future fields */
> > void *reserved_ptrs[2]; /**< Reserved for future fields */
>
> Not sure you can claim ABI compatibility because the previous versions of DPDK
> did not enforce that reserved fields must be zero. The Linux kernel
> learned this when adding flags for new system calls; reserved fields only
> work if you enforce that application must set them to zero.
In this case it rte_eth_dev_info is an out parameter and implementation of
rte_eth_dev_info_get() already memseting to 0.
Do you still see any other ABI issue?
See rte_eth_dev_info_get()
/*
* Init dev_info before port_id check since caller does not have
* return status and does not know if get is successful or not.
*/
memset(dev_info, 0, sizeof(struct rte_eth_dev_info));
On Sun, 5 Dec 2021 12:33:57 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:
> On Sat, Dec 4, 2021 at 11:08 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Sat, 4 Dec 2021 22:54:58 +0530
> > <jerinj@marvell.com> wrote:
> >
> > > + /**
> > > + * Maximum supported traffic class as per PFC (802.1Qbb) specification.
> > > + *
> > > + * Based on device support and use-case need, there are two different
> > > + * ways to enable PFC. The first case is the port level PFC
> > > + * configuration, in this case, rte_eth_dev_priority_flow_ctrl_set()
> > > + * API shall be used to configure the PFC, and PFC frames will be
> > > + * generated using based on VLAN TC value.
> > > + * The second case is the queue level PFC configuration, in this case,
> > > + * Any packet field content can be used to steer the packet to the
> > > + * specific queue using rte_flow or RSS and then use
> > > + * rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping
> > > + * on each queue. Based on congestion selected on the specific queue,
> > > + * configured TC shall be used to generate PFC frames.
> > > + *
> > > + * When set to non zero value, application must use queue level
> > > + * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
> > > + * instead of port level PFC configuration via
> > > + * rte_eth_dev_priority_flow_ctrl_set() API to realize
> > > + * PFC configuration.
> > > + */
> > > + uint8_t pfc_queue_tc_max;
> > > + uint8_t reserved_8s[7];
> > > + uint64_t reserved_64s[1]; /**< Reserved for future fields */
> > > void *reserved_ptrs[2]; /**< Reserved for future fields */
> >
> > Not sure you can claim ABI compatibility because the previous versions of DPDK
> > did not enforce that reserved fields must be zero. The Linux kernel
> > learned this when adding flags for new system calls; reserved fields only
> > work if you enforce that application must set them to zero.
>
> In this case it rte_eth_dev_info is an out parameter and implementation of
> rte_eth_dev_info_get() already memseting to 0.
> Do you still see any other ABI issue?
>
> See rte_eth_dev_info_get()
> /*
> * Init dev_info before port_id check since caller does not have
> * return status and does not know if get is successful or not.
> */
> memset(dev_info, 0, sizeof(struct rte_eth_dev_info));
The concern was from the misreading comment. It talks about what application should do.
Could you reword the comment so that it describes what pfc_queue_tc_max is here
and move the flow control set part of the comment to where the API for that is.
On Sun, Dec 5, 2021 at 11:30 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Sun, 5 Dec 2021 12:33:57 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> > On Sat, Dec 4, 2021 at 11:08 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Sat, 4 Dec 2021 22:54:58 +0530
> > > <jerinj@marvell.com> wrote:
> > >
> > > > + /**
> > > > + * Maximum supported traffic class as per PFC (802.1Qbb) specification.
> > > > + *
> > > > + * Based on device support and use-case need, there are two different
> > > > + * ways to enable PFC. The first case is the port level PFC
> > > > + * configuration, in this case, rte_eth_dev_priority_flow_ctrl_set()
> > > > + * API shall be used to configure the PFC, and PFC frames will be
> > > > + * generated using based on VLAN TC value.
> > > > + * The second case is the queue level PFC configuration, in this case,
> > > > + * Any packet field content can be used to steer the packet to the
> > > > + * specific queue using rte_flow or RSS and then use
> > > > + * rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping
> > > > + * on each queue. Based on congestion selected on the specific queue,
> > > > + * configured TC shall be used to generate PFC frames.
> > > > + *
> > > > + * When set to non zero value, application must use queue level
> > > > + * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
> > > > + * instead of port level PFC configuration via
> > > > + * rte_eth_dev_priority_flow_ctrl_set() API to realize
> > > > + * PFC configuration.
> > > > + */
> > > > + uint8_t pfc_queue_tc_max;
> > > > + uint8_t reserved_8s[7];
> > > > + uint64_t reserved_64s[1]; /**< Reserved for future fields */
> > > > void *reserved_ptrs[2]; /**< Reserved for future fields */
> > >
> > > Not sure you can claim ABI compatibility because the previous versions of DPDK
> > > did not enforce that reserved fields must be zero. The Linux kernel
> > > learned this when adding flags for new system calls; reserved fields only
> > > work if you enforce that application must set them to zero.
> >
> > In this case it rte_eth_dev_info is an out parameter and implementation of
> > rte_eth_dev_info_get() already memseting to 0.
> > Do you still see any other ABI issue?
> >
> > See rte_eth_dev_info_get()
> > /*
> > * Init dev_info before port_id check since caller does not have
> > * return status and does not know if get is successful or not.
> > */
> > memset(dev_info, 0, sizeof(struct rte_eth_dev_info));
>
> The concern was from the misreading comment. It talks about what application should do.
> Could you reword the comment so that it describes what pfc_queue_tc_max is here
The comment is at rte_eth_dev_info::pfc_queue_tc_max. So it is implied
that get pararamter.
current comment
---
+ * When set to non zero value, application must use queue level
+ * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
+ * instead of port level PFC configuration via
+ * rte_eth_dev_priority_flow_ctrl_set() API to realize
+ * PFC configuration.
---
Is updating to following help to clarify. If so, I will send v2, if
not, Please suggest.
---
+ * When set to non zero value by the driver, application must use queue level
^^^^^^^^^^^
+ * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
+ * instead of port level PFC configuration via
+ * rte_eth_dev_priority_flow_ctrl_set() API to realize
+ * PFC configuration.
---
> and move the flow control set part of the comment to where the API for that is.
The comment is needed for rte_eth_dev_priority_flow_ctrl_set() and
rte_eth_dev_priority_flow_ctrl_queue_set().
Instead of duplicating the comments, I added the comment at
rte_eth_dev_info::pfc_queue_tc_max and
added "@see struct rte_eth_dev_info::pfc_queue_tc_max priority flow
control usage models"
in rte_eth_dev_priority_flow_ctrl_set() and
rte_eth_dev_priority_flow_ctrl_queue_set().
@@ -11,3 +11,9 @@
; Ignore generated PMD information strings
[suppress_variable]
name_regexp = _pmd_info$
+
+; Ignore fields inserted in place of reserved fields of rte_eth_dev_info
+[suppress_type]
+ name = rte_eth_dev_info
+ has_data_member_inserted_between = {offset_after(switch_info), end}
+
@@ -379,9 +379,10 @@ Flow control
Supports configuring link flow control.
* **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``,
- ``priority_flow_ctrl_set``.
+ ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_set_t``.
* **[related] API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``,
- ``rte_eth_dev_priority_flow_ctrl_set()``.
+ ``rte_eth_dev_priority_flow_ctrl_set()``, ``rte_eth_dev_priority_flow_ctrl_queue_set()``.
+* **[provides] rte_eth_dev_info**: ``pfc_queue_tc_max``.
.. _nic_features_rate_limitation:
@@ -55,6 +55,10 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added an API to enable queue based priority flow ctrl(PFC).**
+
+ A new API, ``rte_eth_dev_priority_flow_ctrl_queue_set()``, was added.
+
Removed Items
-------------
@@ -532,6 +532,9 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
/** @internal Setup priority flow control parameter on an Ethernet device. */
typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
struct rte_eth_pfc_conf *pfc_conf);
+/** @internal Queue setup for priority flow control parameter on an Ethernet device. */
+typedef int (*priority_flow_ctrl_queue_set_t)(struct rte_eth_dev *dev,
+ uint16_t queue_id, struct rte_eth_pfc_queue_conf *pfc_queue_conf);
/** @internal Update RSS redirection table on an Ethernet device. */
typedef int (*reta_update_t)(struct rte_eth_dev *dev,
@@ -1080,7 +1083,8 @@ struct eth_dev_ops {
flow_ctrl_set_t flow_ctrl_set; /**< Setup flow control */
/** Setup priority flow control */
priority_flow_ctrl_set_t priority_flow_ctrl_set;
-
+ /** Priority flow control queue setup */
+ priority_flow_ctrl_queue_set_t priority_flow_ctrl_queue_set;
/** Set Unicast Table Array */
eth_uc_hash_table_set_t uc_hash_table_set;
/** Set Unicast hash bitmap */
@@ -3998,7 +3998,9 @@ int
rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
struct rte_eth_pfc_conf *pfc_conf)
{
+ struct rte_eth_dev_info dev_info;
struct rte_eth_dev *dev;
+ int ret;
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
dev = &rte_eth_devices[port_id];
@@ -4010,6 +4012,17 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
return -EINVAL;
}
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ if (ret != 0)
+ return ret;
+
+ if (dev_info.pfc_queue_tc_max != 0) {
+ RTE_ETHDEV_LOG(ERR,
+ "Ethdev port %u driver does not support port level PFC config\n",
+ port_id);
+ return -ENOTSUP;
+ }
+
if (pfc_conf->priority > (RTE_ETH_DCB_NUM_USER_PRIORITIES - 1)) {
RTE_ETHDEV_LOG(ERR, "Invalid priority, only 0-7 allowed\n");
return -EINVAL;
@@ -4022,6 +4035,66 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
return -ENOTSUP;
}
+int
+rte_eth_dev_priority_flow_ctrl_queue_set(
+ uint16_t port_id, uint16_t queue_id,
+ struct rte_eth_pfc_queue_conf *pfc_queue_conf)
+{
+ struct rte_eth_dev_info dev_info;
+ struct rte_eth_dev *dev;
+ int ret;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+
+ if (pfc_queue_conf == NULL) {
+ RTE_ETHDEV_LOG(ERR,
+ "Cannot set ethdev port %u queue %d PFC from NULL config\n",
+ port_id, queue_id);
+ return -EINVAL;
+ }
+
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ if (ret != 0)
+ return ret;
+
+ if (dev_info.pfc_queue_tc_max == 0) {
+ RTE_ETHDEV_LOG(ERR,
+ "Ethdev port %u driver does not support PFC TC values\n",
+ port_id);
+ return -ENOTSUP;
+ }
+
+ if (pfc_queue_conf->mode != RTE_ETH_FC_TX_PAUSE &&
+ queue_id >= dev_info.nb_rx_queues) {
+ RTE_ETHDEV_LOG(ERR,
+ "PFC Rx queue not in range(requested: %d configured: %d)\n",
+ queue_id, dev_info.nb_rx_queues);
+ return -EINVAL;
+ }
+
+ if (pfc_queue_conf->mode != RTE_ETH_FC_RX_PAUSE &&
+ queue_id >= dev_info.nb_tx_queues) {
+ RTE_ETHDEV_LOG(ERR,
+ "PFC Tx queue not in range(requested: %d configured: %d)\n",
+ queue_id, dev_info.nb_tx_queues);
+ return -EINVAL;
+ }
+
+ if (pfc_queue_conf->tc >= dev_info.pfc_queue_tc_max) {
+ RTE_ETHDEV_LOG(ERR,
+ "PFC TC not in range(requested: %d max: %d)\n",
+ pfc_queue_conf->tc, dev_info.pfc_queue_tc_max);
+ return -EINVAL;
+ }
+
+ if (*dev->dev_ops->priority_flow_ctrl_queue_set)
+ return eth_err(port_id,
+ (*dev->dev_ops->priority_flow_ctrl_queue_set)(
+ dev, queue_id, pfc_queue_conf));
+ return -ENOTSUP;
+}
+
static int
eth_check_reta_mask(struct rte_eth_rss_reta_entry64 *reta_conf,
uint16_t reta_size)
@@ -1395,6 +1395,19 @@ struct rte_eth_pfc_conf {
uint8_t priority; /**< VLAN User Priority. */
};
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure Ethernet priority flow control parameter for
+ * ethdev queues.
+ */
+struct rte_eth_pfc_queue_conf {
+ uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) specification */
+ uint16_t pause_time; /**< Pause quota in the Pause frame */
+ enum rte_eth_fc_mode mode; /**< Link flow control mode */
+};
+
/**
* Tunnel type for device-specific classifier configuration.
* @see rte_eth_udp_tunnel
@@ -1841,8 +1854,30 @@ struct rte_eth_dev_info {
* embedded managed interconnect/switch.
*/
struct rte_eth_switch_info switch_info;
-
- uint64_t reserved_64s[2]; /**< Reserved for future fields */
+ /**
+ * Maximum supported traffic class as per PFC (802.1Qbb) specification.
+ *
+ * Based on device support and use-case need, there are two different
+ * ways to enable PFC. The first case is the port level PFC
+ * configuration, in this case, rte_eth_dev_priority_flow_ctrl_set()
+ * API shall be used to configure the PFC, and PFC frames will be
+ * generated using based on VLAN TC value.
+ * The second case is the queue level PFC configuration, in this case,
+ * Any packet field content can be used to steer the packet to the
+ * specific queue using rte_flow or RSS and then use
+ * rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping
+ * on each queue. Based on congestion selected on the specific queue,
+ * configured TC shall be used to generate PFC frames.
+ *
+ * When set to non zero value, application must use queue level
+ * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
+ * instead of port level PFC configuration via
+ * rte_eth_dev_priority_flow_ctrl_set() API to realize
+ * PFC configuration.
+ */
+ uint8_t pfc_queue_tc_max;
+ uint8_t reserved_8s[7];
+ uint64_t reserved_64s[1]; /**< Reserved for future fields */
void *reserved_ptrs[2]; /**< Reserved for future fields */
};
@@ -4109,6 +4144,9 @@ int rte_eth_dev_flow_ctrl_set(uint16_t port_id,
* Configure the Ethernet priority flow control under DCB environment
* for Ethernet device.
*
+ * @see struct rte_eth_dev_info::pfc_queue_tc_max priority
+ * flow control usage models.
+ *
* @param port_id
* The port identifier of the Ethernet device.
* @param pfc_conf
@@ -4119,10 +4157,43 @@ int rte_eth_dev_flow_ctrl_set(uint16_t port_id,
* - (-ENODEV) if *port_id* invalid.
* - (-EINVAL) if bad parameter
* - (-EIO) if flow control setup failure or device is removed.
+ *
*/
int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
- struct rte_eth_pfc_conf *pfc_conf);
+ struct rte_eth_pfc_conf *pfc_conf);
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Configure the Ethernet priority flow control for a given queue
+ * for Ethernet device.
+ *
+ * @see struct rte_eth_dev_info::pfc_queue_tc_max priority flow control
+ * usage models.
+ *
+ * @note When an ethdev port switches to PFC mode, the unconfigured
+ * queues shall be configured by the driver with default values such as
+ * lower priority value for TC etc.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param queue_id
+ * The Rx/Tx queue to apply the PFC configuration.
+ * @note pfc_queue_conf::mode depicts the queue direction(Rx and/or Tx)
+ * @param pfc_queue_conf
+ * The pointer to the structure of the priority flow control parameters
+ * for the queue.
+ * @return
+ * - (0) if successful.
+ * - (-ENOTSUP) if hardware doesn't support priority flow control mode.
+ * - (-ENODEV) if *port_id* invalid.
+ * - (-EINVAL) if bad parameter
+ * - (-EIO) if flow control setup queue failure
+ */
+__rte_experimental
+int rte_eth_dev_priority_flow_ctrl_queue_set(uint16_t port_id, uint16_t queue_id,
+ struct rte_eth_pfc_queue_conf *pfc_queue_conf);
/**
* Add a MAC address to the set used for filtering incoming packets.
*
@@ -256,6 +256,9 @@ EXPERIMENTAL {
rte_flow_flex_item_create;
rte_flow_flex_item_release;
rte_flow_pick_transfer_proxy;
+
+ # added in 22.03
+ rte_eth_dev_priority_flow_ctrl_queue_set;
};
INTERNAL {