[1/4] ethdev: add hairpin bind and unbind APIs

Message ID 1601511962-21532-2-git-send-email-bingz@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series introduce support for hairpin between two ports |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Bing Zhao Oct. 1, 2020, 12:25 a.m. UTC
  In single port hairpin mode, all the hairpin TX and RX queues belong
to the same device. After the queues are set up properly, there is
no other dependency between the TX queue and its RX peer queue. The
binding process that connected the TX and RX queues together from
hardware level will be done automatically during the device start
procedure. Everything required is configured and initialized already
for the binding process.

But in two ports hairpin mode, there will be some cross-dependences
between two different ports. Usually, the ports will be initialized
serially by the main thread but not in parallel. The earlier port
will not be able to enable the bind if the following peer port is
not yet configured with HW resources. What's more, if one port is
detached / attached dynamically, it would introduce more trouble
for the hairpin binding.

To overcome these, new APIs for binding and unbinding are added.
During startup, only the hairpin TX and RX peer queues will be set
up. Nothing will be done when starting the device if the queues are
without auto-bind attribute. Only after the required ports pair
started, the `rte_eth_hairpin_bind()` API can be called to bind the
all TX queues of the egress port to the RX queues of the peer port.
Then the connection between the egress and ingress ports pair will
be established.

The `rte_eth_hairpin_unbind()` API could be used to disconnect the
egress and the peer ingress ports. This should only be called before
the device is closed if needed. When doing the clean up, all the
egress and ingress pairs related to a single port should be taken
into consideration.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 lib/librte_ethdev/rte_ethdev.c           | 107 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           |  51 +++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h    |  52 +++++++++++++++
 lib/librte_ethdev/rte_ethdev_version.map |   2 +
 4 files changed, 212 insertions(+)
  

Comments

Ori Kam Oct. 4, 2020, 9:20 a.m. UTC | #1
Hi Bing,

PSB,

Thanks,
Ori
> -----Original Message-----
> From: Bing Zhao <bingz@nvidia.com>
> Sent: Thursday, October 1, 2020 3:26 AM
> Cc: dev@dpdk.org
> Subject: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs
> 
> In single port hairpin mode, all the hairpin TX and RX queues belong
> to the same device. After the queues are set up properly, there is
> no other dependency between the TX queue and its RX peer queue. The
> binding process that connected the TX and RX queues together from
> hardware level will be done automatically during the device start
> procedure. Everything required is configured and initialized already
> for the binding process.
> 
> But in two ports hairpin mode, there will be some cross-dependences
> between two different ports. Usually, the ports will be initialized
> serially by the main thread but not in parallel. The earlier port
> will not be able to enable the bind if the following peer port is
> not yet configured with HW resources. What's more, if one port is
> detached / attached dynamically, it would introduce more trouble
> for the hairpin binding.
> 
> To overcome these, new APIs for binding and unbinding are added.
> During startup, only the hairpin TX and RX peer queues will be set
> up. Nothing will be done when starting the device if the queues are
> without auto-bind attribute. Only after the required ports pair
> started, the `rte_eth_hairpin_bind()` API can be called to bind the
> all TX queues of the egress port to the RX queues of the peer port.
> Then the connection between the egress and ingress ports pair will
> be established.
> 
> The `rte_eth_hairpin_unbind()` API could be used to disconnect the
> egress and the peer ingress ports. This should only be called before
> the device is closed if needed. When doing the clean up, all the
> egress and ingress pairs related to a single port should be taken
> into consideration.
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c           | 107
> +++++++++++++++++++++++++++++++
>  lib/librte_ethdev/rte_ethdev.h           |  51 +++++++++++++++
>  lib/librte_ethdev/rte_ethdev_driver.h    |  52 +++++++++++++++
>  lib/librte_ethdev/rte_ethdev_version.map |   2 +
>  4 files changed, 212 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index dfe5c1b..72f567b 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -2175,6 +2175,113 @@ rte_eth_tx_hairpin_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
>  	return eth_err(port_id, ret);
>  }
> 
> +int
> +rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev *rdev;
> +	uint16_t p;
> +	uint16_t rp;
> +	int ret = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
> +	dev = &rte_eth_devices[tx_port];
> +	if (!dev->data->dev_started) {
> +		RTE_ETHDEV_LOG(ERR, "TX port %d is not started", tx_port);
> +		return -EBUSY;
> +	}
> +
> +	/*
> +	 * If the all the ports probed belong to two or more separate NICs, it
> +	 * is recommended that each pair is bound independently but not in the
> +	 * loop to bind all ports.
> +	 */

I don't understand your comment. 

> +	if (rx_port == RTE_MAX_ETHPORTS) {

I think maybe this should be done in the tx queue. Since if the bind don't need some port why do
we care if it is started?
So either add a new function to get all peer ports from the tx port, or move this logic to the 
Target PMD.

> +		RTE_ETH_FOREACH_DEV(p) {
> +			rdev = &rte_eth_devices[p];
> +			if (!rdev->data->dev_started) {
> +				RTE_ETHDEV_LOG(ERR,
> +					       "RX port %d is not started", p);
> +				ret = -EBUSY;
> +				goto unbind;
> +			}
> +			ret = (*dev->dev_ops->hairpin_bind)(dev, p);
> +			if (ret) {
> +				RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin
> TX "
> +					       "%d to RX %d", tx_port, p);
> +				goto unbind;
> +			}
> +		}
> +	} else {
> +		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
> +		rdev = &rte_eth_devices[rx_port];
> +		if (!rdev->data->dev_started) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "RX port %d is not started", rx_port);
> +			return -EBUSY;
> +		}
> +		ret = (*dev->dev_ops->hairpin_bind)(dev, rx_port);
> +		if (ret)
> +			RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin TX %d "
> +				       "to RX %d", tx_port, rx_port);
> +	}
> +
> +	return ret;
> +
> +unbind:
> +	/* Roll back the previous binding process. */
> +	RTE_ETH_FOREACH_DEV(rp) {
> +		if (rp < p)
> +			(*dev->dev_ops->hairpin_unbind)(dev, rp);
> +		else
> +			break;
> +	}
> +	return ret;
> +}
> +
> +int
> +rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev *rdev;
> +	uint16_t p;
> +	int ret = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
> +	dev = &rte_eth_devices[tx_port];
> +	if (!dev->data->dev_started) {
> +		RTE_ETHDEV_LOG(ERR, "TX port %d is stopped", tx_port);
> +		return -EBUSY;
> +	}
> +
> +	if (rx_port == RTE_MAX_ETHPORTS) {
> +		RTE_ETH_FOREACH_DEV(p) {
> +			rdev = &rte_eth_devices[p];
> +			if (!rdev->data->dev_started) {

This condition should never be true.
First see my comment above about the list of devices, second port should fail to
stop if it is bounded.

> +				RTE_ETHDEV_LOG(ERR, "RX port %d is
> stopped", p);
> +				ret = -EBUSY;
> +				break;
> +			}
> +			ret = (*dev->dev_ops->hairpin_unbind)(dev, p);
> +			if (ret) {
> +				RTE_ETHDEV_LOG(ERR, "Failed to unbind
> hairpin "
> +					       "TX %d from RX %d", tx_port, p);
> +				break;
> +			}
> +		}
> +	} else {
> +		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
> +		rdev = &rte_eth_devices[rx_port];
> +		if (!rdev->data->dev_started) {
> +			RTE_ETHDEV_LOG(ERR, "RX port %d is stopped",
> rx_port);
> +			return -EBUSY;
> +		}
> +		ret = (*dev->dev_ops->hairpin_unbind)(dev, rx_port);
> +	}
> +
> +	return ret;
> +}
> +
>  void
>  rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
>  		void *userdata __rte_unused)
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 645a186..c3fb684 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -2133,6 +2133,57 @@ int rte_eth_tx_hairpin_queue_setup
>  	 const struct rte_eth_hairpin_conf *conf);
> 
>  /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> + *
> + * Bind all hairpin TX queues of one port to the RX queues of the peer port.
> + * It is only allowed to call this API after all hairpin queues are configured
> + * properly and the devices of TX and peer RX are in started state.
> + *
> + * @param tx_port
> + *   The TX port identifier of the Ethernet device.
> + * @param rx_port
> + *   The peer RX port identifier of the Ethernet device.
> + *   RTE_MAX_ETHPORTS is allowed for the traversal of all devices.
> + *   RX port ID could have the same value with TX port ID.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-EBUSY) if device is not in started state.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - Others detailed errors from PMD drivers.
> + */
> +__rte_experimental
> +int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> + *
> + * Unbind all hairpin TX queues of one port from the RX queues of the peer
> port.
> + * This should be called before closing the TX or RX devices (optional). After
> + * unbind the hairpin ports pair, it is allowed to bind them again.
> + * Changing queues configuration should be after stopping the device.
> + *
> + * @param tx_port
> + *   The TX port identifier of the Ethernet device.
> + * @param rx_port
> + *   The peer RX port identifier of the Ethernet device.
> + *   RTE_MAX_ETHPORTS is allowed for traversal of all devices.
> + *   RX port ID could have the same value with TX port ID.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-EBUSY) if device is in stopped state.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - Others detailed errors from PMD drivers.
> + */
> +__rte_experimental
> +int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port);
> +
> +/**
>   * Return the NUMA socket to which an Ethernet device is connected
>   *
>   * @param port_id
> diff --git a/lib/librte_ethdev/rte_ethdev_driver.h
> b/lib/librte_ethdev/rte_ethdev_driver.h
> index 04ac8e9..910433f 100644
> --- a/lib/librte_ethdev/rte_ethdev_driver.h
> +++ b/lib/librte_ethdev/rte_ethdev_driver.h
> @@ -575,6 +575,54 @@ typedef int (*eth_tx_hairpin_queue_setup_t)
>  	 const struct rte_eth_hairpin_conf *hairpin_conf);
> 
>  /**
> + * @internal
> + * Bind all hairpin TX queues of one port to the RX queues of the peer port.
> + *
> + * @param dev
> + *   ethdev handle of port.
> + * @param rx_port
> + *   the peer RX port.
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + *
> + * @retval 0
> + *   Success, bind successfully.
> + * @retval -ENOTSUP
> + *   Bind API is not supported.
> + * @retval -EINVAL
> + *   One of the parameters is invalid.
> + * @retval -EBUSY
> + *   Device is not started.
> + */
> +typedef int (*eth_hairpin_bind_t)(struct rte_eth_dev *dev,
> +				uint16_t rx_port);
> +
> +/**
> + * @internal
> + * Unbind all hairpin TX queues of one port from the RX queues of the peer
> port.
> + *
> + * @param dev
> + *   ethdev handle of port.
> + * @param rx_port
> + *   the peer RX port.
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + *
> + * @retval 0
> + *   Success, bind successfully.
> + * @retval -ENOTSUP
> + *   Bind API is not supported.
> + * @retval -EINVAL
> + *   One of the parameters is invalid.
> + * @retval -EBUSY
> + *   Device is already stopped.
> + */
> +typedef int (*eth_hairpin_unbind_t)(struct rte_eth_dev *dev,
> +				  uint16_t rx_port);
> +
> +/**
>   * @internal A structure containing the functions exported by an Ethernet
> driver.
>   */
>  struct eth_dev_ops {
> @@ -713,6 +761,10 @@ struct eth_dev_ops {
>  	/**< Set up device RX hairpin queue. */
>  	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
>  	/**< Set up device TX hairpin queue. */
> +	eth_hairpin_bind_t hairpin_bind;
> +	/**< Bind all hairpin TX queues of device to the peer port RX queues. */
> +	eth_hairpin_unbind_t hairpin_unbind;
> +	/**< Unbind all hairpin TX queues from the peer port RX queues. */
>  };
> 
>  /**
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> b/lib/librte_ethdev/rte_ethdev_version.map
> index c95ef51..18efe4e 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -227,6 +227,8 @@ EXPERIMENTAL {
>  	rte_tm_wred_profile_delete;
> 
>  	# added in 20.11
> +	rte_eth_hairpin_bind;
> +	rte_eth_hairpin_unbind;
>  	rte_eth_link_speed_to_str;
>  	rte_eth_link_to_str;
>  };
> --
> 2.5.5
  
Bing Zhao Oct. 7, 2020, 11:21 a.m. UTC | #2
Hi Ori,

> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Sunday, October 4, 2020 5:20 PM
> To: Bing Zhao <bingz@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com;
> arybchenko@solarflare.com; mdr@ashroe.eu; nhorman@tuxdriver.com;
> bernard.iremonger@intel.com; beilei.xing@intel.com;
> wenzhuo.lu@intel.com
> Cc: dev@dpdk.org
> Subject: RE: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs
> 
> Hi Bing,
> 
> PSB,
> 
> Thanks,
> Ori
> > -----Original Message-----
> > From: Bing Zhao <bingz@nvidia.com>
> > Sent: Thursday, October 1, 2020 3:26 AM
> > Cc: dev@dpdk.org
> > Subject: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs
> >
> > In single port hairpin mode, all the hairpin TX and RX queues
> belong
> > to the same device. After the queues are set up properly, there is
> no
> > other dependency between the TX queue and its RX peer queue. The
> > binding process that connected the TX and RX queues together from
> > hardware level will be done automatically during the device start
> > procedure. Everything required is configured and initialized
> already
> > for the binding process.
> >
> > But in two ports hairpin mode, there will be some cross-
> dependences
> > between two different ports. Usually, the ports will be
> initialized
> > serially by the main thread but not in parallel. The earlier port
> will
> > not be able to enable the bind if the following peer port is not
> yet
> > configured with HW resources. What's more, if one port is detached
> /
> > attached dynamically, it would introduce more trouble for the
> hairpin
> > binding.
> >
> > To overcome these, new APIs for binding and unbinding are added.
> > During startup, only the hairpin TX and RX peer queues will be set
> up.
> > Nothing will be done when starting the device if the queues are
> > without auto-bind attribute. Only after the required ports pair
> > started, the `rte_eth_hairpin_bind()` API can be called to bind
> the
> > all TX queues of the egress port to the RX queues of the peer port.
> > Then the connection between the egress and ingress ports pair will
> be
> > established.
> >
> > The `rte_eth_hairpin_unbind()` API could be used to disconnect the
> > egress and the peer ingress ports. This should only be called
> before
> > the device is closed if needed. When doing the clean up, all the
> > egress and ingress pairs related to a single port should be taken
> into
> > consideration.
> >
> > Signed-off-by: Bing Zhao <bingz@nvidia.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.c           | 107
> > +++++++++++++++++++++++++++++++
> >  lib/librte_ethdev/rte_ethdev.h           |  51 +++++++++++++++
> >  lib/librte_ethdev/rte_ethdev_driver.h    |  52 +++++++++++++++
> >  lib/librte_ethdev/rte_ethdev_version.map |   2 +
> >  4 files changed, 212 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index dfe5c1b..72f567b 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -2175,6 +2175,113 @@ rte_eth_tx_hairpin_queue_setup(uint16_t
> > port_id, uint16_t tx_queue_id,
> >  	return eth_err(port_id, ret);
> >  }
> >
> > +int
> > +rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port) {
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev *rdev;
> > +	uint16_t p;
> > +	uint16_t rp;
> > +	int ret = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
> > +	dev = &rte_eth_devices[tx_port];
> > +	if (!dev->data->dev_started) {
> > +		RTE_ETHDEV_LOG(ERR, "TX port %d is not started",
> tx_port);
> > +		return -EBUSY;
> > +	}
> > +
> > +	/*
> > +	 * If the all the ports probed belong to two or more separate
> NICs, it
> > +	 * is recommended that each pair is bound independently but
> not in the
> > +	 * loop to bind all ports.
> > +	 */
> 
> I don't understand your comment.

I mean, in the following loops, if the application wants to bind one TX port to all the other RX ports, it would not be supported between different PMDs or different NICs. Then the roll-back actions will be done and no hairpin port peers will work successfully.

> 
> > +	if (rx_port == RTE_MAX_ETHPORTS) {
> 
> I think maybe this should be done in the tx queue. Since if the bind
> don't need some port why do we care if it is started?

Do you mean the checking in the RX ports loop below? At first, I intended to make sure this API would be called only after all the ports are started.
But yes, there is no need if some port is not being used.

> So either add a new function to get all peer ports from the tx port,
> or move this logic to the Target PMD.

There would be one more API to be introduced. To my understanding, it would be more appropriate if we add a new API here.
But it would keep RTE simpler if we move such logic into the PMD.

> 
> > +		RTE_ETH_FOREACH_DEV(p) {
> > +			rdev = &rte_eth_devices[p];
> > +			if (!rdev->data->dev_started) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					       "RX port %d is not started", p);
> > +				ret = -EBUSY;
> > +				goto unbind;
> > +			}
> > +			ret = (*dev->dev_ops->hairpin_bind)(dev, p);
> > +			if (ret) {
> > +				RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin
> > TX "
> > +					       "%d to RX %d", tx_port, p);
> > +				goto unbind;
> > +			}
> > +		}
> > +	} else {
> > +		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
> > +		rdev = &rte_eth_devices[rx_port];
> > +		if (!rdev->data->dev_started) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "RX port %d is not started", rx_port);
> > +			return -EBUSY;
> > +		}
> > +		ret = (*dev->dev_ops->hairpin_bind)(dev, rx_port);
> > +		if (ret)
> > +			RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin TX %d
> "
> > +				       "to RX %d", tx_port, rx_port);
> > +	}
> > +
> > +	return ret;
> > +
> > +unbind:
> > +	/* Roll back the previous binding process. */
> > +	RTE_ETH_FOREACH_DEV(rp) {
> > +		if (rp < p)
> > +			(*dev->dev_ops->hairpin_unbind)(dev, rp);
> > +		else
> > +			break;
> > +	}
> > +	return ret;
> > +}
> > +
> > +int
> > +rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port) {
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev *rdev;
> > +	uint16_t p;
> > +	int ret = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
> > +	dev = &rte_eth_devices[tx_port];
> > +	if (!dev->data->dev_started) {
> > +		RTE_ETHDEV_LOG(ERR, "TX port %d is stopped", tx_port);
> > +		return -EBUSY;
> > +	}
> > +
> > +	if (rx_port == RTE_MAX_ETHPORTS) {
> > +		RTE_ETH_FOREACH_DEV(p) {
> > +			rdev = &rte_eth_devices[p];
> > +			if (!rdev->data->dev_started) {
> 
> This condition should never be true.
> First see my comment above about the list of devices, second port
> should fail to stop if it is bounded.

Thank, I will remove this check and make the logic the same as "bind" API.
When stopping the second port, PMD will check if any peer port is bound and return a failure then.

> 
> > +				RTE_ETHDEV_LOG(ERR, "RX port %d is
> > stopped", p);
> > +				ret = -EBUSY;
> > +				break;
> > +			}
> > +			ret = (*dev->dev_ops->hairpin_unbind)(dev, p);
> > +			if (ret) {
> > +				RTE_ETHDEV_LOG(ERR, "Failed to unbind
> > hairpin "
> > +					       "TX %d from RX %d", tx_port, p);
> > +				break;
> > +			}
> > +		}
> > +	} else {
> > +		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
> > +		rdev = &rte_eth_devices[rx_port];
> > +		if (!rdev->data->dev_started) {
> > +			RTE_ETHDEV_LOG(ERR, "RX port %d is stopped",
> > rx_port);
> > +			return -EBUSY;
> > +		}
> > +		ret = (*dev->dev_ops->hairpin_unbind)(dev, rx_port);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >  void
> >  rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
> unsent,
> >  		void *userdata __rte_unused)
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index 645a186..c3fb684 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -2133,6 +2133,57 @@ int rte_eth_tx_hairpin_queue_setup
> >  	 const struct rte_eth_hairpin_conf *conf);
> >
> >  /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without
> prior
> > notice
> > + *
> > + * Bind all hairpin TX queues of one port to the RX queues of the
> peer port.
> > + * It is only allowed to call this API after all hairpin queues
> are
> > +configured
> > + * properly and the devices of TX and peer RX are in started
> state.
> > + *
> > + * @param tx_port
> > + *   The TX port identifier of the Ethernet device.
> > + * @param rx_port
> > + *   The peer RX port identifier of the Ethernet device.
> > + *   RTE_MAX_ETHPORTS is allowed for the traversal of all devices.
> > + *   RX port ID could have the same value with TX port ID.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-EINVAL) if bad parameter.
> > + *   - (-EBUSY) if device is not in started state.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - Others detailed errors from PMD drivers.
> > + */
> > +__rte_experimental
> > +int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without
> prior
> > notice
> > + *
> > + * Unbind all hairpin TX queues of one port from the RX queues of
> the
> > + peer
> > port.
> > + * This should be called before closing the TX or RX devices
> > +(optional). After
> > + * unbind the hairpin ports pair, it is allowed to bind them
> again.
> > + * Changing queues configuration should be after stopping the
> device.
> > + *
> > + * @param tx_port
> > + *   The TX port identifier of the Ethernet device.
> > + * @param rx_port
> > + *   The peer RX port identifier of the Ethernet device.
> > + *   RTE_MAX_ETHPORTS is allowed for traversal of all devices.
> > + *   RX port ID could have the same value with TX port ID.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-EINVAL) if bad parameter.
> > + *   - (-EBUSY) if device is in stopped state.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - Others detailed errors from PMD drivers.
> > + */
> > +__rte_experimental
> > +int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port);
> > +
> > +/**
> >   * Return the NUMA socket to which an Ethernet device is
> connected
> >   *
> >   * @param port_id
> > diff --git a/lib/librte_ethdev/rte_ethdev_driver.h
> > b/lib/librte_ethdev/rte_ethdev_driver.h
> > index 04ac8e9..910433f 100644
> > --- a/lib/librte_ethdev/rte_ethdev_driver.h
> > +++ b/lib/librte_ethdev/rte_ethdev_driver.h
> > @@ -575,6 +575,54 @@ typedef int (*eth_tx_hairpin_queue_setup_t)
> >  	 const struct rte_eth_hairpin_conf *hairpin_conf);
> >
> >  /**
> > + * @internal
> > + * Bind all hairpin TX queues of one port to the RX queues of the
> peer port.
> > + *
> > + * @param dev
> > + *   ethdev handle of port.
> > + * @param rx_port
> > + *   the peer RX port.
> > + *
> > + * @return
> > + *   Negative errno value on error, 0 on success.
> > + *
> > + * @retval 0
> > + *   Success, bind successfully.
> > + * @retval -ENOTSUP
> > + *   Bind API is not supported.
> > + * @retval -EINVAL
> > + *   One of the parameters is invalid.
> > + * @retval -EBUSY
> > + *   Device is not started.
> > + */
> > +typedef int (*eth_hairpin_bind_t)(struct rte_eth_dev *dev,
> > +				uint16_t rx_port);
> > +
> > +/**
> > + * @internal
> > + * Unbind all hairpin TX queues of one port from the RX queues of
> the
> > +peer
> > port.
> > + *
> > + * @param dev
> > + *   ethdev handle of port.
> > + * @param rx_port
> > + *   the peer RX port.
> > + *
> > + * @return
> > + *   Negative errno value on error, 0 on success.
> > + *
> > + * @retval 0
> > + *   Success, bind successfully.
> > + * @retval -ENOTSUP
> > + *   Bind API is not supported.
> > + * @retval -EINVAL
> > + *   One of the parameters is invalid.
> > + * @retval -EBUSY
> > + *   Device is already stopped.
> > + */
> > +typedef int (*eth_hairpin_unbind_t)(struct rte_eth_dev *dev,
> > +				  uint16_t rx_port);
> > +
> > +/**
> >   * @internal A structure containing the functions exported by an
> > Ethernet driver.
> >   */
> >  struct eth_dev_ops {
> > @@ -713,6 +761,10 @@ struct eth_dev_ops {
> >  	/**< Set up device RX hairpin queue. */
> >  	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
> >  	/**< Set up device TX hairpin queue. */
> > +	eth_hairpin_bind_t hairpin_bind;
> > +	/**< Bind all hairpin TX queues of device to the peer port RX
> queues. */
> > +	eth_hairpin_unbind_t hairpin_unbind;
> > +	/**< Unbind all hairpin TX queues from the peer port RX queues.
> */
> >  };
> >
> >  /**
> > diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> > b/lib/librte_ethdev/rte_ethdev_version.map
> > index c95ef51..18efe4e 100644
> > --- a/lib/librte_ethdev/rte_ethdev_version.map
> > +++ b/lib/librte_ethdev/rte_ethdev_version.map
> > @@ -227,6 +227,8 @@ EXPERIMENTAL {
> >  	rte_tm_wred_profile_delete;
> >
> >  	# added in 20.11
> > +	rte_eth_hairpin_bind;
> > +	rte_eth_hairpin_unbind;
> >  	rte_eth_link_speed_to_str;
> >  	rte_eth_link_to_str;
> >  };
> > --
> > 2.5.5

Thanks
  
Ori Kam Oct. 7, 2020, 11:42 a.m. UTC | #3
Hi Bing,

> -----Original Message-----
> From: Bing Zhao <bingz@nvidia.com>
> Sent: Wednesday, October 7, 2020 2:22 PM
> Subject: RE: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs
> 
> Hi Ori,
> 
> > -----Original Message-----
> > From: Ori Kam <orika@nvidia.com>
> > Sent: Sunday, October 4, 2020 5:20 PM
> > Subject: RE: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs
> >
> > Hi Bing,
> >
> > PSB,
> >
> > Thanks,
> > Ori
> > > -----Original Message-----
> > > From: Bing Zhao <bingz@nvidia.com>
> > > Sent: Thursday, October 1, 2020 3:26 AM
> > > Cc: dev@dpdk.org
> > > Subject: [PATCH 1/4] ethdev: add hairpin bind and unbind APIs
> > >
> > > In single port hairpin mode, all the hairpin TX and RX queues
> > belong
> > > to the same device. After the queues are set up properly, there is
> > no
> > > other dependency between the TX queue and its RX peer queue. The
> > > binding process that connected the TX and RX queues together from
> > > hardware level will be done automatically during the device start
> > > procedure. Everything required is configured and initialized
> > already
> > > for the binding process.
> > >
> > > But in two ports hairpin mode, there will be some cross-
> > dependences
> > > between two different ports. Usually, the ports will be
> > initialized
> > > serially by the main thread but not in parallel. The earlier port
> > will
> > > not be able to enable the bind if the following peer port is not
> > yet
> > > configured with HW resources. What's more, if one port is detached
> > /
> > > attached dynamically, it would introduce more trouble for the
> > hairpin
> > > binding.
> > >
> > > To overcome these, new APIs for binding and unbinding are added.
> > > During startup, only the hairpin TX and RX peer queues will be set
> > up.
> > > Nothing will be done when starting the device if the queues are
> > > without auto-bind attribute. Only after the required ports pair
> > > started, the `rte_eth_hairpin_bind()` API can be called to bind
> > the
> > > all TX queues of the egress port to the RX queues of the peer port.
> > > Then the connection between the egress and ingress ports pair will
> > be
> > > established.
> > >
> > > The `rte_eth_hairpin_unbind()` API could be used to disconnect the
> > > egress and the peer ingress ports. This should only be called
> > before
> > > the device is closed if needed. When doing the clean up, all the
> > > egress and ingress pairs related to a single port should be taken
> > into
> > > consideration.
> > >
> > > Signed-off-by: Bing Zhao <bingz@nvidia.com>
> > > ---
> > >  lib/librte_ethdev/rte_ethdev.c           | 107
> > > +++++++++++++++++++++++++++++++
> > >  lib/librte_ethdev/rte_ethdev.h           |  51 +++++++++++++++
> > >  lib/librte_ethdev/rte_ethdev_driver.h    |  52 +++++++++++++++
> > >  lib/librte_ethdev/rte_ethdev_version.map |   2 +
> > >  4 files changed, 212 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > b/lib/librte_ethdev/rte_ethdev.c index dfe5c1b..72f567b 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > @@ -2175,6 +2175,113 @@ rte_eth_tx_hairpin_queue_setup(uint16_t
> > > port_id, uint16_t tx_queue_id,
> > >  	return eth_err(port_id, ret);
> > >  }
> > >
> > > +int
> > > +rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port) {
> > > +	struct rte_eth_dev *dev;
> > > +	struct rte_eth_dev *rdev;
> > > +	uint16_t p;
> > > +	uint16_t rp;
> > > +	int ret = 0;
> > > +
> > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
> > > +	dev = &rte_eth_devices[tx_port];
> > > +	if (!dev->data->dev_started) {
> > > +		RTE_ETHDEV_LOG(ERR, "TX port %d is not started",
> > tx_port);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	/*
> > > +	 * If the all the ports probed belong to two or more separate
> > NICs, it
> > > +	 * is recommended that each pair is bound independently but
> > not in the
> > > +	 * loop to bind all ports.
> > > +	 */
> >
> > I don't understand your comment.
> 
> I mean, in the following loops, if the application wants to bind one TX port to all
> the other RX ports, it would not be supported between different PMDs or
> different NICs. Then the roll-back actions will be done and no hairpin port peers
> will work successfully.
> 

O.K.

> >
> > > +	if (rx_port == RTE_MAX_ETHPORTS) {
> >
> > I think maybe this should be done in the tx queue. Since if the bind
> > don't need some port why do we care if it is started?
> 
> Do you mean the checking in the RX ports loop below? At first, I intended to
> make sure this API would be called only after all the ports are started.
> But yes, there is no need if some port is not being used.
> 
I mean move the check to the tx pmd, if for some reason (peer device not started)
then it should fail.

> > So either add a new function to get all peer ports from the tx port,
> > or move this logic to the Target PMD.
> 
> There would be one more API to be introduced. To my understanding, it would
> be more appropriate if we add a new API here.
> But it would keep RTE simpler if we move such logic into the PMD.
> 
I'm fine with moving the logic to the PMD as long as it can check the status.
which he can using the internal function right?

> >
> > > +		RTE_ETH_FOREACH_DEV(p) {
> > > +			rdev = &rte_eth_devices[p];
> > > +			if (!rdev->data->dev_started) {
> > > +				RTE_ETHDEV_LOG(ERR,
> > > +					       "RX port %d is not started", p);
> > > +				ret = -EBUSY;
> > > +				goto unbind;
> > > +			}
> > > +			ret = (*dev->dev_ops->hairpin_bind)(dev, p);
> > > +			if (ret) {
> > > +				RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin
> > > TX "
> > > +					       "%d to RX %d", tx_port, p);
> > > +				goto unbind;
> > > +			}
> > > +		}
> > > +	} else {
> > > +		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
> > > +		rdev = &rte_eth_devices[rx_port];
> > > +		if (!rdev->data->dev_started) {
> > > +			RTE_ETHDEV_LOG(ERR,
> > > +				       "RX port %d is not started", rx_port);
> > > +			return -EBUSY;
> > > +		}
> > > +		ret = (*dev->dev_ops->hairpin_bind)(dev, rx_port);
> > > +		if (ret)
> > > +			RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin TX %d
> > "
> > > +				       "to RX %d", tx_port, rx_port);
> > > +	}
> > > +
> > > +	return ret;
> > > +
> > > +unbind:
> > > +	/* Roll back the previous binding process. */
> > > +	RTE_ETH_FOREACH_DEV(rp) {
> > > +		if (rp < p)
> > > +			(*dev->dev_ops->hairpin_unbind)(dev, rp);
> > > +		else
> > > +			break;
> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +int
> > > +rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port) {
> > > +	struct rte_eth_dev *dev;
> > > +	struct rte_eth_dev *rdev;
> > > +	uint16_t p;
> > > +	int ret = 0;
> > > +
> > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
> > > +	dev = &rte_eth_devices[tx_port];
> > > +	if (!dev->data->dev_started) {
> > > +		RTE_ETHDEV_LOG(ERR, "TX port %d is stopped", tx_port);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	if (rx_port == RTE_MAX_ETHPORTS) {
> > > +		RTE_ETH_FOREACH_DEV(p) {
> > > +			rdev = &rte_eth_devices[p];
> > > +			if (!rdev->data->dev_started) {
> >
> > This condition should never be true.
> > First see my comment above about the list of devices, second port
> > should fail to stop if it is bounded.
> 
> Thank, I will remove this check and make the logic the same as "bind" API.
> When stopping the second port, PMD will check if any peer port is bound and
> return a failure then.
> 

Great we should also consider what happens in case of hot plug,
In this case the hot plug port should be able to unbind himself from
all ports.

> >
> > > +				RTE_ETHDEV_LOG(ERR, "RX port %d is
> > > stopped", p);
> > > +				ret = -EBUSY;
> > > +				break;
> > > +			}
> > > +			ret = (*dev->dev_ops->hairpin_unbind)(dev, p);
> > > +			if (ret) {
> > > +				RTE_ETHDEV_LOG(ERR, "Failed to unbind
> > > hairpin "
> > > +					       "TX %d from RX %d", tx_port, p);
> > > +				break;
> > > +			}
> > > +		}
> > > +	} else {
> > > +		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
> > > +		rdev = &rte_eth_devices[rx_port];
> > > +		if (!rdev->data->dev_started) {
> > > +			RTE_ETHDEV_LOG(ERR, "RX port %d is stopped",
> > > rx_port);
> > > +			return -EBUSY;
> > > +		}
> > > +		ret = (*dev->dev_ops->hairpin_unbind)(dev, rx_port);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >  void
> > >  rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
> > unsent,
> > >  		void *userdata __rte_unused)
> > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > b/lib/librte_ethdev/rte_ethdev.h index 645a186..c3fb684 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > @@ -2133,6 +2133,57 @@ int rte_eth_tx_hairpin_queue_setup
> > >  	 const struct rte_eth_hairpin_conf *conf);
> > >
> > >  /**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change, or be removed, without
> > prior
> > > notice
> > > + *
> > > + * Bind all hairpin TX queues of one port to the RX queues of the
> > peer port.
> > > + * It is only allowed to call this API after all hairpin queues
> > are
> > > +configured
> > > + * properly and the devices of TX and peer RX are in started
> > state.
> > > + *
> > > + * @param tx_port
> > > + *   The TX port identifier of the Ethernet device.
> > > + * @param rx_port
> > > + *   The peer RX port identifier of the Ethernet device.
> > > + *   RTE_MAX_ETHPORTS is allowed for the traversal of all devices.
> > > + *   RX port ID could have the same value with TX port ID.
> > > + *
> > > + * @return
> > > + *   - (0) if successful.
> > > + *   - (-EINVAL) if bad parameter.
> > > + *   - (-EBUSY) if device is not in started state.
> > > + *   - (-ENOTSUP) if hardware doesn't support.
> > > + *   - Others detailed errors from PMD drivers.
> > > + */
> > > +__rte_experimental
> > > +int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change, or be removed, without
> > prior
> > > notice
> > > + *
> > > + * Unbind all hairpin TX queues of one port from the RX queues of
> > the
> > > + peer
> > > port.
> > > + * This should be called before closing the TX or RX devices
> > > +(optional). After
> > > + * unbind the hairpin ports pair, it is allowed to bind them
> > again.
> > > + * Changing queues configuration should be after stopping the
> > device.
> > > + *
> > > + * @param tx_port
> > > + *   The TX port identifier of the Ethernet device.
> > > + * @param rx_port
> > > + *   The peer RX port identifier of the Ethernet device.
> > > + *   RTE_MAX_ETHPORTS is allowed for traversal of all devices.
> > > + *   RX port ID could have the same value with TX port ID.
> > > + *
> > > + * @return
> > > + *   - (0) if successful.
> > > + *   - (-EINVAL) if bad parameter.
> > > + *   - (-EBUSY) if device is in stopped state.
> > > + *   - (-ENOTSUP) if hardware doesn't support.
> > > + *   - Others detailed errors from PMD drivers.
> > > + */
> > > +__rte_experimental
> > > +int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port);
> > > +
> > > +/**
> > >   * Return the NUMA socket to which an Ethernet device is
> > connected
> > >   *
> > >   * @param port_id
> > > diff --git a/lib/librte_ethdev/rte_ethdev_driver.h
> > > b/lib/librte_ethdev/rte_ethdev_driver.h
> > > index 04ac8e9..910433f 100644
> > > --- a/lib/librte_ethdev/rte_ethdev_driver.h
> > > +++ b/lib/librte_ethdev/rte_ethdev_driver.h
> > > @@ -575,6 +575,54 @@ typedef int (*eth_tx_hairpin_queue_setup_t)
> > >  	 const struct rte_eth_hairpin_conf *hairpin_conf);
> > >
> > >  /**
> > > + * @internal
> > > + * Bind all hairpin TX queues of one port to the RX queues of the
> > peer port.
> > > + *
> > > + * @param dev
> > > + *   ethdev handle of port.
> > > + * @param rx_port
> > > + *   the peer RX port.
> > > + *
> > > + * @return
> > > + *   Negative errno value on error, 0 on success.
> > > + *
> > > + * @retval 0
> > > + *   Success, bind successfully.
> > > + * @retval -ENOTSUP
> > > + *   Bind API is not supported.
> > > + * @retval -EINVAL
> > > + *   One of the parameters is invalid.
> > > + * @retval -EBUSY
> > > + *   Device is not started.
> > > + */
> > > +typedef int (*eth_hairpin_bind_t)(struct rte_eth_dev *dev,
> > > +				uint16_t rx_port);
> > > +
> > > +/**
> > > + * @internal
> > > + * Unbind all hairpin TX queues of one port from the RX queues of
> > the
> > > +peer
> > > port.
> > > + *
> > > + * @param dev
> > > + *   ethdev handle of port.
> > > + * @param rx_port
> > > + *   the peer RX port.
> > > + *
> > > + * @return
> > > + *   Negative errno value on error, 0 on success.
> > > + *
> > > + * @retval 0
> > > + *   Success, bind successfully.
> > > + * @retval -ENOTSUP
> > > + *   Bind API is not supported.
> > > + * @retval -EINVAL
> > > + *   One of the parameters is invalid.
> > > + * @retval -EBUSY
> > > + *   Device is already stopped.
> > > + */
> > > +typedef int (*eth_hairpin_unbind_t)(struct rte_eth_dev *dev,
> > > +				  uint16_t rx_port);
> > > +
> > > +/**
> > >   * @internal A structure containing the functions exported by an
> > > Ethernet driver.
> > >   */
> > >  struct eth_dev_ops {
> > > @@ -713,6 +761,10 @@ struct eth_dev_ops {
> > >  	/**< Set up device RX hairpin queue. */
> > >  	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
> > >  	/**< Set up device TX hairpin queue. */
> > > +	eth_hairpin_bind_t hairpin_bind;
> > > +	/**< Bind all hairpin TX queues of device to the peer port RX
> > queues. */
> > > +	eth_hairpin_unbind_t hairpin_unbind;
> > > +	/**< Unbind all hairpin TX queues from the peer port RX queues.
> > */
> > >  };
> > >
> > >  /**
> > > diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> > > b/lib/librte_ethdev/rte_ethdev_version.map
> > > index c95ef51..18efe4e 100644
> > > --- a/lib/librte_ethdev/rte_ethdev_version.map
> > > +++ b/lib/librte_ethdev/rte_ethdev_version.map
> > > @@ -227,6 +227,8 @@ EXPERIMENTAL {
> > >  	rte_tm_wred_profile_delete;
> > >
> > >  	# added in 20.11
> > > +	rte_eth_hairpin_bind;
> > > +	rte_eth_hairpin_unbind;
> > >  	rte_eth_link_speed_to_str;
> > >  	rte_eth_link_to_str;
> > >  };
> > > --
> > > 2.5.5
> 
> Thanks
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index dfe5c1b..72f567b 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -2175,6 +2175,113 @@  rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 	return eth_err(port_id, ret);
 }
 
+int
+rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev *rdev;
+	uint16_t p;
+	uint16_t rp;
+	int ret = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
+	dev = &rte_eth_devices[tx_port];
+	if (!dev->data->dev_started) {
+		RTE_ETHDEV_LOG(ERR, "TX port %d is not started", tx_port);
+		return -EBUSY;
+	}
+
+	/*
+	 * If the all the ports probed belong to two or more separate NICs, it
+	 * is recommended that each pair is bound independently but not in the
+	 * loop to bind all ports.
+	 */
+	if (rx_port == RTE_MAX_ETHPORTS) {
+		RTE_ETH_FOREACH_DEV(p) {
+			rdev = &rte_eth_devices[p];
+			if (!rdev->data->dev_started) {
+				RTE_ETHDEV_LOG(ERR,
+					       "RX port %d is not started", p);
+				ret = -EBUSY;
+				goto unbind;
+			}
+			ret = (*dev->dev_ops->hairpin_bind)(dev, p);
+			if (ret) {
+				RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin TX "
+					       "%d to RX %d", tx_port, p);
+				goto unbind;
+			}
+		}
+	} else {
+		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
+		rdev = &rte_eth_devices[rx_port];
+		if (!rdev->data->dev_started) {
+			RTE_ETHDEV_LOG(ERR,
+				       "RX port %d is not started", rx_port);
+			return -EBUSY;
+		}
+		ret = (*dev->dev_ops->hairpin_bind)(dev, rx_port);
+		if (ret)
+			RTE_ETHDEV_LOG(ERR, "Failed to bind hairpin TX %d "
+				       "to RX %d", tx_port, rx_port);
+	}
+
+	return ret;
+
+unbind:
+	/* Roll back the previous binding process. */
+	RTE_ETH_FOREACH_DEV(rp) {
+		if (rp < p)
+			(*dev->dev_ops->hairpin_unbind)(dev, rp);
+		else
+			break;
+	}
+	return ret;
+}
+
+int
+rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev *rdev;
+	uint16_t p;
+	int ret = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(tx_port, -EINVAL);
+	dev = &rte_eth_devices[tx_port];
+	if (!dev->data->dev_started) {
+		RTE_ETHDEV_LOG(ERR, "TX port %d is stopped", tx_port);
+		return -EBUSY;
+	}
+
+	if (rx_port == RTE_MAX_ETHPORTS) {
+		RTE_ETH_FOREACH_DEV(p) {
+			rdev = &rte_eth_devices[p];
+			if (!rdev->data->dev_started) {
+				RTE_ETHDEV_LOG(ERR, "RX port %d is stopped", p);
+				ret = -EBUSY;
+				break;
+			}
+			ret = (*dev->dev_ops->hairpin_unbind)(dev, p);
+			if (ret) {
+				RTE_ETHDEV_LOG(ERR, "Failed to unbind hairpin "
+					       "TX %d from RX %d", tx_port, p);
+				break;
+			}
+		}
+	} else {
+		RTE_ETH_VALID_PORTID_OR_ERR_RET(rx_port, -EINVAL);
+		rdev = &rte_eth_devices[rx_port];
+		if (!rdev->data->dev_started) {
+			RTE_ETHDEV_LOG(ERR, "RX port %d is stopped", rx_port);
+			return -EBUSY;
+		}
+		ret = (*dev->dev_ops->hairpin_unbind)(dev, rx_port);
+	}
+
+	return ret;
+}
+
 void
 rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata __rte_unused)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 645a186..c3fb684 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -2133,6 +2133,57 @@  int rte_eth_tx_hairpin_queue_setup
 	 const struct rte_eth_hairpin_conf *conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Bind all hairpin TX queues of one port to the RX queues of the peer port.
+ * It is only allowed to call this API after all hairpin queues are configured
+ * properly and the devices of TX and peer RX are in started state.
+ *
+ * @param tx_port
+ *   The TX port identifier of the Ethernet device.
+ * @param rx_port
+ *   The peer RX port identifier of the Ethernet device.
+ *   RTE_MAX_ETHPORTS is allowed for the traversal of all devices.
+ *   RX port ID could have the same value with TX port ID.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-EBUSY) if device is not in started state.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - Others detailed errors from PMD drivers.
+ */
+__rte_experimental
+int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Unbind all hairpin TX queues of one port from the RX queues of the peer port.
+ * This should be called before closing the TX or RX devices (optional). After
+ * unbind the hairpin ports pair, it is allowed to bind them again.
+ * Changing queues configuration should be after stopping the device.
+ *
+ * @param tx_port
+ *   The TX port identifier of the Ethernet device.
+ * @param rx_port
+ *   The peer RX port identifier of the Ethernet device.
+ *   RTE_MAX_ETHPORTS is allowed for traversal of all devices.
+ *   RX port ID could have the same value with TX port ID.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-EBUSY) if device is in stopped state.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - Others detailed errors from PMD drivers.
+ */
+__rte_experimental
+int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 04ac8e9..910433f 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -575,6 +575,54 @@  typedef int (*eth_tx_hairpin_queue_setup_t)
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
 
 /**
+ * @internal
+ * Bind all hairpin TX queues of one port to the RX queues of the peer port.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param rx_port
+ *   the peer RX port.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, bind successfully.
+ * @retval -ENOTSUP
+ *   Bind API is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -EBUSY
+ *   Device is not started.
+ */
+typedef int (*eth_hairpin_bind_t)(struct rte_eth_dev *dev,
+				uint16_t rx_port);
+
+/**
+ * @internal
+ * Unbind all hairpin TX queues of one port from the RX queues of the peer port.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param rx_port
+ *   the peer RX port.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, bind successfully.
+ * @retval -ENOTSUP
+ *   Bind API is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -EBUSY
+ *   Device is already stopped.
+ */
+typedef int (*eth_hairpin_unbind_t)(struct rte_eth_dev *dev,
+				  uint16_t rx_port);
+
+/**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
 struct eth_dev_ops {
@@ -713,6 +761,10 @@  struct eth_dev_ops {
 	/**< Set up device RX hairpin queue. */
 	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
 	/**< Set up device TX hairpin queue. */
+	eth_hairpin_bind_t hairpin_bind;
+	/**< Bind all hairpin TX queues of device to the peer port RX queues. */
+	eth_hairpin_unbind_t hairpin_unbind;
+	/**< Unbind all hairpin TX queues from the peer port RX queues. */
 };
 
 /**
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index c95ef51..18efe4e 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -227,6 +227,8 @@  EXPERIMENTAL {
 	rte_tm_wred_profile_delete;
 
 	# added in 20.11
+	rte_eth_hairpin_bind;
+	rte_eth_hairpin_unbind;
 	rte_eth_link_speed_to_str;
 	rte_eth_link_to_str;
 };