[v6,3/4] net/ixgbe: cleanup Tx buffers

Message ID 20191230093840.17701-4-chenxux.di@intel.com (mailing list archive)
State Superseded, archived
Delegated to: xiaolong ye
Headers
Series drivers/net: cleanup Tx buffers |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Chenxu Di Dec. 30, 2019, 9:38 a.m. UTC
  Add support to the ixgbe driver for the API rte_eth_tx_done_cleanup
to force free consumed buffers on Tx ring.

Signed-off-by: Chenxu Di <chenxux.di@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |   2 +
 drivers/net/ixgbe/ixgbe_rxtx.c   | 116 +++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
 3 files changed, 120 insertions(+)
  

Comments

Ananyev, Konstantin Dec. 30, 2019, 12:53 p.m. UTC | #1
Hi,

> Add support to the ixgbe driver for the API rte_eth_tx_done_cleanup
> to force free consumed buffers on Tx ring.
> 
> Signed-off-by: Chenxu Di <chenxux.di@intel.com>
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c |   2 +
>  drivers/net/ixgbe/ixgbe_rxtx.c   | 116 +++++++++++++++++++++++++++++++
>  drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
>  3 files changed, 120 insertions(+)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 2c6fd0f13..0091405db 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -601,6 +601,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
>  	.udp_tunnel_port_add  = ixgbe_dev_udp_tunnel_port_add,
>  	.udp_tunnel_port_del  = ixgbe_dev_udp_tunnel_port_del,
>  	.tm_ops_get           = ixgbe_tm_ops_get,
> +	.tx_done_cleanup      = ixgbe_tx_done_cleanup,

Don't see how we can have one tx_done_cleanup() for different tx functions?
Vector and scalar TX path use different  format for sw_ring[] entries.
Also offload and simile TX paths use different method to track used/free descriptors,
and use different functions to free them:
offload uses tx_entry next_id, last_id plus txq. last_desc_cleaned, while
simple TX paths use tx_next_dd. 


>  };
> 
>  /*
> @@ -649,6 +650,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
>  	.reta_query           = ixgbe_dev_rss_reta_query,
>  	.rss_hash_update      = ixgbe_dev_rss_hash_update,
>  	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
> +	.tx_done_cleanup      = ixgbe_tx_done_cleanup,
>  };
> 
>  /* store statistics names and its offset in stats structure */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index fa572d184..520b9c756 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -2306,6 +2306,122 @@ ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue *txq)
>  	}
>  }
> 
> +int ixgbe_tx_done_cleanup(void *q, uint32_t free_cnt)

That seems to work only for offload(full) TX path (ixgbe_xmit_pkts).
Simple(fast) path seems not covered by this function.

> +{
> +	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)q;
> +	struct ixgbe_tx_entry *sw_ring;
> +	volatile union ixgbe_adv_tx_desc *txr;
> +	uint16_t tx_first; /* First segment analyzed. */
> +	uint16_t tx_id;    /* Current segment being processed. */
> +	uint16_t tx_last;  /* Last segment in the current packet. */
> +	uint16_t tx_next;  /* First segment of the next packet. */
> +	int count;
> +
> +	if (txq == NULL)
> +		return -ENODEV;
> +
> +	count = 0;
> +	sw_ring = txq->sw_ring;
> +	txr = txq->tx_ring;
> +
> +	/*
> +	 * tx_tail is the last sent packet on the sw_ring. Goto the end
> +	 * of that packet (the last segment in the packet chain) and
> +	 * then the next segment will be the start of the oldest segment
> +	 * in the sw_ring. 

Not sure I understand the sentence above.
tx_tail is the value of TDT HW register (most recently armed by SW TD).
last_id  is the index of last descriptor for multi-seg packet.
next_id is just the index of next descriptor in HW TD ring.
How do you conclude that it will be the ' oldest segment in the sw_ring'?

Another question why do you need to write your own functions?
Why can't you reuse existing ixgbe_xmit_cleanup() for full(offload) path
and  ixgbe_tx_free_bufs() for simple path?
Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it could be used
to determine finished TX descriptors.
Based on that you can you can free appropriate sw_ring[] entries.

>This is the first packet that will be
> +	 * attempted to be freed.
> +	 */
> +
> +	/* Get last segment in most recently added packet. */
> +	tx_last = sw_ring[txq->tx_tail].last_id;
> +
> +	/* Get the next segment, which is the oldest segment in ring. */
> +	tx_first = sw_ring[tx_last].next_id;
> +
> +	/* Set the current index to the first. */
> +	tx_id = tx_first;
> +
> +	/*
> +	 * Loop through each packet. For each packet, verify that an
> +	 * mbuf exists and that the last segment is free. If so, free
> +	 * it and move on.
> +	 */
> +	while (1) {
> +		tx_last = sw_ring[tx_id].last_id;
> +
> +		if (sw_ring[tx_last].mbuf) {
> +			if (!(txr[tx_last].wb.status &
> +				IXGBE_TXD_STAT_DD))
> +				break;
> +
> +			/* Get the start of the next packet. */
> +			tx_next = sw_ring[tx_last].next_id;
> +
> +			/*
> +			 * Loop through all segments in a
> +			 * packet.
> +			 */
> +			do {
> +				rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> +				sw_ring[tx_id].mbuf = NULL;
> +				sw_ring[tx_id].last_id = tx_id;
> +
> +				/* Move to next segment. */
> +				tx_id = sw_ring[tx_id].next_id;
> +
> +			} while (tx_id != tx_next);
> +
> +			/*
> +			 * Increment the number of packets
> +			 * freed.
> +			 */
> +			count++;
> +
> +			if (unlikely(count == (int)free_cnt))
> +				break;
> +		} else {
> +			/*
> +			 * There are multiple reasons to be here:
> +			 * 1) All the packets on the ring have been
> +			 *    freed - tx_id is equal to tx_first
> +			 *    and some packets have been freed.
> +			 *    - Done, exit
> +			 * 2) Interfaces has not sent a rings worth of
> +			 *    packets yet, so the segment after tail is
> +			 *    still empty. Or a previous call to this
> +			 *    function freed some of the segments but
> +			 *    not all so there is a hole in the list.
> +			 *    Hopefully this is a rare case.
> +			 *    - Walk the list and find the next mbuf. If
> +			 *      there isn't one, then done.
> +			 */
> +			if (likely(tx_id == tx_first && count != 0))
> +				break;
> +
> +			/*
> +			 * Walk the list and find the next mbuf, if any.
> +			 */
> +			do {
> +				/* Move to next segment. */
> +				tx_id = sw_ring[tx_id].next_id;
> +
> +				if (sw_ring[tx_id].mbuf)
> +					break;
> +
> +			} while (tx_id != tx_first);
> +
> +			/*
> +			 * Determine why previous loop bailed. If there
> +			 * is not an mbuf, done.
> +			 */
> +			if (sw_ring[tx_id].mbuf == NULL)
> +				break;
> +		}
> +	}
> +
> +	return count;
> +}
> +
>  static void __attribute__((cold))
>  ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)
>  {
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
> index 505d344b9..2c3770af6 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> @@ -285,6 +285,8 @@ int ixgbe_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
>  int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq);
>  void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq);
> 
> +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> +
>  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
>  extern const uint32_t ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> 
> --
> 2.17.1
  
Chenxu Di Jan. 3, 2020, 9:01 a.m. UTC | #2
Hi,


> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, December 30, 2019 8:54 PM
> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>; Di, ChenxuX
> <chenxux.di@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
> 
> Hi,
> 
> > Add support to the ixgbe driver for the API rte_eth_tx_done_cleanup to
> > force free consumed buffers on Tx ring.
> >
> > Signed-off-by: Chenxu Di <chenxux.di@intel.com>
> > ---
> >  drivers/net/ixgbe/ixgbe_ethdev.c |   2 +
> >  drivers/net/ixgbe/ixgbe_rxtx.c   | 116 +++++++++++++++++++++++++++++++
> >  drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
> >  3 files changed, 120 insertions(+)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 2c6fd0f13..0091405db 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -601,6 +601,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops
> > = {  .udp_tunnel_port_add  = ixgbe_dev_udp_tunnel_port_add,
> > .udp_tunnel_port_del  = ixgbe_dev_udp_tunnel_port_del,
> >  .tm_ops_get           = ixgbe_tm_ops_get,
> > +.tx_done_cleanup      = ixgbe_tx_done_cleanup,
> 
> Don't see how we can have one tx_done_cleanup() for different tx functions?
> Vector and scalar TX path use different  format for sw_ring[] entries.
> Also offload and simile TX paths use different method to track used/free
> descriptors, and use different functions to free them:
> offload uses tx_entry next_id, last_id plus txq. last_desc_cleaned, while simple
> TX paths use tx_next_dd.
> 

This patches will be not include function for Vector, and I will update my code to
Make it work for offload and simple .
> 
> >  };
> >
> >  /*
> > @@ -649,6 +650,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops
> = {
> >  .reta_query           = ixgbe_dev_rss_reta_query,
> >  .rss_hash_update      = ixgbe_dev_rss_hash_update,
> >  .rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
> > +.tx_done_cleanup      = ixgbe_tx_done_cleanup,
> >  };
> >
> >  /* store statistics names and its offset in stats structure */ diff
> > --git a/drivers/net/ixgbe/ixgbe_rxtx.c
> > b/drivers/net/ixgbe/ixgbe_rxtx.c index fa572d184..520b9c756 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > @@ -2306,6 +2306,122 @@ ixgbe_tx_queue_release_mbufs(struct
> > ixgbe_tx_queue *txq)  }  }
> >
> > +int ixgbe_tx_done_cleanup(void *q, uint32_t free_cnt)
> 
> That seems to work only for offload(full) TX path (ixgbe_xmit_pkts).
> Simple(fast) path seems not covered by this function.
> 

Same as above

> > +{
> > +struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)q; struct
> > +ixgbe_tx_entry *sw_ring; volatile union ixgbe_adv_tx_desc *txr;
> > +uint16_t tx_first; /* First segment analyzed. */
> > +uint16_t tx_id;    /* Current segment being processed. */
> > +uint16_t tx_last;  /* Last segment in the current packet. */ uint16_t
> > +tx_next;  /* First segment of the next packet. */ int count;
> > +
> > +if (txq == NULL)
> > +return -ENODEV;
> > +
> > +count = 0;
> > +sw_ring = txq->sw_ring;
> > +txr = txq->tx_ring;
> > +
> > +/*
> > + * tx_tail is the last sent packet on the sw_ring. Goto the end
> > + * of that packet (the last segment in the packet chain) and
> > + * then the next segment will be the start of the oldest segment
> > + * in the sw_ring.
> 
> Not sure I understand the sentence above.
> tx_tail is the value of TDT HW register (most recently armed by SW TD).
> last_id  is the index of last descriptor for multi-seg packet.
> next_id is just the index of next descriptor in HW TD ring.
> How do you conclude that it will be the ' oldest segment in the sw_ring'?
> 

The tx_tail is the last sent packet on the sw_ring. While the xmit_cleanup or 
Tx_free_bufs will be call when the nb_tx_free < tx_free_thresh .
So the sw_ring[tx_tail].next_id must be the begin of mbufs which are not used or
 Already freed . then begin the loop until the mbuf is used and begin to free them.



> Another question why do you need to write your own functions?
> Why can't you reuse existing ixgbe_xmit_cleanup() for full(offload) path and
> ixgbe_tx_free_bufs() for simple path?
> Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it could be used to
> determine finished TX descriptors.
> Based on that you can you can free appropriate sw_ring[] entries.
> 

The reason why I don't reuse existing function is that they all free several mbufs 
While the free_cnt of the API rte_eth_tx_done_cleanup() is the number of packets.
It also need to be done that check which mbuffs are from the same packet.


> >This is the first packet that will be
> > + * attempted to be freed.
> > + */
> > +
> > +/* Get last segment in most recently added packet. */ tx_last =
> > +sw_ring[txq->tx_tail].last_id;
> > +
> > +/* Get the next segment, which is the oldest segment in ring. */
> > +tx_first = sw_ring[tx_last].next_id;
> > +
> > +/* Set the current index to the first. */ tx_id = tx_first;
> > +
> > +/*
> > + * Loop through each packet. For each packet, verify that an
> > + * mbuf exists and that the last segment is free. If so, free
> > + * it and move on.
> > + */
> > +while (1) {
> > +tx_last = sw_ring[tx_id].last_id;
> > +
> > +if (sw_ring[tx_last].mbuf) {
> > +if (!(txr[tx_last].wb.status &
> > +IXGBE_TXD_STAT_DD))
> > +break;
> > +
> > +/* Get the start of the next packet. */ tx_next =
> > +sw_ring[tx_last].next_id;
> > +
> > +/*
> > + * Loop through all segments in a
> > + * packet.
> > + */
> > +do {
> > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > +sw_ring[tx_id].mbuf = NULL;
> > +sw_ring[tx_id].last_id = tx_id;
> > +
> > +/* Move to next segment. */
> > +tx_id = sw_ring[tx_id].next_id;
> > +
> > +} while (tx_id != tx_next);
> > +
> > +/*
> > + * Increment the number of packets
> > + * freed.
> > + */
> > +count++;
> > +
> > +if (unlikely(count == (int)free_cnt)) break; } else {
> > +/*
> > + * There are multiple reasons to be here:
> > + * 1) All the packets on the ring have been
> > + *    freed - tx_id is equal to tx_first
> > + *    and some packets have been freed.
> > + *    - Done, exit
> > + * 2) Interfaces has not sent a rings worth of
> > + *    packets yet, so the segment after tail is
> > + *    still empty. Or a previous call to this
> > + *    function freed some of the segments but
> > + *    not all so there is a hole in the list.
> > + *    Hopefully this is a rare case.
> > + *    - Walk the list and find the next mbuf. If
> > + *      there isn't one, then done.
> > + */
> > +if (likely(tx_id == tx_first && count != 0)) break;
> > +
> > +/*
> > + * Walk the list and find the next mbuf, if any.
> > + */
> > +do {
> > +/* Move to next segment. */
> > +tx_id = sw_ring[tx_id].next_id;
> > +
> > +if (sw_ring[tx_id].mbuf)
> > +break;
> > +
> > +} while (tx_id != tx_first);
> > +
> > +/*
> > + * Determine why previous loop bailed. If there
> > + * is not an mbuf, done.
> > + */
> > +if (sw_ring[tx_id].mbuf == NULL)
> > +break;
> > +}
> > +}
> > +
> > +return count;
> > +}
> > +
> >  static void __attribute__((cold))
> >  ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff --git
> > a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
> > index 505d344b9..2c3770af6 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > @@ -285,6 +285,8 @@ int ixgbe_rx_vec_dev_conf_condition_check(struct
> > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue
> > *rxq);  void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue
> > *rxq);
> >
> > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > +
> >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> >  extern const uint32_t ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> >
> > --
> > 2.17.1
>
  
Ananyev, Konstantin Jan. 5, 2020, 11:36 p.m. UTC | #3
> > > Add support to the ixgbe driver for the API rte_eth_tx_done_cleanup to
> > > force free consumed buffers on Tx ring.
> > >
> > > Signed-off-by: Chenxu Di <chenxux.di@intel.com>
> > > ---
> > >  drivers/net/ixgbe/ixgbe_ethdev.c |   2 +
> > >  drivers/net/ixgbe/ixgbe_rxtx.c   | 116 +++++++++++++++++++++++++++++++
> > >  drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
> > >  3 files changed, 120 insertions(+)
> > >
> > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > index 2c6fd0f13..0091405db 100644
> > > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > @@ -601,6 +601,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops
> > > = {  .udp_tunnel_port_add  = ixgbe_dev_udp_tunnel_port_add,
> > > .udp_tunnel_port_del  = ixgbe_dev_udp_tunnel_port_del,
> > >  .tm_ops_get           = ixgbe_tm_ops_get,
> > > +.tx_done_cleanup      = ixgbe_tx_done_cleanup,
> >
> > Don't see how we can have one tx_done_cleanup() for different tx functions?
> > Vector and scalar TX path use different  format for sw_ring[] entries.
> > Also offload and simile TX paths use different method to track used/free
> > descriptors, and use different functions to free them:
> > offload uses tx_entry next_id, last_id plus txq. last_desc_cleaned, while simple
> > TX paths use tx_next_dd.
> >
> 
> This patches will be not include function for Vector, and I will update my code to
> Make it work for offload and simple .
> >
> > >  };
> > >
> > >  /*
> > > @@ -649,6 +650,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops
> > = {
> > >  .reta_query           = ixgbe_dev_rss_reta_query,
> > >  .rss_hash_update      = ixgbe_dev_rss_hash_update,
> > >  .rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
> > > +.tx_done_cleanup      = ixgbe_tx_done_cleanup,
> > >  };
> > >
> > >  /* store statistics names and its offset in stats structure */ diff
> > > --git a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > b/drivers/net/ixgbe/ixgbe_rxtx.c index fa572d184..520b9c756 100644
> > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > > @@ -2306,6 +2306,122 @@ ixgbe_tx_queue_release_mbufs(struct
> > > ixgbe_tx_queue *txq)  }  }
> > >
> > > +int ixgbe_tx_done_cleanup(void *q, uint32_t free_cnt)
> >
> > That seems to work only for offload(full) TX path (ixgbe_xmit_pkts).
> > Simple(fast) path seems not covered by this function.
> >
> 
> Same as above
> 
> > > +{
> > > +struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)q; struct
> > > +ixgbe_tx_entry *sw_ring; volatile union ixgbe_adv_tx_desc *txr;
> > > +uint16_t tx_first; /* First segment analyzed. */
> > > +uint16_t tx_id;    /* Current segment being processed. */
> > > +uint16_t tx_last;  /* Last segment in the current packet. */ uint16_t
> > > +tx_next;  /* First segment of the next packet. */ int count;
> > > +
> > > +if (txq == NULL)
> > > +return -ENODEV;
> > > +
> > > +count = 0;
> > > +sw_ring = txq->sw_ring;
> > > +txr = txq->tx_ring;
> > > +
> > > +/*
> > > + * tx_tail is the last sent packet on the sw_ring. Goto the end
> > > + * of that packet (the last segment in the packet chain) and
> > > + * then the next segment will be the start of the oldest segment
> > > + * in the sw_ring.
> >
> > Not sure I understand the sentence above.
> > tx_tail is the value of TDT HW register (most recently armed by SW TD).
> > last_id  is the index of last descriptor for multi-seg packet.
> > next_id is just the index of next descriptor in HW TD ring.
> > How do you conclude that it will be the ' oldest segment in the sw_ring'?
> >
> 
> The tx_tail is the last sent packet on the sw_ring. While the xmit_cleanup or
> Tx_free_bufs will be call when the nb_tx_free < tx_free_thresh .
> So the sw_ring[tx_tail].next_id must be the begin of mbufs which are not used or
>  Already freed . then begin the loop until the mbuf is used and begin to free them.
> 
> 
> 
> > Another question why do you need to write your own functions?
> > Why can't you reuse existing ixgbe_xmit_cleanup() for full(offload) path and
> > ixgbe_tx_free_bufs() for simple path?
> > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it could be used to
> > determine finished TX descriptors.
> > Based on that you can you can free appropriate sw_ring[] entries.
> >
> 
> The reason why I don't reuse existing function is that they all free several mbufs
> While the free_cnt of the API rte_eth_tx_done_cleanup() is the number of packets.
> It also need to be done that check which mbuffs are from the same packet.

At first, I don't see anything bad if tx_done_cleanup() will free only some segments from
the packet. As long as it is safe - there is no problem with that.
I think rte_eth_tx_done_cleanup() operates on mbuf, not packet quantities.
But in our case I think it doesn't matter, as ixgbe_xmit_cleanup()
mark TXDs as free only when HW is done with all TXDs for that packet.
As long as there is a way to reuse existing code and avoid duplication
(without introducing any degradation) - we should use it.
And I think there is a very good opportunity here to reuse existing
ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
Moreover because your code doesn't follow ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
logic and infrastructure, it introduces unnecessary scans over TXD ring,
and in some cases doesn't work as expected: 

+	while (1) {
+		tx_last = sw_ring[tx_id].last_id;
+
+		if (sw_ring[tx_last].mbuf) {
+			if (txr[tx_last].wb.status &
+					IXGBE_TXD_STAT_DD) {
...
+			} else {
+				/*
+				 * mbuf still in use, nothing left to
+				 * free.
+				 */
+				break;

It is not correct to expect that IXGBE_TXD_STAT_DD will be set on last TXD for *every* packet.
We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.

So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
It would be much less error prone and will help to avoid code duplication.

Konstantin 

> 
> 
> > >This is the first packet that will be
> > > + * attempted to be freed.
> > > + */
> > > +
> > > +/* Get last segment in most recently added packet. */ tx_last =
> > > +sw_ring[txq->tx_tail].last_id;
> > > +
> > > +/* Get the next segment, which is the oldest segment in ring. */
> > > +tx_first = sw_ring[tx_last].next_id;
> > > +
> > > +/* Set the current index to the first. */ tx_id = tx_first;
> > > +
> > > +/*
> > > + * Loop through each packet. For each packet, verify that an
> > > + * mbuf exists and that the last segment is free. If so, free
> > > + * it and move on.
> > > + */
> > > +while (1) {
> > > +tx_last = sw_ring[tx_id].last_id;
> > > +
> > > +if (sw_ring[tx_last].mbuf) {
> > > +if (!(txr[tx_last].wb.status &
> > > +IXGBE_TXD_STAT_DD))
> > > +break;
> > > +
> > > +/* Get the start of the next packet. */ tx_next =
> > > +sw_ring[tx_last].next_id;
> > > +
> > > +/*
> > > + * Loop through all segments in a
> > > + * packet.
> > > + */
> > > +do {
> > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > +sw_ring[tx_id].mbuf = NULL;
> > > +sw_ring[tx_id].last_id = tx_id;
> > > +
> > > +/* Move to next segment. */
> > > +tx_id = sw_ring[tx_id].next_id;
> > > +
> > > +} while (tx_id != tx_next);
> > > +
> > > +/*
> > > + * Increment the number of packets
> > > + * freed.
> > > + */
> > > +count++;
> > > +
> > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > +/*
> > > + * There are multiple reasons to be here:
> > > + * 1) All the packets on the ring have been
> > > + *    freed - tx_id is equal to tx_first
> > > + *    and some packets have been freed.
> > > + *    - Done, exit
> > > + * 2) Interfaces has not sent a rings worth of
> > > + *    packets yet, so the segment after tail is
> > > + *    still empty. Or a previous call to this
> > > + *    function freed some of the segments but
> > > + *    not all so there is a hole in the list.
> > > + *    Hopefully this is a rare case.
> > > + *    - Walk the list and find the next mbuf. If
> > > + *      there isn't one, then done.
> > > + */
> > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > +
> > > +/*
> > > + * Walk the list and find the next mbuf, if any.
> > > + */
> > > +do {
> > > +/* Move to next segment. */
> > > +tx_id = sw_ring[tx_id].next_id;
> > > +
> > > +if (sw_ring[tx_id].mbuf)
> > > +break;
> > > +
> > > +} while (tx_id != tx_first);
> > > +
> > > +/*
> > > + * Determine why previous loop bailed. If there
> > > + * is not an mbuf, done.
> > > + */
> > > +if (sw_ring[tx_id].mbuf == NULL)
> > > +break;
> > > +}
> > > +}
> > > +
> > > +return count;
> > > +}
> > > +
> > >  static void __attribute__((cold))
> > >  ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff --git
> > > a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > index 505d344b9..2c3770af6 100644
> > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > @@ -285,6 +285,8 @@ int ixgbe_rx_vec_dev_conf_condition_check(struct
> > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue
> > > *rxq);  void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue
> > > *rxq);
> > >
> > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > +
> > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > >  extern const uint32_t ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > >
> > > --
> > > 2.17.1
> >
>
  
Chenxu Di Jan. 6, 2020, 9:03 a.m. UTC | #4
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, January 6, 2020 7:36 AM
> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
> 
> 
> > > > Add support to the ixgbe driver for the API
> > > > rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring.

[snip]

> > > > + * tx_tail is the last sent packet on the sw_ring. Goto the end
> > > > + * of that packet (the last segment in the packet chain) and
> > > > + * then the next segment will be the start of the oldest segment
> > > > + * in the sw_ring.
> > >
> > > Not sure I understand the sentence above.
> > > tx_tail is the value of TDT HW register (most recently armed by SW TD).
> > > last_id  is the index of last descriptor for multi-seg packet.
> > > next_id is just the index of next descriptor in HW TD ring.
> > > How do you conclude that it will be the ' oldest segment in the sw_ring'?
> > >
> >
> > The tx_tail is the last sent packet on the sw_ring. While the
> > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free <
> tx_free_thresh .
> > So the sw_ring[tx_tail].next_id must be the begin of mbufs which are
> > not used or  Already freed . then begin the loop until the mbuf is used and
> begin to free them.
> >
> >
> >
> > > Another question why do you need to write your own functions?
> > > Why can't you reuse existing ixgbe_xmit_cleanup() for full(offload)
> > > path and
> > > ixgbe_tx_free_bufs() for simple path?
> > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it could
> > > be used to determine finished TX descriptors.
> > > Based on that you can you can free appropriate sw_ring[] entries.
> > >
> >
> > The reason why I don't reuse existing function is that they all free
> > several mbufs While the free_cnt of the API rte_eth_tx_done_cleanup() is the
> number of packets.
> > It also need to be done that check which mbuffs are from the same packet.
> 
> At first, I don't see anything bad if tx_done_cleanup() will free only some
> segments from the packet. As long as it is safe - there is no problem with that.
> I think rte_eth_tx_done_cleanup() operates on mbuf, not packet quantities.
> But in our case I think it doesn't matter, as ixgbe_xmit_cleanup() mark TXDs as
> free only when HW is done with all TXDs for that packet.
> As long as there is a way to reuse existing code and avoid duplication (without
> introducing any degradation) - we should use it.
> And I think there is a very good opportunity here to reuse existing
> ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> Moreover because your code doesn't follow
> ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> logic and infrastructure, it introduces unnecessary scans over TXD ring, and in
> some cases doesn't work as expected:
> 
> +while (1) {
> +tx_last = sw_ring[tx_id].last_id;
> +
> +if (sw_ring[tx_last].mbuf) {
> +if (txr[tx_last].wb.status &
> +IXGBE_TXD_STAT_DD) {
> ...
> +} else {
> +/*
> + * mbuf still in use, nothing left to
> + * free.
> + */
> +break;
> 
> It is not correct to expect that IXGBE_TXD_STAT_DD will be set on last TXD for
> *every* packet.
> We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> 
> So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> It would be much less error prone and will help to avoid code duplication.
> 
> Konstantin
> 

At first.
The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will cleanup  TXD wb.status.
the number of status cleanuped is always txq->tx_rs_thresh.

The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that 
	@param free_cnt
 	*   Maximum number of packets to free. Use 0 to indicate all possible packets
 	*   should be freed. Note that a packet may be using multiple mbufs.
a number must be set while ixgbe_xmit_cleanup and ixgbe_tx_free_bufs only have one parameter txq.
And what should do is not only free buffers and status but also check which bufs are from 
 One packet and count the packet freed. 
So I think it can't be implemented that reuse function xmit_cleanup without change it.
And create a new function with the code of xmit_cleanup will cause many duplication.

Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().

Second.
The function in patch is copy from code in igb_rxtx.c. it already updated in 2017,
The commit id is 8d907d2b79f7a54c809f1c44970ff455fa2865e1.
I trust the logic of code is right.
Actually it don't complete for ixgbe, i40e and ice, while it don't change the value of 
last_desc_cleaned and tx_next_dd. And it's beginning prefer last_desc_cleaned or
 tx_next_dd(for offload or simple) to tx_tail. 

So, I suggest to use the old function and fix the issue.

> >
> >
> > > >This is the first packet that will be
> > > > + * attempted to be freed.
> > > > + */
> > > > +
> > > > +/* Get last segment in most recently added packet. */ tx_last =
> > > > +sw_ring[txq->tx_tail].last_id;
> > > > +
> > > > +/* Get the next segment, which is the oldest segment in ring. */
> > > > +tx_first = sw_ring[tx_last].next_id;
> > > > +
> > > > +/* Set the current index to the first. */ tx_id = tx_first;
> > > > +
> > > > +/*
> > > > + * Loop through each packet. For each packet, verify that an
> > > > + * mbuf exists and that the last segment is free. If so, free
> > > > + * it and move on.
> > > > + */
> > > > +while (1) {
> > > > +tx_last = sw_ring[tx_id].last_id;
> > > > +
> > > > +if (sw_ring[tx_last].mbuf) {
> > > > +if (!(txr[tx_last].wb.status &
> > > > +IXGBE_TXD_STAT_DD))
> > > > +break;
> > > > +
> > > > +/* Get the start of the next packet. */ tx_next =
> > > > +sw_ring[tx_last].next_id;
> > > > +
> > > > +/*
> > > > + * Loop through all segments in a
> > > > + * packet.
> > > > + */
> > > > +do {
> > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > +sw_ring[tx_id].mbuf = NULL;
> > > > +sw_ring[tx_id].last_id = tx_id;
> > > > +
> > > > +/* Move to next segment. */
> > > > +tx_id = sw_ring[tx_id].next_id;
> > > > +
> > > > +} while (tx_id != tx_next);
> > > > +
> > > > +/*
> > > > + * Increment the number of packets
> > > > + * freed.
> > > > + */
> > > > +count++;
> > > > +
> > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > +/*
> > > > + * There are multiple reasons to be here:
> > > > + * 1) All the packets on the ring have been
> > > > + *    freed - tx_id is equal to tx_first
> > > > + *    and some packets have been freed.
> > > > + *    - Done, exit
> > > > + * 2) Interfaces has not sent a rings worth of
> > > > + *    packets yet, so the segment after tail is
> > > > + *    still empty. Or a previous call to this
> > > > + *    function freed some of the segments but
> > > > + *    not all so there is a hole in the list.
> > > > + *    Hopefully this is a rare case.
> > > > + *    - Walk the list and find the next mbuf. If
> > > > + *      there isn't one, then done.
> > > > + */
> > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > +
> > > > +/*
> > > > + * Walk the list and find the next mbuf, if any.
> > > > + */
> > > > +do {
> > > > +/* Move to next segment. */
> > > > +tx_id = sw_ring[tx_id].next_id;
> > > > +
> > > > +if (sw_ring[tx_id].mbuf)
> > > > +break;
> > > > +
> > > > +} while (tx_id != tx_first);
> > > > +
> > > > +/*
> > > > + * Determine why previous loop bailed. If there
> > > > + * is not an mbuf, done.
> > > > + */
> > > > +if (sw_ring[tx_id].mbuf == NULL)
> > > > +break;
> > > > +}
> > > > +}
> > > > +
> > > > +return count;
> > > > +}
> > > > +
> > > >  static void __attribute__((cold))  ixgbe_tx_free_swring(struct
> > > > ixgbe_tx_queue *txq)  { diff --git
> > > > a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > index 505d344b9..2c3770af6 100644
> > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > @@ -285,6 +285,8 @@ int
> > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue
> > > > *rxq);  void ixgbe_rx_queue_release_mbufs_vec(struct
> > > > ixgbe_rx_queue *rxq);
> > > >
> > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > +
> > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > >  extern const uint32_t ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > >
> > > > --
> > > > 2.17.1
> > >
> >
>
  
Ananyev, Konstantin Jan. 6, 2020, 1:26 p.m. UTC | #5
> > > > > + * tx_tail is the last sent packet on the sw_ring. Goto the end
> > > > > + * of that packet (the last segment in the packet chain) and
> > > > > + * then the next segment will be the start of the oldest segment
> > > > > + * in the sw_ring.
> > > >
> > > > Not sure I understand the sentence above.
> > > > tx_tail is the value of TDT HW register (most recently armed by SW TD).
> > > > last_id  is the index of last descriptor for multi-seg packet.
> > > > next_id is just the index of next descriptor in HW TD ring.
> > > > How do you conclude that it will be the ' oldest segment in the sw_ring'?
> > > >
> > >
> > > The tx_tail is the last sent packet on the sw_ring. While the
> > > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free <
> > tx_free_thresh .
> > > So the sw_ring[tx_tail].next_id must be the begin of mbufs which are
> > > not used or  Already freed . then begin the loop until the mbuf is used and
> > begin to free them.
> > >
> > >
> > >
> > > > Another question why do you need to write your own functions?
> > > > Why can't you reuse existing ixgbe_xmit_cleanup() for full(offload)
> > > > path and
> > > > ixgbe_tx_free_bufs() for simple path?
> > > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it could
> > > > be used to determine finished TX descriptors.
> > > > Based on that you can you can free appropriate sw_ring[] entries.
> > > >
> > >
> > > The reason why I don't reuse existing function is that they all free
> > > several mbufs While the free_cnt of the API rte_eth_tx_done_cleanup() is the
> > number of packets.
> > > It also need to be done that check which mbuffs are from the same packet.
> >
> > At first, I don't see anything bad if tx_done_cleanup() will free only some
> > segments from the packet. As long as it is safe - there is no problem with that.
> > I think rte_eth_tx_done_cleanup() operates on mbuf, not packet quantities.
> > But in our case I think it doesn't matter, as ixgbe_xmit_cleanup() mark TXDs as
> > free only when HW is done with all TXDs for that packet.
> > As long as there is a way to reuse existing code and avoid duplication (without
> > introducing any degradation) - we should use it.
> > And I think there is a very good opportunity here to reuse existing
> > ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> > Moreover because your code doesn’t follow
> > ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> > logic and infrastructure, it introduces unnecessary scans over TXD ring, and in
> > some cases doesn't work as expected:
> >
> > +while (1) {
> > +tx_last = sw_ring[tx_id].last_id;
> > +
> > +if (sw_ring[tx_last].mbuf) {
> > +if (txr[tx_last].wb.status &
> > +IXGBE_TXD_STAT_DD) {
> > ...
> > +} else {
> > +/*
> > + * mbuf still in use, nothing left to
> > + * free.
> > + */
> > +break;
> >
> > It is not correct to expect that IXGBE_TXD_STAT_DD will be set on last TXD for
> > *every* packet.
> > We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> > Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> >
> > So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> > It would be much less error prone and will help to avoid code duplication.
> >
> > Konstantin
> >
> 
> At first.
> The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will cleanup  TXD wb.status.
> the number of status cleanuped is always txq->tx_rs_thresh.

Yes, and what's wrong with it?

> 
> The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that
> 	@param free_cnt
>  	*   Maximum number of packets to free. Use 0 to indicate all possible packets
>  	*   should be freed. Note that a packet may be using multiple mbufs.

I don't think it is a good approach, better would be to report number of freed mbufs,
but ok, as it is a public API, we probably need to keep it as it is.

> a number must be set while ixgbe_xmit_cleanup and ixgbe_tx_free_bufs only have one parameter txq.

Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
So if user requested more packets to be freed we can call ixgbe_xmit_cleanup()
in a loop. 

> And what should do is not only free buffers and status but also check which bufs are from
>  One packet and count the packet freed.

ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
It only cleans up TXDs. 
So in tx_done_cleanup() after calling ixgbe_xmit_cleanup()
you'll still need to go through sw_ring[]
entries that correspond to free TXDs and call mbuf_seg_free().
You can count number of full packets here. 

> So I think it can't be implemented that reuse function xmit_cleanup without change it.
> And create a new function with the code of xmit_cleanup will cause many duplication.

I don't think it would.
I think all we need here is something like that
(note it is schematic one, doesn't take into accounf that TXD ring is circular):

tx_done_cleanup(..., uint32_t cnt)
{
      /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
           Scan them first and free as many mbufs as we can.
           If we need more mbufs to free call  ixgbe_xmit_cleanup()
           to free more TXDs. */ 

       swr = txq->sw_ring;
       txr     = txq->tx_ring;
       id   = txq->tx_tail;
       free =  txq->nb_tx_free;       
  
       for (n = 0; n < cnt && free != 0; ) {

          for (j = 0; j != free && n < cnt; j++) {
             swe = &swr[id + j];
             if (swe->mbuf != NULL) {
                   rte_pktmbuf_free_seg(swe->mbuf);
                   swe->mbuf = NULL;
             }
             n += (swe->last_id == id + j)
          } 

          if (n < cnt) { 
               ixgbe_xmit_cleanup(txq);
               free =   txq->nb_tx_free - free;
          }
     }
     return n;
}

> 
> Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().

Totally disagree, see above.

> 
> Second.
> The function in patch is copy from code in igb_rxtx.c. it already updated in 2017,
> The commit id is 8d907d2b79f7a54c809f1c44970ff455fa2865e1.

I realized that.
But I think it as a problem, not a positive thing.
While they do have some similarities, igb abd ixgbe are PMDs for different devices,
and their TX code differs quite a lot. Let say igb doesn't use tx_rs_threshold,
but instead set RS bit for each last TXD.
So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't look like a good
idea to me.

> I trust the logic of code is right.
> Actually it don't complete for ixgbe, i40e and ice, while it don't change the value of
> last_desc_cleaned and tx_next_dd. And it's beginning prefer last_desc_cleaned or
>  tx_next_dd(for offload or simple) to tx_tail.
> 
> So, I suggest to use the old function and fix the issue.
> 
> > >
> > >
> > > > >This is the first packet that will be
> > > > > + * attempted to be freed.
> > > > > + */
> > > > > +
> > > > > +/* Get last segment in most recently added packet. */ tx_last =
> > > > > +sw_ring[txq->tx_tail].last_id;
> > > > > +
> > > > > +/* Get the next segment, which is the oldest segment in ring. */
> > > > > +tx_first = sw_ring[tx_last].next_id;
> > > > > +
> > > > > +/* Set the current index to the first. */ tx_id = tx_first;
> > > > > +
> > > > > +/*
> > > > > + * Loop through each packet. For each packet, verify that an
> > > > > + * mbuf exists and that the last segment is free. If so, free
> > > > > + * it and move on.
> > > > > + */
> > > > > +while (1) {
> > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > +
> > > > > +if (sw_ring[tx_last].mbuf) {
> > > > > +if (!(txr[tx_last].wb.status &
> > > > > +IXGBE_TXD_STAT_DD))
> > > > > +break;
> > > > > +
> > > > > +/* Get the start of the next packet. */ tx_next =
> > > > > +sw_ring[tx_last].next_id;
> > > > > +
> > > > > +/*
> > > > > + * Loop through all segments in a
> > > > > + * packet.
> > > > > + */
> > > > > +do {
> > > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > > +sw_ring[tx_id].mbuf = NULL;
> > > > > +sw_ring[tx_id].last_id = tx_id;
> > > > > +
> > > > > +/* Move to next segment. */
> > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > +
> > > > > +} while (tx_id != tx_next);
> > > > > +
> > > > > +/*
> > > > > + * Increment the number of packets
> > > > > + * freed.
> > > > > + */
> > > > > +count++;
> > > > > +
> > > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > > +/*
> > > > > + * There are multiple reasons to be here:
> > > > > + * 1) All the packets on the ring have been
> > > > > + *    freed - tx_id is equal to tx_first
> > > > > + *    and some packets have been freed.
> > > > > + *    - Done, exit
> > > > > + * 2) Interfaces has not sent a rings worth of
> > > > > + *    packets yet, so the segment after tail is
> > > > > + *    still empty. Or a previous call to this
> > > > > + *    function freed some of the segments but
> > > > > + *    not all so there is a hole in the list.
> > > > > + *    Hopefully this is a rare case.
> > > > > + *    - Walk the list and find the next mbuf. If
> > > > > + *      there isn't one, then done.
> > > > > + */
> > > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > > +
> > > > > +/*
> > > > > + * Walk the list and find the next mbuf, if any.
> > > > > + */
> > > > > +do {
> > > > > +/* Move to next segment. */
> > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > +
> > > > > +if (sw_ring[tx_id].mbuf)
> > > > > +break;
> > > > > +
> > > > > +} while (tx_id != tx_first);
> > > > > +
> > > > > +/*
> > > > > + * Determine why previous loop bailed. If there
> > > > > + * is not an mbuf, done.
> > > > > + */
> > > > > +if (sw_ring[tx_id].mbuf == NULL)
> > > > > +break;
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +return count;
> > > > > +}
> > > > > +
> > > > >  static void __attribute__((cold))  ixgbe_tx_free_swring(struct
> > > > > ixgbe_tx_queue *txq)  { diff --git
> > > > > a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > index 505d344b9..2c3770af6 100644
> > > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > @@ -285,6 +285,8 @@ int
> > > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue
> > > > > *rxq);  void ixgbe_rx_queue_release_mbufs_vec(struct
> > > > > ixgbe_rx_queue *rxq);
> > > > >
> > > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > > +
> > > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > > >  extern const uint32_t ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > > >
> > > > > --
> > > > > 2.17.1
> > > >
> > >
> >
>
  
Chenxu Di Jan. 7, 2020, 10:46 a.m. UTC | #6
Hi

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, January 6, 2020 9:26 PM
> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
> 
> 
> 
> > > > > > + * tx_tail is the last sent packet on the sw_ring. Goto the
> > > > > > + end
> > > > > > + * of that packet (the last segment in the packet chain) and
> > > > > > + * then the next segment will be the start of the oldest
> > > > > > + segment
> > > > > > + * in the sw_ring.
> > > > >
> > > > > Not sure I understand the sentence above.
> > > > > tx_tail is the value of TDT HW register (most recently armed by SW TD).
> > > > > last_id  is the index of last descriptor for multi-seg packet.
> > > > > next_id is just the index of next descriptor in HW TD ring.
> > > > > How do you conclude that it will be the ' oldest segment in the sw_ring'?
> > > > >
> > > >
> > > > The tx_tail is the last sent packet on the sw_ring. While the
> > > > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free <
> > > tx_free_thresh .
> > > > So the sw_ring[tx_tail].next_id must be the begin of mbufs which
> > > > are not used or  Already freed . then begin the loop until the
> > > > mbuf is used and
> > > begin to free them.
> > > >
> > > >
> > > >
> > > > > Another question why do you need to write your own functions?
> > > > > Why can't you reuse existing ixgbe_xmit_cleanup() for
> > > > > full(offload) path and
> > > > > ixgbe_tx_free_bufs() for simple path?
> > > > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it
> > > > > could be used to determine finished TX descriptors.
> > > > > Based on that you can you can free appropriate sw_ring[] entries.
> > > > >
> > > >
> > > > The reason why I don't reuse existing function is that they all
> > > > free several mbufs While the free_cnt of the API
> > > > rte_eth_tx_done_cleanup() is the
> > > number of packets.
> > > > It also need to be done that check which mbuffs are from the same packet.
> > >
> > > At first, I don't see anything bad if tx_done_cleanup() will free
> > > only some segments from the packet. As long as it is safe - there is no
> problem with that.
> > > I think rte_eth_tx_done_cleanup() operates on mbuf, not packet quantities.
> > > But in our case I think it doesn't matter, as ixgbe_xmit_cleanup()
> > > mark TXDs as free only when HW is done with all TXDs for that packet.
> > > As long as there is a way to reuse existing code and avoid
> > > duplication (without introducing any degradation) - we should use it.
> > > And I think there is a very good opportunity here to reuse existing
> > > ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> > > Moreover because your code doesn’t follow
> > > ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> > > logic and infrastructure, it introduces unnecessary scans over TXD
> > > ring, and in some cases doesn't work as expected:
> > >
> > > +while (1) {
> > > +tx_last = sw_ring[tx_id].last_id;
> > > +
> > > +if (sw_ring[tx_last].mbuf) {
> > > +if (txr[tx_last].wb.status &
> > > +IXGBE_TXD_STAT_DD) {
> > > ...
> > > +} else {
> > > +/*
> > > + * mbuf still in use, nothing left to
> > > + * free.
> > > + */
> > > +break;
> > >
> > > It is not correct to expect that IXGBE_TXD_STAT_DD will be set on
> > > last TXD for
> > > *every* packet.
> > > We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> > > Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> > >
> > > So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> > > It would be much less error prone and will help to avoid code duplication.
> > >
> > > Konstantin
> > >
> >
> > At first.
> > The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will cleanup
> TXD wb.status.
> > the number of status cleanuped is always txq->tx_rs_thresh.
> 
> Yes, and what's wrong with it?
> 
> >
> > The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that @param
> > free_cnt
> >  *   Maximum number of packets to free. Use 0 to indicate all possible packets
> >  *   should be freed. Note that a packet may be using multiple mbufs.
> 
> I don't think it is a good approach, better would be to report number of freed
> mbufs, but ok, as it is a public API, we probably need to keep it as it is.
> 
> > a number must be set while ixgbe_xmit_cleanup and ixgbe_tx_free_bufs only
> have one parameter txq.
> 
> Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
> So if user requested more packets to be freed we can call ixgbe_xmit_cleanup()
> in a loop.
> 

That is a great idea and I discuss with my workmate about it today. there is also some
Question that we don’t confirm. 
Actually it can call ixgbe_xmit_cleanup() in a loop if user requested more packets,
How to deal with the MOD. For example:
	The default tx_rs_thresh is 32.if the count of mbufs we need free is 50.
	We can call ixgbe_xmit_cleanup() one time to free 32 mbufs.
	Then how about other 18 mbufs. 
	1.If we do nothing, it looks not good. 
	2.if we call ixgbe_xmit_cleanup() successfully, we free 14 mbufs more.
	3.if we call ixgbe_xmit_cleanup() fail, the status of No.32 mbufs is not DD
	  While No .18 is DD. So we do not free 18 mbufs what we can and should free.

We have try some plans about it, likes add parameter for ixgbe_xmit_cleanup(), change 
 Tx_rs_thresh or copy the code or ixgbe_xmit_cleanup() as a new function with more 
Parameter. But all of them seem not perfect. 

So  can you give some comment about it? It seems not easy as we think by reuse function.


> > And what should do is not only free buffers and status but also check
> > which bufs are from  One packet and count the packet freed.
> 
> ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
> It only cleans up TXDs.
> So in tx_done_cleanup() after calling ixgbe_xmit_cleanup() you'll still need to go
> through sw_ring[] entries that correspond to free TXDs and call mbuf_seg_free().
> You can count number of full packets here.
> 
> > So I think it can't be implemented that reuse function xmit_cleanup without
> change it.
> > And create a new function with the code of xmit_cleanup will cause many
> duplication.
> 
> I don't think it would.
> I think all we need here is something like that (note it is schematic one, doesn't
> take into accounf that TXD ring is circular):
> 
> tx_done_cleanup(..., uint32_t cnt)
> {
>       /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
>            Scan them first and free as many mbufs as we can.
>            If we need more mbufs to free call  ixgbe_xmit_cleanup()
>            to free more TXDs. */
> 
>        swr = txq->sw_ring;
>        txr     = txq->tx_ring;
>        id   = txq->tx_tail;
>        free =  txq->nb_tx_free;
> 
>        for (n = 0; n < cnt && free != 0; ) {
> 
>           for (j = 0; j != free && n < cnt; j++) {
>              swe = &swr[id + j];
>              if (swe->mbuf != NULL) {
>                    rte_pktmbuf_free_seg(swe->mbuf);
>                    swe->mbuf = NULL;
>              }
>              n += (swe->last_id == id + j)
>           }
> 
>           if (n < cnt) {
>                ixgbe_xmit_cleanup(txq);
>                free =   txq->nb_tx_free - free;
>           }
>      }
>      return n;
> }
> 
> >
> > Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().
> 
> Totally disagree, see above.
> 
> >
> > Second.
> > The function in patch is copy from code in igb_rxtx.c. it already
> > updated in 2017, The commit id is
> 8d907d2b79f7a54c809f1c44970ff455fa2865e1.
> 
> I realized that.
> But I think it as a problem, not a positive thing.
> While they do have some similarities, igb abd ixgbe are PMDs for different
> devices, and their TX code differs quite a lot. Let say igb doesn't use
> tx_rs_threshold, but instead set RS bit for each last TXD.
> So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't look like a
> good idea to me.
> 
> > I trust the logic of code is right.
> > Actually it don't complete for ixgbe, i40e and ice, while it don't
> > change the value of last_desc_cleaned and tx_next_dd. And it's
> > beginning prefer last_desc_cleaned or  tx_next_dd(for offload or simple) to
> tx_tail.
> >
> > So, I suggest to use the old function and fix the issue.
> >
> > > >
> > > >
> > > > > >This is the first packet that will be
> > > > > > + * attempted to be freed.
> > > > > > + */
> > > > > > +
> > > > > > +/* Get last segment in most recently added packet. */ tx_last
> > > > > > += sw_ring[txq->tx_tail].last_id;
> > > > > > +
> > > > > > +/* Get the next segment, which is the oldest segment in ring.
> > > > > > +*/ tx_first = sw_ring[tx_last].next_id;
> > > > > > +
> > > > > > +/* Set the current index to the first. */ tx_id = tx_first;
> > > > > > +
> > > > > > +/*
> > > > > > + * Loop through each packet. For each packet, verify that an
> > > > > > + * mbuf exists and that the last segment is free. If so, free
> > > > > > + * it and move on.
> > > > > > + */
> > > > > > +while (1) {
> > > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > > +
> > > > > > +if (sw_ring[tx_last].mbuf) {
> > > > > > +if (!(txr[tx_last].wb.status &
> > > > > > +IXGBE_TXD_STAT_DD))
> > > > > > +break;
> > > > > > +
> > > > > > +/* Get the start of the next packet. */ tx_next =
> > > > > > +sw_ring[tx_last].next_id;
> > > > > > +
> > > > > > +/*
> > > > > > + * Loop through all segments in a
> > > > > > + * packet.
> > > > > > + */
> > > > > > +do {
> > > > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > > > +sw_ring[tx_id].mbuf = NULL;
> > > > > > +sw_ring[tx_id].last_id = tx_id;
> > > > > > +
> > > > > > +/* Move to next segment. */
> > > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > > +
> > > > > > +} while (tx_id != tx_next);
> > > > > > +
> > > > > > +/*
> > > > > > + * Increment the number of packets
> > > > > > + * freed.
> > > > > > + */
> > > > > > +count++;
> > > > > > +
> > > > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > > > +/*
> > > > > > + * There are multiple reasons to be here:
> > > > > > + * 1) All the packets on the ring have been
> > > > > > + *    freed - tx_id is equal to tx_first
> > > > > > + *    and some packets have been freed.
> > > > > > + *    - Done, exit
> > > > > > + * 2) Interfaces has not sent a rings worth of
> > > > > > + *    packets yet, so the segment after tail is
> > > > > > + *    still empty. Or a previous call to this
> > > > > > + *    function freed some of the segments but
> > > > > > + *    not all so there is a hole in the list.
> > > > > > + *    Hopefully this is a rare case.
> > > > > > + *    - Walk the list and find the next mbuf. If
> > > > > > + *      there isn't one, then done.
> > > > > > + */
> > > > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > > > +
> > > > > > +/*
> > > > > > + * Walk the list and find the next mbuf, if any.
> > > > > > + */
> > > > > > +do {
> > > > > > +/* Move to next segment. */
> > > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > > +
> > > > > > +if (sw_ring[tx_id].mbuf)
> > > > > > +break;
> > > > > > +
> > > > > > +} while (tx_id != tx_first);
> > > > > > +
> > > > > > +/*
> > > > > > + * Determine why previous loop bailed. If there
> > > > > > + * is not an mbuf, done.
> > > > > > + */
> > > > > > +if (sw_ring[tx_id].mbuf == NULL) break; } }
> > > > > > +
> > > > > > +return count;
> > > > > > +}
> > > > > > +
> > > > > >  static void __attribute__((cold))
> > > > > > ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff --git
> > > > > > a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > b/drivers/net/ixgbe/ixgbe_rxtx.h index 505d344b9..2c3770af6
> > > > > > 100644
> > > > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > @@ -285,6 +285,8 @@ int
> > > > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct
> > > > > > ixgbe_rx_queue *rxq);  void
> > > > > > ixgbe_rx_queue_release_mbufs_vec(struct
> > > > > > ixgbe_rx_queue *rxq);
> > > > > >
> > > > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > > > +
> > > > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > > > >  extern const uint32_t
> > > > > > ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > > > >
> > > > > > --
> > > > > > 2.17.1
> > > > >
> > > >
> > >
> >
>
  
Ananyev, Konstantin Jan. 7, 2020, 2:09 p.m. UTC | #7
Hi Chenxu,

> > > > > > > + * tx_tail is the last sent packet on the sw_ring. Goto the
> > > > > > > + end
> > > > > > > + * of that packet (the last segment in the packet chain) and
> > > > > > > + * then the next segment will be the start of the oldest
> > > > > > > + segment
> > > > > > > + * in the sw_ring.
> > > > > >
> > > > > > Not sure I understand the sentence above.
> > > > > > tx_tail is the value of TDT HW register (most recently armed by SW TD).
> > > > > > last_id  is the index of last descriptor for multi-seg packet.
> > > > > > next_id is just the index of next descriptor in HW TD ring.
> > > > > > How do you conclude that it will be the ' oldest segment in the sw_ring'?
> > > > > >
> > > > >
> > > > > The tx_tail is the last sent packet on the sw_ring. While the
> > > > > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free <
> > > > tx_free_thresh .
> > > > > So the sw_ring[tx_tail].next_id must be the begin of mbufs which
> > > > > are not used or  Already freed . then begin the loop until the
> > > > > mbuf is used and
> > > > begin to free them.
> > > > >
> > > > >
> > > > >
> > > > > > Another question why do you need to write your own functions?
> > > > > > Why can't you reuse existing ixgbe_xmit_cleanup() for
> > > > > > full(offload) path and
> > > > > > ixgbe_tx_free_bufs() for simple path?
> > > > > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it
> > > > > > could be used to determine finished TX descriptors.
> > > > > > Based on that you can you can free appropriate sw_ring[] entries.
> > > > > >
> > > > >
> > > > > The reason why I don't reuse existing function is that they all
> > > > > free several mbufs While the free_cnt of the API
> > > > > rte_eth_tx_done_cleanup() is the
> > > > number of packets.
> > > > > It also need to be done that check which mbuffs are from the same packet.
> > > >
> > > > At first, I don't see anything bad if tx_done_cleanup() will free
> > > > only some segments from the packet. As long as it is safe - there is no
> > problem with that.
> > > > I think rte_eth_tx_done_cleanup() operates on mbuf, not packet quantities.
> > > > But in our case I think it doesn't matter, as ixgbe_xmit_cleanup()
> > > > mark TXDs as free only when HW is done with all TXDs for that packet.
> > > > As long as there is a way to reuse existing code and avoid
> > > > duplication (without introducing any degradation) - we should use it.
> > > > And I think there is a very good opportunity here to reuse existing
> > > > ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> > > > Moreover because your code doesn’t follow
> > > > ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> > > > logic and infrastructure, it introduces unnecessary scans over TXD
> > > > ring, and in some cases doesn't work as expected:
> > > >
> > > > +while (1) {
> > > > +tx_last = sw_ring[tx_id].last_id;
> > > > +
> > > > +if (sw_ring[tx_last].mbuf) {
> > > > +if (txr[tx_last].wb.status &
> > > > +IXGBE_TXD_STAT_DD) {
> > > > ...
> > > > +} else {
> > > > +/*
> > > > + * mbuf still in use, nothing left to
> > > > + * free.
> > > > + */
> > > > +break;
> > > >
> > > > It is not correct to expect that IXGBE_TXD_STAT_DD will be set on
> > > > last TXD for
> > > > *every* packet.
> > > > We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> > > > Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> > > >
> > > > So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> > > > It would be much less error prone and will help to avoid code duplication.
> > > >
> > > > Konstantin
> > > >
> > >
> > > At first.
> > > The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will cleanup
> > TXD wb.status.
> > > the number of status cleanuped is always txq->tx_rs_thresh.
> >
> > Yes, and what's wrong with it?
> >
> > >
> > > The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that @param
> > > free_cnt
> > >  *   Maximum number of packets to free. Use 0 to indicate all possible packets
> > >  *   should be freed. Note that a packet may be using multiple mbufs.
> >
> > I don't think it is a good approach, better would be to report number of freed
> > mbufs, but ok, as it is a public API, we probably need to keep it as it is.
> >
> > > a number must be set while ixgbe_xmit_cleanup and ixgbe_tx_free_bufs only
> > have one parameter txq.
> >
> > Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
> > So if user requested more packets to be freed we can call ixgbe_xmit_cleanup()
> > in a loop.
> >
> 
> That is a great idea and I discuss with my workmate about it today. there is also some
> Question that we don’t confirm.
> Actually it can call ixgbe_xmit_cleanup() in a loop if user requested more packets,
> How to deal with the MOD. For example:
> 	The default tx_rs_thresh is 32.if the count of mbufs we need free is 50.
> 	We can call ixgbe_xmit_cleanup() one time to free 32 mbufs.
> 	Then how about other 18 mbufs.
> 	1.If we do nothing, it looks not good.
> 	2.if we call ixgbe_xmit_cleanup() successfully, we free 14 mbufs more.
> 	3.if we call ixgbe_xmit_cleanup() fail, the status of No.32 mbufs is not DD
> 	  While No .18 is DD. So we do not free 18 mbufs what we can and should free.
> 
> We have try some plans about it, likes add parameter for ixgbe_xmit_cleanup(), change
>  Tx_rs_thresh or copy the code or ixgbe_xmit_cleanup() as a new function with more
> Parameter. But all of them seem not perfect.
> 
> So  can you give some comment about it? It seems not easy as we think by reuse function.

My thought about it:
for situations when cnt % rs_thresh != 0
we'll still call ixgbe_xmit_cleanup() cnt / rs_thresh + 1 times.
But then, we can free just cnt % rs_thresh mbufs, while keeping
rest of them intact. 
  
Let say at some moment we have txq->nb_tx_free==0,
and user called tx_done_cleanup(txq, ..., cnt==50)
So we call ixgbe_xmit_cleanup(txq) first time.
Suppose it frees 32 TXDs, then we walk through corresponding
sw_ring[] entries and let say free 32 packets (one mbuf per packet).
Then we call ixgbe_xmit_cleanup(txq)  second time.  
Suppose it will free another 32 TXDs, we again walk thour sw_ring[],
but free only first 18 mbufs and return.
Suppose we call tx_done_cleanup(txq, cnt=50) immediately again.
Now txq->nb_tx_free==64, so we can start to scan sw_entries[]
from tx_tail straightway. We'll skip first 50 entries as they are 
already empty, then free remaining 14 mbufs, then will
call ixgbe_xmit_cleanup(txq) again, and if it would be successful,
will scan and free another 32 sw_ring[] entries.
Then again will call ixgbe_xmit_cleanup(txq), but will free
only first 8 available sw_ring[].mbuf.   
Probably a bit easier with the code:

tx_done_cleanup(..., uint32_t cnt)
{
       swr = txq->sw_ring;
       txr     = txq->tx_ring;
       id   = txq->tx_tail;
  
      if (txq->nb_tx_free == 0)
         ixgbe_xmit_cleanup(txq);

      free =  txq->nb_tx_free;       
      for (n = 0; n < cnt && free != 0; ) {

          for (j = 0; j != free && n < cnt; j++) {
             swe = &swr[id + j];
             if (swe->mbuf != NULL) {
                   rte_pktmbuf_free_seg(swe->mbuf);
                   swe->mbuf = NULL;
             
                  /* last segment in the packet, increment packet count */
                  n += (swe->last_id == id + j);
             }
          } 

          if (n < cnt) { 
               ixgbe_xmit_cleanup(txq);
               free =   txq->nb_tx_free - free;
          }
     }
     return n;
}

For the situation when there are less then rx_thresh free TXDs
((txq->tx_ring[desc_to_clean_to].wb.status & IXGBE_TXD_STAT_DD) == 0)
we do nothing - in that case we consider there are no more mbufs to free.

> 
> 
> > > And what should do is not only free buffers and status but also check
> > > which bufs are from  One packet and count the packet freed.
> >
> > ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
> > It only cleans up TXDs.
> > So in tx_done_cleanup() after calling ixgbe_xmit_cleanup() you'll still need to go
> > through sw_ring[] entries that correspond to free TXDs and call mbuf_seg_free().
> > You can count number of full packets here.
> >
> > > So I think it can't be implemented that reuse function xmit_cleanup without
> > change it.
> > > And create a new function with the code of xmit_cleanup will cause many
> > duplication.
> >
> > I don't think it would.
> > I think all we need here is something like that (note it is schematic one, doesn't
> > take into accounf that TXD ring is circular):
> >
> > tx_done_cleanup(..., uint32_t cnt)
> > {
> >       /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
> >            Scan them first and free as many mbufs as we can.
> >            If we need more mbufs to free call  ixgbe_xmit_cleanup()
> >            to free more TXDs. */
> >
> >        swr = txq->sw_ring;
> >        txr     = txq->tx_ring;
> >        id   = txq->tx_tail;
> >        free =  txq->nb_tx_free;
> >
> >        for (n = 0; n < cnt && free != 0; ) {
> >
> >           for (j = 0; j != free && n < cnt; j++) {
> >              swe = &swr[id + j];
> >              if (swe->mbuf != NULL) {
> >                    rte_pktmbuf_free_seg(swe->mbuf);
> >                    swe->mbuf = NULL;
> >              }
> >              n += (swe->last_id == id + j)
> >           }
> >
> >           if (n < cnt) {
> >                ixgbe_xmit_cleanup(txq);
> >                free =   txq->nb_tx_free - free;
> >           }
> >      }
> >      return n;
> > }
> >
> > >
> > > Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().
> >
> > Totally disagree, see above.
> >
> > >
> > > Second.
> > > The function in patch is copy from code in igb_rxtx.c. it already
> > > updated in 2017, The commit id is
> > 8d907d2b79f7a54c809f1c44970ff455fa2865e1.
> >
> > I realized that.
> > But I think it as a problem, not a positive thing.
> > While they do have some similarities, igb abd ixgbe are PMDs for different
> > devices, and their TX code differs quite a lot. Let say igb doesn't use
> > tx_rs_threshold, but instead set RS bit for each last TXD.
> > So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't look like a
> > good idea to me.
> >
> > > I trust the logic of code is right.
> > > Actually it don't complete for ixgbe, i40e and ice, while it don't
> > > change the value of last_desc_cleaned and tx_next_dd. And it's
> > > beginning prefer last_desc_cleaned or  tx_next_dd(for offload or simple) to
> > tx_tail.
> > >
> > > So, I suggest to use the old function and fix the issue.
> > >
> > > > >
> > > > >
> > > > > > >This is the first packet that will be
> > > > > > > + * attempted to be freed.
> > > > > > > + */
> > > > > > > +
> > > > > > > +/* Get last segment in most recently added packet. */ tx_last
> > > > > > > += sw_ring[txq->tx_tail].last_id;
> > > > > > > +
> > > > > > > +/* Get the next segment, which is the oldest segment in ring.
> > > > > > > +*/ tx_first = sw_ring[tx_last].next_id;
> > > > > > > +
> > > > > > > +/* Set the current index to the first. */ tx_id = tx_first;
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Loop through each packet. For each packet, verify that an
> > > > > > > + * mbuf exists and that the last segment is free. If so, free
> > > > > > > + * it and move on.
> > > > > > > + */
> > > > > > > +while (1) {
> > > > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > > > +
> > > > > > > +if (sw_ring[tx_last].mbuf) {
> > > > > > > +if (!(txr[tx_last].wb.status &
> > > > > > > +IXGBE_TXD_STAT_DD))
> > > > > > > +break;
> > > > > > > +
> > > > > > > +/* Get the start of the next packet. */ tx_next =
> > > > > > > +sw_ring[tx_last].next_id;
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Loop through all segments in a
> > > > > > > + * packet.
> > > > > > > + */
> > > > > > > +do {
> > > > > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > > > > +sw_ring[tx_id].mbuf = NULL;
> > > > > > > +sw_ring[tx_id].last_id = tx_id;
> > > > > > > +
> > > > > > > +/* Move to next segment. */
> > > > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > > > +
> > > > > > > +} while (tx_id != tx_next);
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Increment the number of packets
> > > > > > > + * freed.
> > > > > > > + */
> > > > > > > +count++;
> > > > > > > +
> > > > > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > > > > +/*
> > > > > > > + * There are multiple reasons to be here:
> > > > > > > + * 1) All the packets on the ring have been
> > > > > > > + *    freed - tx_id is equal to tx_first
> > > > > > > + *    and some packets have been freed.
> > > > > > > + *    - Done, exit
> > > > > > > + * 2) Interfaces has not sent a rings worth of
> > > > > > > + *    packets yet, so the segment after tail is
> > > > > > > + *    still empty. Or a previous call to this
> > > > > > > + *    function freed some of the segments but
> > > > > > > + *    not all so there is a hole in the list.
> > > > > > > + *    Hopefully this is a rare case.
> > > > > > > + *    - Walk the list and find the next mbuf. If
> > > > > > > + *      there isn't one, then done.
> > > > > > > + */
> > > > > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Walk the list and find the next mbuf, if any.
> > > > > > > + */
> > > > > > > +do {
> > > > > > > +/* Move to next segment. */
> > > > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > > > +
> > > > > > > +if (sw_ring[tx_id].mbuf)
> > > > > > > +break;
> > > > > > > +
> > > > > > > +} while (tx_id != tx_first);
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Determine why previous loop bailed. If there
> > > > > > > + * is not an mbuf, done.
> > > > > > > + */
> > > > > > > +if (sw_ring[tx_id].mbuf == NULL) break; } }
> > > > > > > +
> > > > > > > +return count;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void __attribute__((cold))
> > > > > > > ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff --git
> > > > > > > a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > b/drivers/net/ixgbe/ixgbe_rxtx.h index 505d344b9..2c3770af6
> > > > > > > 100644
> > > > > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > @@ -285,6 +285,8 @@ int
> > > > > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct
> > > > > > > ixgbe_rx_queue *rxq);  void
> > > > > > > ixgbe_rx_queue_release_mbufs_vec(struct
> > > > > > > ixgbe_rx_queue *rxq);
> > > > > > >
> > > > > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > > > > +
> > > > > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > > > > >  extern const uint32_t
> > > > > > > ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > > > > >
> > > > > > > --
> > > > > > > 2.17.1
> > > > > >
> > > > >
> > > >
> > >
> >
>
  
Chenxu Di Jan. 8, 2020, 10:15 a.m. UTC | #8
Hi, Konstantin

Thanks for your read.

for our research, we don't think it is a good plan that reuse ixgbe_xmit_cleanup() or ixgbe_tx_free_bufs.
following two opinion will show the reason.


first, it doesn't solve the isituations when cnt % rs_thresh != 0 even by using your plan.

For example, the parameters of API rte_eth_tx_done_cleanup free_cnt=0, that means we want to free all possible mbufs(one mbuf per packet). we find it after checking that
No.18 mbuf status is DD, and the status of mbuf after No.19 is in use.

If the plan is that call ixgbe_xmit_cleanup() cnt /rs_thresh + 1 times( 1 in this example).
After ixgbe_xmit_cleanup()  the value of last_desc_cleaned is not same as the actual.  
the function ixgbe_xmit_pkts will call ixgbe_xmit_cleanup automatically when nb_tx_free<tx_free_thresh, mbufs before last_desc_cleaned (No.19~No.31 for this example)will not be freed

however, all of above is for ixgbe_xmit_pkts() and ixgbe_xmit_cleanup(). if that for ixgbe_xmit_pkts_simple() and ixgbe_tx_free_bufs(),it may be worse.
because the free buff action is in the function ixgbe_tx_free_bufs, we won't  do free in the outside code. And  no mbufs will be freed.

If you do free in outside code for the MOD mbufs and update the value of tx_next_dd and nb_tx_free. there is no need to call ixgbe_tx_free_bufs.
 
here is two case about tx_done_cleanup and ixgbe_tx_free_bufs. it may be more detail

Case 1
Status:
	Suppose before calling tx_done_cleanup(..., uint32_t cnt), txq status are as followings.
	
	one mbuf per packet
	txq->tx_rs_thresh = 32;
	clean mbuf from index id
	txr[id+9].wb.status & IXGBE_TXD_STAT_DD != 0.   // means we cloud free 10 packets
	txr[id+31].wb.status & IXGBE_TXD_STAT_DD == 0.  // means we cloud not clean 32 packets

Process Flow:
	tx_done_cleanup(..., 10) be invoked to free 10 packets.
	Firstly ,tx_done_cleanup will invoke ixgbe_xmit_cleanup to count how many mbufs could free.
	But ixgbe_xmit_cleanup will return -1 (means no mbufs to free), please ref code bellow

	status = txr[desc_to_clean_to].wb.status;
	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
		PMD_TX_FREE_LOG(DEBUG,
				"TX descriptor %4u is not done"
				"(port=%d queue=%d)",
				desc_to_clean_to,
				txq->port_id, txq->queue_id);
		/* Failed to clean any descriptors, better luck next time */
		return -(1);
	}

Result:
	We do nothing

Expect:
	Free 10 packets.

Thoughts:
	If we try to check status from txr[id].wb.status to txr[id+31].wb.status one by one, we could find txr[id].wb.status, txr[id+1].wb.status, txr[id+2].wb.status ……txr[id+9].wb.status, all of them status are IXGBE_TXD_STAT_DD, so actually we have the ability to free 10 packets.



Case 2:
Status:
	Suppose before calling tx_done_cleanup(..., uint32_t cnt), txq status are as followings.

	one mbuf per packet
	txq->tx_rs_thresh = 32;
	clean mbuf from index id
	txr[id+31].wb.status & IXGBE_TXD_STAT_DD != 0.   // means we cloud free 32 packets

Process Flow:
	tx_done_cleanup(..., 10) be invoked to free 10 packets.
	When tx_done_cleanup invoke ixgbe_tx_free_bufs free bufs, it will free 32 packets..

Result:
	Free 32 packets.

Expect:
	Free 10 packets.

Thoughts:
	If we try to check status from txr[id].wb.status to txr[id+31].wb.status one by one, we could find txr[id].wb.status, txr[id+1].wb.status, txr[id+2].wb.status ……txr[id+10].wb.status, all of them status are IXGBE_TXD_STAT_DD, so we have the ability to free 10 packets only.




 second, we have a lot of codes what is same as it in function ixgbe_xmit_cleanup.
 

 we do analysis for function ixgbe_tx_free_bufs and ixgbe_xmit_cleanup and segment their actions.
 the actual action of codes are following points:
1. Determine the start position of the free action.
2. Check whether the status of the end position is DD
3. Free ( set status=0 in xmit_cleanup() while call rte_mempool_put_bulk() in tx_free_bufs())
4. Update location variables( last_desc_cleaned in xmit_cleanup() and  tx_next_dd  in tx_free_bufs() and nb_tx_free in both).
 
If reuse ixgbe_xmit_cleanup, we need get the number of mbufs before calling ixgbe_xmit_cleanup()
We also need to implement the following actions:
1. Determine the starting position
2. Find the last mbuf of each PKT and confirm whether its status is DD
3. Free (don't do for tx_free_bufs())
4. call xmit_cleanup function in loop.
 
By comparison, it is possible to see that 3/4 functions of ixgbe_xmit_cleanup are already being called before, 
it means that ixgbe_xmit_cleanup is not highly reusable, the repeat codes is so many.
 
 
following code is the main code in ixgbe_xmit_cleanup() and the action code.
 
 1.
	last_desc_cleaned = txq->last_desc_cleaned;
 
 2.
	/* Determine the last descriptor needing to be cleaned */
	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
	if (desc_to_clean_to >= nb_tx_desc)
		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);

	/* Check to make sure the last descriptor to clean is done */
	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
	status = txr[desc_to_clean_to].wb.status;
	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
		PMD_TX_FREE_LOG(DEBUG,
		return -(1);
	}
3.
	/* Figure out how many descriptors will be cleaned */
	if (last_desc_cleaned > desc_to_clean_to)
		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
							desc_to_clean_to);
	else
		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
						last_desc_cleaned);
	txr[desc_to_clean_to].wb.status = 0;
4.
	/* Update the txq to reflect the last descriptor that was cleaned */
	txq->last_desc_cleaned = desc_to_clean_to;
	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
 








> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, January 7, 2020 10:10 PM
> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
> 
> 
> Hi Chenxu,
> 
> > > > > > > > + * tx_tail is the last sent packet on the sw_ring. Goto
> > > > > > > > + the end
> > > > > > > > + * of that packet (the last segment in the packet chain)
> > > > > > > > + and
> > > > > > > > + * then the next segment will be the start of the oldest
> > > > > > > > + segment
> > > > > > > > + * in the sw_ring.
> > > > > > >
> > > > > > > Not sure I understand the sentence above.
> > > > > > > tx_tail is the value of TDT HW register (most recently armed by SW
> TD).
> > > > > > > last_id  is the index of last descriptor for multi-seg packet.
> > > > > > > next_id is just the index of next descriptor in HW TD ring.
> > > > > > > How do you conclude that it will be the ' oldest segment in the
> sw_ring'?
> > > > > > >
> > > > > >
> > > > > > The tx_tail is the last sent packet on the sw_ring. While the
> > > > > > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free
> > > > > > <
> > > > > tx_free_thresh .
> > > > > > So the sw_ring[tx_tail].next_id must be the begin of mbufs
> > > > > > which are not used or  Already freed . then begin the loop
> > > > > > until the mbuf is used and
> > > > > begin to free them.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Another question why do you need to write your own functions?
> > > > > > > Why can't you reuse existing ixgbe_xmit_cleanup() for
> > > > > > > full(offload) path and
> > > > > > > ixgbe_tx_free_bufs() for simple path?
> > > > > > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least
> > > > > > > it could be used to determine finished TX descriptors.
> > > > > > > Based on that you can you can free appropriate sw_ring[] entries.
> > > > > > >
> > > > > >
> > > > > > The reason why I don't reuse existing function is that they
> > > > > > all free several mbufs While the free_cnt of the API
> > > > > > rte_eth_tx_done_cleanup() is the
> > > > > number of packets.
> > > > > > It also need to be done that check which mbuffs are from the same
> packet.
> > > > >
> > > > > At first, I don't see anything bad if tx_done_cleanup() will
> > > > > free only some segments from the packet. As long as it is safe -
> > > > > there is no
> > > problem with that.
> > > > > I think rte_eth_tx_done_cleanup() operates on mbuf, not packet
> quantities.
> > > > > But in our case I think it doesn't matter, as
> > > > > ixgbe_xmit_cleanup() mark TXDs as free only when HW is done with all
> TXDs for that packet.
> > > > > As long as there is a way to reuse existing code and avoid
> > > > > duplication (without introducing any degradation) - we should use it.
> > > > > And I think there is a very good opportunity here to reuse
> > > > > existing
> > > > > ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> > > > > Moreover because your code doesn’t follow
> > > > > ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> > > > > logic and infrastructure, it introduces unnecessary scans over
> > > > > TXD ring, and in some cases doesn't work as expected:
> > > > >
> > > > > +while (1) {
> > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > +
> > > > > +if (sw_ring[tx_last].mbuf) {
> > > > > +if (txr[tx_last].wb.status &
> > > > > +IXGBE_TXD_STAT_DD) {
> > > > > ...
> > > > > +} else {
> > > > > +/*
> > > > > + * mbuf still in use, nothing left to
> > > > > + * free.
> > > > > + */
> > > > > +break;
> > > > >
> > > > > It is not correct to expect that IXGBE_TXD_STAT_DD will be set
> > > > > on last TXD for
> > > > > *every* packet.
> > > > > We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> > > > > Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> > > > >
> > > > > So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> > > > > It would be much less error prone and will help to avoid code duplication.
> > > > >
> > > > > Konstantin
> > > > >
> > > >
> > > > At first.
> > > > The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will
> > > > cleanup
> > > TXD wb.status.
> > > > the number of status cleanuped is always txq->tx_rs_thresh.
> > >
> > > Yes, and what's wrong with it?
> > >
> > > >
> > > > The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that
> > > > @param free_cnt
> > > >  *   Maximum number of packets to free. Use 0 to indicate all possible
> packets
> > > >  *   should be freed. Note that a packet may be using multiple mbufs.
> > >
> > > I don't think it is a good approach, better would be to report
> > > number of freed mbufs, but ok, as it is a public API, we probably need to
> keep it as it is.
> > >
> > > > a number must be set while ixgbe_xmit_cleanup and
> > > > ixgbe_tx_free_bufs only
> > > have one parameter txq.
> > >
> > > Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
> > > So if user requested more packets to be freed we can call
> > > ixgbe_xmit_cleanup() in a loop.
> > >
> >
> > That is a great idea and I discuss with my workmate about it today.
> > there is also some Question that we don’t confirm.
> > Actually it can call ixgbe_xmit_cleanup() in a loop if user requested
> > more packets, How to deal with the MOD. For example:
> > The default tx_rs_thresh is 32.if the count of mbufs we need free is 50.
> > We can call ixgbe_xmit_cleanup() one time to free 32 mbufs.
> > Then how about other 18 mbufs.
> > 1.If we do nothing, it looks not good.
> > 2.if we call ixgbe_xmit_cleanup() successfully, we free 14 mbufs more.
> > 3.if we call ixgbe_xmit_cleanup() fail, the status of No.32 mbufs is not DD
> >   While No .18 is DD. So we do not free 18 mbufs what we can and should free.
> >
> > We have try some plans about it, likes add parameter for
> > ixgbe_xmit_cleanup(), change  Tx_rs_thresh or copy the code or
> > ixgbe_xmit_cleanup() as a new function with more Parameter. But all of them
> seem not perfect.
> >
> > So  can you give some comment about it? It seems not easy as we think by
> reuse function.
> 
> My thought about it:
> for situations when cnt % rs_thresh != 0 we'll still call ixgbe_xmit_cleanup() cnt /
> rs_thresh + 1 times.
> But then, we can free just cnt % rs_thresh mbufs, while keeping rest of them
> intact.
> 
> Let say at some moment we have txq->nb_tx_free==0, and user called
> tx_done_cleanup(txq, ..., cnt==50) So we call ixgbe_xmit_cleanup(txq) first time.
> Suppose it frees 32 TXDs, then we walk through corresponding sw_ring[] entries
> and let say free 32 packets (one mbuf per packet).
> Then we call ixgbe_xmit_cleanup(txq)  second time.
> Suppose it will free another 32 TXDs, we again walk thour sw_ring[], but free
> only first 18 mbufs and return.
> Suppose we call tx_done_cleanup(txq, cnt=50) immediately again.
> Now txq->nb_tx_free==64, so we can start to scan sw_entries[] from tx_tail
> straightway. We'll skip first 50 entries as they are already empty, then free
> remaining 14 mbufs, then will call ixgbe_xmit_cleanup(txq) again, and if it would
> be successful, will scan and free another 32 sw_ring[] entries.
> Then again will call ixgbe_xmit_cleanup(txq), but will free
> only first 8 available sw_ring[].mbuf.
> Probably a bit easier with the code:
> 
> tx_done_cleanup(..., uint32_t cnt)
> {
>        swr = txq->sw_ring;
>        txr     = txq->tx_ring;
>        id   = txq->tx_tail;
> 
>       if (txq->nb_tx_free == 0)
>          ixgbe_xmit_cleanup(txq);
> 
>       free =  txq->nb_tx_free;
>       for (n = 0; n < cnt && free != 0; ) {
> 
>           for (j = 0; j != free && n < cnt; j++) {
>              swe = &swr[id + j];
>              if (swe->mbuf != NULL) {
>                    rte_pktmbuf_free_seg(swe->mbuf);
>                    swe->mbuf = NULL;
> 
>                   /* last segment in the packet, increment packet count */
>                   n += (swe->last_id == id + j);
>              }
>           }
> 
>           if (n < cnt) {
>                ixgbe_xmit_cleanup(txq);
>                free =   txq->nb_tx_free - free;
>           }
>      }
>      return n;
> }
> 
> For the situation when there are less then rx_thresh free TXDs ((txq-
> >tx_ring[desc_to_clean_to].wb.status & IXGBE_TXD_STAT_DD) == 0) we do
> nothing - in that case we consider there are no more mbufs to free.
> 
> >
> >
> > > > And what should do is not only free buffers and status but also
> > > > check which bufs are from  One packet and count the packet freed.
> > >
> > > ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
> > > It only cleans up TXDs.
> > > So in tx_done_cleanup() after calling ixgbe_xmit_cleanup() you'll
> > > still need to go through sw_ring[] entries that correspond to free TXDs and
> call mbuf_seg_free().
> > > You can count number of full packets here.
> > >
> > > > So I think it can't be implemented that reuse function
> > > > xmit_cleanup without
> > > change it.
> > > > And create a new function with the code of xmit_cleanup will cause
> > > > many
> > > duplication.
> > >
> > > I don't think it would.
> > > I think all we need here is something like that (note it is
> > > schematic one, doesn't take into accounf that TXD ring is circular):
> > >
> > > tx_done_cleanup(..., uint32_t cnt)
> > > {
> > >       /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
> > >            Scan them first and free as many mbufs as we can.
> > >            If we need more mbufs to free call  ixgbe_xmit_cleanup()
> > >            to free more TXDs. */
> > >
> > >        swr = txq->sw_ring;
> > >        txr     = txq->tx_ring;
> > >        id   = txq->tx_tail;
> > >        free =  txq->nb_tx_free;
> > >
> > >        for (n = 0; n < cnt && free != 0; ) {
> > >
> > >           for (j = 0; j != free && n < cnt; j++) {
> > >              swe = &swr[id + j];
> > >              if (swe->mbuf != NULL) {
> > >                    rte_pktmbuf_free_seg(swe->mbuf);
> > >                    swe->mbuf = NULL;
> > >              }
> > >              n += (swe->last_id == id + j)
> > >           }
> > >
> > >           if (n < cnt) {
> > >                ixgbe_xmit_cleanup(txq);
> > >                free =   txq->nb_tx_free - free;
> > >           }
> > >      }
> > >      return n;
> > > }
> > >
> > > >
> > > > Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().
> > >
> > > Totally disagree, see above.
> > >
> > > >
> > > > Second.
> > > > The function in patch is copy from code in igb_rxtx.c. it already
> > > > updated in 2017, The commit id is
> > > 8d907d2b79f7a54c809f1c44970ff455fa2865e1.
> > >
> > > I realized that.
> > > But I think it as a problem, not a positive thing.
> > > While they do have some similarities, igb abd ixgbe are PMDs for
> > > different devices, and their TX code differs quite a lot. Let say
> > > igb doesn't use tx_rs_threshold, but instead set RS bit for each last TXD.
> > > So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't
> > > look like a good idea to me.
> > >
> > > > I trust the logic of code is right.
> > > > Actually it don't complete for ixgbe, i40e and ice, while it don't
> > > > change the value of last_desc_cleaned and tx_next_dd. And it's
> > > > beginning prefer last_desc_cleaned or  tx_next_dd(for offload or
> > > > simple) to
> > > tx_tail.
> > > >
> > > > So, I suggest to use the old function and fix the issue.
> > > >
> > > > > >
> > > > > >
> > > > > > > >This is the first packet that will be
> > > > > > > > + * attempted to be freed.
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +/* Get last segment in most recently added packet. */
> > > > > > > > +tx_last = sw_ring[txq->tx_tail].last_id;
> > > > > > > > +
> > > > > > > > +/* Get the next segment, which is the oldest segment in ring.
> > > > > > > > +*/ tx_first = sw_ring[tx_last].next_id;
> > > > > > > > +
> > > > > > > > +/* Set the current index to the first. */ tx_id =
> > > > > > > > +tx_first;
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Loop through each packet. For each packet, verify that
> > > > > > > > +an
> > > > > > > > + * mbuf exists and that the last segment is free. If so,
> > > > > > > > +free
> > > > > > > > + * it and move on.
> > > > > > > > + */
> > > > > > > > +while (1) {
> > > > > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > > > > +
> > > > > > > > +if (sw_ring[tx_last].mbuf) { if (!(txr[tx_last].wb.status
> > > > > > > > +&
> > > > > > > > +IXGBE_TXD_STAT_DD))
> > > > > > > > +break;
> > > > > > > > +
> > > > > > > > +/* Get the start of the next packet. */ tx_next =
> > > > > > > > +sw_ring[tx_last].next_id;
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Loop through all segments in a
> > > > > > > > + * packet.
> > > > > > > > + */
> > > > > > > > +do {
> > > > > > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > > > > > +sw_ring[tx_id].mbuf = NULL; sw_ring[tx_id].last_id =
> > > > > > > > +tx_id;
> > > > > > > > +
> > > > > > > > +/* Move to next segment. */ tx_id =
> > > > > > > > +sw_ring[tx_id].next_id;
> > > > > > > > +
> > > > > > > > +} while (tx_id != tx_next);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Increment the number of packets
> > > > > > > > + * freed.
> > > > > > > > + */
> > > > > > > > +count++;
> > > > > > > > +
> > > > > > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > > > > > +/*
> > > > > > > > + * There are multiple reasons to be here:
> > > > > > > > + * 1) All the packets on the ring have been
> > > > > > > > + *    freed - tx_id is equal to tx_first
> > > > > > > > + *    and some packets have been freed.
> > > > > > > > + *    - Done, exit
> > > > > > > > + * 2) Interfaces has not sent a rings worth of
> > > > > > > > + *    packets yet, so the segment after tail is
> > > > > > > > + *    still empty. Or a previous call to this
> > > > > > > > + *    function freed some of the segments but
> > > > > > > > + *    not all so there is a hole in the list.
> > > > > > > > + *    Hopefully this is a rare case.
> > > > > > > > + *    - Walk the list and find the next mbuf. If
> > > > > > > > + *      there isn't one, then done.
> > > > > > > > + */
> > > > > > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Walk the list and find the next mbuf, if any.
> > > > > > > > + */
> > > > > > > > +do {
> > > > > > > > +/* Move to next segment. */ tx_id =
> > > > > > > > +sw_ring[tx_id].next_id;
> > > > > > > > +
> > > > > > > > +if (sw_ring[tx_id].mbuf)
> > > > > > > > +break;
> > > > > > > > +
> > > > > > > > +} while (tx_id != tx_first);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Determine why previous loop bailed. If there
> > > > > > > > + * is not an mbuf, done.
> > > > > > > > + */
> > > > > > > > +if (sw_ring[tx_id].mbuf == NULL) break; } }
> > > > > > > > +
> > > > > > > > +return count;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static void __attribute__((cold))
> > > > > > > > ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff
> > > > > > > > --git a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > > b/drivers/net/ixgbe/ixgbe_rxtx.h index
> > > > > > > > 505d344b9..2c3770af6
> > > > > > > > 100644
> > > > > > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > > @@ -285,6 +285,8 @@ int
> > > > > > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > > > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct
> > > > > > > > ixgbe_rx_queue *rxq);  void
> > > > > > > > ixgbe_rx_queue_release_mbufs_vec(struct
> > > > > > > > ixgbe_rx_queue *rxq);
> > > > > > > >
> > > > > > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > > > > > +
> > > > > > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > > > > > >  extern const uint32_t
> > > > > > > > ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.17.1
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
  
Ananyev, Konstantin Jan. 8, 2020, 3:12 p.m. UTC | #9
Hi Chenxu,
 
> Thanks for your read.
> 
> for our research, we don't think it is a good plan that reuse ixgbe_xmit_cleanup() or ixgbe_tx_free_bufs.
> following two opinion will show the reason.

I think there is a main misunderstandings between us:

TXD.DD bit setting.
You expect that for every completed packet packet HW will set DD bit.
For ixgbe it is not the case.
For ixgbe (and AFAIK for i40e) we don't set RS bit for every packet.
Instead we set it for a bunch of packets: tx_rs_thresh value.
That's one of the optimizations ixgbe PMD does.
So valid assumption is to check DD bit only for TXDs where RS bit is set.
That's how both ixgbe_xmit_cleanup() or ixgbe_tx_free_bufs() works.
All TXDs between last_desc_cleaned and desc_to_clean_to
(or between tx_next_dd and tx_next_dd + tx_rs_thresh) are considered 
by PMD as 'still busy', even in reality some of them might already be not. 

Yes, it means that in some cases some of them mbufs will be considered
by PMD as still in use, while they are really not.
But I don't think it is a critical thing.
Having TXD and related mbufs management logic within PMD
consistent is much more important.
 
In other words, I still think reusing exiting functions
(ixgbe_xmit_cleanup() and ixgbe_tx_free_bufs) for tx_cleanup_done()
is a better option then trying to create new ones. 

BTW, as a side question, do you guys have any vehicle to test these functions?
AFAIK, right now this function is not called from any app:
find  app examples/ -type f -name '*.[h,c]' | xargs grep -l rte_eth_tx_done_cleanup
<empty>

> 
> first, it doesn't solve the isituations when cnt % rs_thresh != 0 even by using your plan.
> 
> For example, the parameters of API rte_eth_tx_done_cleanup free_cnt=0, that means we want to free all possible mbufs(one mbuf per
> packet). we find it after checking that
> No.18 mbuf status is DD, and the status of mbuf after No.19 is in use.
> 
> If the plan is that call ixgbe_xmit_cleanup() cnt /rs_thresh + 1 times( 1 in this example).
> After ixgbe_xmit_cleanup()  the value of last_desc_cleaned is not same as the actual.
> the function ixgbe_xmit_pkts will call ixgbe_xmit_cleanup automatically when nb_tx_free<tx_free_thresh, mbufs before last_desc_cleaned
> (No.19~No.31 for this example)will not be freed
> 
> however, all of above is for ixgbe_xmit_pkts() and ixgbe_xmit_cleanup(). if that for ixgbe_xmit_pkts_simple() and ixgbe_tx_free_bufs(),it
> may be worse.
> because the free buff action is in the function ixgbe_tx_free_bufs, we won't  do free in the outside code. And  no mbufs will be freed.
> 
> If you do free in outside code for the MOD mbufs and update the value of tx_next_dd and nb_tx_free. there is no need to call
> ixgbe_tx_free_bufs.
> 
> here is two case about tx_done_cleanup and ixgbe_tx_free_bufs. it may be more detail
> 
> Case 1
> Status:
> 	Suppose before calling tx_done_cleanup(..., uint32_t cnt), txq status are as followings.
> 
> 	one mbuf per packet
> 	txq->tx_rs_thresh = 32;
> 	clean mbuf from index id
> 	txr[id+9].wb.status & IXGBE_TXD_STAT_DD != 0.   // means we cloud free 10 packets
> 	txr[id+31].wb.status & IXGBE_TXD_STAT_DD == 0.  // means we cloud not clean 32 packets
> 
> Process Flow:
> 	tx_done_cleanup(..., 10) be invoked to free 10 packets.
> 	Firstly ,tx_done_cleanup will invoke ixgbe_xmit_cleanup to count how many mbufs could free.
> 	But ixgbe_xmit_cleanup will return -1 (means no mbufs to free), please ref code bellow
> 
> 	status = txr[desc_to_clean_to].wb.status;
> 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
> 		PMD_TX_FREE_LOG(DEBUG,
> 				"TX descriptor %4u is not done"
> 				"(port=%d queue=%d)",
> 				desc_to_clean_to,
> 				txq->port_id, txq->queue_id);
> 		/* Failed to clean any descriptors, better luck next time */
> 		return -(1);
> 	}
> 
> Result:
> 	We do nothing
> 
> Expect:
> 	Free 10 packets.
> 
> Thoughts:
> 	If we try to check status from txr[id].wb.status to txr[id+31].wb.status one by one, we could find txr[id].wb.status,
> txr[id+1].wb.status, txr[id+2].wb.status ……txr[id+9].wb.status, all of them status are IXGBE_TXD_STAT_DD, so actually we have the ability
> to free 10 packets.
> 
> 
> 
> Case 2:
> Status:
> 	Suppose before calling tx_done_cleanup(..., uint32_t cnt), txq status are as followings.
> 
> 	one mbuf per packet
> 	txq->tx_rs_thresh = 32;
> 	clean mbuf from index id
> 	txr[id+31].wb.status & IXGBE_TXD_STAT_DD != 0.   // means we cloud free 32 packets
> 
> Process Flow:
> 	tx_done_cleanup(..., 10) be invoked to free 10 packets.
> 	When tx_done_cleanup invoke ixgbe_tx_free_bufs free bufs, it will free 32 packets..
> 
> Result:
> 	Free 32 packets.
> 
> Expect:
> 	Free 10 packets.
> 
> Thoughts:
> 	If we try to check status from txr[id].wb.status to txr[id+31].wb.status one by one, we could find txr[id].wb.status,
> txr[id+1].wb.status, txr[id+2].wb.status ……txr[id+10].wb.status, all of them status are IXGBE_TXD_STAT_DD, so we have the ability to free
> 10 packets only.
> 
> 
> 
> 
>  second, we have a lot of codes what is same as it in function ixgbe_xmit_cleanup.
> 
> 
>  we do analysis for function ixgbe_tx_free_bufs and ixgbe_xmit_cleanup and segment their actions.
>  the actual action of codes are following points:
> 1. Determine the start position of the free action.
> 2. Check whether the status of the end position is DD
> 3. Free ( set status=0 in xmit_cleanup() while call rte_mempool_put_bulk() in tx_free_bufs())
> 4. Update location variables( last_desc_cleaned in xmit_cleanup() and  tx_next_dd  in tx_free_bufs() and nb_tx_free in both).
> 
> If reuse ixgbe_xmit_cleanup, we need get the number of mbufs before calling ixgbe_xmit_cleanup()
> We also need to implement the following actions:
> 1. Determine the starting position
> 2. Find the last mbuf of each PKT and confirm whether its status is DD
> 3. Free (don't do for tx_free_bufs())
> 4. call xmit_cleanup function in loop.
> 
> By comparison, it is possible to see that 3/4 functions of ixgbe_xmit_cleanup are already being called before,
> it means that ixgbe_xmit_cleanup is not highly reusable, the repeat codes is so many.
> 
> 
> following code is the main code in ixgbe_xmit_cleanup() and the action code.
> 
>  1.
> 	last_desc_cleaned = txq->last_desc_cleaned;
> 
>  2.
> 	/* Determine the last descriptor needing to be cleaned */
> 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> 	if (desc_to_clean_to >= nb_tx_desc)
> 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
> 
> 	/* Check to make sure the last descriptor to clean is done */
> 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> 	status = txr[desc_to_clean_to].wb.status;
> 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
> 		PMD_TX_FREE_LOG(DEBUG,
> 		return -(1);
> 	}
> 3.
> 	/* Figure out how many descriptors will be cleaned */
> 	if (last_desc_cleaned > desc_to_clean_to)
> 		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
> 							desc_to_clean_to);
> 	else
> 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> 						last_desc_cleaned);
> 	txr[desc_to_clean_to].wb.status = 0;
> 4.
> 	/* Update the txq to reflect the last descriptor that was cleaned */
> 	txq->last_desc_cleaned = desc_to_clean_to;
> 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Tuesday, January 7, 2020 10:10 PM
> > To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> > Cc: Yang, Qiming <qiming.yang@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
> >
> >
> > Hi Chenxu,
> >
> > > > > > > > > + * tx_tail is the last sent packet on the sw_ring. Goto
> > > > > > > > > + the end
> > > > > > > > > + * of that packet (the last segment in the packet chain)
> > > > > > > > > + and
> > > > > > > > > + * then the next segment will be the start of the oldest
> > > > > > > > > + segment
> > > > > > > > > + * in the sw_ring.
> > > > > > > >
> > > > > > > > Not sure I understand the sentence above.
> > > > > > > > tx_tail is the value of TDT HW register (most recently armed by SW
> > TD).
> > > > > > > > last_id  is the index of last descriptor for multi-seg packet.
> > > > > > > > next_id is just the index of next descriptor in HW TD ring.
> > > > > > > > How do you conclude that it will be the ' oldest segment in the
> > sw_ring'?
> > > > > > > >
> > > > > > >
> > > > > > > The tx_tail is the last sent packet on the sw_ring. While the
> > > > > > > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free
> > > > > > > <
> > > > > > tx_free_thresh .
> > > > > > > So the sw_ring[tx_tail].next_id must be the begin of mbufs
> > > > > > > which are not used or  Already freed . then begin the loop
> > > > > > > until the mbuf is used and
> > > > > > begin to free them.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Another question why do you need to write your own functions?
> > > > > > > > Why can't you reuse existing ixgbe_xmit_cleanup() for
> > > > > > > > full(offload) path and
> > > > > > > > ixgbe_tx_free_bufs() for simple path?
> > > > > > > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least
> > > > > > > > it could be used to determine finished TX descriptors.
> > > > > > > > Based on that you can you can free appropriate sw_ring[] entries.
> > > > > > > >
> > > > > > >
> > > > > > > The reason why I don't reuse existing function is that they
> > > > > > > all free several mbufs While the free_cnt of the API
> > > > > > > rte_eth_tx_done_cleanup() is the
> > > > > > number of packets.
> > > > > > > It also need to be done that check which mbuffs are from the same
> > packet.
> > > > > >
> > > > > > At first, I don't see anything bad if tx_done_cleanup() will
> > > > > > free only some segments from the packet. As long as it is safe -
> > > > > > there is no
> > > > problem with that.
> > > > > > I think rte_eth_tx_done_cleanup() operates on mbuf, not packet
> > quantities.
> > > > > > But in our case I think it doesn't matter, as
> > > > > > ixgbe_xmit_cleanup() mark TXDs as free only when HW is done with all
> > TXDs for that packet.
> > > > > > As long as there is a way to reuse existing code and avoid
> > > > > > duplication (without introducing any degradation) - we should use it.
> > > > > > And I think there is a very good opportunity here to reuse
> > > > > > existing
> > > > > > ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> > > > > > Moreover because your code doesn’t follow
> > > > > > ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> > > > > > logic and infrastructure, it introduces unnecessary scans over
> > > > > > TXD ring, and in some cases doesn't work as expected:
> > > > > >
> > > > > > +while (1) {
> > > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > > +
> > > > > > +if (sw_ring[tx_last].mbuf) {
> > > > > > +if (txr[tx_last].wb.status &
> > > > > > +IXGBE_TXD_STAT_DD) {
> > > > > > ...
> > > > > > +} else {
> > > > > > +/*
> > > > > > + * mbuf still in use, nothing left to
> > > > > > + * free.
> > > > > > + */
> > > > > > +break;
> > > > > >
> > > > > > It is not correct to expect that IXGBE_TXD_STAT_DD will be set
> > > > > > on last TXD for
> > > > > > *every* packet.
> > > > > > We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> > > > > > Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> > > > > >
> > > > > > So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> > > > > > It would be much less error prone and will help to avoid code duplication.
> > > > > >
> > > > > > Konstantin
> > > > > >
> > > > >
> > > > > At first.
> > > > > The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will
> > > > > cleanup
> > > > TXD wb.status.
> > > > > the number of status cleanuped is always txq->tx_rs_thresh.
> > > >
> > > > Yes, and what's wrong with it?
> > > >
> > > > >
> > > > > The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that
> > > > > @param free_cnt
> > > > >  *   Maximum number of packets to free. Use 0 to indicate all possible
> > packets
> > > > >  *   should be freed. Note that a packet may be using multiple mbufs.
> > > >
> > > > I don't think it is a good approach, better would be to report
> > > > number of freed mbufs, but ok, as it is a public API, we probably need to
> > keep it as it is.
> > > >
> > > > > a number must be set while ixgbe_xmit_cleanup and
> > > > > ixgbe_tx_free_bufs only
> > > > have one parameter txq.
> > > >
> > > > Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
> > > > So if user requested more packets to be freed we can call
> > > > ixgbe_xmit_cleanup() in a loop.
> > > >
> > >
> > > That is a great idea and I discuss with my workmate about it today.
> > > there is also some Question that we don’t confirm.
> > > Actually it can call ixgbe_xmit_cleanup() in a loop if user requested
> > > more packets, How to deal with the MOD. For example:
> > > The default tx_rs_thresh is 32.if the count of mbufs we need free is 50.
> > > We can call ixgbe_xmit_cleanup() one time to free 32 mbufs.
> > > Then how about other 18 mbufs.
> > > 1.If we do nothing, it looks not good.
> > > 2.if we call ixgbe_xmit_cleanup() successfully, we free 14 mbufs more.
> > > 3.if we call ixgbe_xmit_cleanup() fail, the status of No.32 mbufs is not DD
> > >   While No .18 is DD. So we do not free 18 mbufs what we can and should free.
> > >
> > > We have try some plans about it, likes add parameter for
> > > ixgbe_xmit_cleanup(), change  Tx_rs_thresh or copy the code or
> > > ixgbe_xmit_cleanup() as a new function with more Parameter. But all of them
> > seem not perfect.
> > >
> > > So  can you give some comment about it? It seems not easy as we think by
> > reuse function.
> >
> > My thought about it:
> > for situations when cnt % rs_thresh != 0 we'll still call ixgbe_xmit_cleanup() cnt /
> > rs_thresh + 1 times.
> > But then, we can free just cnt % rs_thresh mbufs, while keeping rest of them
> > intact.
> >
> > Let say at some moment we have txq->nb_tx_free==0, and user called
> > tx_done_cleanup(txq, ..., cnt==50) So we call ixgbe_xmit_cleanup(txq) first time.
> > Suppose it frees 32 TXDs, then we walk through corresponding sw_ring[] entries
> > and let say free 32 packets (one mbuf per packet).
> > Then we call ixgbe_xmit_cleanup(txq)  second time.
> > Suppose it will free another 32 TXDs, we again walk thour sw_ring[], but free
> > only first 18 mbufs and return.
> > Suppose we call tx_done_cleanup(txq, cnt=50) immediately again.
> > Now txq->nb_tx_free==64, so we can start to scan sw_entries[] from tx_tail
> > straightway. We'll skip first 50 entries as they are already empty, then free
> > remaining 14 mbufs, then will call ixgbe_xmit_cleanup(txq) again, and if it would
> > be successful, will scan and free another 32 sw_ring[] entries.
> > Then again will call ixgbe_xmit_cleanup(txq), but will free
> > only first 8 available sw_ring[].mbuf.
> > Probably a bit easier with the code:
> >
> > tx_done_cleanup(..., uint32_t cnt)
> > {
> >        swr = txq->sw_ring;
> >        txr     = txq->tx_ring;
> >        id   = txq->tx_tail;
> >
> >       if (txq->nb_tx_free == 0)
> >          ixgbe_xmit_cleanup(txq);
> >
> >       free =  txq->nb_tx_free;
> >       for (n = 0; n < cnt && free != 0; ) {
> >
> >           for (j = 0; j != free && n < cnt; j++) {
> >              swe = &swr[id + j];
> >              if (swe->mbuf != NULL) {
> >                    rte_pktmbuf_free_seg(swe->mbuf);
> >                    swe->mbuf = NULL;
> >
> >                   /* last segment in the packet, increment packet count */
> >                   n += (swe->last_id == id + j);
> >              }
> >           }
> >
> >           if (n < cnt) {
> >                ixgbe_xmit_cleanup(txq);
> >                free =   txq->nb_tx_free - free;
> >           }
> >      }
> >      return n;
> > }
> >
> > For the situation when there are less then rx_thresh free TXDs ((txq-
> > >tx_ring[desc_to_clean_to].wb.status & IXGBE_TXD_STAT_DD) == 0) we do
> > nothing - in that case we consider there are no more mbufs to free.
> >
> > >
> > >
> > > > > And what should do is not only free buffers and status but also
> > > > > check which bufs are from  One packet and count the packet freed.
> > > >
> > > > ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
> > > > It only cleans up TXDs.
> > > > So in tx_done_cleanup() after calling ixgbe_xmit_cleanup() you'll
> > > > still need to go through sw_ring[] entries that correspond to free TXDs and
> > call mbuf_seg_free().
> > > > You can count number of full packets here.
> > > >
> > > > > So I think it can't be implemented that reuse function
> > > > > xmit_cleanup without
> > > > change it.
> > > > > And create a new function with the code of xmit_cleanup will cause
> > > > > many
> > > > duplication.
> > > >
> > > > I don't think it would.
> > > > I think all we need here is something like that (note it is
> > > > schematic one, doesn't take into accounf that TXD ring is circular):
> > > >
> > > > tx_done_cleanup(..., uint32_t cnt)
> > > > {
> > > >       /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
> > > >            Scan them first and free as many mbufs as we can.
> > > >            If we need more mbufs to free call  ixgbe_xmit_cleanup()
> > > >            to free more TXDs. */
> > > >
> > > >        swr = txq->sw_ring;
> > > >        txr     = txq->tx_ring;
> > > >        id   = txq->tx_tail;
> > > >        free =  txq->nb_tx_free;
> > > >
> > > >        for (n = 0; n < cnt && free != 0; ) {
> > > >
> > > >           for (j = 0; j != free && n < cnt; j++) {
> > > >              swe = &swr[id + j];
> > > >              if (swe->mbuf != NULL) {
> > > >                    rte_pktmbuf_free_seg(swe->mbuf);
> > > >                    swe->mbuf = NULL;
> > > >              }
> > > >              n += (swe->last_id == id + j)
> > > >           }
> > > >
> > > >           if (n < cnt) {
> > > >                ixgbe_xmit_cleanup(txq);
> > > >                free =   txq->nb_tx_free - free;
> > > >           }
> > > >      }
> > > >      return n;
> > > > }
> > > >
> > > > >
> > > > > Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().
> > > >
> > > > Totally disagree, see above.
> > > >
> > > > >
> > > > > Second.
> > > > > The function in patch is copy from code in igb_rxtx.c. it already
> > > > > updated in 2017, The commit id is
> > > > 8d907d2b79f7a54c809f1c44970ff455fa2865e1.
> > > >
> > > > I realized that.
> > > > But I think it as a problem, not a positive thing.
> > > > While they do have some similarities, igb abd ixgbe are PMDs for
> > > > different devices, and their TX code differs quite a lot. Let say
> > > > igb doesn't use tx_rs_threshold, but instead set RS bit for each last TXD.
> > > > So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't
> > > > look like a good idea to me.
> > > >
> > > > > I trust the logic of code is right.
> > > > > Actually it don't complete for ixgbe, i40e and ice, while it don't
> > > > > change the value of last_desc_cleaned and tx_next_dd. And it's
> > > > > beginning prefer last_desc_cleaned or  tx_next_dd(for offload or
> > > > > simple) to
> > > > tx_tail.
> > > > >
> > > > > So, I suggest to use the old function and fix the issue.
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >This is the first packet that will be
> > > > > > > > > + * attempted to be freed.
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +/* Get last segment in most recently added packet. */
> > > > > > > > > +tx_last = sw_ring[txq->tx_tail].last_id;
> > > > > > > > > +
> > > > > > > > > +/* Get the next segment, which is the oldest segment in ring.
> > > > > > > > > +*/ tx_first = sw_ring[tx_last].next_id;
> > > > > > > > > +
> > > > > > > > > +/* Set the current index to the first. */ tx_id =
> > > > > > > > > +tx_first;
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Loop through each packet. For each packet, verify that
> > > > > > > > > +an
> > > > > > > > > + * mbuf exists and that the last segment is free. If so,
> > > > > > > > > +free
> > > > > > > > > + * it and move on.
> > > > > > > > > + */
> > > > > > > > > +while (1) {
> > > > > > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > > > > > +
> > > > > > > > > +if (sw_ring[tx_last].mbuf) { if (!(txr[tx_last].wb.status
> > > > > > > > > +&
> > > > > > > > > +IXGBE_TXD_STAT_DD))
> > > > > > > > > +break;
> > > > > > > > > +
> > > > > > > > > +/* Get the start of the next packet. */ tx_next =
> > > > > > > > > +sw_ring[tx_last].next_id;
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Loop through all segments in a
> > > > > > > > > + * packet.
> > > > > > > > > + */
> > > > > > > > > +do {
> > > > > > > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > > > > > > +sw_ring[tx_id].mbuf = NULL; sw_ring[tx_id].last_id =
> > > > > > > > > +tx_id;
> > > > > > > > > +
> > > > > > > > > +/* Move to next segment. */ tx_id =
> > > > > > > > > +sw_ring[tx_id].next_id;
> > > > > > > > > +
> > > > > > > > > +} while (tx_id != tx_next);
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Increment the number of packets
> > > > > > > > > + * freed.
> > > > > > > > > + */
> > > > > > > > > +count++;
> > > > > > > > > +
> > > > > > > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > > > > > > +/*
> > > > > > > > > + * There are multiple reasons to be here:
> > > > > > > > > + * 1) All the packets on the ring have been
> > > > > > > > > + *    freed - tx_id is equal to tx_first
> > > > > > > > > + *    and some packets have been freed.
> > > > > > > > > + *    - Done, exit
> > > > > > > > > + * 2) Interfaces has not sent a rings worth of
> > > > > > > > > + *    packets yet, so the segment after tail is
> > > > > > > > > + *    still empty. Or a previous call to this
> > > > > > > > > + *    function freed some of the segments but
> > > > > > > > > + *    not all so there is a hole in the list.
> > > > > > > > > + *    Hopefully this is a rare case.
> > > > > > > > > + *    - Walk the list and find the next mbuf. If
> > > > > > > > > + *      there isn't one, then done.
> > > > > > > > > + */
> > > > > > > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Walk the list and find the next mbuf, if any.
> > > > > > > > > + */
> > > > > > > > > +do {
> > > > > > > > > +/* Move to next segment. */ tx_id =
> > > > > > > > > +sw_ring[tx_id].next_id;
> > > > > > > > > +
> > > > > > > > > +if (sw_ring[tx_id].mbuf)
> > > > > > > > > +break;
> > > > > > > > > +
> > > > > > > > > +} while (tx_id != tx_first);
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Determine why previous loop bailed. If there
> > > > > > > > > + * is not an mbuf, done.
> > > > > > > > > + */
> > > > > > > > > +if (sw_ring[tx_id].mbuf == NULL) break; } }
> > > > > > > > > +
> > > > > > > > > +return count;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static void __attribute__((cold))
> > > > > > > > > ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff
> > > > > > > > > --git a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > > > b/drivers/net/ixgbe/ixgbe_rxtx.h index
> > > > > > > > > 505d344b9..2c3770af6
> > > > > > > > > 100644
> > > > > > > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > > > > @@ -285,6 +285,8 @@ int
> > > > > > > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > > > > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct
> > > > > > > > > ixgbe_rx_queue *rxq);  void
> > > > > > > > > ixgbe_rx_queue_release_mbufs_vec(struct
> > > > > > > > > ixgbe_rx_queue *rxq);
> > > > > > > > >
> > > > > > > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > > > > > > +
> > > > > > > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > > > > > > >  extern const uint32_t
> > > > > > > > > ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.17.1
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 2c6fd0f13..0091405db 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -601,6 +601,7 @@  static const struct eth_dev_ops ixgbe_eth_dev_ops = {
 	.udp_tunnel_port_add  = ixgbe_dev_udp_tunnel_port_add,
 	.udp_tunnel_port_del  = ixgbe_dev_udp_tunnel_port_del,
 	.tm_ops_get           = ixgbe_tm_ops_get,
+	.tx_done_cleanup      = ixgbe_tx_done_cleanup,
 };
 
 /*
@@ -649,6 +650,7 @@  static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
 	.reta_query           = ixgbe_dev_rss_reta_query,
 	.rss_hash_update      = ixgbe_dev_rss_hash_update,
 	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
+	.tx_done_cleanup      = ixgbe_tx_done_cleanup,
 };
 
 /* store statistics names and its offset in stats structure */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index fa572d184..520b9c756 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2306,6 +2306,122 @@  ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue *txq)
 	}
 }
 
+int ixgbe_tx_done_cleanup(void *q, uint32_t free_cnt)
+{
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)q;
+	struct ixgbe_tx_entry *sw_ring;
+	volatile union ixgbe_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq == NULL)
+		return -ENODEV;
+
+	count = 0;
+	sw_ring = txq->sw_ring;
+	txr = txq->tx_ring;
+
+	/*
+	 * tx_tail is the last sent packet on the sw_ring. Goto the end
+	 * of that packet (the last segment in the packet chain) and
+	 * then the next segment will be the start of the oldest segment
+	 * in the sw_ring. This is the first packet that will be
+	 * attempted to be freed.
+	 */
+
+	/* Get last segment in most recently added packet. */
+	tx_last = sw_ring[txq->tx_tail].last_id;
+
+	/* Get the next segment, which is the oldest segment in ring. */
+	tx_first = sw_ring[tx_last].next_id;
+
+	/* Set the current index to the first. */
+	tx_id = tx_first;
+
+	/*
+	 * Loop through each packet. For each packet, verify that an
+	 * mbuf exists and that the last segment is free. If so, free
+	 * it and move on.
+	 */
+	while (1) {
+		tx_last = sw_ring[tx_id].last_id;
+
+		if (sw_ring[tx_last].mbuf) {
+			if (!(txr[tx_last].wb.status &
+				IXGBE_TXD_STAT_DD))
+				break;
+
+			/* Get the start of the next packet. */
+			tx_next = sw_ring[tx_last].next_id;
+
+			/*
+			 * Loop through all segments in a
+			 * packet.
+			 */
+			do {
+				rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+				sw_ring[tx_id].mbuf = NULL;
+				sw_ring[tx_id].last_id = tx_id;
+
+				/* Move to next segment. */
+				tx_id = sw_ring[tx_id].next_id;
+
+			} while (tx_id != tx_next);
+
+			/*
+			 * Increment the number of packets
+			 * freed.
+			 */
+			count++;
+
+			if (unlikely(count == (int)free_cnt))
+				break;
+		} else {
+			/*
+			 * There are multiple reasons to be here:
+			 * 1) All the packets on the ring have been
+			 *    freed - tx_id is equal to tx_first
+			 *    and some packets have been freed.
+			 *    - Done, exit
+			 * 2) Interfaces has not sent a rings worth of
+			 *    packets yet, so the segment after tail is
+			 *    still empty. Or a previous call to this
+			 *    function freed some of the segments but
+			 *    not all so there is a hole in the list.
+			 *    Hopefully this is a rare case.
+			 *    - Walk the list and find the next mbuf. If
+			 *      there isn't one, then done.
+			 */
+			if (likely(tx_id == tx_first && count != 0))
+				break;
+
+			/*
+			 * Walk the list and find the next mbuf, if any.
+			 */
+			do {
+				/* Move to next segment. */
+				tx_id = sw_ring[tx_id].next_id;
+
+				if (sw_ring[tx_id].mbuf)
+					break;
+
+			} while (tx_id != tx_first);
+
+			/*
+			 * Determine why previous loop bailed. If there
+			 * is not an mbuf, done.
+			 */
+			if (sw_ring[tx_id].mbuf == NULL)
+				break;
+		}
+	}
+
+	return count;
+}
+
 static void __attribute__((cold))
 ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)
 {
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 505d344b9..2c3770af6 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -285,6 +285,8 @@  int ixgbe_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
 int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq);
 void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq);
 
+int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
 extern const uint32_t ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];