[v2,1/3] ethdev: introduce pool sort capability

Message ID 20220812172451.1208933-1-hpothula@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2,1/3] ethdev: introduce pool sort capability |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Hanumanth Pothula Aug. 12, 2022, 5:24 p.m. UTC
  Presently, the 'Buffer Split' feature supports sending multiple
segments of the received packet to PMD, which programs the HW
to receive the packet in segments from different pools.

This patch extends the feature to support the pool sort capability.
Some of the HW has support for choosing memory pools based on the
packet's size. The pool sort capability allows PMD to choose a
memory pool based on the packet's length.

This is often useful for saving the memory where the application
can create a different pool to steer the specific size of the
packet, thus enabling effective use of memory.

For example, let's say HW has a capability of three pools,
 - pool-1 size is 2K
 - pool-2 size is > 2K and < 4K
 - pool-3 size is > 4K
Here,
        pool-1 can accommodate packets with sizes < 2K
        pool-2 can accommodate packets with sizes > 2K and < 4K
        pool-3 can accommodate packets with sizes > 4K

With pool sort capability enabled in SW, an application may create
three pools of different sizes and send them to PMD. Allowing PMD
to program HW based on packet lengths. So that packets with less
than 2K are received on pool-1, packets with lengths between 2K
and 4K are received on pool-2 and finally packets greater than 4K
are received on pool-3.

The following two capabilities are added to the rte_eth_rxseg_capa
structure,
1. pool_sort --> tells pool sort capability is supported by HW.
2. max_npool --> max number of pools supported by HW.

Defined new structure rte_eth_rxseg_sort, to be used only when pool
sort capability is present. If required this may be extended further
to support more configurations.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

v2:
 - Along with spec changes, uploading testpmd and driver changes.
---
 lib/ethdev/rte_ethdev.c | 87 +++++++++++++++++++++++++++++++++++------
 lib/ethdev/rte_ethdev.h | 45 +++++++++++++++++++--
 2 files changed, 118 insertions(+), 14 deletions(-)
  

Comments

Ding, Xuan Aug. 23, 2022, 3:26 a.m. UTC | #1
Hi Hanumanth,

> -----Original Message-----
> From: Hanumanth Pothula <hpothula@marvell.com>
> Sent: Saturday, August 13, 2022 1:25 AM
> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; viacheslavo@nvidia.com; jerinj@marvell.com;
> ndabilpuram@marvell.com; Hanumanth Pothula <hpothula@marvell.com>
> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
> 
> Presently, the 'Buffer Split' feature supports sending multiple segments of
> the received packet to PMD, which programs the HW to receive the packet in
> segments from different pools.
> 
> This patch extends the feature to support the pool sort capability.
> Some of the HW has support for choosing memory pools based on the
> packet's size. The pool sort capability allows PMD to choose a memory pool
> based on the packet's length.
> 
> This is often useful for saving the memory where the application can create a
> different pool to steer the specific size of the packet, thus enabling effective
> use of memory.
> 
> For example, let's say HW has a capability of three pools,
>  - pool-1 size is 2K
>  - pool-2 size is > 2K and < 4K
>  - pool-3 size is > 4K
> Here,
>         pool-1 can accommodate packets with sizes < 2K
>         pool-2 can accommodate packets with sizes > 2K and < 4K
>         pool-3 can accommodate packets with sizes > 4K
> 
> With pool sort capability enabled in SW, an application may create three
> pools of different sizes and send them to PMD. Allowing PMD to program
> HW based on packet lengths. So that packets with less than 2K are received
> on pool-1, packets with lengths between 2K and 4K are received on pool-2
> and finally packets greater than 4K are received on pool-3.
> 
> The following two capabilities are added to the rte_eth_rxseg_capa structure,
> 1. pool_sort --> tells pool sort capability is supported by HW.
> 2. max_npool --> max number of pools supported by HW.
> 
> Defined new structure rte_eth_rxseg_sort, to be used only when pool sort
> capability is present. If required this may be extended further to support
> more configurations.
> 
> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> 
> v2:
>  - Along with spec changes, uploading testpmd and driver changes.

Thanks for CCing. It's an interesting feature.

But I have one question here:
Buffer split is for split receiving packets into multiple segments, while pool sort supports
PMD to put the receiving packets into different pools according to packet size.
Every packet is still intact.

So, at this level, pool sort does not belong to buffer split.
And you already use a different function to check pool sort rather than check buffer split.

Should a new RX offload be introduced? like "RTE_ETH_RX_OFFLOAD_POOL_SORT".

> ---
>  lib/ethdev/rte_ethdev.c | 87 +++++++++++++++++++++++++++++++++++------
>  lib/ethdev/rte_ethdev.h | 45 +++++++++++++++++++--
>  2 files changed, 118 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 1979dc0850..7fd5443eb8 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1635,7 +1635,55 @@ rte_eth_dev_is_removed(uint16_t port_id)  }
> 
>  static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> +rte_eth_rx_queue_check_sort(const struct rte_eth_rxseg *rx_seg,
> +			     uint16_t n_seg, uint32_t *mbp_buf_size,
> +			     const struct rte_eth_dev_info *dev_info) {
> +	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> +	uint16_t seg_idx;
> +
> +	if (!seg_capa->multi_pools || n_seg > seg_capa->max_npool) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid capabilities, multi_pools:%d differnt
> length segments %u exceed supported %u\n",
> +			       seg_capa->multi_pools, n_seg, seg_capa-
> >max_nseg);
> +		return -EINVAL;
> +	}
> +
> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +		struct rte_mempool *mpl = rx_seg[seg_idx].sort.mp;
> +		uint32_t length = rx_seg[seg_idx].sort.length;
> +
> +		if (mpl == NULL) {
> +			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> +			return -EINVAL;
> +		}
> +
> +		if (mpl->private_data_size <
> +			sizeof(struct rte_pktmbuf_pool_private)) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "%s private_data_size %u < %u\n",
> +				       mpl->name, mpl->private_data_size,
> +				       (unsigned int)sizeof
> +					(struct rte_pktmbuf_pool_private));
> +			return -ENOSPC;
> +		}
> +
> +		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> +		length = length != 0 ? length : (*mbp_buf_size -
> RTE_PKTMBUF_HEADROOM);
> +		if (*mbp_buf_size < length + RTE_PKTMBUF_HEADROOM) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "%s mbuf_data_room_size %u < %u))\n",
> +				       mpl->name, *mbp_buf_size,
> +				       length);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +rte_eth_rx_queue_check_split(const struct rte_eth_rxseg *rx_seg,
>  			     uint16_t n_seg, uint32_t *mbp_buf_size,
>  			     const struct rte_eth_dev_info *dev_info)  { @@ -
> 1654,12 +1702,12 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  	 * Check the sizes and offsets against buffer sizes
>  	 * for each segment specified in extended configuration.
>  	 */
> -	mp_first = rx_seg[0].mp;
> +	mp_first = rx_seg[0].split.mp;
>  	offset_mask = RTE_BIT32(seg_capa->offset_align_log2) - 1;
>  	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> -		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> -		uint32_t length = rx_seg[seg_idx].length;
> -		uint32_t offset = rx_seg[seg_idx].offset;
> +		struct rte_mempool *mpl = rx_seg[seg_idx].split.mp;
> +		uint32_t length = rx_seg[seg_idx].split.length;
> +		uint32_t offset = rx_seg[seg_idx].split.offset;
> 
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1693,7 +1741,11 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> +		/* On segment length == 0, update segment's length with
> +		 * the pool's length - headeroom space, to make sure enough
> +		 * space is accomidate for header.
> +		 **/
> +		length = length != 0 ? length : (*mbp_buf_size -
> +RTE_PKTMBUF_HEADROOM);
>  		if (*mbp_buf_size < length + offset) {
>  			RTE_ETHDEV_LOG(ERR,
>  				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n", @@ -1764,7 +1816,6 @@
> rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  			return -EINVAL;
>  		}
>  	} else {
> -		const struct rte_eth_rxseg_split *rx_seg;
>  		uint16_t n_seg;
> 
>  		/* Extended multi-segment configuration check. */ @@ -
> 1774,13 +1825,27 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t
> rx_queue_id,
>  			return -EINVAL;
>  		}
> 
> -		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
>  		n_seg = rx_conf->rx_nseg;
> 
>  		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> -							   &mbp_buf_size,
> -							   &dev_info);
> +			ret = -1; /* To make sure at least one of below
> conditions becomes
> +true */
> +
> +			/* Check both NIX and application supports buffer-
> split capability */
> +			if (dev_info.rx_seg_capa.mode_split &&
> +			    rx_conf->mode_flag ==
> RTE_ETH_RXSEG_MODE_SPLIT) {
> +				ret = rte_eth_rx_queue_check_split(rx_conf-
> >rx_seg, n_seg,
> +
> &mbp_buf_size,
> +								   &dev_info);
> +			}
> +
> +			/* Check both NIX and application supports pool-sort
> capability */
> +			if (dev_info.rx_seg_capa.mode_sort &&
> +			    rx_conf->mode_flag ==
> RTE_ETH_RXSEG_MODE_SORT) {
> +				ret = rte_eth_rx_queue_check_sort(rx_conf-
> >rx_seg, n_seg,
> +
> &mbp_buf_size,
> +								  &dev_info);
> +			}
> +
>  			if (ret != 0)
>  				return ret;
>  		} else {
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> de9e970d4d..9f6787d7ad 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1204,16 +1204,46 @@ struct rte_eth_rxseg_split {
>  	uint32_t reserved; /**< Reserved field. */  };
> 
> +/**
> + * The pool sort capability allows PMD to choose a memory pool based on
> +the
> + * packet's length. So, basically, PMD programs HW for receiving
> +packets from
> + * different pools, based on the packet's length.
> + *
> + * This is often useful for saving the memory where the application can
> +create
> + * a different pool to steer the specific size of the packet, thus
> +enabling
> + * effective use of memory.
> + */
> +struct rte_eth_rxseg_sort {
> +	struct rte_mempool *mp; /**< Memory pool to allocate packets
> from. */
> +	uint16_t length; /**< Packet data length. */
> +	uint32_t reserved; /**< Reserved field. */ };
> +
> +enum rte_eth_rxseg_mode {
> +	/**
> +	 * Buffer split mode: PMD split the received packets into multiple
> segments.
> +	 * @see struct rte_eth_rxseg_split
> +	 */
> +	RTE_ETH_RXSEG_MODE_SPLIT = RTE_BIT64(0),
> +	/**
> +	 * Pool sort mode: PMD to chooses a memory pool based on the
> packet's length.
> +	 * @see struct rte_eth_rxseg_sort
> +	 */
> +	RTE_ETH_RXSEG_MODE_SORT  = RTE_BIT64(1), };
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this structure may change without prior notice.
>   *
>   * A common structure used to describe Rx packet segment properties.
>   */
> -union rte_eth_rxseg {
> +struct rte_eth_rxseg {
>  	/* The settings for buffer split offload. */
>  	struct rte_eth_rxseg_split split;
> -	/* The other features settings should be added here. */
> +
> +	/*The settings for packet sort offload. */
> +	struct rte_eth_rxseg_sort sort;
>  };
> 
>  /**
> @@ -1239,6 +1269,11 @@ struct rte_eth_rxconf {
>  	 * fields on rte_eth_dev_info structure are allowed to be set.
>  	 */
>  	uint64_t offloads;
> +	/**
> +	 * PMD may support more than one rxseg mode. This allows
> application
> +	 * to chose which mode to enable.
> +	 */
> +	enum rte_eth_rxseg_mode mode_flag;
>  	/**
>  	 * Points to the array of segment descriptions for an entire packet.
>  	 * Array elements are properties for consecutive Rx segments.
> @@ -1246,7 +1281,7 @@ struct rte_eth_rxconf {
>  	 * The supported capabilities of receiving segmentation is reported
>  	 * in rte_eth_dev_info.rx_seg_capa field.
>  	 */
> -	union rte_eth_rxseg *rx_seg;
> +	struct rte_eth_rxseg *rx_seg;
> 
>  	uint64_t reserved_64s[2]; /**< Reserved for future fields */
>  	void *reserved_ptrs[2];   /**< Reserved for future fields */
> @@ -1827,10 +1862,14 @@ struct rte_eth_switch_info {
>   */
>  struct rte_eth_rxseg_capa {
>  	__extension__
> +	uint32_t mode_split : 1; /**< Supports buffer split capability @see
> struct rte_eth_rxseg_split */
> +	uint32_t mode_sort : 1; /**< Supports pool sort capability @see
> struct

The same doubt here. As I know, the 'rte_eth_rxseg_capa' structure is used for buffer split.

Thanks,
Xuan

> +rte_eth_rxseg_sort */
>  	uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/
>  	uint32_t offset_allowed:1; /**< Supports buffer offsets. */
>  	uint32_t offset_align_log2:4; /**< Required offset alignment. */
>  	uint16_t max_nseg; /**< Maximum amount of segments to split. */
> +	/* < Maximum amount of pools that PMD can sort based on
> packet/segment lengths */
> +	uint16_t max_npool;
>  	uint16_t reserved; /**< Reserved field. */  };
> 
> --
> 2.25.1
  
Ferruh Yigit Aug. 24, 2022, 3:33 p.m. UTC | #2
On 8/23/2022 4:26 AM, Ding, Xuan wrote:
> Hi Hanumanth,
> 
>> -----Original Message-----
>> From: Hanumanth Pothula <hpothula@marvell.com>
>> Sent: Saturday, August 13, 2022 1:25 AM
>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
>> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; viacheslavo@nvidia.com; jerinj@marvell.com;
>> ndabilpuram@marvell.com; Hanumanth Pothula <hpothula@marvell.com>
>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>
>> Presently, the 'Buffer Split' feature supports sending multiple segments of
>> the received packet to PMD, which programs the HW to receive the packet in
>> segments from different pools.
>>
>> This patch extends the feature to support the pool sort capability.
>> Some of the HW has support for choosing memory pools based on the
>> packet's size. The pool sort capability allows PMD to choose a memory pool
>> based on the packet's length.
>>
>> This is often useful for saving the memory where the application can create a
>> different pool to steer the specific size of the packet, thus enabling effective
>> use of memory.
>>
>> For example, let's say HW has a capability of three pools,
>>   - pool-1 size is 2K
>>   - pool-2 size is > 2K and < 4K
>>   - pool-3 size is > 4K
>> Here,
>>          pool-1 can accommodate packets with sizes < 2K
>>          pool-2 can accommodate packets with sizes > 2K and < 4K
>>          pool-3 can accommodate packets with sizes > 4K
>>
>> With pool sort capability enabled in SW, an application may create three
>> pools of different sizes and send them to PMD. Allowing PMD to program
>> HW based on packet lengths. So that packets with less than 2K are received
>> on pool-1, packets with lengths between 2K and 4K are received on pool-2
>> and finally packets greater than 4K are received on pool-3.
>>
>> The following two capabilities are added to the rte_eth_rxseg_capa structure,
>> 1. pool_sort --> tells pool sort capability is supported by HW.
>> 2. max_npool --> max number of pools supported by HW.
>>
>> Defined new structure rte_eth_rxseg_sort, to be used only when pool sort
>> capability is present. If required this may be extended further to support
>> more configurations.
>>
>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
>>
>> v2:
>>   - Along with spec changes, uploading testpmd and driver changes.
> 
> Thanks for CCing. It's an interesting feature.
> 
> But I have one question here:
> Buffer split is for split receiving packets into multiple segments, while pool sort supports
> PMD to put the receiving packets into different pools according to packet size.
> Every packet is still intact.
> 
> So, at this level, pool sort does not belong to buffer split.
> And you already use a different function to check pool sort rather than check buffer split.
> 
> Should a new RX offload be introduced? like "RTE_ETH_RX_OFFLOAD_POOL_SORT".
> 

Hi Hanumanth,

I had the similar concern with the feature. I assume you want to benefit 
from exiting config structure that gets multiple mempool as argument, 
since this feature also needs multiple mempools, but the feature is 
different.

It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to decide 
if to receive into multiple mempool or not, which doesn't have anything 
related split. Also not sure about using the 'sort' keyword.
What do you think to introduce new fetaure, instead of extending 
existing split one?
This is optimisation, right? To enable us to use less memory for the 
packet buffer, does it qualify to a device offload?


Also, what is the relation with segmented Rx, how a PMD decide to use 
segmented Rx or bigger mempool? How can application can configure this?

Need to clarify the rules, based on your sample, if a 512 bytes packet 
received, does it have to go pool-1, or can it go to any of three pools?


And I don't see any change in the 'net/cnxk' Rx burst code, when 
multiple mempool used, while filling the mbufs shouldn't it check which 
mempool is filled. How this works without update in the Rx burst code, 
or am I missing some implementation detail?


>> ---
>>   lib/ethdev/rte_ethdev.c | 87 +++++++++++++++++++++++++++++++++++------
>>   lib/ethdev/rte_ethdev.h | 45 +++++++++++++++++++--
>>   2 files changed, 118 insertions(+), 14 deletions(-)
>>
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>> 1979dc0850..7fd5443eb8 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -1635,7 +1635,55 @@ rte_eth_dev_is_removed(uint16_t port_id)  }
>>
>>   static int
>> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>> +rte_eth_rx_queue_check_sort(const struct rte_eth_rxseg *rx_seg,
>> +			     uint16_t n_seg, uint32_t *mbp_buf_size,
>> +			     const struct rte_eth_dev_info *dev_info) {
>> +	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
>>> rx_seg_capa;
>> +	uint16_t seg_idx;
>> +
>> +	if (!seg_capa->multi_pools || n_seg > seg_capa->max_npool) {
>> +		RTE_ETHDEV_LOG(ERR,
>> +			       "Invalid capabilities, multi_pools:%d differnt
>> length segments %u exceed supported %u\n",
>> +			       seg_capa->multi_pools, n_seg, seg_capa-
>>> max_nseg);
>> +		return -EINVAL;
>> +	}
>> +
>> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
>> +		struct rte_mempool *mpl = rx_seg[seg_idx].sort.mp;
>> +		uint32_t length = rx_seg[seg_idx].sort.length;
>> +
>> +		if (mpl == NULL) {
>> +			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> +			return -EINVAL;
>> +		}
>> +
>> +		if (mpl->private_data_size <
>> +			sizeof(struct rte_pktmbuf_pool_private)) {
>> +			RTE_ETHDEV_LOG(ERR,
>> +				       "%s private_data_size %u < %u\n",
>> +				       mpl->name, mpl->private_data_size,
>> +				       (unsigned int)sizeof
>> +					(struct rte_pktmbuf_pool_private));
>> +			return -ENOSPC;
>> +		}
>> +
>> +		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>> +		length = length != 0 ? length : (*mbp_buf_size -
>> RTE_PKTMBUF_HEADROOM);
>> +		if (*mbp_buf_size < length + RTE_PKTMBUF_HEADROOM) {
>> +			RTE_ETHDEV_LOG(ERR,
>> +				       "%s mbuf_data_room_size %u < %u))\n",
>> +				       mpl->name, *mbp_buf_size,
>> +				       length);
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +rte_eth_rx_queue_check_split(const struct rte_eth_rxseg *rx_seg,
>>   			     uint16_t n_seg, uint32_t *mbp_buf_size,
>>   			     const struct rte_eth_dev_info *dev_info)  { @@ -
>> 1654,12 +1702,12 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>   	 * Check the sizes and offsets against buffer sizes
>>   	 * for each segment specified in extended configuration.
>>   	 */
>> -	mp_first = rx_seg[0].mp;
>> +	mp_first = rx_seg[0].split.mp;
>>   	offset_mask = RTE_BIT32(seg_capa->offset_align_log2) - 1;
>>   	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
>> -		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>> -		uint32_t length = rx_seg[seg_idx].length;
>> -		uint32_t offset = rx_seg[seg_idx].offset;
>> +		struct rte_mempool *mpl = rx_seg[seg_idx].split.mp;
>> +		uint32_t length = rx_seg[seg_idx].split.length;
>> +		uint32_t offset = rx_seg[seg_idx].split.offset;
>>
>>   		if (mpl == NULL) {
>>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> @@ -1693,7 +1741,11 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>   		}
>>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>> -		length = length != 0 ? length : *mbp_buf_size;
>> +		/* On segment length == 0, update segment's length with
>> +		 * the pool's length - headeroom space, to make sure enough
>> +		 * space is accomidate for header.
>> +		 **/
>> +		length = length != 0 ? length : (*mbp_buf_size -
>> +RTE_PKTMBUF_HEADROOM);
>>   		if (*mbp_buf_size < length + offset) {
>>   			RTE_ETHDEV_LOG(ERR,
>>   				       "%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n", @@ -1764,7 +1816,6 @@
>> rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>>   			return -EINVAL;
>>   		}
>>   	} else {
>> -		const struct rte_eth_rxseg_split *rx_seg;
>>   		uint16_t n_seg;
>>
>>   		/* Extended multi-segment configuration check. */ @@ -
>> 1774,13 +1825,27 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t
>> rx_queue_id,
>>   			return -EINVAL;
>>   		}
>>
>> -		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
>>   		n_seg = rx_conf->rx_nseg;
>>
>>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
>> {
>> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>> -							   &mbp_buf_size,
>> -							   &dev_info);
>> +			ret = -1; /* To make sure at least one of below
>> conditions becomes
>> +true */
>> +
>> +			/* Check both NIX and application supports buffer-
>> split capability */
>> +			if (dev_info.rx_seg_capa.mode_split &&
>> +			    rx_conf->mode_flag ==
>> RTE_ETH_RXSEG_MODE_SPLIT) {
>> +				ret = rte_eth_rx_queue_check_split(rx_conf-
>>> rx_seg, n_seg,
>> +
>> &mbp_buf_size,
>> +								   &dev_info);
>> +			}
>> +
>> +			/* Check both NIX and application supports pool-sort
>> capability */
>> +			if (dev_info.rx_seg_capa.mode_sort &&
>> +			    rx_conf->mode_flag ==
>> RTE_ETH_RXSEG_MODE_SORT) {
>> +				ret = rte_eth_rx_queue_check_sort(rx_conf-
>>> rx_seg, n_seg,
>> +
>> &mbp_buf_size,
>> +								  &dev_info);
>> +			}
>> +
>>   			if (ret != 0)
>>   				return ret;
>>   		} else {
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>> de9e970d4d..9f6787d7ad 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -1204,16 +1204,46 @@ struct rte_eth_rxseg_split {
>>   	uint32_t reserved; /**< Reserved field. */  };
>>
>> +/**
>> + * The pool sort capability allows PMD to choose a memory pool based on
>> +the
>> + * packet's length. So, basically, PMD programs HW for receiving
>> +packets from
>> + * different pools, based on the packet's length.
>> + *
>> + * This is often useful for saving the memory where the application can
>> +create
>> + * a different pool to steer the specific size of the packet, thus
>> +enabling
>> + * effective use of memory.
>> + */
>> +struct rte_eth_rxseg_sort {
>> +	struct rte_mempool *mp; /**< Memory pool to allocate packets
>> from. */
>> +	uint16_t length; /**< Packet data length. */
>> +	uint32_t reserved; /**< Reserved field. */ };
>> +
>> +enum rte_eth_rxseg_mode {
>> +	/**
>> +	 * Buffer split mode: PMD split the received packets into multiple
>> segments.
>> +	 * @see struct rte_eth_rxseg_split
>> +	 */
>> +	RTE_ETH_RXSEG_MODE_SPLIT = RTE_BIT64(0),
>> +	/**
>> +	 * Pool sort mode: PMD to chooses a memory pool based on the
>> packet's length.
>> +	 * @see struct rte_eth_rxseg_sort
>> +	 */
>> +	RTE_ETH_RXSEG_MODE_SORT  = RTE_BIT64(1), };
>> +
>>   /**
>>    * @warning
>>    * @b EXPERIMENTAL: this structure may change without prior notice.
>>    *
>>    * A common structure used to describe Rx packet segment properties.
>>    */
>> -union rte_eth_rxseg {
>> +struct rte_eth_rxseg {
>>   	/* The settings for buffer split offload. */
>>   	struct rte_eth_rxseg_split split;
>> -	/* The other features settings should be added here. */
>> +
>> +	/*The settings for packet sort offload. */
>> +	struct rte_eth_rxseg_sort sort;
>>   };
>>
>>   /**
>> @@ -1239,6 +1269,11 @@ struct rte_eth_rxconf {
>>   	 * fields on rte_eth_dev_info structure are allowed to be set.
>>   	 */
>>   	uint64_t offloads;
>> +	/**
>> +	 * PMD may support more than one rxseg mode. This allows
>> application
>> +	 * to chose which mode to enable.
>> +	 */
>> +	enum rte_eth_rxseg_mode mode_flag;
>>   	/**
>>   	 * Points to the array of segment descriptions for an entire packet.
>>   	 * Array elements are properties for consecutive Rx segments.
>> @@ -1246,7 +1281,7 @@ struct rte_eth_rxconf {
>>   	 * The supported capabilities of receiving segmentation is reported
>>   	 * in rte_eth_dev_info.rx_seg_capa field.
>>   	 */
>> -	union rte_eth_rxseg *rx_seg;
>> +	struct rte_eth_rxseg *rx_seg;
>>
>>   	uint64_t reserved_64s[2]; /**< Reserved for future fields */
>>   	void *reserved_ptrs[2];   /**< Reserved for future fields */
>> @@ -1827,10 +1862,14 @@ struct rte_eth_switch_info {
>>    */
>>   struct rte_eth_rxseg_capa {
>>   	__extension__
>> +	uint32_t mode_split : 1; /**< Supports buffer split capability @see
>> struct rte_eth_rxseg_split */
>> +	uint32_t mode_sort : 1; /**< Supports pool sort capability @see
>> struct
> 
> The same doubt here. As I know, the 'rte_eth_rxseg_capa' structure is used for buffer split.
> 
> Thanks,
> Xuan
> 
>> +rte_eth_rxseg_sort */
>>   	uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/
>>   	uint32_t offset_allowed:1; /**< Supports buffer offsets. */
>>   	uint32_t offset_align_log2:4; /**< Required offset alignment. */
>>   	uint16_t max_nseg; /**< Maximum amount of segments to split. */
>> +	/* < Maximum amount of pools that PMD can sort based on
>> packet/segment lengths */
>> +	uint16_t max_npool;
>>   	uint16_t reserved; /**< Reserved field. */  };
>>
>> --
>> 2.25.1
>
  
Hanumanth Pothula Aug. 30, 2022, 12:08 p.m. UTC | #3
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Wednesday, August 24, 2022 9:04 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
> 
> External Email
> 
> ----------------------------------------------------------------------


Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for providing your valuable feedback.
Please find responses inline.

> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
> > Hi Hanumanth,
> >
> >> -----Original Message-----
> >> From: Hanumanth Pothula <hpothula@marvell.com>
> >> Sent: Saturday, August 13, 2022 1:25 AM
> >> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> >> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
> >> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; viacheslavo@nvidia.com; jerinj@marvell.com;
> >> ndabilpuram@marvell.com; Hanumanth Pothula <hpothula@marvell.com>
> >> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
> >>
> >> Presently, the 'Buffer Split' feature supports sending multiple
> >> segments of the received packet to PMD, which programs the HW to
> >> receive the packet in segments from different pools.
> >>
> >> This patch extends the feature to support the pool sort capability.
> >> Some of the HW has support for choosing memory pools based on the
> >> packet's size. The pool sort capability allows PMD to choose a memory
> >> pool based on the packet's length.
> >>
> >> This is often useful for saving the memory where the application can
> >> create a different pool to steer the specific size of the packet,
> >> thus enabling effective use of memory.
> >>
> >> For example, let's say HW has a capability of three pools,
> >>   - pool-1 size is 2K
> >>   - pool-2 size is > 2K and < 4K
> >>   - pool-3 size is > 4K
> >> Here,
> >>          pool-1 can accommodate packets with sizes < 2K
> >>          pool-2 can accommodate packets with sizes > 2K and < 4K
> >>          pool-3 can accommodate packets with sizes > 4K
> >>
> >> With pool sort capability enabled in SW, an application may create
> >> three pools of different sizes and send them to PMD. Allowing PMD to
> >> program HW based on packet lengths. So that packets with less than 2K
> >> are received on pool-1, packets with lengths between 2K and 4K are
> >> received on pool-2 and finally packets greater than 4K are received on pool-
> 3.
> >>
> >> The following two capabilities are added to the rte_eth_rxseg_capa
> >> structure, 1. pool_sort --> tells pool sort capability is supported by HW.
> >> 2. max_npool --> max number of pools supported by HW.
> >>
> >> Defined new structure rte_eth_rxseg_sort, to be used only when pool
> >> sort capability is present. If required this may be extended further
> >> to support more configurations.
> >>
> >> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> >>
> >> v2:
> >>   - Along with spec changes, uploading testpmd and driver changes.
> >
> > Thanks for CCing. It's an interesting feature.
> >
> > But I have one question here:
> > Buffer split is for split receiving packets into multiple segments,
> > while pool sort supports PMD to put the receiving packets into different pools
> according to packet size.
> > Every packet is still intact.
> >
> > So, at this level, pool sort does not belong to buffer split.
> > And you already use a different function to check pool sort rather than check
> buffer split.
> >
> > Should a new RX offload be introduced? like
> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
> >
Please find my response below. 
> 
> Hi Hanumanth,
> 
> I had the similar concern with the feature. I assume you want to benefit from
> exiting config structure that gets multiple mempool as argument, since this
> feature also needs multiple mempools, but the feature is different.
> 
> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to decide if to
> receive into multiple mempool or not, which doesn't have anything related split.
> Also not sure about using the 'sort' keyword.
> What do you think to introduce new fetaure, instead of extending existing split
> one?

Actually we thought both BUFFER_SPLIT and POOL_SORT are similar features where RX
pools are configured in certain way and thought not use up one more RX offload capability, 
as the existing software architecture can be extended to support pool_sort capability.
Yes, as part of pool sort, there is no buffer split but pools are picked based on the buffer length.

Since you think it's better to use new RX offload for POOL_SORT, will go ahead and implement the same.

> This is optimisation, right? To enable us to use less memory for the packet
> buffer, does it qualify to a device offload?
> 
Yes, its qualify as a device offload and saves memory.
Marvel NIC has a capability to receive packets on  two different pools based on its length.
Below explained more on the same.
> 
> Also, what is the relation with segmented Rx, how a PMD decide to use
> segmented Rx or bigger mempool? How can application can configure this?
> 
> Need to clarify the rules, based on your sample, if a 512 bytes packet received,
> does it have to go pool-1, or can it go to any of three pools?
> 
Here, Marvell NIC supports two HW pools, SPB(small packet buffer) pool and LPB(large packet buffer) pool.
SPB pool can hold up to 4KB
LPB pool can hold anything more than 4KB
Smaller packets are received on SPB pool and larger packets on LPB pool, based on the RQ configuration.
Here, in our case HW pools holds whole packet. So if a packet is divided into segments, lower layer
HW going to receive all segments of the packet and then going to place the whole packet in SPB/LPB pool,
based on the packet length.

As pools are picked based on the packets length we used SORT term. In case you have any better term(word), please suggest. 

> 
> And I don't see any change in the 'net/cnxk' Rx burst code, when
> multiple mempool used, while filling the mbufs shouldn't it check which
> mempool is filled. How this works without update in the Rx burst code,
> or am I missing some implementation detail?
> 
Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool sort capability
Here, in control path, HW pools are programmed based on the inputs it received from the application.
Once the HW is programmed, packets are received on HW pools based the packets sizes.
> 

I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD, unless If you have any other suggestion/thoughts.

> >> ---
> >>   lib/ethdev/rte_ethdev.c | 87 +++++++++++++++++++++++++++++++++++-----
> -
> >>   lib/ethdev/rte_ethdev.h | 45 +++++++++++++++++++--
> >>   2 files changed, 118 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >> 1979dc0850..7fd5443eb8 100644
> >> --- a/lib/ethdev/rte_ethdev.c
> >> +++ b/lib/ethdev/rte_ethdev.c
> >> @@ -1635,7 +1635,55 @@ rte_eth_dev_is_removed(uint16_t port_id)  }
> >>
> >>   static int
> >> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> >> +rte_eth_rx_queue_check_sort(const struct rte_eth_rxseg *rx_seg,
> >> +			     uint16_t n_seg, uint32_t *mbp_buf_size,
> >> +			     const struct rte_eth_dev_info *dev_info) {
> >> +	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >>> rx_seg_capa;
> >> +	uint16_t seg_idx;
> >> +
> >> +	if (!seg_capa->multi_pools || n_seg > seg_capa->max_npool) {
> >> +		RTE_ETHDEV_LOG(ERR,
> >> +			       "Invalid capabilities, multi_pools:%d differnt
> >> length segments %u exceed supported %u\n",
> >> +			       seg_capa->multi_pools, n_seg, seg_capa-
> >>> max_nseg);
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> >> +		struct rte_mempool *mpl = rx_seg[seg_idx].sort.mp;
> >> +		uint32_t length = rx_seg[seg_idx].sort.length;
> >> +
> >> +		if (mpl == NULL) {
> >> +			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> +			return -EINVAL;
> >> +		}
> >> +
> >> +		if (mpl->private_data_size <
> >> +			sizeof(struct rte_pktmbuf_pool_private)) {
> >> +			RTE_ETHDEV_LOG(ERR,
> >> +				       "%s private_data_size %u < %u\n",
> >> +				       mpl->name, mpl->private_data_size,
> >> +				       (unsigned int)sizeof
> >> +					(struct rte_pktmbuf_pool_private));
> >> +			return -ENOSPC;
> >> +		}
> >> +
> >> +		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >> +		length = length != 0 ? length : (*mbp_buf_size -
> >> RTE_PKTMBUF_HEADROOM);
> >> +		if (*mbp_buf_size < length + RTE_PKTMBUF_HEADROOM) {
> >> +			RTE_ETHDEV_LOG(ERR,
> >> +				       "%s mbuf_data_room_size %u < %u))\n",
> >> +				       mpl->name, *mbp_buf_size,
> >> +				       length);
> >> +			return -EINVAL;
> >> +		}
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int
> >> +rte_eth_rx_queue_check_split(const struct rte_eth_rxseg *rx_seg,
> >>   			     uint16_t n_seg, uint32_t *mbp_buf_size,
> >>   			     const struct rte_eth_dev_info *dev_info)  { @@ -
> >> 1654,12 +1702,12 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>   	 * Check the sizes and offsets against buffer sizes
> >>   	 * for each segment specified in extended configuration.
> >>   	 */
> >> -	mp_first = rx_seg[0].mp;
> >> +	mp_first = rx_seg[0].split.mp;
> >>   	offset_mask = RTE_BIT32(seg_capa->offset_align_log2) - 1;
> >>   	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> >> -		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >> -		uint32_t length = rx_seg[seg_idx].length;
> >> -		uint32_t offset = rx_seg[seg_idx].offset;
> >> +		struct rte_mempool *mpl = rx_seg[seg_idx].split.mp;
> >> +		uint32_t length = rx_seg[seg_idx].split.length;
> >> +		uint32_t offset = rx_seg[seg_idx].split.offset;
> >>
> >>   		if (mpl == NULL) {
> >>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> @@ -1693,7 +1741,11 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>   		}
> >>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >> -		length = length != 0 ? length : *mbp_buf_size;
> >> +		/* On segment length == 0, update segment's length with
> >> +		 * the pool's length - headeroom space, to make sure enough
> >> +		 * space is accomidate for header.
> >> +		 **/
> >> +		length = length != 0 ? length : (*mbp_buf_size -
> >> +RTE_PKTMBUF_HEADROOM);
> >>   		if (*mbp_buf_size < length + offset) {
> >>   			RTE_ETHDEV_LOG(ERR,
> >>   				       "%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n", @@ -1764,7 +1816,6 @@
> >> rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> >>   			return -EINVAL;
> >>   		}
> >>   	} else {
> >> -		const struct rte_eth_rxseg_split *rx_seg;
> >>   		uint16_t n_seg;
> >>
> >>   		/* Extended multi-segment configuration check. */ @@ -
> >> 1774,13 +1825,27 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t
> >> rx_queue_id,
> >>   			return -EINVAL;
> >>   		}
> >>
> >> -		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
> >>   		n_seg = rx_conf->rx_nseg;
> >>
> >>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> >> {
> >> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >> -							   &mbp_buf_size,
> >> -							   &dev_info);
> >> +			ret = -1; /* To make sure at least one of below
> >> conditions becomes
> >> +true */
> >> +
> >> +			/* Check both NIX and application supports buffer-
> >> split capability */
> >> +			if (dev_info.rx_seg_capa.mode_split &&
> >> +			    rx_conf->mode_flag ==
> >> RTE_ETH_RXSEG_MODE_SPLIT) {
> >> +				ret = rte_eth_rx_queue_check_split(rx_conf-
> >>> rx_seg, n_seg,
> >> +
> >> &mbp_buf_size,
> >> +								   &dev_info);
> >> +			}
> >> +
> >> +			/* Check both NIX and application supports pool-sort
> >> capability */
> >> +			if (dev_info.rx_seg_capa.mode_sort &&
> >> +			    rx_conf->mode_flag ==
> >> RTE_ETH_RXSEG_MODE_SORT) {
> >> +				ret = rte_eth_rx_queue_check_sort(rx_conf-
> >>> rx_seg, n_seg,
> >> +
> >> &mbp_buf_size,
> >> +								  &dev_info);
> >> +			}
> >> +
> >>   			if (ret != 0)
> >>   				return ret;
> >>   		} else {
> >> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >> de9e970d4d..9f6787d7ad 100644
> >> --- a/lib/ethdev/rte_ethdev.h
> >> +++ b/lib/ethdev/rte_ethdev.h
> >> @@ -1204,16 +1204,46 @@ struct rte_eth_rxseg_split {
> >>   	uint32_t reserved; /**< Reserved field. */  };
> >>
> >> +/**
> >> + * The pool sort capability allows PMD to choose a memory pool based on
> >> +the
> >> + * packet's length. So, basically, PMD programs HW for receiving
> >> +packets from
> >> + * different pools, based on the packet's length.
> >> + *
> >> + * This is often useful for saving the memory where the application can
> >> +create
> >> + * a different pool to steer the specific size of the packet, thus
> >> +enabling
> >> + * effective use of memory.
> >> + */
> >> +struct rte_eth_rxseg_sort {
> >> +	struct rte_mempool *mp; /**< Memory pool to allocate packets
> >> from. */
> >> +	uint16_t length; /**< Packet data length. */
> >> +	uint32_t reserved; /**< Reserved field. */ };
> >> +
> >> +enum rte_eth_rxseg_mode {
> >> +	/**
> >> +	 * Buffer split mode: PMD split the received packets into multiple
> >> segments.
> >> +	 * @see struct rte_eth_rxseg_split
> >> +	 */
> >> +	RTE_ETH_RXSEG_MODE_SPLIT = RTE_BIT64(0),
> >> +	/**
> >> +	 * Pool sort mode: PMD to chooses a memory pool based on the
> >> packet's length.
> >> +	 * @see struct rte_eth_rxseg_sort
> >> +	 */
> >> +	RTE_ETH_RXSEG_MODE_SORT  = RTE_BIT64(1), };
> >> +
> >>   /**
> >>    * @warning
> >>    * @b EXPERIMENTAL: this structure may change without prior notice.
> >>    *
> >>    * A common structure used to describe Rx packet segment properties.
> >>    */
> >> -union rte_eth_rxseg {
> >> +struct rte_eth_rxseg {
> >>   	/* The settings for buffer split offload. */
> >>   	struct rte_eth_rxseg_split split;
> >> -	/* The other features settings should be added here. */
> >> +
> >> +	/*The settings for packet sort offload. */
> >> +	struct rte_eth_rxseg_sort sort;
> >>   };
> >>
> >>   /**
> >> @@ -1239,6 +1269,11 @@ struct rte_eth_rxconf {
> >>   	 * fields on rte_eth_dev_info structure are allowed to be set.
> >>   	 */
> >>   	uint64_t offloads;
> >> +	/**
> >> +	 * PMD may support more than one rxseg mode. This allows
> >> application
> >> +	 * to chose which mode to enable.
> >> +	 */
> >> +	enum rte_eth_rxseg_mode mode_flag;
> >>   	/**
> >>   	 * Points to the array of segment descriptions for an entire packet.
> >>   	 * Array elements are properties for consecutive Rx segments.
> >> @@ -1246,7 +1281,7 @@ struct rte_eth_rxconf {
> >>   	 * The supported capabilities of receiving segmentation is reported
> >>   	 * in rte_eth_dev_info.rx_seg_capa field.
> >>   	 */
> >> -	union rte_eth_rxseg *rx_seg;
> >> +	struct rte_eth_rxseg *rx_seg;
> >>
> >>   	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> >>   	void *reserved_ptrs[2];   /**< Reserved for future fields */
> >> @@ -1827,10 +1862,14 @@ struct rte_eth_switch_info {
> >>    */
> >>   struct rte_eth_rxseg_capa {
> >>   	__extension__
> >> +	uint32_t mode_split : 1; /**< Supports buffer split capability @see
> >> struct rte_eth_rxseg_split */
> >> +	uint32_t mode_sort : 1; /**< Supports pool sort capability @see
> >> struct
> >
> > The same doubt here. As I know, the 'rte_eth_rxseg_capa' structure is used for
> buffer split.
> >
> > Thanks,
> > Xuan
> >
> >> +rte_eth_rxseg_sort */
> >>   	uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/
> >>   	uint32_t offset_allowed:1; /**< Supports buffer offsets. */
> >>   	uint32_t offset_align_log2:4; /**< Required offset alignment. */
> >>   	uint16_t max_nseg; /**< Maximum amount of segments to split. */
> >> +	/* < Maximum amount of pools that PMD can sort based on
> >> packet/segment lengths */
> >> +	uint16_t max_npool;
> >>   	uint16_t reserved; /**< Reserved field. */  };
> >>
> >> --
> >> 2.25.1
> >
  
Ferruh Yigit Sept. 6, 2022, 12:18 p.m. UTC | #4
On 8/30/2022 1:08 PM, Hanumanth Reddy Pothula wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Sent: Wednesday, August 24, 2022 9:04 PM
>> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
>> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
>> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>
>> External Email
>>
>> ----------------------------------------------------------------------
> 
> 
> Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for providing your valuable feedback.
> Please find responses inline.
> 
>> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
>>> Hi Hanumanth,
>>>
>>>> -----Original Message-----
>>>> From: Hanumanth Pothula <hpothula@marvell.com>
>>>> Sent: Saturday, August 13, 2022 1:25 AM
>>>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>
>>>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
>>>> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
>>>> stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
>>>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
>>>> <qi.z.zhang@intel.com>; viacheslavo@nvidia.com; jerinj@marvell.com;
>>>> ndabilpuram@marvell.com; Hanumanth Pothula <hpothula@marvell.com>
>>>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>>>
>>>> Presently, the 'Buffer Split' feature supports sending multiple
>>>> segments of the received packet to PMD, which programs the HW to
>>>> receive the packet in segments from different pools.
>>>>
>>>> This patch extends the feature to support the pool sort capability.
>>>> Some of the HW has support for choosing memory pools based on the
>>>> packet's size. The pool sort capability allows PMD to choose a memory
>>>> pool based on the packet's length.
>>>>
>>>> This is often useful for saving the memory where the application can
>>>> create a different pool to steer the specific size of the packet,
>>>> thus enabling effective use of memory.
>>>>
>>>> For example, let's say HW has a capability of three pools,
>>>>    - pool-1 size is 2K
>>>>    - pool-2 size is > 2K and < 4K
>>>>    - pool-3 size is > 4K
>>>> Here,
>>>>           pool-1 can accommodate packets with sizes < 2K
>>>>           pool-2 can accommodate packets with sizes > 2K and < 4K
>>>>           pool-3 can accommodate packets with sizes > 4K
>>>>
>>>> With pool sort capability enabled in SW, an application may create
>>>> three pools of different sizes and send them to PMD. Allowing PMD to
>>>> program HW based on packet lengths. So that packets with less than 2K
>>>> are received on pool-1, packets with lengths between 2K and 4K are
>>>> received on pool-2 and finally packets greater than 4K are received on pool-
>> 3.
>>>>
>>>> The following two capabilities are added to the rte_eth_rxseg_capa
>>>> structure, 1. pool_sort --> tells pool sort capability is supported by HW.
>>>> 2. max_npool --> max number of pools supported by HW.
>>>>
>>>> Defined new structure rte_eth_rxseg_sort, to be used only when pool
>>>> sort capability is present. If required this may be extended further
>>>> to support more configurations.
>>>>
>>>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
>>>>
>>>> v2:
>>>>    - Along with spec changes, uploading testpmd and driver changes.
>>>
>>> Thanks for CCing. It's an interesting feature.
>>>
>>> But I have one question here:
>>> Buffer split is for split receiving packets into multiple segments,
>>> while pool sort supports PMD to put the receiving packets into different pools
>> according to packet size.
>>> Every packet is still intact.
>>>
>>> So, at this level, pool sort does not belong to buffer split.
>>> And you already use a different function to check pool sort rather than check
>> buffer split.
>>>
>>> Should a new RX offload be introduced? like
>> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
>>>
> Please find my response below.
>>
>> Hi Hanumanth,
>>
>> I had the similar concern with the feature. I assume you want to benefit from
>> exiting config structure that gets multiple mempool as argument, since this
>> feature also needs multiple mempools, but the feature is different.
>>
>> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to decide if to
>> receive into multiple mempool or not, which doesn't have anything related split.
>> Also not sure about using the 'sort' keyword.
>> What do you think to introduce new fetaure, instead of extending existing split
>> one?
> 
> Actually we thought both BUFFER_SPLIT and POOL_SORT are similar features where RX
> pools are configured in certain way and thought not use up one more RX offload capability,
> as the existing software architecture can be extended to support pool_sort capability.
> Yes, as part of pool sort, there is no buffer split but pools are picked based on the buffer length.
> 
> Since you think it's better to use new RX offload for POOL_SORT, will go ahead and implement the same.
> 
>> This is optimisation, right? To enable us to use less memory for the packet
>> buffer, does it qualify to a device offload?
>>
> Yes, its qualify as a device offload and saves memory.
> Marvel NIC has a capability to receive packets on  two different pools based on its length.
> Below explained more on the same.
>>
>> Also, what is the relation with segmented Rx, how a PMD decide to use
>> segmented Rx or bigger mempool? How can application can configure this?
>>
>> Need to clarify the rules, based on your sample, if a 512 bytes packet received,
>> does it have to go pool-1, or can it go to any of three pools?
>>
> Here, Marvell NIC supports two HW pools, SPB(small packet buffer) pool and LPB(large packet buffer) pool.
> SPB pool can hold up to 4KB
> LPB pool can hold anything more than 4KB
> Smaller packets are received on SPB pool and larger packets on LPB pool, based on the RQ configuration.
> Here, in our case HW pools holds whole packet. So if a packet is divided into segments, lower layer
> HW going to receive all segments of the packet and then going to place the whole packet in SPB/LPB pool,
> based on the packet length.
> 

If the packet is bigger than 4KB, you have two options,
1- Use multiple chained buffers in SPB
2- Use single LPB buffer

As I understand (2) is used in this case, but I think we should clarify 
how this feature works with 'RTE_ETH_RX_OFFLOAD_SCATTER' offload, if it 
is requested by user.

Or lets say HW has two pools with 1K and 2K sizes, what is expected with 
4K packet, with or without scattered Rx offload?

> As pools are picked based on the packets length we used SORT term. In case you have any better term(word), please suggest.
> 

what about multiple pool, like RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL, I think 
it is more clear but I would like to get more comments from others, 
naming is hard ;)

>>
>> And I don't see any change in the 'net/cnxk' Rx burst code, when
>> multiple mempool used, while filling the mbufs shouldn't it check which
>> mempool is filled. How this works without update in the Rx burst code,
>> or am I missing some implementation detail?
>>
> Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool sort capability
> Here, in control path, HW pools are programmed based on the inputs it received from the application.
> Once the HW is programmed, packets are received on HW pools based the packets sizes.

I was expecting to changes in datapath too, something like in Rx burst 
function check if spb or lpb is used and update mbuf pointers accordingly.
But it seems HW doesn't work this way, can you please explain how this 
feature works transparent to datapath code?

>>
> 
> I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD, unless If you have any other suggestion/thoughts.
> 

<...>
  
Hanumanth Pothula Sept. 7, 2022, 7:02 a.m. UTC | #5
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Tuesday, September 6, 2022 5:48 PM
> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
> 
> On 8/30/2022 1:08 PM, Hanumanth Reddy Pothula wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >> Sent: Wednesday, August 24, 2022 9:04 PM
> >> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
> >> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>;
> Andrew
> >> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> >> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> >> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
> >> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> >> <ndabilpuram@marvell.com>
> >> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
> >> capability
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> -
> >
> >
> > Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for providing
> your valuable feedback.
> > Please find responses inline.
> >
> >> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
> >>> Hi Hanumanth,
> >>>
> >>>> -----Original Message-----
> >>>> From: Hanumanth Pothula <hpothula@marvell.com>
> >>>> Sent: Saturday, August 13, 2022 1:25 AM
> >>>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >>>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> >>>> <andrew.rybchenko@oktetlabs.ru>
> >>>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> >>>> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >>>> stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
> >>>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> >>>> <qi.z.zhang@intel.com>; viacheslavo@nvidia.com; jerinj@marvell.com;
> >>>> ndabilpuram@marvell.com; Hanumanth Pothula <hpothula@marvell.com>
> >>>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
> >>>>
> >>>> Presently, the 'Buffer Split' feature supports sending multiple
> >>>> segments of the received packet to PMD, which programs the HW to
> >>>> receive the packet in segments from different pools.
> >>>>
> >>>> This patch extends the feature to support the pool sort capability.
> >>>> Some of the HW has support for choosing memory pools based on the
> >>>> packet's size. The pool sort capability allows PMD to choose a
> >>>> memory pool based on the packet's length.
> >>>>
> >>>> This is often useful for saving the memory where the application
> >>>> can create a different pool to steer the specific size of the
> >>>> packet, thus enabling effective use of memory.
> >>>>
> >>>> For example, let's say HW has a capability of three pools,
> >>>>    - pool-1 size is 2K
> >>>>    - pool-2 size is > 2K and < 4K
> >>>>    - pool-3 size is > 4K
> >>>> Here,
> >>>>           pool-1 can accommodate packets with sizes < 2K
> >>>>           pool-2 can accommodate packets with sizes > 2K and < 4K
> >>>>           pool-3 can accommodate packets with sizes > 4K
> >>>>
> >>>> With pool sort capability enabled in SW, an application may create
> >>>> three pools of different sizes and send them to PMD. Allowing PMD
> >>>> to program HW based on packet lengths. So that packets with less
> >>>> than 2K are received on pool-1, packets with lengths between 2K and
> >>>> 4K are received on pool-2 and finally packets greater than 4K are
> >>>> received on pool-
> >> 3.
> >>>>
> >>>> The following two capabilities are added to the rte_eth_rxseg_capa
> >>>> structure, 1. pool_sort --> tells pool sort capability is supported by HW.
> >>>> 2. max_npool --> max number of pools supported by HW.
> >>>>
> >>>> Defined new structure rte_eth_rxseg_sort, to be used only when pool
> >>>> sort capability is present. If required this may be extended
> >>>> further to support more configurations.
> >>>>
> >>>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> >>>>
> >>>> v2:
> >>>>    - Along with spec changes, uploading testpmd and driver changes.
> >>>
> >>> Thanks for CCing. It's an interesting feature.
> >>>
> >>> But I have one question here:
> >>> Buffer split is for split receiving packets into multiple segments,
> >>> while pool sort supports PMD to put the receiving packets into
> >>> different pools
> >> according to packet size.
> >>> Every packet is still intact.
> >>>
> >>> So, at this level, pool sort does not belong to buffer split.
> >>> And you already use a different function to check pool sort rather
> >>> than check
> >> buffer split.
> >>>
> >>> Should a new RX offload be introduced? like
> >> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
> >>>
> > Please find my response below.
> >>
> >> Hi Hanumanth,
> >>
> >> I had the similar concern with the feature. I assume you want to
> >> benefit from exiting config structure that gets multiple mempool as
> >> argument, since this feature also needs multiple mempools, but the feature is
> different.
> >>
> >> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to
> >> decide if to receive into multiple mempool or not, which doesn't have
> anything related split.
> >> Also not sure about using the 'sort' keyword.
> >> What do you think to introduce new fetaure, instead of extending
> >> existing split one?
> >
> > Actually we thought both BUFFER_SPLIT and POOL_SORT are similar
> > features where RX pools are configured in certain way and thought not
> > use up one more RX offload capability, as the existing software architecture
> can be extended to support pool_sort capability.
> > Yes, as part of pool sort, there is no buffer split but pools are picked based on
> the buffer length.
> >
> > Since you think it's better to use new RX offload for POOL_SORT, will go ahead
> and implement the same.
> >
> >> This is optimisation, right? To enable us to use less memory for the
> >> packet buffer, does it qualify to a device offload?
> >>
> > Yes, its qualify as a device offload and saves memory.
> > Marvel NIC has a capability to receive packets on  two different pools based
> on its length.
> > Below explained more on the same.
> >>
> >> Also, what is the relation with segmented Rx, how a PMD decide to use
> >> segmented Rx or bigger mempool? How can application can configure this?
> >>
> >> Need to clarify the rules, based on your sample, if a 512 bytes
> >> packet received, does it have to go pool-1, or can it go to any of three pools?
> >>
> > Here, Marvell NIC supports two HW pools, SPB(small packet buffer) pool and
> LPB(large packet buffer) pool.
> > SPB pool can hold up to 4KB
> > LPB pool can hold anything more than 4KB Smaller packets are received
> > on SPB pool and larger packets on LPB pool, based on the RQ configuration.
> > Here, in our case HW pools holds whole packet. So if a packet is
> > divided into segments, lower layer HW going to receive all segments of
> > the packet and then going to place the whole packet in SPB/LPB pool, based
> on the packet length.
> >
> 
> If the packet is bigger than 4KB, you have two options,
> 1- Use multiple chained buffers in SPB
> 2- Use single LPB buffer
> 
> As I understand (2) is used in this case, but I think we should clarify how this
> feature works with 'RTE_ETH_RX_OFFLOAD_SCATTER' offload, if it is requested
> by user.
> 
> Or lets say HW has two pools with 1K and 2K sizes, what is expected with 4K
> packet, with or without scattered Rx offload?
>

As mentioned, Marvell supports two pools, pool-1(SPB) and pool-2(LPB)
If the packet length is within pool-1 length and has only one segment then the packet is allocated from pool-1.
If the packet length is greater than pool-1 or has more than one segment then the packet is allocated from pool-2. 

So, here packets with a single segment and length less than 1K are allocated from pool-1 and
packets with multiple segments or packets with length greater than 1K are allocated from pool-2.

> > As pools are picked based on the packets length we used SORT term. In case
> you have any better term(word), please suggest.
> >
> 
> what about multiple pool, like RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL, I think
> it is more clear but I would like to get more comments from others, naming is
> hard ;)
> 
Yes, RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL is clearer than RTE_ETH_RX_OFFLOAD_SORT_POOL. 
Thanks for the suggestion. 
Will upload V4 with RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL.

> >>
> >> And I don't see any change in the 'net/cnxk' Rx burst code, when
> >> multiple mempool used, while filling the mbufs shouldn't it check
> >> which mempool is filled. How this works without update in the Rx
> >> burst code, or am I missing some implementation detail?
> >>
> > Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool
> > sort capability Here, in control path, HW pools are programmed based on the
> inputs it received from the application.
> > Once the HW is programmed, packets are received on HW pools based the
> packets sizes.
> 
> I was expecting to changes in datapath too, something like in Rx burst function
> check if spb or lpb is used and update mbuf pointers accordingly.
> But it seems HW doesn't work this way, can you please explain how this feature
> works transparent to datapath code?
> 
> >>
> >
> > I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD, unless
> If you have any other suggestion/thoughts.
> >
> 
> <...>
  
Ferruh Yigit Sept. 7, 2022, 11:24 a.m. UTC | #6
On 9/7/2022 8:02 AM, Hanumanth Reddy Pothula wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Sent: Tuesday, September 6, 2022 5:48 PM
>> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
>> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
>> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>
>> On 8/30/2022 1:08 PM, Hanumanth Reddy Pothula wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>>>> Sent: Wednesday, August 24, 2022 9:04 PM
>>>> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
>>>> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>;
>> Andrew
>>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
>>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
>>>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
>>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>>>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
>>>> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
>>>> <ndabilpuram@marvell.com>
>>>> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
>>>> capability
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> -
>>>
>>>
>>> Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for providing
>> your valuable feedback.
>>> Please find responses inline.
>>>
>>>> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
>>>>> Hi Hanumanth,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Hanumanth Pothula <hpothula@marvell.com>
>>>>>> Sent: Saturday, August 13, 2022 1:25 AM
>>>>>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>>>>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
>>>>>> <andrew.rybchenko@oktetlabs.ru>
>>>>>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
>>>>>> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
>>>>>> stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
>>>>>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
>>>>>> <qi.z.zhang@intel.com>; viacheslavo@nvidia.com; jerinj@marvell.com;
>>>>>> ndabilpuram@marvell.com; Hanumanth Pothula <hpothula@marvell.com>
>>>>>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>>>>>
>>>>>> Presently, the 'Buffer Split' feature supports sending multiple
>>>>>> segments of the received packet to PMD, which programs the HW to
>>>>>> receive the packet in segments from different pools.
>>>>>>
>>>>>> This patch extends the feature to support the pool sort capability.
>>>>>> Some of the HW has support for choosing memory pools based on the
>>>>>> packet's size. The pool sort capability allows PMD to choose a
>>>>>> memory pool based on the packet's length.
>>>>>>
>>>>>> This is often useful for saving the memory where the application
>>>>>> can create a different pool to steer the specific size of the
>>>>>> packet, thus enabling effective use of memory.
>>>>>>
>>>>>> For example, let's say HW has a capability of three pools,
>>>>>>     - pool-1 size is 2K
>>>>>>     - pool-2 size is > 2K and < 4K
>>>>>>     - pool-3 size is > 4K
>>>>>> Here,
>>>>>>            pool-1 can accommodate packets with sizes < 2K
>>>>>>            pool-2 can accommodate packets with sizes > 2K and < 4K
>>>>>>            pool-3 can accommodate packets with sizes > 4K
>>>>>>
>>>>>> With pool sort capability enabled in SW, an application may create
>>>>>> three pools of different sizes and send them to PMD. Allowing PMD
>>>>>> to program HW based on packet lengths. So that packets with less
>>>>>> than 2K are received on pool-1, packets with lengths between 2K and
>>>>>> 4K are received on pool-2 and finally packets greater than 4K are
>>>>>> received on pool-
>>>> 3.
>>>>>>
>>>>>> The following two capabilities are added to the rte_eth_rxseg_capa
>>>>>> structure, 1. pool_sort --> tells pool sort capability is supported by HW.
>>>>>> 2. max_npool --> max number of pools supported by HW.
>>>>>>
>>>>>> Defined new structure rte_eth_rxseg_sort, to be used only when pool
>>>>>> sort capability is present. If required this may be extended
>>>>>> further to support more configurations.
>>>>>>
>>>>>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
>>>>>>
>>>>>> v2:
>>>>>>     - Along with spec changes, uploading testpmd and driver changes.
>>>>>
>>>>> Thanks for CCing. It's an interesting feature.
>>>>>
>>>>> But I have one question here:
>>>>> Buffer split is for split receiving packets into multiple segments,
>>>>> while pool sort supports PMD to put the receiving packets into
>>>>> different pools
>>>> according to packet size.
>>>>> Every packet is still intact.
>>>>>
>>>>> So, at this level, pool sort does not belong to buffer split.
>>>>> And you already use a different function to check pool sort rather
>>>>> than check
>>>> buffer split.
>>>>>
>>>>> Should a new RX offload be introduced? like
>>>> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
>>>>>
>>> Please find my response below.
>>>>
>>>> Hi Hanumanth,
>>>>
>>>> I had the similar concern with the feature. I assume you want to
>>>> benefit from exiting config structure that gets multiple mempool as
>>>> argument, since this feature also needs multiple mempools, but the feature is
>> different.
>>>>
>>>> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to
>>>> decide if to receive into multiple mempool or not, which doesn't have
>> anything related split.
>>>> Also not sure about using the 'sort' keyword.
>>>> What do you think to introduce new fetaure, instead of extending
>>>> existing split one?
>>>
>>> Actually we thought both BUFFER_SPLIT and POOL_SORT are similar
>>> features where RX pools are configured in certain way and thought not
>>> use up one more RX offload capability, as the existing software architecture
>> can be extended to support pool_sort capability.
>>> Yes, as part of pool sort, there is no buffer split but pools are picked based on
>> the buffer length.
>>>
>>> Since you think it's better to use new RX offload for POOL_SORT, will go ahead
>> and implement the same.
>>>
>>>> This is optimisation, right? To enable us to use less memory for the
>>>> packet buffer, does it qualify to a device offload?
>>>>
>>> Yes, its qualify as a device offload and saves memory.
>>> Marvel NIC has a capability to receive packets on  two different pools based
>> on its length.
>>> Below explained more on the same.
>>>>
>>>> Also, what is the relation with segmented Rx, how a PMD decide to use
>>>> segmented Rx or bigger mempool? How can application can configure this?
>>>>
>>>> Need to clarify the rules, based on your sample, if a 512 bytes
>>>> packet received, does it have to go pool-1, or can it go to any of three pools?
>>>>
>>> Here, Marvell NIC supports two HW pools, SPB(small packet buffer) pool and
>> LPB(large packet buffer) pool.
>>> SPB pool can hold up to 4KB
>>> LPB pool can hold anything more than 4KB Smaller packets are received
>>> on SPB pool and larger packets on LPB pool, based on the RQ configuration.
>>> Here, in our case HW pools holds whole packet. So if a packet is
>>> divided into segments, lower layer HW going to receive all segments of
>>> the packet and then going to place the whole packet in SPB/LPB pool, based
>> on the packet length.
>>>
>>
>> If the packet is bigger than 4KB, you have two options,
>> 1- Use multiple chained buffers in SPB
>> 2- Use single LPB buffer
>>
>> As I understand (2) is used in this case, but I think we should clarify how this
>> feature works with 'RTE_ETH_RX_OFFLOAD_SCATTER' offload, if it is requested
>> by user.
>>
>> Or lets say HW has two pools with 1K and 2K sizes, what is expected with 4K
>> packet, with or without scattered Rx offload?
>>
> 
> As mentioned, Marvell supports two pools, pool-1(SPB) and pool-2(LPB)
> If the packet length is within pool-1 length and has only one segment then the packet is allocated from pool-1.
> If the packet length is greater than pool-1 or has more than one segment then the packet is allocated from pool-2.
> 
> So, here packets with a single segment and length less than 1K are allocated from pool-1 and
> packets with multiple segments or packets with length greater than 1K are allocated from pool-2.
> 

To have multiple segment or not is HW configuration, it is not external 
variable. Drivers mostly decide to configure HW to receive multiple 
segment or not based on buffer size and max packet size device support.
In this case since buffer size is not fixed, there are multiple buffer 
sizes, how driver will configure HW?

This is not specific to Marvell HW, for the case multiple mempool 
supported, it is better to clarify in this patch how it is works with 
'RTE_ETH_RX_OFFLOAD_SCATTER' offload.

>>> As pools are picked based on the packets length we used SORT term. In case
>> you have any better term(word), please suggest.
>>>
>>
>> what about multiple pool, like RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL, I think
>> it is more clear but I would like to get more comments from others, naming is
>> hard ;)
>>
> Yes, RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL is clearer than RTE_ETH_RX_OFFLOAD_SORT_POOL.
> Thanks for the suggestion.
> Will upload V4 with RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL.
> 
>>>>
>>>> And I don't see any change in the 'net/cnxk' Rx burst code, when
>>>> multiple mempool used, while filling the mbufs shouldn't it check
>>>> which mempool is filled. How this works without update in the Rx
>>>> burst code, or am I missing some implementation detail?
>>>>
>>> Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool
>>> sort capability Here, in control path, HW pools are programmed based on the
>> inputs it received from the application.
>>> Once the HW is programmed, packets are received on HW pools based the
>> packets sizes.
>>
>> I was expecting to changes in datapath too, something like in Rx burst function
>> check if spb or lpb is used and update mbuf pointers accordingly.
>> But it seems HW doesn't work this way, can you please explain how this feature
>> works transparent to datapath code?
>>
>>>>
>>>
>>> I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD, unless
>> If you have any other suggestion/thoughts.
>>>
>>
>> <...>
>
  
Hanumanth Pothula Sept. 7, 2022, 9:31 p.m. UTC | #7
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Wednesday, September 7, 2022 4:54 PM
> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
> 
> On 9/7/2022 8:02 AM, Hanumanth Reddy Pothula wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >> Sent: Tuesday, September 6, 2022 5:48 PM
> >> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
> >> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Andrew
> >> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> >> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> >> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
> >> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> >> <ndabilpuram@marvell.com>
> >> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
> >> capability
> >>
> >> On 8/30/2022 1:08 PM, Hanumanth Reddy Pothula wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >>>> Sent: Wednesday, August 24, 2022 9:04 PM
> >>>> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
> >>>> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> Andrew
> >>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li,
> Xiaoyun
> >>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> >>>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> >>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >>>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
> >>>> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> >>>> <ndabilpuram@marvell.com>
> >>>> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
> >>>> capability
> >>>>
> >>>> External Email
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> -
> >>>
> >>>
> >>> Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for
> >>> providing
> >> your valuable feedback.
> >>> Please find responses inline.
> >>>
> >>>> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
> >>>>> Hi Hanumanth,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Hanumanth Pothula <hpothula@marvell.com>
> >>>>>> Sent: Saturday, August 13, 2022 1:25 AM
> >>>>>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >>>>>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> >>>>>> <andrew.rybchenko@oktetlabs.ru>
> >>>>>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> >>>>>> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >>>>>> stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>;
> >>>>>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi
> >>>>>> Z <qi.z.zhang@intel.com>; viacheslavo@nvidia.com;
> >>>>>> jerinj@marvell.com; ndabilpuram@marvell.com; Hanumanth Pothula
> >>>>>> <hpothula@marvell.com>
> >>>>>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
> >>>>>>
> >>>>>> Presently, the 'Buffer Split' feature supports sending multiple
> >>>>>> segments of the received packet to PMD, which programs the HW to
> >>>>>> receive the packet in segments from different pools.
> >>>>>>
> >>>>>> This patch extends the feature to support the pool sort capability.
> >>>>>> Some of the HW has support for choosing memory pools based on the
> >>>>>> packet's size. The pool sort capability allows PMD to choose a
> >>>>>> memory pool based on the packet's length.
> >>>>>>
> >>>>>> This is often useful for saving the memory where the application
> >>>>>> can create a different pool to steer the specific size of the
> >>>>>> packet, thus enabling effective use of memory.
> >>>>>>
> >>>>>> For example, let's say HW has a capability of three pools,
> >>>>>>     - pool-1 size is 2K
> >>>>>>     - pool-2 size is > 2K and < 4K
> >>>>>>     - pool-3 size is > 4K
> >>>>>> Here,
> >>>>>>            pool-1 can accommodate packets with sizes < 2K
> >>>>>>            pool-2 can accommodate packets with sizes > 2K and < 4K
> >>>>>>            pool-3 can accommodate packets with sizes > 4K
> >>>>>>
> >>>>>> With pool sort capability enabled in SW, an application may
> >>>>>> create three pools of different sizes and send them to PMD.
> >>>>>> Allowing PMD to program HW based on packet lengths. So that
> >>>>>> packets with less than 2K are received on pool-1, packets with
> >>>>>> lengths between 2K and 4K are received on pool-2 and finally
> >>>>>> packets greater than 4K are received on pool-
> >>>> 3.
> >>>>>>
> >>>>>> The following two capabilities are added to the
> >>>>>> rte_eth_rxseg_capa structure, 1. pool_sort --> tells pool sort capability
> is supported by HW.
> >>>>>> 2. max_npool --> max number of pools supported by HW.
> >>>>>>
> >>>>>> Defined new structure rte_eth_rxseg_sort, to be used only when
> >>>>>> pool sort capability is present. If required this may be extended
> >>>>>> further to support more configurations.
> >>>>>>
> >>>>>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> >>>>>>
> >>>>>> v2:
> >>>>>>     - Along with spec changes, uploading testpmd and driver changes.
> >>>>>
> >>>>> Thanks for CCing. It's an interesting feature.
> >>>>>
> >>>>> But I have one question here:
> >>>>> Buffer split is for split receiving packets into multiple
> >>>>> segments, while pool sort supports PMD to put the receiving
> >>>>> packets into different pools
> >>>> according to packet size.
> >>>>> Every packet is still intact.
> >>>>>
> >>>>> So, at this level, pool sort does not belong to buffer split.
> >>>>> And you already use a different function to check pool sort rather
> >>>>> than check
> >>>> buffer split.
> >>>>>
> >>>>> Should a new RX offload be introduced? like
> >>>> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
> >>>>>
> >>> Please find my response below.
> >>>>
> >>>> Hi Hanumanth,
> >>>>
> >>>> I had the similar concern with the feature. I assume you want to
> >>>> benefit from exiting config structure that gets multiple mempool as
> >>>> argument, since this feature also needs multiple mempools, but the
> >>>> feature is
> >> different.
> >>>>
> >>>> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to
> >>>> decide if to receive into multiple mempool or not, which doesn't
> >>>> have
> >> anything related split.
> >>>> Also not sure about using the 'sort' keyword.
> >>>> What do you think to introduce new fetaure, instead of extending
> >>>> existing split one?
> >>>
> >>> Actually we thought both BUFFER_SPLIT and POOL_SORT are similar
> >>> features where RX pools are configured in certain way and thought
> >>> not use up one more RX offload capability, as the existing software
> >>> architecture
> >> can be extended to support pool_sort capability.
> >>> Yes, as part of pool sort, there is no buffer split but pools are
> >>> picked based on
> >> the buffer length.
> >>>
> >>> Since you think it's better to use new RX offload for POOL_SORT,
> >>> will go ahead
> >> and implement the same.
> >>>
> >>>> This is optimisation, right? To enable us to use less memory for
> >>>> the packet buffer, does it qualify to a device offload?
> >>>>
> >>> Yes, its qualify as a device offload and saves memory.
> >>> Marvel NIC has a capability to receive packets on  two different
> >>> pools based
> >> on its length.
> >>> Below explained more on the same.
> >>>>
> >>>> Also, what is the relation with segmented Rx, how a PMD decide to
> >>>> use segmented Rx or bigger mempool? How can application can configure
> this?
> >>>>
> >>>> Need to clarify the rules, based on your sample, if a 512 bytes
> >>>> packet received, does it have to go pool-1, or can it go to any of three
> pools?
> >>>>
> >>> Here, Marvell NIC supports two HW pools, SPB(small packet buffer)
> >>> pool and
> >> LPB(large packet buffer) pool.
> >>> SPB pool can hold up to 4KB
> >>> LPB pool can hold anything more than 4KB Smaller packets are
> >>> received on SPB pool and larger packets on LPB pool, based on the RQ
> configuration.
> >>> Here, in our case HW pools holds whole packet. So if a packet is
> >>> divided into segments, lower layer HW going to receive all segments
> >>> of the packet and then going to place the whole packet in SPB/LPB
> >>> pool, based
> >> on the packet length.
> >>>
> >>
> >> If the packet is bigger than 4KB, you have two options,
> >> 1- Use multiple chained buffers in SPB
> >> 2- Use single LPB buffer
> >>
> >> As I understand (2) is used in this case, but I think we should
> >> clarify how this feature works with 'RTE_ETH_RX_OFFLOAD_SCATTER'
> >> offload, if it is requested by user.
> >>
> >> Or lets say HW has two pools with 1K and 2K sizes, what is expected
> >> with 4K packet, with or without scattered Rx offload?
> >>
> >
> > As mentioned, Marvell supports two pools, pool-1(SPB) and pool-2(LPB)
> > If the packet length is within pool-1 length and has only one segment then the
> packet is allocated from pool-1.
> > If the packet length is greater than pool-1 or has more than one segment then
> the packet is allocated from pool-2.
> >
> > So, here packets with a single segment and length less than 1K are
> > allocated from pool-1 and packets with multiple segments or packets with
> length greater than 1K are allocated from pool-2.
> >
> 
> To have multiple segment or not is HW configuration, it is not external variable.
> Drivers mostly decide to configure HW to receive multiple segment or not based
> on buffer size and max packet size device support.
> In this case since buffer size is not fixed, there are multiple buffer sizes, how
> driver will configure HW?
> 
> This is not specific to Marvell HW, for the case multiple mempool supported, it is
> better to clarify in this patch how it is works with
> 'RTE_ETH_RX_OFFLOAD_SCATTER' offload.
> 

Here, application sends multiple pools with different buffer lengths to PMD and PMD further programs HW depending on its architecture.
Similarly, in this case, where multiple HW pools are present,  with 'RTE_ETH_RX_OFFLOAD_SCATTER' enabled, PMD receives packets/segments based on the HW architecture.
Depending the architecture, if any extra programming is required(either to implement some logic or HW programming) in RX path, its to be implemented in that NIC PMD.
As far as multiple pool support is considered, in Marvell case, once HW pools are programmed in control path, there is nothing to be done on fast path.

So, I think, It depends on HW architecture of how multiple pool and multiple segment receive is implemented.

Please suggest, if I am missing any generic scenario(use case) here.

> >>> As pools are picked based on the packets length we used SORT term.
> >>> In case
> >> you have any better term(word), please suggest.
> >>>
> >>
> >> what about multiple pool, like RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL, I
> >> think it is more clear but I would like to get more comments from
> >> others, naming is hard ;)
> >>
> > Yes, RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL is clearer than
> RTE_ETH_RX_OFFLOAD_SORT_POOL.
> > Thanks for the suggestion.
> > Will upload V4 with RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL.
> >
> >>>>
> >>>> And I don't see any change in the 'net/cnxk' Rx burst code, when
> >>>> multiple mempool used, while filling the mbufs shouldn't it check
> >>>> which mempool is filled. How this works without update in the Rx
> >>>> burst code, or am I missing some implementation detail?
> >>>>
> >>> Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool
> >>> sort capability Here, in control path, HW pools are programmed based
> >>> on the
> >> inputs it received from the application.
> >>> Once the HW is programmed, packets are received on HW pools based
> >>> the
> >> packets sizes.
> >>
> >> I was expecting to changes in datapath too, something like in Rx
> >> burst function check if spb or lpb is used and update mbuf pointers
> accordingly.
> >> But it seems HW doesn't work this way, can you please explain how
> >> this feature works transparent to datapath code?
> >>
> >>>>
> >>>
> >>> I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD,
> >>> unless
> >> If you have any other suggestion/thoughts.
> >>>
> >>
> >> <...>
> >
  
Ferruh Yigit Sept. 13, 2022, 9:28 a.m. UTC | #8
On 9/7/2022 10:31 PM, Hanumanth Reddy Pothula wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Sent: Wednesday, September 7, 2022 4:54 PM
>> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
>> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
>> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>
>> On 9/7/2022 8:02 AM, Hanumanth Reddy Pothula wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>>>> Sent: Tuesday, September 6, 2022 5:48 PM
>>>> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
>>>> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>> Andrew
>>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
>>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
>>>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
>>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>>>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
>>>> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
>>>> <ndabilpuram@marvell.com>
>>>> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
>>>> capability
>>>>
>>>> On 8/30/2022 1:08 PM, Hanumanth Reddy Pothula wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>>>>>> Sent: Wednesday, August 24, 2022 9:04 PM
>>>>>> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
>>>>>> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>;
>>>> Andrew
>>>>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li,
>> Xiaoyun
>>>>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
>>>>>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
>>>>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>>>>>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
>>>>>> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
>>>>>> <ndabilpuram@marvell.com>
>>>>>> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
>>>>>> capability
>>>>>>
>>>>>> External Email
>>>>>>
>>>>>> -------------------------------------------------------------------
>>>>>> --
>>>>>> -
>>>>>
>>>>>
>>>>> Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and for
>>>>> providing
>>>> your valuable feedback.
>>>>> Please find responses inline.
>>>>>
>>>>>> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
>>>>>>> Hi Hanumanth,
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Hanumanth Pothula <hpothula@marvell.com>
>>>>>>>> Sent: Saturday, August 13, 2022 1:25 AM
>>>>>>>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>>>>>>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
>>>>>>>> <andrew.rybchenko@oktetlabs.ru>
>>>>>>>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
>>>>>>>> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
>>>>>>>> stephen@networkplumber.org; Wang, YuanX
>> <yuanx.wang@intel.com>;
>>>>>>>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi
>>>>>>>> Z <qi.z.zhang@intel.com>; viacheslavo@nvidia.com;
>>>>>>>> jerinj@marvell.com; ndabilpuram@marvell.com; Hanumanth Pothula
>>>>>>>> <hpothula@marvell.com>
>>>>>>>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
>>>>>>>>
>>>>>>>> Presently, the 'Buffer Split' feature supports sending multiple
>>>>>>>> segments of the received packet to PMD, which programs the HW to
>>>>>>>> receive the packet in segments from different pools.
>>>>>>>>
>>>>>>>> This patch extends the feature to support the pool sort capability.
>>>>>>>> Some of the HW has support for choosing memory pools based on the
>>>>>>>> packet's size. The pool sort capability allows PMD to choose a
>>>>>>>> memory pool based on the packet's length.
>>>>>>>>
>>>>>>>> This is often useful for saving the memory where the application
>>>>>>>> can create a different pool to steer the specific size of the
>>>>>>>> packet, thus enabling effective use of memory.
>>>>>>>>
>>>>>>>> For example, let's say HW has a capability of three pools,
>>>>>>>>      - pool-1 size is 2K
>>>>>>>>      - pool-2 size is > 2K and < 4K
>>>>>>>>      - pool-3 size is > 4K
>>>>>>>> Here,
>>>>>>>>             pool-1 can accommodate packets with sizes < 2K
>>>>>>>>             pool-2 can accommodate packets with sizes > 2K and < 4K
>>>>>>>>             pool-3 can accommodate packets with sizes > 4K
>>>>>>>>
>>>>>>>> With pool sort capability enabled in SW, an application may
>>>>>>>> create three pools of different sizes and send them to PMD.
>>>>>>>> Allowing PMD to program HW based on packet lengths. So that
>>>>>>>> packets with less than 2K are received on pool-1, packets with
>>>>>>>> lengths between 2K and 4K are received on pool-2 and finally
>>>>>>>> packets greater than 4K are received on pool-
>>>>>> 3.
>>>>>>>>
>>>>>>>> The following two capabilities are added to the
>>>>>>>> rte_eth_rxseg_capa structure, 1. pool_sort --> tells pool sort capability
>> is supported by HW.
>>>>>>>> 2. max_npool --> max number of pools supported by HW.
>>>>>>>>
>>>>>>>> Defined new structure rte_eth_rxseg_sort, to be used only when
>>>>>>>> pool sort capability is present. If required this may be extended
>>>>>>>> further to support more configurations.
>>>>>>>>
>>>>>>>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
>>>>>>>>
>>>>>>>> v2:
>>>>>>>>      - Along with spec changes, uploading testpmd and driver changes.
>>>>>>>
>>>>>>> Thanks for CCing. It's an interesting feature.
>>>>>>>
>>>>>>> But I have one question here:
>>>>>>> Buffer split is for split receiving packets into multiple
>>>>>>> segments, while pool sort supports PMD to put the receiving
>>>>>>> packets into different pools
>>>>>> according to packet size.
>>>>>>> Every packet is still intact.
>>>>>>>
>>>>>>> So, at this level, pool sort does not belong to buffer split.
>>>>>>> And you already use a different function to check pool sort rather
>>>>>>> than check
>>>>>> buffer split.
>>>>>>>
>>>>>>> Should a new RX offload be introduced? like
>>>>>> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
>>>>>>>
>>>>> Please find my response below.
>>>>>>
>>>>>> Hi Hanumanth,
>>>>>>
>>>>>> I had the similar concern with the feature. I assume you want to
>>>>>> benefit from exiting config structure that gets multiple mempool as
>>>>>> argument, since this feature also needs multiple mempools, but the
>>>>>> feature is
>>>> different.
>>>>>>
>>>>>> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to
>>>>>> decide if to receive into multiple mempool or not, which doesn't
>>>>>> have
>>>> anything related split.
>>>>>> Also not sure about using the 'sort' keyword.
>>>>>> What do you think to introduce new fetaure, instead of extending
>>>>>> existing split one?
>>>>>
>>>>> Actually we thought both BUFFER_SPLIT and POOL_SORT are similar
>>>>> features where RX pools are configured in certain way and thought
>>>>> not use up one more RX offload capability, as the existing software
>>>>> architecture
>>>> can be extended to support pool_sort capability.
>>>>> Yes, as part of pool sort, there is no buffer split but pools are
>>>>> picked based on
>>>> the buffer length.
>>>>>
>>>>> Since you think it's better to use new RX offload for POOL_SORT,
>>>>> will go ahead
>>>> and implement the same.
>>>>>
>>>>>> This is optimisation, right? To enable us to use less memory for
>>>>>> the packet buffer, does it qualify to a device offload?
>>>>>>
>>>>> Yes, its qualify as a device offload and saves memory.
>>>>> Marvel NIC has a capability to receive packets on  two different
>>>>> pools based
>>>> on its length.
>>>>> Below explained more on the same.
>>>>>>
>>>>>> Also, what is the relation with segmented Rx, how a PMD decide to
>>>>>> use segmented Rx or bigger mempool? How can application can configure
>> this?
>>>>>>
>>>>>> Need to clarify the rules, based on your sample, if a 512 bytes
>>>>>> packet received, does it have to go pool-1, or can it go to any of three
>> pools?
>>>>>>
>>>>> Here, Marvell NIC supports two HW pools, SPB(small packet buffer)
>>>>> pool and
>>>> LPB(large packet buffer) pool.
>>>>> SPB pool can hold up to 4KB
>>>>> LPB pool can hold anything more than 4KB Smaller packets are
>>>>> received on SPB pool and larger packets on LPB pool, based on the RQ
>> configuration.
>>>>> Here, in our case HW pools holds whole packet. So if a packet is
>>>>> divided into segments, lower layer HW going to receive all segments
>>>>> of the packet and then going to place the whole packet in SPB/LPB
>>>>> pool, based
>>>> on the packet length.
>>>>>
>>>>
>>>> If the packet is bigger than 4KB, you have two options,
>>>> 1- Use multiple chained buffers in SPB
>>>> 2- Use single LPB buffer
>>>>
>>>> As I understand (2) is used in this case, but I think we should
>>>> clarify how this feature works with 'RTE_ETH_RX_OFFLOAD_SCATTER'
>>>> offload, if it is requested by user.
>>>>
>>>> Or lets say HW has two pools with 1K and 2K sizes, what is expected
>>>> with 4K packet, with or without scattered Rx offload?
>>>>
>>>
>>> As mentioned, Marvell supports two pools, pool-1(SPB) and pool-2(LPB)
>>> If the packet length is within pool-1 length and has only one segment then the
>> packet is allocated from pool-1.
>>> If the packet length is greater than pool-1 or has more than one segment then
>> the packet is allocated from pool-2.
>>>
>>> So, here packets with a single segment and length less than 1K are
>>> allocated from pool-1 and packets with multiple segments or packets with
>> length greater than 1K are allocated from pool-2.
>>>
>>
>> To have multiple segment or not is HW configuration, it is not external variable.
>> Drivers mostly decide to configure HW to receive multiple segment or not based
>> on buffer size and max packet size device support.
>> In this case since buffer size is not fixed, there are multiple buffer sizes, how
>> driver will configure HW?
>>
>> This is not specific to Marvell HW, for the case multiple mempool supported, it is
>> better to clarify in this patch how it is works with
>> 'RTE_ETH_RX_OFFLOAD_SCATTER' offload.
>>
> 
> Here, application sends multiple pools with different buffer lengths to PMD and PMD further programs HW depending on its architecture.
> Similarly, in this case, where multiple HW pools are present,  with 'RTE_ETH_RX_OFFLOAD_SCATTER' enabled, PMD receives packets/segments based on the HW architecture.
> Depending the architecture, if any extra programming is required(either to implement some logic or HW programming) in RX path, its to be implemented in that NIC PMD.

My intention is to clarify the relation between features, like Andrew 
suggested to have Rx segmentation on any pools including mixture of 
pools, that is an option, only please document this in the API 
documentation so that it is clear for future PMD/app developers.

> As far as multiple pool support is considered, in Marvell case, once HW pools are programmed in control path, there is nothing to be done on fast path.
> 

Out of curiosity, how this is transparent in fast path. Normally mbuf 
will point to a data buffer, if there are multiple pools, how this is 
not effected?
Is there any redirection, like mbuf buffer points to some kind of 
metedata (event etc..)?

> So, I think, It depends on HW architecture of how multiple pool and multiple segment receive is implemented.
> 
> Please suggest, if I am missing any generic scenario(use case) here.
> 
>>>>> As pools are picked based on the packets length we used SORT term.
>>>>> In case
>>>> you have any better term(word), please suggest.
>>>>>
>>>>
>>>> what about multiple pool, like RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL, I
>>>> think it is more clear but I would like to get more comments from
>>>> others, naming is hard ;)
>>>>
>>> Yes, RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL is clearer than
>> RTE_ETH_RX_OFFLOAD_SORT_POOL.
>>> Thanks for the suggestion.
>>> Will upload V4 with RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL.
>>>
>>>>>>
>>>>>> And I don't see any change in the 'net/cnxk' Rx burst code, when
>>>>>> multiple mempool used, while filling the mbufs shouldn't it check
>>>>>> which mempool is filled. How this works without update in the Rx
>>>>>> burst code, or am I missing some implementation detail?
>>>>>>
>>>>> Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool
>>>>> sort capability Here, in control path, HW pools are programmed based
>>>>> on the
>>>> inputs it received from the application.
>>>>> Once the HW is programmed, packets are received on HW pools based
>>>>> the
>>>> packets sizes.
>>>>
>>>> I was expecting to changes in datapath too, something like in Rx
>>>> burst function check if spb or lpb is used and update mbuf pointers
>> accordingly.
>>>> But it seems HW doesn't work this way, can you please explain how
>>>> this feature works transparent to datapath code?
>>>>
>>>>>>
>>>>>
>>>>> I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD,
>>>>> unless
>>>> If you have any other suggestion/thoughts.
>>>>>
>>>>
>>>> <...>
>>>
>
  
Hanumanth Pothula Sept. 13, 2022, 10 a.m. UTC | #9
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Tuesday, September 13, 2022 2:59 PM
> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort capability
> 
> On 9/7/2022 10:31 PM, Hanumanth Reddy Pothula wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >> Sent: Wednesday, September 7, 2022 4:54 PM
> >> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
> >> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Andrew
> >> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> >> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> >> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
> >> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> >> <ndabilpuram@marvell.com>
> >> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
> >> capability
> >>
> >> On 9/7/2022 8:02 AM, Hanumanth Reddy Pothula wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >>>> Sent: Tuesday, September 6, 2022 5:48 PM
> >>>> To: Hanumanth Reddy Pothula <hpothula@marvell.com>; Ding, Xuan
> >>>> <xuan.ding@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> Andrew
> >>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li,
> Xiaoyun
> >>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> >>>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> >>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >>>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
> >>>> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> >>>> <ndabilpuram@marvell.com>
> >>>> Subject: Re: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
> >>>> capability
> >>>>
> >>>> On 8/30/2022 1:08 PM, Hanumanth Reddy Pothula wrote:
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >>>>>> Sent: Wednesday, August 24, 2022 9:04 PM
> >>>>>> To: Ding, Xuan <xuan.ding@intel.com>; Hanumanth Reddy Pothula
> >>>>>> <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>;
> >>>> Andrew
> >>>>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> Cc: dev@dpdk.org; Wu, WenxuanX <wenxuanx.wu@intel.com>; Li,
> >> Xiaoyun
> >>>>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang, YuanX
> >>>>>> <yuanx.wang@intel.com>; mdr@ashroe.eu; Zhang, Yuying
> >>>>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >>>>>> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran
> >>>>>> <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> >>>>>> <ndabilpuram@marvell.com>
> >>>>>> Subject: [EXT] Re: [PATCH v2 1/3] ethdev: introduce pool sort
> >>>>>> capability
> >>>>>>
> >>>>>> External Email
> >>>>>>
> >>>>>> -----------------------------------------------------------------
> >>>>>> --
> >>>>>> --
> >>>>>> -
> >>>>>
> >>>>>
> >>>>> Thanks Ding Xuan and Ferruh Yigit for reviewing the changes and
> >>>>> for providing
> >>>> your valuable feedback.
> >>>>> Please find responses inline.
> >>>>>
> >>>>>> On 8/23/2022 4:26 AM, Ding, Xuan wrote:
> >>>>>>> Hi Hanumanth,
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Hanumanth Pothula <hpothula@marvell.com>
> >>>>>>>> Sent: Saturday, August 13, 2022 1:25 AM
> >>>>>>>> To: Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >>>>>>>> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> >>>>>>>> <andrew.rybchenko@oktetlabs.ru>
> >>>>>>>> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Wu,
> >>>>>>>> WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> >>>>>>>> <xiaoyun.li@intel.com>; stephen@networkplumber.org; Wang,
> YuanX
> >> <yuanx.wang@intel.com>;
> >>>>>>>> mdr@ashroe.eu; Zhang, Yuying <yuying.zhang@intel.com>; Zhang,
> >>>>>>>> Qi Z <qi.z.zhang@intel.com>; viacheslavo@nvidia.com;
> >>>>>>>> jerinj@marvell.com; ndabilpuram@marvell.com; Hanumanth Pothula
> >>>>>>>> <hpothula@marvell.com>
> >>>>>>>> Subject: [PATCH v2 1/3] ethdev: introduce pool sort capability
> >>>>>>>>
> >>>>>>>> Presently, the 'Buffer Split' feature supports sending multiple
> >>>>>>>> segments of the received packet to PMD, which programs the HW
> >>>>>>>> to receive the packet in segments from different pools.
> >>>>>>>>
> >>>>>>>> This patch extends the feature to support the pool sort capability.
> >>>>>>>> Some of the HW has support for choosing memory pools based on
> >>>>>>>> the packet's size. The pool sort capability allows PMD to
> >>>>>>>> choose a memory pool based on the packet's length.
> >>>>>>>>
> >>>>>>>> This is often useful for saving the memory where the
> >>>>>>>> application can create a different pool to steer the specific
> >>>>>>>> size of the packet, thus enabling effective use of memory.
> >>>>>>>>
> >>>>>>>> For example, let's say HW has a capability of three pools,
> >>>>>>>>      - pool-1 size is 2K
> >>>>>>>>      - pool-2 size is > 2K and < 4K
> >>>>>>>>      - pool-3 size is > 4K
> >>>>>>>> Here,
> >>>>>>>>             pool-1 can accommodate packets with sizes < 2K
> >>>>>>>>             pool-2 can accommodate packets with sizes > 2K and < 4K
> >>>>>>>>             pool-3 can accommodate packets with sizes > 4K
> >>>>>>>>
> >>>>>>>> With pool sort capability enabled in SW, an application may
> >>>>>>>> create three pools of different sizes and send them to PMD.
> >>>>>>>> Allowing PMD to program HW based on packet lengths. So that
> >>>>>>>> packets with less than 2K are received on pool-1, packets with
> >>>>>>>> lengths between 2K and 4K are received on pool-2 and finally
> >>>>>>>> packets greater than 4K are received on pool-
> >>>>>> 3.
> >>>>>>>>
> >>>>>>>> The following two capabilities are added to the
> >>>>>>>> rte_eth_rxseg_capa structure, 1. pool_sort --> tells pool sort
> >>>>>>>> capability
> >> is supported by HW.
> >>>>>>>> 2. max_npool --> max number of pools supported by HW.
> >>>>>>>>
> >>>>>>>> Defined new structure rte_eth_rxseg_sort, to be used only when
> >>>>>>>> pool sort capability is present. If required this may be
> >>>>>>>> extended further to support more configurations.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> >>>>>>>>
> >>>>>>>> v2:
> >>>>>>>>      - Along with spec changes, uploading testpmd and driver changes.
> >>>>>>>
> >>>>>>> Thanks for CCing. It's an interesting feature.
> >>>>>>>
> >>>>>>> But I have one question here:
> >>>>>>> Buffer split is for split receiving packets into multiple
> >>>>>>> segments, while pool sort supports PMD to put the receiving
> >>>>>>> packets into different pools
> >>>>>> according to packet size.
> >>>>>>> Every packet is still intact.
> >>>>>>>
> >>>>>>> So, at this level, pool sort does not belong to buffer split.
> >>>>>>> And you already use a different function to check pool sort
> >>>>>>> rather than check
> >>>>>> buffer split.
> >>>>>>>
> >>>>>>> Should a new RX offload be introduced? like
> >>>>>> "RTE_ETH_RX_OFFLOAD_POOL_SORT".
> >>>>>>>
> >>>>> Please find my response below.
> >>>>>>
> >>>>>> Hi Hanumanth,
> >>>>>>
> >>>>>> I had the similar concern with the feature. I assume you want to
> >>>>>> benefit from exiting config structure that gets multiple mempool
> >>>>>> as argument, since this feature also needs multiple mempools, but
> >>>>>> the feature is
> >>>> different.
> >>>>>>
> >>>>>> It looks to me wrong to check 'OFFLOAD_BUFFER_SPLIT' offload to
> >>>>>> decide if to receive into multiple mempool or not, which doesn't
> >>>>>> have
> >>>> anything related split.
> >>>>>> Also not sure about using the 'sort' keyword.
> >>>>>> What do you think to introduce new fetaure, instead of extending
> >>>>>> existing split one?
> >>>>>
> >>>>> Actually we thought both BUFFER_SPLIT and POOL_SORT are similar
> >>>>> features where RX pools are configured in certain way and thought
> >>>>> not use up one more RX offload capability, as the existing
> >>>>> software architecture
> >>>> can be extended to support pool_sort capability.
> >>>>> Yes, as part of pool sort, there is no buffer split but pools are
> >>>>> picked based on
> >>>> the buffer length.
> >>>>>
> >>>>> Since you think it's better to use new RX offload for POOL_SORT,
> >>>>> will go ahead
> >>>> and implement the same.
> >>>>>
> >>>>>> This is optimisation, right? To enable us to use less memory for
> >>>>>> the packet buffer, does it qualify to a device offload?
> >>>>>>
> >>>>> Yes, its qualify as a device offload and saves memory.
> >>>>> Marvel NIC has a capability to receive packets on  two different
> >>>>> pools based
> >>>> on its length.
> >>>>> Below explained more on the same.
> >>>>>>
> >>>>>> Also, what is the relation with segmented Rx, how a PMD decide to
> >>>>>> use segmented Rx or bigger mempool? How can application can
> >>>>>> configure
> >> this?
> >>>>>>
> >>>>>> Need to clarify the rules, based on your sample, if a 512 bytes
> >>>>>> packet received, does it have to go pool-1, or can it go to any
> >>>>>> of three
> >> pools?
> >>>>>>
> >>>>> Here, Marvell NIC supports two HW pools, SPB(small packet buffer)
> >>>>> pool and
> >>>> LPB(large packet buffer) pool.
> >>>>> SPB pool can hold up to 4KB
> >>>>> LPB pool can hold anything more than 4KB Smaller packets are
> >>>>> received on SPB pool and larger packets on LPB pool, based on the
> >>>>> RQ
> >> configuration.
> >>>>> Here, in our case HW pools holds whole packet. So if a packet is
> >>>>> divided into segments, lower layer HW going to receive all
> >>>>> segments of the packet and then going to place the whole packet in
> >>>>> SPB/LPB pool, based
> >>>> on the packet length.
> >>>>>
> >>>>
> >>>> If the packet is bigger than 4KB, you have two options,
> >>>> 1- Use multiple chained buffers in SPB
> >>>> 2- Use single LPB buffer
> >>>>
> >>>> As I understand (2) is used in this case, but I think we should
> >>>> clarify how this feature works with 'RTE_ETH_RX_OFFLOAD_SCATTER'
> >>>> offload, if it is requested by user.
> >>>>
> >>>> Or lets say HW has two pools with 1K and 2K sizes, what is expected
> >>>> with 4K packet, with or without scattered Rx offload?
> >>>>
> >>>
> >>> As mentioned, Marvell supports two pools, pool-1(SPB) and
> >>> pool-2(LPB) If the packet length is within pool-1 length and has
> >>> only one segment then the
> >> packet is allocated from pool-1.
> >>> If the packet length is greater than pool-1 or has more than one
> >>> segment then
> >> the packet is allocated from pool-2.
> >>>
> >>> So, here packets with a single segment and length less than 1K are
> >>> allocated from pool-1 and packets with multiple segments or packets
> >>> with
> >> length greater than 1K are allocated from pool-2.
> >>>
> >>
> >> To have multiple segment or not is HW configuration, it is not external
> variable.
> >> Drivers mostly decide to configure HW to receive multiple segment or
> >> not based on buffer size and max packet size device support.
> >> In this case since buffer size is not fixed, there are multiple
> >> buffer sizes, how driver will configure HW?
> >>
> >> This is not specific to Marvell HW, for the case multiple mempool
> >> supported, it is better to clarify in this patch how it is works with
> >> 'RTE_ETH_RX_OFFLOAD_SCATTER' offload.
> >>
> >
> > Here, application sends multiple pools with different buffer lengths to PMD
> and PMD further programs HW depending on its architecture.
> > Similarly, in this case, where multiple HW pools are present,  with
> 'RTE_ETH_RX_OFFLOAD_SCATTER' enabled, PMD receives packets/segments
> based on the HW architecture.
> > Depending the architecture, if any extra programming is required(either to
> implement some logic or HW programming) in RX path, its to be implemented in
> that NIC PMD.
> 
> My intention is to clarify the relation between features, like Andrew suggested
> to have Rx segmentation on any pools including mixture of pools, that is an
> option, only please document this in the API documentation so that it is clear for
> future PMD/app developers.

Sure, will document this in the API documentation.
 
> 
> > As far as multiple pool support is considered, in Marvell case, once HW pools
> are programmed in control path, there is nothing to be done on fast path.
> >
> 
> Out of curiosity, how this is transparent in fast path. Normally mbuf will point to
> a data buffer, if there are multiple pools, how this is not effected?
> Is there any redirection, like mbuf buffer points to some kind of metedata (event
> etc..)?
> 
> > So, I think, It depends on HW architecture of how multiple pool and multiple
> segment receive is implemented.
> >
> > Please suggest, if I am missing any generic scenario(use case) here.
> >
> >>>>> As pools are picked based on the packets length we used SORT term.
> >>>>> In case
> >>>> you have any better term(word), please suggest.
> >>>>>
> >>>>
> >>>> what about multiple pool, like RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL, I
> >>>> think it is more clear but I would like to get more comments from
> >>>> others, naming is hard ;)
> >>>>
> >>> Yes, RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL is clearer than
> >> RTE_ETH_RX_OFFLOAD_SORT_POOL.
> >>> Thanks for the suggestion.
> >>> Will upload V4 with RTE_ETH_RX_OFFLOAD_MULTIPLE_POOL.
> >>>
> >>>>>>
> >>>>>> And I don't see any change in the 'net/cnxk' Rx burst code, when
> >>>>>> multiple mempool used, while filling the mbufs shouldn't it check
> >>>>>> which mempool is filled. How this works without update in the Rx
> >>>>>> burst code, or am I missing some implementation detail?
> >>>>>>
> >>>>> Please find PMD changes in patch [v2,3/3] net/cnxk: introduce pool
> >>>>> sort capability Here, in control path, HW pools are programmed
> >>>>> based on the
> >>>> inputs it received from the application.
> >>>>> Once the HW is programmed, packets are received on HW pools based
> >>>>> the
> >>>> packets sizes.
> >>>>
> >>>> I was expecting to changes in datapath too, something like in Rx
> >>>> burst function check if spb or lpb is used and update mbuf pointers
> >> accordingly.
> >>>> But it seems HW doesn't work this way, can you please explain how
> >>>> this feature works transparent to datapath code?
> >>>>
> >>>>>>
> >>>>>
> >>>>> I will upload V3 where POOL_SORT is implemented as new RX OFFLOAD,
> >>>>> unless
> >>>> If you have any other suggestion/thoughts.
> >>>>>
> >>>>
> >>>> <...>
> >>>
> >
  

Patch

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..7fd5443eb8 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1635,7 +1635,55 @@  rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
+rte_eth_rx_queue_check_sort(const struct rte_eth_rxseg *rx_seg,
+			     uint16_t n_seg, uint32_t *mbp_buf_size,
+			     const struct rte_eth_dev_info *dev_info)
+{
+	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
+	uint16_t seg_idx;
+
+	if (!seg_capa->multi_pools || n_seg > seg_capa->max_npool) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid capabilities, multi_pools:%d differnt length segments %u exceed supported %u\n",
+			       seg_capa->multi_pools, n_seg, seg_capa->max_nseg);
+		return -EINVAL;
+	}
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mpl = rx_seg[seg_idx].sort.mp;
+		uint32_t length = rx_seg[seg_idx].sort.length;
+
+		if (mpl == NULL) {
+			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
+			return -EINVAL;
+		}
+
+		if (mpl->private_data_size <
+			sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR,
+				       "%s private_data_size %u < %u\n",
+				       mpl->name, mpl->private_data_size,
+				       (unsigned int)sizeof
+					(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
+		length = length != 0 ? length : (*mbp_buf_size - RTE_PKTMBUF_HEADROOM);
+		if (*mbp_buf_size < length + RTE_PKTMBUF_HEADROOM) {
+			RTE_ETHDEV_LOG(ERR,
+				       "%s mbuf_data_room_size %u < %u))\n",
+				       mpl->name, *mbp_buf_size,
+				       length);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int
+rte_eth_rx_queue_check_split(const struct rte_eth_rxseg *rx_seg,
 			     uint16_t n_seg, uint32_t *mbp_buf_size,
 			     const struct rte_eth_dev_info *dev_info)
 {
@@ -1654,12 +1702,12 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 	 * Check the sizes and offsets against buffer sizes
 	 * for each segment specified in extended configuration.
 	 */
-	mp_first = rx_seg[0].mp;
+	mp_first = rx_seg[0].split.mp;
 	offset_mask = RTE_BIT32(seg_capa->offset_align_log2) - 1;
 	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
-		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
-		uint32_t length = rx_seg[seg_idx].length;
-		uint32_t offset = rx_seg[seg_idx].offset;
+		struct rte_mempool *mpl = rx_seg[seg_idx].split.mp;
+		uint32_t length = rx_seg[seg_idx].split.length;
+		uint32_t offset = rx_seg[seg_idx].split.offset;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1693,7 +1741,11 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
+		/* On segment length == 0, update segment's length with
+		 * the pool's length - headeroom space, to make sure enough
+		 * space is accomidate for header.
+		 **/
+		length = length != 0 ? length : (*mbp_buf_size - RTE_PKTMBUF_HEADROOM);
 		if (*mbp_buf_size < length + offset) {
 			RTE_ETHDEV_LOG(ERR,
 				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
@@ -1764,7 +1816,6 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			return -EINVAL;
 		}
 	} else {
-		const struct rte_eth_rxseg_split *rx_seg;
 		uint16_t n_seg;
 
 		/* Extended multi-segment configuration check. */
@@ -1774,13 +1825,27 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			return -EINVAL;
 		}
 
-		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
-							   &mbp_buf_size,
-							   &dev_info);
+			ret = -1; /* To make sure at least one of below conditions becomes true */
+
+			/* Check both NIX and application supports buffer-split capability */
+			if (dev_info.rx_seg_capa.mode_split &&
+			    rx_conf->mode_flag == RTE_ETH_RXSEG_MODE_SPLIT) {
+				ret = rte_eth_rx_queue_check_split(rx_conf->rx_seg, n_seg,
+								   &mbp_buf_size,
+								   &dev_info);
+			}
+
+			/* Check both NIX and application supports pool-sort capability */
+			if (dev_info.rx_seg_capa.mode_sort &&
+			    rx_conf->mode_flag == RTE_ETH_RXSEG_MODE_SORT) {
+				ret = rte_eth_rx_queue_check_sort(rx_conf->rx_seg, n_seg,
+								  &mbp_buf_size,
+								  &dev_info);
+			}
+
 			if (ret != 0)
 				return ret;
 		} else {
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..9f6787d7ad 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1204,16 +1204,46 @@  struct rte_eth_rxseg_split {
 	uint32_t reserved; /**< Reserved field. */
 };
 
+/**
+ * The pool sort capability allows PMD to choose a memory pool based on the
+ * packet's length. So, basically, PMD programs HW for receiving packets from
+ * different pools, based on the packet's length.
+ *
+ * This is often useful for saving the memory where the application can create
+ * a different pool to steer the specific size of the packet, thus enabling
+ * effective use of memory.
+ */
+struct rte_eth_rxseg_sort {
+	struct rte_mempool *mp; /**< Memory pool to allocate packets from. */
+	uint16_t length; /**< Packet data length. */
+	uint32_t reserved; /**< Reserved field. */
+};
+
+enum rte_eth_rxseg_mode {
+	/**
+	 * Buffer split mode: PMD split the received packets into multiple segments.
+	 * @see struct rte_eth_rxseg_split
+	 */
+	RTE_ETH_RXSEG_MODE_SPLIT = RTE_BIT64(0),
+	/**
+	 * Pool sort mode: PMD to chooses a memory pool based on the packet's length.
+	 * @see struct rte_eth_rxseg_sort
+	 */
+	RTE_ETH_RXSEG_MODE_SORT  = RTE_BIT64(1),
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
  *
  * A common structure used to describe Rx packet segment properties.
  */
-union rte_eth_rxseg {
+struct rte_eth_rxseg {
 	/* The settings for buffer split offload. */
 	struct rte_eth_rxseg_split split;
-	/* The other features settings should be added here. */
+
+	/*The settings for packet sort offload. */
+	struct rte_eth_rxseg_sort sort;
 };
 
 /**
@@ -1239,6 +1269,11 @@  struct rte_eth_rxconf {
 	 * fields on rte_eth_dev_info structure are allowed to be set.
 	 */
 	uint64_t offloads;
+	/**
+	 * PMD may support more than one rxseg mode. This allows application
+	 * to chose which mode to enable.
+	 */
+	enum rte_eth_rxseg_mode mode_flag;
 	/**
 	 * Points to the array of segment descriptions for an entire packet.
 	 * Array elements are properties for consecutive Rx segments.
@@ -1246,7 +1281,7 @@  struct rte_eth_rxconf {
 	 * The supported capabilities of receiving segmentation is reported
 	 * in rte_eth_dev_info.rx_seg_capa field.
 	 */
-	union rte_eth_rxseg *rx_seg;
+	struct rte_eth_rxseg *rx_seg;
 
 	uint64_t reserved_64s[2]; /**< Reserved for future fields */
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
@@ -1827,10 +1862,14 @@  struct rte_eth_switch_info {
  */
 struct rte_eth_rxseg_capa {
 	__extension__
+	uint32_t mode_split : 1; /**< Supports buffer split capability @see struct rte_eth_rxseg_split */
+	uint32_t mode_sort : 1; /**< Supports pool sort capability @see struct rte_eth_rxseg_sort */
 	uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/
 	uint32_t offset_allowed:1; /**< Supports buffer offsets. */
 	uint32_t offset_align_log2:4; /**< Required offset alignment. */
 	uint16_t max_nseg; /**< Maximum amount of segments to split. */
+	/* < Maximum amount of pools that PMD can sort based on packet/segment lengths */
+	uint16_t max_npool;
 	uint16_t reserved; /**< Reserved field. */
 };