[RFC] ethdev: introduce maximum Rx buffer size

Message ID 20230808040234.12947-1-lihuisong@huawei.com (mailing list archive)
State Not Applicable, archived
Delegated to: Ferruh Yigit
Headers
Series [RFC] ethdev: introduce maximum Rx buffer size |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

lihuisong (C) Aug. 8, 2023, 4:02 a.m. UTC
  The Rx buffer size stands for the size hardware supported to receive
packets in one mbuf. The "min_rx_bufsize" is the minimum buffer hardware
supported in Rx. Actually, some engines also have the maximum buffer
specification, like, hns3. For these engines, the available data size
of one mbuf in Rx also depends on the maximum buffer hardware supported.
So introduce maximum Rx buffer size in struct rte_eth_dev_info to report
user to avoid memory waste.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 lib/ethdev/rte_ethdev.c | 1 +
 lib/ethdev/rte_ethdev.h | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)
  

Comments

Andrew Rybchenko Aug. 11, 2023, 12:07 p.m. UTC | #1
On 8/8/23 07:02, Huisong Li wrote:
> The Rx buffer size stands for the size hardware supported to receive
> packets in one mbuf. The "min_rx_bufsize" is the minimum buffer hardware
> supported in Rx. Actually, some engines also have the maximum buffer
> specification, like, hns3. For these engines, the available data size
> of one mbuf in Rx also depends on the maximum buffer hardware supported.
> So introduce maximum Rx buffer size in struct rte_eth_dev_info to report
> user to avoid memory waste.

I think that the field should be defined as for informational purposes
only (highlighted in comments). I.e. if application specifies larger Rx
buffer, driver should accept it and just pass smaller value value to HW.
Also I think it would be useful to log warning from Rx queue setup
if provided Rx buffer is larger than maximum reported by the driver.

> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> ---
>   lib/ethdev/rte_ethdev.c | 1 +
>   lib/ethdev/rte_ethdev.h | 4 ++--
>   2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 0840d2b594..6d1b92e607 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -3689,6 +3689,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>   	dev_info->min_mtu = RTE_ETHER_MIN_LEN - RTE_ETHER_HDR_LEN -
>   		RTE_ETHER_CRC_LEN;
>   	dev_info->max_mtu = UINT16_MAX;
> +	dev_info->max_rx_bufsize = UINT32_MAX;
>   
>   	if (*dev->dev_ops->dev_infos_get == NULL)
>   		return -ENOTSUP;
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04a2564f22..1f0ab9c5d8 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1779,8 +1779,8 @@ struct rte_eth_dev_info {
>   	struct rte_eth_switch_info switch_info;
>   	/** Supported error handling mode. */
>   	enum rte_eth_err_handle_mode err_handle_mode;
> -
> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> +	uint32_t max_rx_bufsize; /**< Maximum size of Rx buffer. */

IMHO, comment should be aligned similar to comments below.
Since the next release is ABI breaking, I think it should be put
nearby min_rx_bufsize to make it easier to notice it.

> +	uint32_t reserved_32s[3]; /**< Reserved for future fields */
>   	void *reserved_ptrs[2];   /**< Reserved for future fields */
>   };
>
  
lihuisong (C) Aug. 15, 2023, 8:16 a.m. UTC | #2
Hi Andrew,
Thanks for your review.


在 2023/8/11 20:07, Andrew Rybchenko 写道:
> On 8/8/23 07:02, Huisong Li wrote:
>> The Rx buffer size stands for the size hardware supported to receive
>> packets in one mbuf. The "min_rx_bufsize" is the minimum buffer hardware
>> supported in Rx. Actually, some engines also have the maximum buffer
>> specification, like, hns3. For these engines, the available data size
>> of one mbuf in Rx also depends on the maximum buffer hardware supported.
>> So introduce maximum Rx buffer size in struct rte_eth_dev_info to report
>> user to avoid memory waste.
>
> I think that the field should be defined as for informational purposes
> only (highlighted in comments). I.e. if application specifies larger Rx
> buffer, driver should accept it and just pass smaller value value to HW.
Ok, will add it.
> Also I think it would be useful to log warning from Rx queue setup
> if provided Rx buffer is larger than maximum reported by the driver.
Ack
>
>>
>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>> ---
>>   lib/ethdev/rte_ethdev.c | 1 +
>>   lib/ethdev/rte_ethdev.h | 4 ++--
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index 0840d2b594..6d1b92e607 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -3689,6 +3689,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct 
>> rte_eth_dev_info *dev_info)
>>       dev_info->min_mtu = RTE_ETHER_MIN_LEN - RTE_ETHER_HDR_LEN -
>>           RTE_ETHER_CRC_LEN;
>>       dev_info->max_mtu = UINT16_MAX;
>> +    dev_info->max_rx_bufsize = UINT32_MAX;
>>         if (*dev->dev_ops->dev_infos_get == NULL)
>>           return -ENOTSUP;
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 04a2564f22..1f0ab9c5d8 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -1779,8 +1779,8 @@ struct rte_eth_dev_info {
>>       struct rte_eth_switch_info switch_info;
>>       /** Supported error handling mode. */
>>       enum rte_eth_err_handle_mode err_handle_mode;
>> -
>> -    uint64_t reserved_64s[2]; /**< Reserved for future fields */
>> +    uint32_t max_rx_bufsize; /**< Maximum size of Rx buffer. */
>
> IMHO, comment should be aligned similar to comments below.
> Since the next release is ABI breaking, I think it should be put
> nearby min_rx_bufsize to make it easier to notice it.
Yes, let's put min/max_rx_bufsize together.
>
>> +    uint32_t reserved_32s[3]; /**< Reserved for future fields */
>>       void *reserved_ptrs[2];   /**< Reserved for future fields */
>>   };
>
> .
  
Stephen Hemminger Oct. 29, 2023, 3:48 p.m. UTC | #3
On Sat, 28 Oct 2023 09:48:44 +0800
Huisong Li <lihuisong@huawei.com> wrote:

> The "min_rx_bufsize" in struct rte_eth_dev_info stands for the minimum
> Rx buffer size supported by hardware. Actually, some engines also have
> the maximum Rx buffer specification, like, hns3.
> 
> If mbuf data room size in mempool is greater then the maximum Rx buffer
> size supported by HW, the data size application used in each mbuf is just
> as much as the maximum Rx buffer size supported by HW instead of the whole
> data room size.
> 
> So introduce maximum Rx buffer size which is not enforced just to report
> user to avoid memory waste.

I am not convinced this is really necessary.
Your device will use up to 4K of buffer size, not sure why an application
would want to use much larger than that because it would be wasting
a lot of buffer space (most packets are smaller) anyway.

The only case where it might be useful is if application is using jumbo
frames (9K) and the application was not able to handle multi segment packets.
Not handling multi segment packets in SW is just programmer laziness.
  
lihuisong (C) Oct. 30, 2023, 1:25 a.m. UTC | #4
在 2023/10/29 23:48, Stephen Hemminger 写道:
> On Sat, 28 Oct 2023 09:48:44 +0800
> Huisong Li <lihuisong@huawei.com> wrote:
>
>> The "min_rx_bufsize" in struct rte_eth_dev_info stands for the minimum
>> Rx buffer size supported by hardware. Actually, some engines also have
>> the maximum Rx buffer specification, like, hns3.
>>
>> If mbuf data room size in mempool is greater then the maximum Rx buffer
>> size supported by HW, the data size application used in each mbuf is just
>> as much as the maximum Rx buffer size supported by HW instead of the whole
>> data room size.
>>
>> So introduce maximum Rx buffer size which is not enforced just to report
>> user to avoid memory waste.
> I am not convinced this is really necessary.
> Your device will use up to 4K of buffer size, not sure why an application
> would want to use much larger than that because it would be wasting
> a lot of buffer space (most packets are smaller) anyway.
>
> The only case where it might be useful is if application is using jumbo
> frames (9K) and the application was not able to handle multi segment packets.
Yeah, it is useful if user want a large packet (like, 6K) is in a mbuf.
But, in current layer, user don't know what the maximum buffer size per 
descriptor supported by HW is.
> Not handling multi segment packets in SW is just programmer laziness.
User do decide their implement based on their cases in project.
May it be a point for this that user don't want to do memcpy for multi 
segment packets and just use the first mbuf memory.

Now that there is the "min_rx_bufsize" to report in ethdev layer.
Anyway, DPDK is indeed the lack of the way to report the maximum Rx 
buffer size per hw descriptor.
>
> .
  
Stephen Hemminger Oct. 30, 2023, 6:48 p.m. UTC | #5
On Mon, 30 Oct 2023 09:25:34 +0800
"lihuisong (C)" <lihuisong@huawei.com> wrote:

> >  
> >> The "min_rx_bufsize" in struct rte_eth_dev_info stands for the
> >> minimum Rx buffer size supported by hardware. Actually, some
> >> engines also have the maximum Rx buffer specification, like, hns3.
> >>
> >> If mbuf data room size in mempool is greater then the maximum Rx
> >> buffer size supported by HW, the data size application used in
> >> each mbuf is just as much as the maximum Rx buffer size supported
> >> by HW instead of the whole data room size.
> >>
> >> So introduce maximum Rx buffer size which is not enforced just to
> >> report user to avoid memory waste.  
> > I am not convinced this is really necessary.
> > Your device will use up to 4K of buffer size, not sure why an
> > application would want to use much larger than that because it
> > would be wasting a lot of buffer space (most packets are smaller)
> > anyway.
> >
> > The only case where it might be useful is if application is using
> > jumbo frames (9K) and the application was not able to handle multi
> > segment packets.  
> Yeah, it is useful if user want a large packet (like, 6K) is in a
> mbuf. But, in current layer, user don't know what the maximum buffer
> size per descriptor supported by HW is.
> > Not handling multi segment packets in SW is just programmer
> > laziness.  
> User do decide their implement based on their cases in project.
> May it be a point for this that user don't want to do memcpy for
> multi segment packets and just use the first mbuf memory.
> 
> Now that there is the "min_rx_bufsize" to report in ethdev layer.
> Anyway, DPDK is indeed the lack of the way to report the maximum Rx 
> buffer size per hw descriptor.

My concern is that you are creating a special case for one driver.
And other drivers probably have similar upper bound.

Could the warning be better handled in the driver specific configure
routine rather than updating the ethdev API.  Something like:

   if (multi-segment-flag off) {
       if (mtu > driver max buf size) {
               return error;
   } else {
       if (mtu > driver max buf size && 
           mtu < mempool_buf_size(mp)) {
         warn that packet maybe segmented ??
       }
   }
  
lihuisong (C) Oct. 31, 2023, 2:57 a.m. UTC | #6
在 2023/10/31 2:48, Stephen Hemminger 写道:
> On Mon, 30 Oct 2023 09:25:34 +0800
> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>
>>>   
>>>> The "min_rx_bufsize" in struct rte_eth_dev_info stands for the
>>>> minimum Rx buffer size supported by hardware. Actually, some
>>>> engines also have the maximum Rx buffer specification, like, hns3.
>>>>
>>>> If mbuf data room size in mempool is greater then the maximum Rx
>>>> buffer size supported by HW, the data size application used in
>>>> each mbuf is just as much as the maximum Rx buffer size supported
>>>> by HW instead of the whole data room size.
>>>>
>>>> So introduce maximum Rx buffer size which is not enforced just to
>>>> report user to avoid memory waste.
>>> I am not convinced this is really necessary.
>>> Your device will use up to 4K of buffer size, not sure why an
>>> application would want to use much larger than that because it
>>> would be wasting a lot of buffer space (most packets are smaller)
>>> anyway.
>>>
>>> The only case where it might be useful is if application is using
>>> jumbo frames (9K) and the application was not able to handle multi
>>> segment packets.
>> Yeah, it is useful if user want a large packet (like, 6K) is in a
>> mbuf. But, in current layer, user don't know what the maximum buffer
>> size per descriptor supported by HW is.
>>> Not handling multi segment packets in SW is just programmer
>>> laziness.
>> User do decide their implement based on their cases in project.
>> May it be a point for this that user don't want to do memcpy for
>> multi segment packets and just use the first mbuf memory.
>>
>> Now that there is the "min_rx_bufsize" to report in ethdev layer.
>> Anyway, DPDK is indeed the lack of the way to report the maximum Rx
>> buffer size per hw descriptor.
> My concern is that you are creating a special case for one driver.
understand your concern.
> And other drivers probably have similar upper bound.
Yes, they also have similar upper bound.
 From the codes, the max buffer size of Most PMDs are 16K and bnxt is 
9600Byte.
Do we need to report this size? It's a common feature for all PMDs.
>
> Could the warning be better handled in the driver specific configure
> routine rather than updating the ethdev API.  Something like:
>
>     if (multi-segment-flag off) {
>         if (mtu > driver max buf size) {
>                 return error;
>     } else {
>         if (mtu > driver max buf size &&
>             mtu < mempool_buf_size(mp)) {
>           warn that packet maybe segmented ??
>         }
>     }
>
>
> .
  
Morten Brørup Oct. 31, 2023, 7:48 a.m. UTC | #7
> From: lihuisong (C) [mailto:lihuisong@huawei.com]
> Sent: Tuesday, 31 October 2023 03.58
> 
> 在 2023/10/31 2:48, Stephen Hemminger 写道:
> > On Mon, 30 Oct 2023 09:25:34 +0800
> > "lihuisong (C)" <lihuisong@huawei.com> wrote:
> >
> >>>
> >>>> The "min_rx_bufsize" in struct rte_eth_dev_info stands for the
> >>>> minimum Rx buffer size supported by hardware. Actually, some
> >>>> engines also have the maximum Rx buffer specification, like, hns3.
> >>>>
> >>>> If mbuf data room size in mempool is greater then the maximum Rx
> >>>> buffer size supported by HW, the data size application used in
> >>>> each mbuf is just as much as the maximum Rx buffer size supported
> >>>> by HW instead of the whole data room size.
> >>>>
> >>>> So introduce maximum Rx buffer size which is not enforced just to
> >>>> report user to avoid memory waste.
> >>> I am not convinced this is really necessary.
> >>> Your device will use up to 4K of buffer size, not sure why an
> >>> application would want to use much larger than that because it
> >>> would be wasting a lot of buffer space (most packets are smaller)
> >>> anyway.
> >>>
> >>> The only case where it might be useful is if application is using
> >>> jumbo frames (9K) and the application was not able to handle multi
> >>> segment packets.
> >> Yeah, it is useful if user want a large packet (like, 6K) is in a
> >> mbuf. But, in current layer, user don't know what the maximum buffer
> >> size per descriptor supported by HW is.
> >>> Not handling multi segment packets in SW is just programmer
> >>> laziness.
> >> User do decide their implement based on their cases in project.
> >> May it be a point for this that user don't want to do memcpy for
> >> multi segment packets and just use the first mbuf memory.
> >>
> >> Now that there is the "min_rx_bufsize" to report in ethdev layer.
> >> Anyway, DPDK is indeed the lack of the way to report the maximum Rx
> >> buffer size per hw descriptor.
> > My concern is that you are creating a special case for one driver.
> understand your concern.
> > And other drivers probably have similar upper bound.
> Yes, they also have similar upper bound.
>  From the codes, the max buffer size of Most PMDs are 16K and bnxt is
> 9600Byte.
> Do we need to report this size? It's a common feature for all PMDs.

I think this could be a useful feature for applications not wanting to deal with scattered packets. I don't consider such applications exotic, so I support adding this feature.

> >
> > Could the warning be better handled in the driver specific configure
> > routine rather than updating the ethdev API.  Something like:
> >
> >     if (multi-segment-flag off) {

Many drivers ignore the RTE_ETH_RX_OFFLOAD_SCATTER configuration, and only use RTE_ETH_RX_OFFLOAD_SCATTER for capabilities reporting. Perhaps this should be reported in Bugzilla?

> >         if (mtu > driver max buf size) {
> >                 return error;
> >     } else {
> >         if (mtu > driver max buf size &&
> >             mtu < mempool_buf_size(mp)) {
> >           warn that packet maybe segmented ??
> >         }
> >     }

Such a log message would also be useful.
  
Stephen Hemminger Oct. 31, 2023, 3:40 p.m. UTC | #8
On Tue, 31 Oct 2023 10:57:45 +0800
"lihuisong (C)" <lihuisong@huawei.com> wrote:

> >> User do decide their implement based on their cases in project.
> >> May it be a point for this that user don't want to do memcpy for
> >> multi segment packets and just use the first mbuf memory.
> >>
> >> Now that there is the "min_rx_bufsize" to report in ethdev layer.
> >> Anyway, DPDK is indeed the lack of the way to report the maximum Rx
> >> buffer size per hw descriptor.  
> > My concern is that you are creating a special case for one driver.  
> understand your concern.
> > And other drivers probably have similar upper bound.  
> Yes, they also have similar upper bound.
>  From the codes, the max buffer size of Most PMDs are 16K and bnxt is 
> 9600Byte.
> Do we need to report this size? It's a common feature for all PMDs.

It would make sense then to have max_rx_bufsize set to 16K by default
in ethdev, and PMD could then raise/lower based on hardware.
  
lihuisong (C) Nov. 1, 2023, 2:36 a.m. UTC | #9
Hi Stephen,

在 2023/10/31 23:40, Stephen Hemminger 写道:
> On Tue, 31 Oct 2023 10:57:45 +0800
> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>
>>>> User do decide their implement based on their cases in project.
>>>> May it be a point for this that user don't want to do memcpy for
>>>> multi segment packets and just use the first mbuf memory.
>>>>
>>>> Now that there is the "min_rx_bufsize" to report in ethdev layer.
>>>> Anyway, DPDK is indeed the lack of the way to report the maximum Rx
>>>> buffer size per hw descriptor.
>>> My concern is that you are creating a special case for one driver.
>> understand your concern.
>>> And other drivers probably have similar upper bound.
>> Yes, they also have similar upper bound.
>>   From the codes, the max buffer size of Most PMDs are 16K and bnxt is
>> 9600Byte.
>> Do we need to report this size? It's a common feature for all PMDs.
> It would make sense then to have max_rx_bufsize set to 16K by default
> in ethdev, and PMD could then raise/lower based on hardware.
It is not appropriate to set to 16K by default in ethdev layer.
Because I don't see any check for the upper bound in some driver, like 
axgbe, enetc and so on.
I'm not sure if they have no upper bound.
And some driver's maximum buffer size is "16384(16K) - 128"
So it's better to set to UINT32_MAX by default.
what do you think?
>
> .
  
Stephen Hemminger Nov. 1, 2023, 4:08 p.m. UTC | #10
On Wed, 1 Nov 2023 10:36:07 +0800
"lihuisong (C)" <lihuisong@huawei.com> wrote:

> > Do we need to report this size? It's a common feature for all PMDs.  
> > It would make sense then to have max_rx_bufsize set to 16K by default
> > in ethdev, and PMD could then raise/lower based on hardware.  
> It is not appropriate to set to 16K by default in ethdev layer.
> Because I don't see any check for the upper bound in some driver, like 
> axgbe, enetc and so on.
> I'm not sure if they have no upper bound.
> And some driver's maximum buffer size is "16384(16K) - 128"
> So it's better to set to UINT32_MAX by default.
> what do you think?

The goal is always giving application a working upper bound, and enforcing
that as much as possible in ethdev layer. It doesnt matter which pattern
does that.  Fortunately, telling application an incorrect answer is not fatal.
If over estimated, application pool would be wasting space.
If under estimated, application will get more fragmented packets.
  
lihuisong (C) Nov. 2, 2023, 1:59 a.m. UTC | #11
在 2023/11/2 0:08, Stephen Hemminger 写道:
> On Wed, 1 Nov 2023 10:36:07 +0800
> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>
>>> Do we need to report this size? It's a common feature for all PMDs.
>>> It would make sense then to have max_rx_bufsize set to 16K by default
>>> in ethdev, and PMD could then raise/lower based on hardware.
>> It is not appropriate to set to 16K by default in ethdev layer.
>> Because I don't see any check for the upper bound in some driver, like
>> axgbe, enetc and so on.
>> I'm not sure if they have no upper bound.
>> And some driver's maximum buffer size is "16384(16K) - 128"
>> So it's better to set to UINT32_MAX by default.
>> what do you think?
> The goal is always giving application a working upper bound, and enforcing
> that as much as possible in ethdev layer. It doesnt matter which pattern
> does that.  Fortunately, telling application an incorrect answer is not fatal.
> If over estimated, application pool would be wasting space.
> If under estimated, application will get more fragmented packets.
I know what you mean.
If we set UINT32_MAX, it just means that driver don't report this upper 
bound.
This is also a very common way of handling. And it has no effect on the 
drivers that doesn't report this value.
On the contrary, if we set a default value (like 16K) in ethdev, user 
may be misunderstood and confused by that, right?
After all, this isn't the real upper bound of all drivers. And this 
fixed default value may affect the behavior of some driver that I didn't 
find their upper bound.
So I'd like to keep it as UINT32_MAX.
>
> .
  
Ferruh Yigit Nov. 2, 2023, 4:23 p.m. UTC | #12
On 11/2/2023 1:59 AM, lihuisong (C) wrote:
> 
> 在 2023/11/2 0:08, Stephen Hemminger 写道:
>> On Wed, 1 Nov 2023 10:36:07 +0800
>> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>>
>>>> Do we need to report this size? It's a common feature for all PMDs.
>>>> It would make sense then to have max_rx_bufsize set to 16K by default
>>>> in ethdev, and PMD could then raise/lower based on hardware.
>>> It is not appropriate to set to 16K by default in ethdev layer.
>>> Because I don't see any check for the upper bound in some driver, like
>>> axgbe, enetc and so on.
>>> I'm not sure if they have no upper bound.
>>> And some driver's maximum buffer size is "16384(16K) - 128"
>>> So it's better to set to UINT32_MAX by default.
>>> what do you think?
>> The goal is always giving application a working upper bound, and
>> enforcing
>> that as much as possible in ethdev layer. It doesnt matter which pattern
>> does that.  Fortunately, telling application an incorrect answer is
>> not fatal.
>> If over estimated, application pool would be wasting space.
>> If under estimated, application will get more fragmented packets.
> I know what you mean.
> If we set UINT32_MAX, it just means that driver don't report this upper
> bound.
> This is also a very common way of handling. And it has no effect on the
> drivers that doesn't report this value.
> On the contrary, if we set a default value (like 16K) in ethdev, user
> may be misunderstood and confused by that, right?
> After all, this isn't the real upper bound of all drivers. And this
> fixed default value may affect the behavior of some driver that I didn't
> find their upper bound.
> So I'd like to keep it as UINT32_MAX.
> 


Hi Stephen, Morten,

I saw scattered Rx mentioned, there may be some misalignment,
the purpose of the patch is not to enable application to set as big as
possible mbuf size, so that application can escape from parsing
multi-segment mbufs.
Indeed application can provide a large mbuf anyway, to have same result,
without knowing this information.

Main motivation is other way around, device may have restriction on
buffer size that a single descriptor can address, independent from
scattered Rx used, if mbuf size is bigger than this device limit, each
mbuf will have some unused space.
Patch has intention to inform this max per mbuf/descriptor buffer size,
so that application doesn't allocate bigger mbuf and waste memory.
  
Morten Brørup Nov. 2, 2023, 4:51 p.m. UTC | #13
> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
> Sent: Thursday, 2 November 2023 17.24
> 
> On 11/2/2023 1:59 AM, lihuisong (C) wrote:
> >
> > 在 2023/11/2 0:08, Stephen Hemminger 写道:
> >> On Wed, 1 Nov 2023 10:36:07 +0800
> >> "lihuisong (C)" <lihuisong@huawei.com> wrote:
> >>
> >>>> Do we need to report this size? It's a common feature for all
> PMDs.
> >>>> It would make sense then to have max_rx_bufsize set to 16K by
> default
> >>>> in ethdev, and PMD could then raise/lower based on hardware.
> >>> It is not appropriate to set to 16K by default in ethdev layer.
> >>> Because I don't see any check for the upper bound in some driver,
> like
> >>> axgbe, enetc and so on.
> >>> I'm not sure if they have no upper bound.
> >>> And some driver's maximum buffer size is "16384(16K) - 128"
> >>> So it's better to set to UINT32_MAX by default.
> >>> what do you think?
> >> The goal is always giving application a working upper bound, and
> >> enforcing
> >> that as much as possible in ethdev layer. It doesnt matter which
> pattern
> >> does that.  Fortunately, telling application an incorrect answer is
> >> not fatal.
> >> If over estimated, application pool would be wasting space.
> >> If under estimated, application will get more fragmented packets.
> > I know what you mean.
> > If we set UINT32_MAX, it just means that driver don't report this
> upper
> > bound.
> > This is also a very common way of handling. And it has no effect on
> the
> > drivers that doesn't report this value.
> > On the contrary, if we set a default value (like 16K) in ethdev, user
> > may be misunderstood and confused by that, right?
> > After all, this isn't the real upper bound of all drivers. And this
> > fixed default value may affect the behavior of some driver that I
> didn't
> > find their upper bound.
> > So I'd like to keep it as UINT32_MAX.
> >
> 
> 
> Hi Stephen, Morten,
> 
> I saw scattered Rx mentioned, there may be some misalignment,
> the purpose of the patch is not to enable application to set as big as
> possible mbuf size, so that application can escape from parsing
> multi-segment mbufs.
> Indeed application can provide a large mbuf anyway, to have same
> result,
> without knowing this information.
> 
> Main motivation is other way around, device may have restriction on
> buffer size that a single descriptor can address, independent from
> scattered Rx used, if mbuf size is bigger than this device limit, each
> mbuf will have some unused space.
> Patch has intention to inform this max per mbuf/descriptor buffer size,
> so that application doesn't allocate bigger mbuf and waste memory.

Good point!

Let's categorize this patch series as a memory optimization for applications that support jumbo frames, but are trying to avoid (or reduce) scattered RX. :-)
  
Ferruh Yigit Nov. 2, 2023, 5:05 p.m. UTC | #14
On 11/2/2023 4:51 PM, Morten Brørup wrote:
>> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
>> Sent: Thursday, 2 November 2023 17.24
>>
>> On 11/2/2023 1:59 AM, lihuisong (C) wrote:
>>>
>>> 在 2023/11/2 0:08, Stephen Hemminger 写道:
>>>> On Wed, 1 Nov 2023 10:36:07 +0800
>>>> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>>>>
>>>>>> Do we need to report this size? It's a common feature for all
>> PMDs.
>>>>>> It would make sense then to have max_rx_bufsize set to 16K by
>> default
>>>>>> in ethdev, and PMD could then raise/lower based on hardware.
>>>>> It is not appropriate to set to 16K by default in ethdev layer.
>>>>> Because I don't see any check for the upper bound in some driver,
>> like
>>>>> axgbe, enetc and so on.
>>>>> I'm not sure if they have no upper bound.
>>>>> And some driver's maximum buffer size is "16384(16K) - 128"
>>>>> So it's better to set to UINT32_MAX by default.
>>>>> what do you think?
>>>> The goal is always giving application a working upper bound, and
>>>> enforcing
>>>> that as much as possible in ethdev layer. It doesnt matter which
>> pattern
>>>> does that.  Fortunately, telling application an incorrect answer is
>>>> not fatal.
>>>> If over estimated, application pool would be wasting space.
>>>> If under estimated, application will get more fragmented packets.
>>> I know what you mean.
>>> If we set UINT32_MAX, it just means that driver don't report this
>> upper
>>> bound.
>>> This is also a very common way of handling. And it has no effect on
>> the
>>> drivers that doesn't report this value.
>>> On the contrary, if we set a default value (like 16K) in ethdev, user
>>> may be misunderstood and confused by that, right?
>>> After all, this isn't the real upper bound of all drivers. And this
>>> fixed default value may affect the behavior of some driver that I
>> didn't
>>> find their upper bound.
>>> So I'd like to keep it as UINT32_MAX.
>>>
>>
>>
>> Hi Stephen, Morten,
>>
>> I saw scattered Rx mentioned, there may be some misalignment,
>> the purpose of the patch is not to enable application to set as big as
>> possible mbuf size, so that application can escape from parsing
>> multi-segment mbufs.
>> Indeed application can provide a large mbuf anyway, to have same
>> result,
>> without knowing this information.
>>
>> Main motivation is other way around, device may have restriction on
>> buffer size that a single descriptor can address, independent from
>> scattered Rx used, if mbuf size is bigger than this device limit, each
>> mbuf will have some unused space.
>> Patch has intention to inform this max per mbuf/descriptor buffer size,
>> so that application doesn't allocate bigger mbuf and waste memory.
> 
> Good point!
> 
> Let's categorize this patch series as a memory optimization for applications that support jumbo frames, but are trying to avoid (or reduce) scattered RX. :-)
> 

It is a memory optimization patch, but again nothing to do with jumbo
frames or scattered Rx.
  
Morten Brørup Nov. 2, 2023, 5:12 p.m. UTC | #15
> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
> Sent: Thursday, 2 November 2023 18.06
> 
> On 11/2/2023 4:51 PM, Morten Brørup wrote:
> >> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
> >> Sent: Thursday, 2 November 2023 17.24
> >>
> >> On 11/2/2023 1:59 AM, lihuisong (C) wrote:
> >>>
> >>> 在 2023/11/2 0:08, Stephen Hemminger 写道:
> >>>> On Wed, 1 Nov 2023 10:36:07 +0800
> >>>> "lihuisong (C)" <lihuisong@huawei.com> wrote:
> >>>>
> >>>>>> Do we need to report this size? It's a common feature for all
> >> PMDs.
> >>>>>> It would make sense then to have max_rx_bufsize set to 16K by
> >> default
> >>>>>> in ethdev, and PMD could then raise/lower based on hardware.
> >>>>> It is not appropriate to set to 16K by default in ethdev layer.
> >>>>> Because I don't see any check for the upper bound in some driver,
> >> like
> >>>>> axgbe, enetc and so on.
> >>>>> I'm not sure if they have no upper bound.
> >>>>> And some driver's maximum buffer size is "16384(16K) - 128"
> >>>>> So it's better to set to UINT32_MAX by default.
> >>>>> what do you think?
> >>>> The goal is always giving application a working upper bound, and
> >>>> enforcing
> >>>> that as much as possible in ethdev layer. It doesnt matter which
> >> pattern
> >>>> does that.  Fortunately, telling application an incorrect answer
> is
> >>>> not fatal.
> >>>> If over estimated, application pool would be wasting space.
> >>>> If under estimated, application will get more fragmented packets.
> >>> I know what you mean.
> >>> If we set UINT32_MAX, it just means that driver don't report this
> >> upper
> >>> bound.
> >>> This is also a very common way of handling. And it has no effect on
> >> the
> >>> drivers that doesn't report this value.
> >>> On the contrary, if we set a default value (like 16K) in ethdev,
> user
> >>> may be misunderstood and confused by that, right?
> >>> After all, this isn't the real upper bound of all drivers. And this
> >>> fixed default value may affect the behavior of some driver that I
> >> didn't
> >>> find their upper bound.
> >>> So I'd like to keep it as UINT32_MAX.
> >>>
> >>
> >>
> >> Hi Stephen, Morten,
> >>
> >> I saw scattered Rx mentioned, there may be some misalignment,
> >> the purpose of the patch is not to enable application to set as big
> as
> >> possible mbuf size, so that application can escape from parsing
> >> multi-segment mbufs.
> >> Indeed application can provide a large mbuf anyway, to have same
> >> result,
> >> without knowing this information.
> >>
> >> Main motivation is other way around, device may have restriction on
> >> buffer size that a single descriptor can address, independent from
> >> scattered Rx used, if mbuf size is bigger than this device limit,
> each
> >> mbuf will have some unused space.
> >> Patch has intention to inform this max per mbuf/descriptor buffer
> size,
> >> so that application doesn't allocate bigger mbuf and waste memory.
> >
> > Good point!
> >
> > Let's categorize this patch series as a memory optimization for
> applications that support jumbo frames, but are trying to avoid (or
> reduce) scattered RX. :-)
> >
> 
> It is a memory optimization patch, but again nothing to do with jumbo
> frames or scattered Rx.

I expect all NICs to support standard Ethernet frames without scattered RX.

So I consider this patch related to jumbo frames (and non-scattered RX). Is there any other use case?
  
Ferruh Yigit Nov. 2, 2023, 5:35 p.m. UTC | #16
On 11/2/2023 5:12 PM, Morten Brørup wrote:
>> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
>> Sent: Thursday, 2 November 2023 18.06
>>
>> On 11/2/2023 4:51 PM, Morten Brørup wrote:
>>>> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
>>>> Sent: Thursday, 2 November 2023 17.24
>>>>
>>>> On 11/2/2023 1:59 AM, lihuisong (C) wrote:
>>>>>
>>>>> 在 2023/11/2 0:08, Stephen Hemminger 写道:
>>>>>> On Wed, 1 Nov 2023 10:36:07 +0800
>>>>>> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>>>>>>
>>>>>>>> Do we need to report this size? It's a common feature for all
>>>> PMDs.
>>>>>>>> It would make sense then to have max_rx_bufsize set to 16K by
>>>> default
>>>>>>>> in ethdev, and PMD could then raise/lower based on hardware.
>>>>>>> It is not appropriate to set to 16K by default in ethdev layer.
>>>>>>> Because I don't see any check for the upper bound in some driver,
>>>> like
>>>>>>> axgbe, enetc and so on.
>>>>>>> I'm not sure if they have no upper bound.
>>>>>>> And some driver's maximum buffer size is "16384(16K) - 128"
>>>>>>> So it's better to set to UINT32_MAX by default.
>>>>>>> what do you think?
>>>>>> The goal is always giving application a working upper bound, and
>>>>>> enforcing
>>>>>> that as much as possible in ethdev layer. It doesnt matter which
>>>> pattern
>>>>>> does that.  Fortunately, telling application an incorrect answer
>> is
>>>>>> not fatal.
>>>>>> If over estimated, application pool would be wasting space.
>>>>>> If under estimated, application will get more fragmented packets.
>>>>> I know what you mean.
>>>>> If we set UINT32_MAX, it just means that driver don't report this
>>>> upper
>>>>> bound.
>>>>> This is also a very common way of handling. And it has no effect on
>>>> the
>>>>> drivers that doesn't report this value.
>>>>> On the contrary, if we set a default value (like 16K) in ethdev,
>> user
>>>>> may be misunderstood and confused by that, right?
>>>>> After all, this isn't the real upper bound of all drivers. And this
>>>>> fixed default value may affect the behavior of some driver that I
>>>> didn't
>>>>> find their upper bound.
>>>>> So I'd like to keep it as UINT32_MAX.
>>>>>
>>>>
>>>>
>>>> Hi Stephen, Morten,
>>>>
>>>> I saw scattered Rx mentioned, there may be some misalignment,
>>>> the purpose of the patch is not to enable application to set as big
>> as
>>>> possible mbuf size, so that application can escape from parsing
>>>> multi-segment mbufs.
>>>> Indeed application can provide a large mbuf anyway, to have same
>>>> result,
>>>> without knowing this information.
>>>>
>>>> Main motivation is other way around, device may have restriction on
>>>> buffer size that a single descriptor can address, independent from
>>>> scattered Rx used, if mbuf size is bigger than this device limit,
>> each
>>>> mbuf will have some unused space.
>>>> Patch has intention to inform this max per mbuf/descriptor buffer
>> size,
>>>> so that application doesn't allocate bigger mbuf and waste memory.
>>>
>>> Good point!
>>>
>>> Let's categorize this patch series as a memory optimization for
>> applications that support jumbo frames, but are trying to avoid (or
>> reduce) scattered RX. :-)
>>>
>>
>> It is a memory optimization patch, but again nothing to do with jumbo
>> frames or scattered Rx.
> 
> I expect all NICs to support standard Ethernet frames without scattered RX.
> 
> So I consider this patch related to jumbo frames (and non-scattered RX). Is there any other use case?
> 

I was thinking this is mainly for miss configuration by the application,
but if done intentionally yes intention of the application can be to
receive jumbo frames.
  
lihuisong (C) Nov. 3, 2023, 2:13 a.m. UTC | #17
在 2023/11/3 0:51, Morten Brørup 写道:
>> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
>> Sent: Thursday, 2 November 2023 17.24
>>
>> On 11/2/2023 1:59 AM, lihuisong (C) wrote:
>>> 在 2023/11/2 0:08, Stephen Hemminger 写道:
>>>> On Wed, 1 Nov 2023 10:36:07 +0800
>>>> "lihuisong (C)" <lihuisong@huawei.com> wrote:
>>>>
>>>>>> Do we need to report this size? It's a common feature for all
>> PMDs.
>>>>>> It would make sense then to have max_rx_bufsize set to 16K by
>> default
>>>>>> in ethdev, and PMD could then raise/lower based on hardware.
>>>>> It is not appropriate to set to 16K by default in ethdev layer.
>>>>> Because I don't see any check for the upper bound in some driver,
>> like
>>>>> axgbe, enetc and so on.
>>>>> I'm not sure if they have no upper bound.
>>>>> And some driver's maximum buffer size is "16384(16K) - 128"
>>>>> So it's better to set to UINT32_MAX by default.
>>>>> what do you think?
>>>> The goal is always giving application a working upper bound, and
>>>> enforcing
>>>> that as much as possible in ethdev layer. It doesnt matter which
>> pattern
>>>> does that.  Fortunately, telling application an incorrect answer is
>>>> not fatal.
>>>> If over estimated, application pool would be wasting space.
>>>> If under estimated, application will get more fragmented packets.
>>> I know what you mean.
>>> If we set UINT32_MAX, it just means that driver don't report this
>> upper
>>> bound.
>>> This is also a very common way of handling. And it has no effect on
>> the
>>> drivers that doesn't report this value.
>>> On the contrary, if we set a default value (like 16K) in ethdev, user
>>> may be misunderstood and confused by that, right?
>>> After all, this isn't the real upper bound of all drivers. And this
>>> fixed default value may affect the behavior of some driver that I
>> didn't
>>> find their upper bound.
>>> So I'd like to keep it as UINT32_MAX.
>>>
>>
>> Hi Stephen, Morten,
>>
>> I saw scattered Rx mentioned, there may be some misalignment,
>> the purpose of the patch is not to enable application to set as big as
>> possible mbuf size, so that application can escape from parsing
>> multi-segment mbufs.
>> Indeed application can provide a large mbuf anyway, to have same
>> result,
>> without knowing this information.
>>
>> Main motivation is other way around, device may have restriction on
>> buffer size that a single descriptor can address, independent from
>> scattered Rx used, if mbuf size is bigger than this device limit, each
>> mbuf will have some unused space.
>> Patch has intention to inform this max per mbuf/descriptor buffer size,
>> so that application doesn't allocate bigger mbuf and waste memory.
> Good point!
+1
>
> Let's categorize this patch series as a memory optimization for applications that support jumbo frames, but are trying to avoid (or reduce) scattered RX. :-)
User can select to receive jumbo frames in one mbuf or descriptor rather 
then multi-mbuf even if no this patch.
In other words, using big Rx buffer size is just a way to receive jumbo 
frames.
>
  
Ferruh Yigit Nov. 3, 2023, 11:53 a.m. UTC | #18
On 11/3/2023 10:27 AM, Huisong Li wrote:
> The "min_rx_bufsize" in struct rte_eth_dev_info stands for the minimum
> Rx buffer size supported by hardware. Actually, some engines also have
> the maximum Rx buffer specification, like, hns3.
> 
> If mbuf data room size in mempool is greater then the maximum Rx buffer
> size supported by HW, the data size application used in each mbuf is just
> as much as the maximum Rx buffer size supported by HW instead of the whole
> data room size.
> 
> So introduce maximum Rx buffer size which is not enforced just to report
> user to avoid memory waste.
> 
> ---
> v5:
>  - fix a valiable name
>  - fix the log level and context in rte_eth_rx_queue_setup.
> 
> v4:
>  - fix the log in rte_eth_rx_queue_setup.
>  - add a comment in rel_notes.
> 
> v3:
>  - fix some comments for Morten's advice.
> 
> v2:
>  - fix some comments
>  - fix the log level when data room size is greater than maximum Rx
>    buffer size.
> 
> v1:
>  - move max_rx_bufsize to min_rx_bufsize closer in struct rte_eth_dev_info
>  - add max_rx_bufsize display in testpmd.
>  - hns3 reports maximum buffer size.
> 
> Huisong Li (3):
>   ethdev: introduce maximum Rx buffer size
>   app/testpmd: add maximum Rx buffer size display
>   net/hns3: report maximum buffer size
> 

For series,
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>


Series applied to dpdk-next-net/main, thanks.
  

Patch

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 0840d2b594..6d1b92e607 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -3689,6 +3689,7 @@  rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
 	dev_info->min_mtu = RTE_ETHER_MIN_LEN - RTE_ETHER_HDR_LEN -
 		RTE_ETHER_CRC_LEN;
 	dev_info->max_mtu = UINT16_MAX;
+	dev_info->max_rx_bufsize = UINT32_MAX;
 
 	if (*dev->dev_ops->dev_infos_get == NULL)
 		return -ENOTSUP;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04a2564f22..1f0ab9c5d8 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1779,8 +1779,8 @@  struct rte_eth_dev_info {
 	struct rte_eth_switch_info switch_info;
 	/** Supported error handling mode. */
 	enum rte_eth_err_handle_mode err_handle_mode;
-
-	uint64_t reserved_64s[2]; /**< Reserved for future fields */
+	uint32_t max_rx_bufsize; /**< Maximum size of Rx buffer. */
+	uint32_t reserved_32s[3]; /**< Reserved for future fields */
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };