[RFC] ethdev: add a field for rte_eth_rxq_info

Message ID 1592894934-57856-1-git-send-email-tangchengchang@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series [RFC] ethdev: add a field for rte_eth_rxq_info |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Chengchang Tang June 23, 2020, 6:48 a.m. UTC
  In common practice, PMD configure the rx_buf_size according to the data
room size of the object in mempool. But in fact the final value is related
to the specifications of hw, and its values will affect the number of
fragments in recieving pkts.

At present, we seem to have no way to espose relevant information to upper
layer users.

Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
size used in recieving pkts for hw.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
---
 lib/librte_ethdev/rte_ethdev.h | 1 +
 1 file changed, 1 insertion(+)

--
2.7.4
  

Comments

Andrew Rybchenko June 23, 2020, 9:30 a.m. UTC | #1
On 6/23/20 9:48 AM, Chengchang Tang wrote:
> In common practice, PMD configure the rx_buf_size according to the data
> room size of the object in mempool. But in fact the final value is related
> to the specifications of hw, and its values will affect the number of
> fragments in recieving pkts.
> 
> At present, we seem to have no way to espose relevant information to upper
> layer users.
> 
> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
> size used in recieving pkts for hw.
> 

I'm OK with the change in general.
I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
ethdev.

I think it is important to update PMDs which provides the
information to fill the field in.

> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

> ---
>  lib/librte_ethdev/rte_ethdev.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 0f6d053..82b7e98 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>  	uint16_t nb_desc;           /**< configured number of RXDs. */
> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>  } __rte_cache_min_aligned;
> 
>  /**
> --
> 2.7.4
>
  
Stephen Hemminger June 23, 2020, 2:48 p.m. UTC | #2
On Tue, 23 Jun 2020 14:48:54 +0800
Chengchang Tang <tangchengchang@huawei.com> wrote:

> In common practice, PMD configure the rx_buf_size according to the data
> room size of the object in mempool. But in fact the final value is related
> to the specifications of hw, and its values will affect the number of
> fragments in recieving pkts.
> 
> At present, we seem to have no way to espose relevant information to upper
> layer users.
> 
> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
> size used in recieving pkts for hw.
> 
> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> ---
>  lib/librte_ethdev/rte_ethdev.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 0f6d053..82b7e98 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>  	uint16_t nb_desc;           /**< configured number of RXDs. */
> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>  } __rte_cache_min_aligned;
> 
>  /**
> --
> 2.7.4
> 

Will have to wait until 20.11 as it is an ABI change.
  
Andrew Rybchenko June 23, 2020, 3:22 p.m. UTC | #3
On 6/23/20 5:48 PM, Stephen Hemminger wrote:
> On Tue, 23 Jun 2020 14:48:54 +0800
> Chengchang Tang <tangchengchang@huawei.com> wrote:
> 
>> In common practice, PMD configure the rx_buf_size according to the data
>> room size of the object in mempool. But in fact the final value is related
>> to the specifications of hw, and its values will affect the number of
>> fragments in recieving pkts.
>>
>> At present, we seem to have no way to espose relevant information to upper
>> layer users.
>>
>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>> size used in recieving pkts for hw.
>>
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>> ---
>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>> index 0f6d053..82b7e98 100644
>> --- a/lib/librte_ethdev/rte_ethdev.h
>> +++ b/lib/librte_ethdev/rte_ethdev.h
>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>  } __rte_cache_min_aligned;
>>
>>  /**
>> --
>> 2.7.4
>>
> 
> Will have to wait until 20.11 as it is an ABI change.
> 

I thought about it.
If I'm not mistaken it does  not change size of the structure.
  
Chengchang Tang June 24, 2020, 3:48 a.m. UTC | #4
On 2020/6/23 17:30, Andrew Rybchenko wrote:
> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>> In common practice, PMD configure the rx_buf_size according to the data
>> room size of the object in mempool. But in fact the final value is related
>> to the specifications of hw, and its values will affect the number of
>> fragments in recieving pkts.
>>
>> At present, we seem to have no way to espose relevant information to upper
>> layer users.
>>
>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>> size used in recieving pkts for hw.
>>
> 
> I'm OK with the change in general.
> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
> ethdev.
> 
> I think it is important to update PMDs which provides the
> information to fill the field in.

My plan is to divide the subsequent series into two patches,
one to modify rte_eth_rxq_info, and one to add our hns3 PMD
implementation of rxq_info_get. Should i update all the PMDs
that provide this information and test programs such as
testpmd at the same time?
> 
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> 
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
>> ---
>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>> index 0f6d053..82b7e98 100644
>> --- a/lib/librte_ethdev/rte_ethdev.h
>> +++ b/lib/librte_ethdev/rte_ethdev.h
>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>  } __rte_cache_min_aligned;
>>
>>  /**
>> --
>> 2.7.4
>>
> 
> 
> .
>
  
Ferruh Yigit June 24, 2020, 8:52 a.m. UTC | #5
On 6/24/2020 4:48 AM, Chengchang Tang wrote:
> 
> On 2020/6/23 17:30, Andrew Rybchenko wrote:
>> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>>> In common practice, PMD configure the rx_buf_size according to the data
>>> room size of the object in mempool. But in fact the final value is related
>>> to the specifications of hw, and its values will affect the number of
>>> fragments in recieving pkts.
>>>
>>> At present, we seem to have no way to espose relevant information to upper
>>> layer users.
>>>
>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>>> size used in recieving pkts for hw.
>>>
>>
>> I'm OK with the change in general.
>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
>> ethdev.
>>
>> I think it is important to update PMDs which provides the
>> information to fill the field in.
> 
> My plan is to divide the subsequent series into two patches,
> one to modify rte_eth_rxq_info, and one to add our hns3 PMD
> implementation of rxq_info_get. Should i update all the PMDs
> that provide this information and test programs such as
> testpmd at the same time?

Hi Chengchang, Andrew,

No objection to the change, but it should be crystal clear what is added. These
are for PMD developers to implement and when it is not clear we end up having
different implementations and inconsistencies.

There is already some confusion for the Rx packet size etc.. my concern is
adding more to it, here all we have is "size of RX buffer." comment, I think we
need more.
Adding a PMD implementation and testpmd updates helps to clarify the
intention/usage, so I suggest sending them as a single patch with this one.

Updating all PMDs is a bigger ask and sometimes too hard because of lack of
knowledge on the internals of other PMDs, although this is causing feature gaps
time to time, we are not mandating this to developers, so please update as many
PMD as you can, that you are confident, rest should be done by their maintainers.

>>
>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>
>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>
>>> ---
>>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>> index 0f6d053..82b7e98 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>>  } __rte_cache_min_aligned;
>>>
>>>  /**
>>> --
>>> 2.7.4
>>>
>>
>>
>> .
>>
>
  
Ferruh Yigit June 24, 2020, 6:32 p.m. UTC | #6
On 6/24/2020 9:52 AM, Ferruh Yigit wrote:
> On 6/24/2020 4:48 AM, Chengchang Tang wrote:
>>
>> On 2020/6/23 17:30, Andrew Rybchenko wrote:
>>> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>>>> In common practice, PMD configure the rx_buf_size according to the data
>>>> room size of the object in mempool. But in fact the final value is related
>>>> to the specifications of hw, and its values will affect the number of
>>>> fragments in recieving pkts.
>>>>
>>>> At present, we seem to have no way to espose relevant information to upper
>>>> layer users.
>>>>
>>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>>>> size used in recieving pkts for hw.
>>>>
>>>
>>> I'm OK with the change in general.
>>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
>>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
>>> ethdev.
>>>
>>> I think it is important to update PMDs which provides the
>>> information to fill the field in.
>>
>> My plan is to divide the subsequent series into two patches,
>> one to modify rte_eth_rxq_info, and one to add our hns3 PMD
>> implementation of rxq_info_get. Should i update all the PMDs
>> that provide this information and test programs such as
>> testpmd at the same time?
> 
> Hi Chengchang, Andrew,
> 
> No objection to the change, but it should be crystal clear what is added. These
> are for PMD developers to implement and when it is not clear we end up having
> different implementations and inconsistencies.
> 
> There is already some confusion for the Rx packet size etc.. my concern is
> adding more to it, here all we have is "size of RX buffer." comment, I think we
> need more.

cc'ed a few more people.

Back to this favorite topic, how to configure/limit the packet size.

Can you please help to have a common/correct understanding? I tried to clarify
as much as I got it, any comment welcome. (I know it is long, please bare with me)


The related config options I can see,
1) rte_eth_conf->rxmode->max_rx_pkt_len
2) rte_eth_dev_info->max_rx_pktlen
3) DEV_RX_OFFLOAD_JUMBO_FRAME
4) rte_eth_dev->data->mtu
5) DEV_RX_OFFLOAD_SCATTER
6) dev->data->scattered_rx
7) rte_eth_dev_info->min_mtu, rte_eth_dev_info->max_mtu

'mtu' (4): Both for Tx and Rx. The network layer payload length. Default value
'RTE_ETHER_MTU'.

'max_rx_pkt_len' (1): Only for Rx, maximum Rx frame length configured by
application.

'max_rx_pktlen' (2): Device reported value on what maximum Rx frame length it
can receive. Application shouldn't set Rx frame length more than this value.

'DEV_RX_OFFLOAD_JUMBO_FRAME' (3): Device Jumbo Frame capability.
	When not enabled the Rx frame length is 'MTU' + overhead
	When enabled Rx frame length is 'max_rx_pkt_len'

'DEV_RX_OFFLOAD_SCATTER' (5): Capability to scatter packet to multiple
descriptor by device and driver converting this to chained mbuf.

'dev->data->scattered_rx' (6): The current status of driver scattered Rx, in
device data mostly for PMD internal usage.

'rte_eth_dev_info->min_mtu' & 'rte_eth_dev_info->max_mtu' (7): minimum and
maximum MTU values device supported.
'max_mtu' == 'max_rx_pkt_len' - L2_OVERHEAD.


I can see two different limits,
a) The Rx frame length limit that device can receive from wire. Any packet
larger than this size will be dropped by device in an early stage.
b) The Rx buffer length limit that received packets are written to. Device
shouldn't DMA larger than reserved buffer size.

If device supports scattered Rx to multiple descriptors, it can be possible to
configure (a) > (b).
Otherwise configuration have to be (b) >= (a).

For example if the mbuf size is 2Kb and the device can receive up to 9000 bytes.
Options are:
- If device supports it, large packet will be scattered on multiple mbufs
- or need to configure device Rx frame length to 2K (mbuf size)
- or need to allocate mbuf big enough to get largest possible packet (9000)



Issues I see:
-------------

i) Although the code clearly says 'max_rx_pkt_len' is only valid when jumbo
frames enabled, some drivers are taking it account always.

ii) Some drivers enable 'scattered_rx' & 'jumbo frame' implicitly, without
having 'DEV_RX_OFFLOAD_JUMBO_FRAME' or 'DEV_RX_OFFLOAD_SCATTER' requested by
application.

iii) Having both 'mtu' & 'max_rx_pkt_len' are confusing, although they are not
exactly same thing they are related. Difference is MTU applies for Tx too, and
L2 network layer overhead is not included.
'MTU' can be more interested by upper layers, 'max_rx_pkt_len' is more driver
level information. And driver should know how to convert one to another.

iv) 'max_rx_pkt_len' provided as part of 'rte_eth_dev_configure()' and there is
no API to update it later.
'mtu' is not part of 'rte_eth_dev_configure()', it can only be updated later
with specific API.
But driver have to keep these two values consistent.

v) 'max_rx_pktlen' & 'max_mtu' reports from driver are redundant information.
Again they are not same thing, but correlated.


Suggested changes:
-----------------

Overall unify 'max_rx_pkt_len' & 'mtu' as much as possible, at first step:

i) Make 'max_rx_pkt_len' independent from 'DEV_RX_OFFLOAD_JUMBO_FRAME', so
'max_rx_pkt_len' value will be always valid, jumbo frame enabled or not.

ii) in '.dev_configure' convert 'max_rx_pkt_len' value to 'mtu' value, this will
be only point 'max_rx_pkt_len' is used, after that point PMD will always use
'mtu' value.
Even don't reflect 'rte_eth_dev_set_mtu()' changes to 'max_rx_pkt_len' anymore.

iii) Don't make 'max_rx_pkt_len' a mandatory config option, let it be '0' by
application, in that case 'rte_eth_dev_configure()' will set
"'max_rx_pkt_len' = RTE_ETHER_MAX_LEN" if 'DEV_RX_OFFLOAD_JUMBO_FRAME' disabled
"'max_rx_pkt_len' = 9000 if 'DEV_RX_OFFLOAD_JUMBO_FRAME' enabled

iv) Allow implicit update of 'DEV_RX_OFFLOAD_JUMBO_FRAME' on MTU set, since
setting a large MTU implies the jumbo frame request. And there is no harm to
application.

v) Do NOT allow implicit update of 'DEV_RX_OFFLOAD_SCATTER' on MTU set (when Rx
frame length > Rx buffer length), since application may not be capable of
parsing chained mbufs. Instead fails the MTU set in that case.
[This can break some applications, relying on this implicit set.]


Any comments?



Additional details:
-------------------

Behavior of some drivers:

What igb & ixgbe does
- Set Rx frame limit (a) using 'max_rx_pkt_len' (1)
- Set Rx buffer limit (b) using mbuf data size
- Enable Scattered Rx (5 & 6) if the Rx frame limit (a) bigger than Rx buffer
limit (b) (even user not requested for it)

What i40e does same as above, only differences
- Return error if jumbo frame enabled and 'max_rx_pkt_len' < RTE_ETHER_MAX_LEN

sfc:
- Set Rx frame limit (a)
  - using 'max_rx_pkt_len' (1) when jumbo frame enabled
  - using 'mtu' when jumbo frame not enabled.
- Set Rx buffer limit (b) using mbuf data size
- If Rx frame limit (a) bigger than Rx buffer limit (b), and user not requested
'DEV_RX_OFFLOAD_SCATTER', return error.

octeontx2:
- Set Rx frame limit (a) using 'max_rx_pkt_len' (1). Implicitly enable jumbo
frame based on 'max_rx_pkt_len'.
- I can't able find how Rx buffer limit (b) set
- Enable Scattered Rx (5) if the Rx frame limit (a) bigger than Rx buffer limit
(b) (even user not requested for it). 'dev->data->scattered_rx' not set at all.


> Adding a PMD implementation and testpmd updates helps to clarify the
> intention/usage, so I suggest sending them as a single patch with this one.
> 
> Updating all PMDs is a bigger ask and sometimes too hard because of lack of
> knowledge on the internals of other PMDs, although this is causing feature gaps
> time to time, we are not mandating this to developers, so please update as many
> PMD as you can, that you are confident, rest should be done by their maintainers.
> 
>>>
>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>
>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>
>>>> ---
>>>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>>> index 0f6d053..82b7e98 100644
>>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>>>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>>>  } __rte_cache_min_aligned;
>>>>
>>>>  /**
>>>> --
>>>> 2.7.4
>>>>
>>>
>>>
>>> .
>>>
>>
>
  
Andrew Rybchenko June 25, 2020, 9:06 a.m. UTC | #7
On 6/24/20 9:32 PM, Ferruh Yigit wrote:
> On 6/24/2020 9:52 AM, Ferruh Yigit wrote:
>> On 6/24/2020 4:48 AM, Chengchang Tang wrote:
>>>
>>> On 2020/6/23 17:30, Andrew Rybchenko wrote:
>>>> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>>>>> In common practice, PMD configure the rx_buf_size according to the data
>>>>> room size of the object in mempool. But in fact the final value is related
>>>>> to the specifications of hw, and its values will affect the number of
>>>>> fragments in recieving pkts.
>>>>>
>>>>> At present, we seem to have no way to espose relevant information to upper
>>>>> layer users.
>>>>>
>>>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>>>>> size used in recieving pkts for hw.
>>>>>
>>>>
>>>> I'm OK with the change in general.
>>>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
>>>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
>>>> ethdev.
>>>>
>>>> I think it is important to update PMDs which provides the
>>>> information to fill the field in.
>>>
>>> My plan is to divide the subsequent series into two patches,
>>> one to modify rte_eth_rxq_info, and one to add our hns3 PMD
>>> implementation of rxq_info_get. Should i update all the PMDs
>>> that provide this information and test programs such as
>>> testpmd at the same time?
>>
>> Hi Chengchang, Andrew,
>>
>> No objection to the change, but it should be crystal clear what is added. These
>> are for PMD developers to implement and when it is not clear we end up having
>> different implementations and inconsistencies.
>>
>> There is already some confusion for the Rx packet size etc.. my concern is
>> adding more to it, here all we have is "size of RX buffer." comment, I think we
>> need more.
> 
> cc'ed a few more people.
> 
> Back to this favorite topic, how to configure/limit the packet size.
> 
> Can you please help to have a common/correct understanding? I tried to clarify
> as much as I got it, any comment welcome. (I know it is long, please bare with me)
> 
> 
> The related config options I can see,
> 1) rte_eth_conf->rxmode->max_rx_pkt_len
> 2) rte_eth_dev_info->max_rx_pktlen
> 3) DEV_RX_OFFLOAD_JUMBO_FRAME
> 4) rte_eth_dev->data->mtu
> 5) DEV_RX_OFFLOAD_SCATTER
> 6) dev->data->scattered_rx
> 7) rte_eth_dev_info->min_mtu, rte_eth_dev_info->max_mtu
> 
> 'mtu' (4): Both for Tx and Rx. The network layer payload length. Default value
> 'RTE_ETHER_MTU'.
> 
> 'max_rx_pkt_len' (1): Only for Rx, maximum Rx frame length configured by
> application.
> 
> 'max_rx_pktlen' (2): Device reported value on what maximum Rx frame length it
> can receive. Application shouldn't set Rx frame length more than this value.
> 
> 'DEV_RX_OFFLOAD_JUMBO_FRAME' (3): Device Jumbo Frame capability.
> 	When not enabled the Rx frame length is 'MTU' + overhead
> 	When enabled Rx frame length is 'max_rx_pkt_len'
> 
> 'DEV_RX_OFFLOAD_SCATTER' (5): Capability to scatter packet to multiple
> descriptor by device and driver converting this to chained mbuf.
> 
> 'dev->data->scattered_rx' (6): The current status of driver scattered Rx, in
> device data mostly for PMD internal usage.
> 
> 'rte_eth_dev_info->min_mtu' & 'rte_eth_dev_info->max_mtu' (7): minimum and
> maximum MTU values device supported.
> 'max_mtu' == 'max_rx_pkt_len' - L2_OVERHEAD.
> 
> 
> I can see two different limits,
> a) The Rx frame length limit that device can receive from wire. Any packet
> larger than this size will be dropped by device in an early stage.
> b) The Rx buffer length limit that received packets are written to. Device
> shouldn't DMA larger than reserved buffer size.
> 
> If device supports scattered Rx to multiple descriptors, it can be possible to
> configure (a) > (b).
> Otherwise configuration have to be (b) >= (a).
> 
> For example if the mbuf size is 2Kb and the device can receive up to 9000 bytes.
> Options are:
> - If device supports it, large packet will be scattered on multiple mbufs
> - or need to configure device Rx frame length to 2K (mbuf size)
> - or need to allocate mbuf big enough to get largest possible packet (9000)
> 
> 
> 
> Issues I see:
> -------------
> 
> i) Although the code clearly says 'max_rx_pkt_len' is only valid when jumbo
> frames enabled, some drivers are taking it account always.

Ack, that's not good.

> ii) Some drivers enable 'scattered_rx' & 'jumbo frame' implicitly, without
> having 'DEV_RX_OFFLOAD_JUMBO_FRAME' or 'DEV_RX_OFFLOAD_SCATTER' requested by
> application.

Ack that it is a problem especially for scatter.

> iii) Having both 'mtu' & 'max_rx_pkt_len' are confusing, although they are not
> exactly same thing they are related. Difference is MTU applies for Tx too, and
> L2 network layer overhead is not included.
> 'MTU' can be more interested by upper layers, 'max_rx_pkt_len' is more driver
> level information. And driver should know how to convert one to another.

Agree

> iv) 'max_rx_pkt_len' provided as part of 'rte_eth_dev_configure()' and there is
> no API to update it later.
> 'mtu' is not part of 'rte_eth_dev_configure()', it can only be updated later
> with specific API.
> But driver have to keep these two values consistent.

Agree

> v) 'max_rx_pktlen' & 'max_mtu' reports from driver are redundant information.
> Again they are not same thing, but correlated.

Agree

> 
> Suggested changes:
> -----------------
> 
> Overall unify 'max_rx_pkt_len' & 'mtu' as much as possible, at first step:
> 
> i) Make 'max_rx_pkt_len' independent from 'DEV_RX_OFFLOAD_JUMBO_FRAME', so
> 'max_rx_pkt_len' value will be always valid, jumbo frame enabled or not.

It will make handling of the max_rx_pkt_len mandatory in
all network PMDs.

> 
> ii) in '.dev_configure' convert 'max_rx_pkt_len' value to 'mtu' value, this will
> be only point 'max_rx_pkt_len' is used, after that point PMD will always use
> 'mtu' value.

I'm not sure that it is a right direction.
Above you say that 'max_rx_pkt_len' is more driver level and
I agree with it. I guess most drivers operate it finally
(i.e. configure underlying HW in terms of max_rx_pkt_len,
not MTU). So, converted from max_rx_pkt_len to MTU on ethdev
level and covered back from MTU to max_rx_pkt_len in drivers.

> Even don't reflect 'rte_eth_dev_set_mtu()' changes to 'max_rx_pkt_len' anymore.

Not sure that I get it. max_rx_pkt_len is used on dev_configure
only. Is it reported on get somewhere?

> 
> iii) Don't make 'max_rx_pkt_len' a mandatory config option, let it be '0' by
> application, in that case 'rte_eth_dev_configure()' will set
> "'max_rx_pkt_len' = RTE_ETHER_MAX_LEN" if 'DEV_RX_OFFLOAD_JUMBO_FRAME' disabled
> "'max_rx_pkt_len' = 9000 if 'DEV_RX_OFFLOAD_JUMBO_FRAME' enabled

Why 9000? IMHO, if max_rx_pkt_len is 0, just use value derived
from MTU.

> 
> iv) Allow implicit update of 'DEV_RX_OFFLOAD_JUMBO_FRAME' on MTU set, since
> setting a large MTU implies the jumbo frame request. And there is no harm to
> application.

Yes and I'd deprecate DEV_RX_OFFLOAD_JUMBO_FRAME.

> 
> v) Do NOT allow implicit update of 'DEV_RX_OFFLOAD_SCATTER' on MTU set (when Rx
> frame length > Rx buffer length), since application may not be capable of
> parsing chained mbufs. Instead fails the MTU set in that case.
> [This can break some applications, relying on this implicit set.]

Yes, definitely.

> 
> 
> Any comments?
> 
> 
> 
> Additional details:
> -------------------
> 
> Behavior of some drivers:
> 
> What igb & ixgbe does
> - Set Rx frame limit (a) using 'max_rx_pkt_len' (1)
> - Set Rx buffer limit (b) using mbuf data size
> - Enable Scattered Rx (5 & 6) if the Rx frame limit (a) bigger than Rx buffer
> limit (b) (even user not requested for it)
> 
> What i40e does same as above, only differences
> - Return error if jumbo frame enabled and 'max_rx_pkt_len' < RTE_ETHER_MAX_LEN
> 
> sfc:
> - Set Rx frame limit (a)
>   - using 'max_rx_pkt_len' (1) when jumbo frame enabled
>   - using 'mtu' when jumbo frame not enabled.
> - Set Rx buffer limit (b) using mbuf data size
> - If Rx frame limit (a) bigger than Rx buffer limit (b), and user not requested
> 'DEV_RX_OFFLOAD_SCATTER', return error.

Ack

> 
> octeontx2:
> - Set Rx frame limit (a) using 'max_rx_pkt_len' (1). Implicitly enable jumbo
> frame based on 'max_rx_pkt_len'.
> - I can't able find how Rx buffer limit (b) set
> - Enable Scattered Rx (5) if the Rx frame limit (a) bigger than Rx buffer limit
> (b) (even user not requested for it). 'dev->data->scattered_rx' not set at all.
> 
> 
>> Adding a PMD implementation and testpmd updates helps to clarify the
>> intention/usage, so I suggest sending them as a single patch with this one.
>>
>> Updating all PMDs is a bigger ask and sometimes too hard because of lack of
>> knowledge on the internals of other PMDs, although this is causing feature gaps
>> time to time, we are not mandating this to developers, so please update as many
>> PMD as you can, that you are confident, rest should be done by their maintainers.
>>
>>>>
>>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>>
>>>>> ---
>>>>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>>>> index 0f6d053..82b7e98 100644
>>>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>>>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>>>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>>>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>>>>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>>>>  } __rte_cache_min_aligned;
>>>>>
>>>>>  /**
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 0f6d053..82b7e98 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1306,6 +1306,7 @@  struct rte_eth_rxq_info {
 	struct rte_eth_rxconf conf; /**< queue config parameters. */
 	uint8_t scattered_rx;       /**< scattered packets RX supported. */
 	uint16_t nb_desc;           /**< configured number of RXDs. */
+	uint16_t rx_bufsize;        /**< size of RX buffer. */
 } __rte_cache_min_aligned;

 /**