[v3,01/22] ethdev: fix MTU size exceeds max rx packet length

Message ID 20210114094537.13661-1-stevex.yang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series [v3,01/22] ethdev: fix MTU size exceeds max rx packet length |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS

Commit Message

Steve Yang Jan. 14, 2021, 9:45 a.m. UTC
  Ethdev is using default Ethernet overhead to decide if provided
'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
and limits it to MAX if it is.

Since the application/driver used Ethernet overhead is different than
the ethdev one, check result is wrong.

If the driver is using Ethernet overhead bigger than the default one,
the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
correct Ethernet overhead is used to convert back, the resulting MTU is
less than the intended one, causing some packets to be dropped.

Like,
app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
driver  -> MTU = 1518 - 22 = 1496
Packets with size 1497-1500 are dropped although intention is to be able
to send/receive them.

The fix is to make ethdev use the correct Ethernet overhead for port,
instead of default one.

Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")

Signed-off-by: Steve Yang <stevex.yang@intel.com>
---
 lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)
  

Comments

Ferruh Yigit Jan. 14, 2021, 4:36 p.m. UTC | #1
On 1/14/2021 9:45 AM, Steve Yang wrote:
> Ethdev is using default Ethernet overhead to decide if provided
> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
> and limits it to MAX if it is.
> 
> Since the application/driver used Ethernet overhead is different than
> the ethdev one, check result is wrong.
> 
> If the driver is using Ethernet overhead bigger than the default one,
> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
> correct Ethernet overhead is used to convert back, the resulting MTU is
> less than the intended one, causing some packets to be dropped.
> 
> Like,
> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
> driver  -> MTU = 1518 - 22 = 1496
> Packets with size 1497-1500 are dropped although intention is to be able
> to send/receive them.
> 
> The fix is to make ethdev use the correct Ethernet overhead for port,
> instead of default one.
> 
> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
> 
> Signed-off-by: Steve Yang <stevex.yang@intel.com>

<...>

> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>   			goto rollback;
>   		}
>   	} else {
> -		if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
> -			dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
> +		uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
> +		if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
> +		    pktlen > RTE_ETHER_MTU + overhead_len)
>   			/* Use default value */
>   			dev->data->dev_conf.rxmode.max_rx_pkt_len =
> -							RTE_ETHER_MAX_LEN;
> +						RTE_ETHER_MTU + overhead_len;

What do you think removing the above check, the else block, completely?
Since the 'max_rx_pkt_len' should not be used when jumbo frame is not set.

> +	}
> +
> +	/* Scale the MTU size to adapt max_rx_pkt_len */
> +	if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
> +		dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
> +				overhead_len;
>   	}

Above if block has exact same check, why not move it above block?
  
Ferruh Yigit Jan. 14, 2021, 5:13 p.m. UTC | #2
On 1/14/2021 4:36 PM, Ferruh Yigit wrote:
> On 1/14/2021 9:45 AM, Steve Yang wrote:
>> Ethdev is using default Ethernet overhead to decide if provided
>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>> and limits it to MAX if it is.
>>
>> Since the application/driver used Ethernet overhead is different than
>> the ethdev one, check result is wrong.
>>
>> If the driver is using Ethernet overhead bigger than the default one,
>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>> correct Ethernet overhead is used to convert back, the resulting MTU is
>> less than the intended one, causing some packets to be dropped.
>>
>> Like,
>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>> driver  -> MTU = 1518 - 22 = 1496
>> Packets with size 1497-1500 are dropped although intention is to be able
>> to send/receive them.
>>
>> The fix is to make ethdev use the correct Ethernet overhead for port,
>> instead of default one.
>>
>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>
>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
> 
> <...>
> 
>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>               goto rollback;
>>           }
>>       } else {
>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>               /* Use default value */
>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>> -                            RTE_ETHER_MAX_LEN;
>> +                        RTE_ETHER_MTU + overhead_len;
> 
> What do you think removing the above check, the else block, completely?
> Since the 'max_rx_pkt_len' should not be used when jumbo frame is not set.
> 

As I tested removing this check is causing problem because some PMDs are using 
the 'max_rx_pkt_len' even jumbo frame is not set.

Perhaps better to keep it, and make a separate patch later to remove this check, 
after PMDs fixed.

>> +    }
>> +
>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>> +                overhead_len;
>>       }
> 
> Above if block has exact same check, why not move it above block?

Can you still send a new version for above change please?
  
Andrew Boyer Jan. 14, 2021, 5:29 p.m. UTC | #3
> On Jan 14, 2021, at 12:13 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
> On 1/14/2021 4:36 PM, Ferruh Yigit wrote:
>> On 1/14/2021 9:45 AM, Steve Yang wrote:
>>> Ethdev is using default Ethernet overhead to decide if provided
>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>> and limits it to MAX if it is.
>>> 
>>> Since the application/driver used Ethernet overhead is different than
>>> the ethdev one, check result is wrong.
>>> 
>>> If the driver is using Ethernet overhead bigger than the default one,
>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>> less than the intended one, causing some packets to be dropped.
>>> 
>>> Like,
>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>> driver  -> MTU = 1518 - 22 = 1496
>>> Packets with size 1497-1500 are dropped although intention is to be able
>>> to send/receive them.
>>> 
>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>> instead of default one.
>>> 
>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>> 
>>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
>> <...>
>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>               goto rollback;
>>>           }
>>>       } else {
>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>               /* Use default value */
>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>> -                            RTE_ETHER_MAX_LEN;
>>> +                        RTE_ETHER_MTU + overhead_len;
>> What do you think removing the above check, the else block, completely?
>> Since the 'max_rx_pkt_len' should not be used when jumbo frame is not set.
> 
> As I tested removing this check is causing problem because some PMDs are using the 'max_rx_pkt_len' even jumbo frame is not set.
> 
> Perhaps better to keep it, and make a separate patch later to remove this check, after PMDs fixed.

Hello Ferruh -
Working on fixing our PMD here. Do you want PMDs to update the JUMBO_FRAME flag based on the mtu value in dev_set_mtu(), or do you want the application to be solely responsible for it?

Thanks,
Andrew

>>> +    }
>>> +
>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>> +                overhead_len;
>>>       }
>> Above if block has exact same check, why not move it above block?
> 
> Can you still send a new version for above change please?
  
Ferruh Yigit Jan. 14, 2021, 8:44 p.m. UTC | #4
On 1/14/2021 5:29 PM, Andrew Boyer wrote:
> 
> 
>> On Jan 14, 2021, at 12:13 PM, Ferruh Yigit <ferruh.yigit@intel.com 
>> <mailto:ferruh.yigit@intel.com>> wrote:
>>
>> On 1/14/2021 4:36 PM, Ferruh Yigit wrote:
>>> On 1/14/2021 9:45 AM, Steve Yang wrote:
>>>> Ethdev is using default Ethernet overhead to decide if provided
>>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>>> and limits it to MAX if it is.
>>>>
>>>> Since the application/driver used Ethernet overhead is different than
>>>> the ethdev one, check result is wrong.
>>>>
>>>> If the driver is using Ethernet overhead bigger than the default one,
>>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>>> less than the intended one, causing some packets to be dropped.
>>>>
>>>> Like,
>>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>>> driver  -> MTU = 1518 - 22 = 1496
>>>> Packets with size 1497-1500 are dropped although intention is to be able
>>>> to send/receive them.
>>>>
>>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>>> instead of default one.
>>>>
>>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>>
>>>> Signed-off-by: Steve Yang <stevex.yang@intel.com <mailto:stevex.yang@intel.com>>
>>> <...>
>>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>> goto rollback;
>>>> }
>>>> } else {
>>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>> /* Use default value */
>>>> dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>>> -                            RTE_ETHER_MAX_LEN;
>>>> +                        RTE_ETHER_MTU + overhead_len;
>>> What do you think removing the above check, the else block, completely?
>>> Since the 'max_rx_pkt_len' should not be used when jumbo frame is not set.
>>
>> As I tested removing this check is causing problem because some PMDs are using 
>> the 'max_rx_pkt_len' even jumbo frame is not set.
>>
>> Perhaps better to keep it, and make a separate patch later to remove this 
>> check, after PMDs fixed.
> 
> Hello Ferruh -
> Working on fixing our PMD here. Do you want PMDs to update the JUMBO_FRAME flag 
> based on the mtu value in dev_set_mtu(), or do you want the application to be 
> solely responsible for it?
> 

Hi Andrew,

Technically JUMBO_FRAME flag is an user config and application should set it. It 
is application's responsibility to check the capability and set the flag when 
necessary.

But, after above said, many PMDs set it based on provided MTU value, if the 
explicitly requested MTU is bigger than the RTE_ETHER_MTU, this means user 
implied the JUMBO_FRAME support, for this case PMDs set the flag implicitly 
instead of failing.

In another thread Andrew R. & Konstantin suggested to remove the JUMBO_FRAME 
flag, since it is redundant and causing this kind of confusion, instead driver 
can decide based on requested MTU value, and driver reported 'max_mtu' value can 
be used by application to detect the capability. We will probably do this 
change, but it can be done only in the ABI break release, v21.11.

For now, PMD can set the flag itself if requested MTU > RTE_ETHER_MTU and driver 
supports jumbo frames.

> Thanks,
> Andrew
> 
>>>> +    }
>>>> +
>>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>>> +                overhead_len;
>>>> }
>>> Above if block has exact same check, why not move it above block?
>>
>> Can you still send a new version for above change please?
>
  
Lijun Ou Jan. 15, 2021, 10:44 a.m. UTC | #5
Hi Steve
This is a very good job! But I have some question and suggestions.
Please check it.

在 2021/1/14 17:45, Steve Yang 写道:
> Ethdev is using default Ethernet overhead to decide if provided
> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
> and limits it to MAX if it is.
> 
> Since the application/driver used Ethernet overhead is different than
> the ethdev one, check result is wrong.
> 
> If the driver is using Ethernet overhead bigger than the default one,
> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
> correct Ethernet overhead is used to convert back, the resulting MTU is
> less than the intended one, causing some packets to be dropped.
> 
> Like,
> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
> driver  -> MTU = 1518 - 22 = 1496
> Packets with size 1497-1500 are dropped although intention is to be able
> to send/receive them.
> 
> The fix is to make ethdev use the correct Ethernet overhead for port,
> instead of default one.
> 
> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
> 
> Signed-off-by: Steve Yang <stevex.yang@intel.com>
> ---
>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>   1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 17ddacc78d..19ca4c4512 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>   	struct rte_eth_dev *dev;
>   	struct rte_eth_dev_info dev_info;
>   	struct rte_eth_conf orig_conf;
> +	uint16_t overhead_len;
>   	int diag;
>   	int ret;
> +	uint16_t old_mtu;
>   
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>   
> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>   		memcpy(&dev->data->dev_conf, dev_conf,
>   		       sizeof(dev->data->dev_conf));
>   
> +	/* Backup mtu for rollback */
> +	old_mtu = dev->data->mtu;
> +
>   	ret = rte_eth_dev_info_get(port_id, &dev_info);
>   	if (ret != 0)
>   		goto rollback;
>   
> +	/* Get the real Ethernet overhead length */
> +	if (dev_info.max_mtu != UINT16_MAX &&
> +	    dev_info.max_rx_pktlen > dev_info.max_mtu)
> +		overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
> +	else
> +		overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
> +
The Ethernet frame header length supported by each NIC may be different, 
which cause the difference of allowed packets size and drop you say.
The diference of Ethernet frame header length have an impact for the 
maximum Ethernet frame length, which is a boundary value
of enabling jumbo frame through the ' max_rx_pkt_len '.
However, we need to do the same thing you do above to get the 
overhead_len every time, which will
cause a lot of duplicate code in the framework and app. For examples, 
parsing and processing for '--max-pkt-len=xxx' parameter,
  and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying 
here dev_configure API.
It's a little redundant and troublesome.

Maybe, it is necessary for driver to directly report the supported 
maximum Ethernet frame length by rte_dev_info_get API.
As following:
struct rte_eth_dev_info {
	xxxx
	/**
	 * The maximum Ethernet frame length supported by each
	 * driver varies with the Ethernet header length.
	 */
	uint16_t eth_max_len;
	xxxx
}

int
rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
{
     xxx
	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
	dev_info->max_mtu = UINT16_MAX;
	dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
     xxx
}

And then:
xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info 
*info)
{
	xxx
	info->eth_max_len = xxx_ETHER_MAX _LEN;
	xxx
}
What do you think?
>   	/* If number of queues specified by application for both Rx and Tx is
>   	 * zero, use driver preferred values. This cannot be done individually
>   	 * as it is valid for either Tx or Rx (but not both) to be zero.
> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>   			goto rollback;
>   		}
>   	} else {
> -		if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
> -			dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
> +		uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
> +		if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
> +		    pktlen > RTE_ETHER_MTU + overhead_len)
>   			/* Use default value */
>   			dev->data->dev_conf.rxmode.max_rx_pkt_len =
> -							RTE_ETHER_MAX_LEN;
> +						RTE_ETHER_MTU + overhead_len;
> +	}
> +
> +	/* Scale the MTU size to adapt max_rx_pkt_len */
> +	if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
> +		dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
> +				overhead_len;
>   	}
Now that we update mtu here when jumbo frame enabled. It is necessary to 
check validity of max_rx_pkt_len.
As following:
if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
	if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
		dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) {
			xxxx
		}
} else {
		xxx
}

If it does not thing above, which will cause an unreasonable case.
Like,
dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
dev_conf->rxmode.max_rx_pkt_len = 1500
overhead_len = 26
dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 
26 = 1474

In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 
1500.
>   
>   	/*
> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>   	eth_dev_tx_queue_config(dev, 0);
>   rollback:
>   	memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
> +	if (old_mtu != dev->data->mtu)
> +		dev->data->mtu = old_mtu;
>   
>   	rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
>   	return ret;
>
  
Ferruh Yigit Jan. 18, 2021, 10:42 a.m. UTC | #6
On 1/15/2021 10:44 AM, oulijun wrote:
> Hi Steve
> This is a very good job! But I have some question and suggestions.
> Please check it.
> 
> 在 2021/1/14 17:45, Steve Yang 写道:
>> Ethdev is using default Ethernet overhead to decide if provided
>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>> and limits it to MAX if it is.
>>
>> Since the application/driver used Ethernet overhead is different than
>> the ethdev one, check result is wrong.
>>
>> If the driver is using Ethernet overhead bigger than the default one,
>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>> correct Ethernet overhead is used to convert back, the resulting MTU is
>> less than the intended one, causing some packets to be dropped.
>>
>> Like,
>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>> driver  -> MTU = 1518 - 22 = 1496
>> Packets with size 1497-1500 are dropped although intention is to be able
>> to send/receive them.
>>
>> The fix is to make ethdev use the correct Ethernet overhead for port,
>> instead of default one.
>>
>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>
>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
>> ---
>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>> index 17ddacc78d..19ca4c4512 100644
>> --- a/lib/librte_ethdev/rte_ethdev.c
>> +++ b/lib/librte_ethdev/rte_ethdev.c
>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>       struct rte_eth_dev *dev;
>>       struct rte_eth_dev_info dev_info;
>>       struct rte_eth_conf orig_conf;
>> +    uint16_t overhead_len;
>>       int diag;
>>       int ret;
>> +    uint16_t old_mtu;
>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>           memcpy(&dev->data->dev_conf, dev_conf,
>>                  sizeof(dev->data->dev_conf));
>> +    /* Backup mtu for rollback */
>> +    old_mtu = dev->data->mtu;
>> +
>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>       if (ret != 0)
>>           goto rollback;
>> +    /* Get the real Ethernet overhead length */
>> +    if (dev_info.max_mtu != UINT16_MAX &&
>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>> +    else
>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>> +
> The Ethernet frame header length supported by each NIC may be different, which 
> cause the difference of allowed packets size and drop you say.
> The diference of Ethernet frame header length have an impact for the maximum 
> Ethernet frame length, which is a boundary value
> of enabling jumbo frame through the ' max_rx_pkt_len '.
> However, we need to do the same thing you do above to get the overhead_len every 
> time, which will
> cause a lot of duplicate code in the framework and app. For examples, parsing 
> and processing for '--max-pkt-len=xxx' parameter,
>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here 
> dev_configure API.
> It's a little redundant and troublesome.
> 
> Maybe, it is necessary for driver to directly report the supported maximum 
> Ethernet frame length by rte_dev_info_get API.
> As following:
> struct rte_eth_dev_info {
>      xxxx
>      /**
>       * The maximum Ethernet frame length supported by each
>       * driver varies with the Ethernet header length.
>       */
>      uint16_t eth_max_len;
>      xxxx
> }
> 
> int
> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
> {
>      xxx
>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>      dev_info->max_mtu = UINT16_MAX;
>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>      xxx
> }
> 
> And then:
> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
> {
>      xxx
>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>      xxx
> }
> What do you think?

Hi Lijun,

The 'max_mtu' has been added in the past to solve exact same problem, perhaps it 
would be better to add something like 'eth_max_len' at that time.

But as Steve mentioned we are planning to switch to more MTU based 
configuration, which should remove the need of the field you mentioned.

>>       /* If number of queues specified by application for both Rx and Tx is
>>        * zero, use driver preferred values. This cannot be done individually
>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>               goto rollback;
>>           }
>>       } else {
>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>               /* Use default value */
>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>> -                            RTE_ETHER_MAX_LEN;
>> +                        RTE_ETHER_MTU + overhead_len;
>> +    }
>> +
>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>> +                overhead_len;
>>       }
> Now that we update mtu here when jumbo frame enabled. It is necessary to check 
> validity of max_rx_pkt_len.
> As following:
> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) {

It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
overhead_len" is a valid value or not, the API says:
"max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"

But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'

>              xxxx
>          }
> } else {
>          xxx
> }
> 
> If it does not thing above, which will cause an unreasonable case.
> Like,
> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
> dev_conf->rxmode.max_rx_pkt_len = 1500
> overhead_len = 26
> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = 1474
> 

This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', the 
requeted MTU is 1474,
if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as "1500 
+ overhead".

> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500.

Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to fix 
in last release but we reverted them because of unexpected side affects.

Current long term plan is,
- Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on MTU 
values. (instead one working on frame lenght and other on MTU)
- Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by 
requested MTU value check
- App decide the jumbo frame capability from the 'max_mtu' check
- App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'

>>       /*
>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>       eth_dev_tx_queue_config(dev, 0);
>>   rollback:
>>       memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
>> +    if (old_mtu != dev->data->mtu)
>> +        dev->data->mtu = old_mtu;
>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
>>       return ret;
>>
  
Lijun Ou Jan. 19, 2021, 8:46 a.m. UTC | #7
在 2021/1/18 18:42, Ferruh Yigit 写道:
> On 1/15/2021 10:44 AM, oulijun wrote:
>> Hi Steve
>> This is a very good job! But I have some question and suggestions.
>> Please check it.
>>
>> 在 2021/1/14 17:45, Steve Yang 写道:
>>> Ethdev is using default Ethernet overhead to decide if provided
>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>> and limits it to MAX if it is.
>>>
>>> Since the application/driver used Ethernet overhead is different than
>>> the ethdev one, check result is wrong.
>>>
>>> If the driver is using Ethernet overhead bigger than the default one,
>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>> less than the intended one, causing some packets to be dropped.
>>>
>>> Like,
>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>> driver  -> MTU = 1518 - 22 = 1496
>>> Packets with size 1497-1500 are dropped although intention is to be able
>>> to send/receive them.
>>>
>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>> instead of default one.
>>>
>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>
>>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
>>> ---
>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c 
>>> b/lib/librte_ethdev/rte_ethdev.c
>>> index 17ddacc78d..19ca4c4512 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>       struct rte_eth_dev *dev;
>>>       struct rte_eth_dev_info dev_info;
>>>       struct rte_eth_conf orig_conf;
>>> +    uint16_t overhead_len;
>>>       int diag;
>>>       int ret;
>>> +    uint16_t old_mtu;
>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>           memcpy(&dev->data->dev_conf, dev_conf,
>>>                  sizeof(dev->data->dev_conf));
>>> +    /* Backup mtu for rollback */
>>> +    old_mtu = dev->data->mtu;
>>> +
>>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>       if (ret != 0)
>>>           goto rollback;
>>> +    /* Get the real Ethernet overhead length */
>>> +    if (dev_info.max_mtu != UINT16_MAX &&
>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>> +    else
>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>> +
>> The Ethernet frame header length supported by each NIC may be 
>> different, which cause the difference of allowed packets size and drop 
>> you say.
>> The diference of Ethernet frame header length have an impact for the 
>> maximum Ethernet frame length, which is a boundary value
>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>> However, we need to do the same thing you do above to get the 
>> overhead_len every time, which will
>> cause a lot of duplicate code in the framework and app. For examples, 
>> parsing and processing for '--max-pkt-len=xxx' parameter,
>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying 
>> here dev_configure API.
>> It's a little redundant and troublesome.
>>
>> Maybe, it is necessary for driver to directly report the supported 
>> maximum Ethernet frame length by rte_dev_info_get API.
>> As following:
>> struct rte_eth_dev_info {
>>      xxxx
>>      /**
>>       * The maximum Ethernet frame length supported by each
>>       * driver varies with the Ethernet header length.
>>       */
>>      uint16_t eth_max_len;
>>      xxxx
>> }
>>
>> int
>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>> {
>>      xxx
>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>      dev_info->max_mtu = UINT16_MAX;
>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>>      xxx
>> }
>>
>> And then:
>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct 
>> rte_eth_dev_info *info)
>> {
>>      xxx
>>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>>      xxx
>> }
>> What do you think?
> 
> Hi Lijun,
> 
> The 'max_mtu' has been added in the past to solve exact same problem, 
> perhaps it would be better to add something like 'eth_max_len' at that 
> time.
> 
> But as Steve mentioned we are planning to switch to more MTU based 
> configuration, which should remove the need of the field you mentioned.
> 
>>>       /* If number of queues specified by application for both Rx and 
>>> Tx is
>>>        * zero, use driver preferred values. This cannot be done 
>>> individually
>>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>               goto rollback;
>>>           }
>>>       } else {
>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>               /* Use default value */
>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>> -                            RTE_ETHER_MAX_LEN;
>>> +                        RTE_ETHER_MTU + overhead_len;
>>> +    }
>>> +
>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>> +                overhead_len;
>>>       }
>> Now that we update mtu here when jumbo frame enabled. It is necessary 
>> to check validity of max_rx_pkt_len.
>> As following:
>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + 
>> overhead_len) {
> 
> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
> overhead_len" is a valid value or not, the API says:
> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"
> 
> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
> 
The MTU setting entry does need to be adjusted. I agree! But I think 
this place should be modified, and it's probably more reasonable.

Generally speaking, max_rx_pkt_len will be also updated with 'mtu + 
overhead_len' also when application sets mtu > 1500, which means 
JUMBO_FRAME is enabled.
Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 
1500 + overhead_len.
Besides, having following check for "rxmode.offloads & 
DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
@@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
nb_rx_q, uint16_t nb_tx_q,
               goto rollback;
           }
       } else {
         uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
         if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
             pktlen > RTE_ETHER_MTU + overhead_len)
               /* Use default value */
               dev->data->dev_conf.rxmode.max_rx_pkt_len =
                         RTE_ETHER_MTU + overhead_len;
}
One of check condition for the branch is "pktlen > RTE_ETHER_MTU + 
overhead_len ".

In view of the above two aspects, when application set 
DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be 
greater than " RTE_ETHER_MTU + overhead_len "

Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.

>>              xxxx
>>          }
>> } else {
>>          xxx
>> }
>>
>> If it does not thing above, which will cause an unreasonable case.
>> Like,
>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>> dev_conf->rxmode.max_rx_pkt_len = 1500
>> overhead_len = 26
>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 
>> - 26 = 1474
>>
> 
> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 
> 1500', the requeted MTU is 1474,
> if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' 
> as "1500 + overhead".
> 
>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu 
>> > 1500.
> 
> Overall the 'max_rx_pkt_len' setting is a little messy, it has been 
> tried to fix in last release but we reverted them because of unexpected 
> side affects.
> 
> Current long term plan is,
> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to 
> work on MTU values. (instead one working on frame lenght and other on MTU)
> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request 
> by requested MTU value check
> - App decide the jumbo frame capability from the 'max_mtu' check
> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
> 
>>>       /*
>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>       eth_dev_tx_queue_config(dev, 0);
>>>   rollback:
>>>       memcpy(&dev->data->dev_conf, &orig_conf, 
>>> sizeof(dev->data->dev_conf));
>>> +    if (old_mtu != dev->data->mtu)
>>> +        dev->data->mtu = old_mtu;
>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, 
>>> ret);
>>>       return ret;
>>>
> 
> .
>
  
Lijun Ou Jan. 23, 2021, 9:05 a.m. UTC | #8
在 2021/1/19 16:46, oulijun 写道:
> 
> 
> 在 2021/1/18 18:42, Ferruh Yigit 写道:
>> On 1/15/2021 10:44 AM, oulijun wrote:
>>> Hi Steve
>>> This is a very good job! But I have some question and suggestions.
>>> Please check it.
>>>
>>> 在 2021/1/14 17:45, Steve Yang 写道:
>>>> Ethdev is using default Ethernet overhead to decide if provided
>>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>>> and limits it to MAX if it is.
>>>>
>>>> Since the application/driver used Ethernet overhead is different than
>>>> the ethdev one, check result is wrong.
>>>>
>>>> If the driver is using Ethernet overhead bigger than the default one,
>>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>>> less than the intended one, causing some packets to be dropped.
>>>>
>>>> Like,
>>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>>> driver  -> MTU = 1518 - 22 = 1496
>>>> Packets with size 1497-1500 are dropped although intention is to be 
>>>> able
>>>> to send/receive them.
>>>>
>>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>>> instead of default one.
>>>>
>>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>>
>>>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
>>>> ---
>>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c 
>>>> b/lib/librte_ethdev/rte_ethdev.c
>>>> index 17ddacc78d..19ca4c4512 100644
>>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>       struct rte_eth_dev *dev;
>>>>       struct rte_eth_dev_info dev_info;
>>>>       struct rte_eth_conf orig_conf;
>>>> +    uint16_t overhead_len;
>>>>       int diag;
>>>>       int ret;
>>>> +    uint16_t old_mtu;
>>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>           memcpy(&dev->data->dev_conf, dev_conf,
>>>>                  sizeof(dev->data->dev_conf));
>>>> +    /* Backup mtu for rollback */
>>>> +    old_mtu = dev->data->mtu;
>>>> +
>>>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>       if (ret != 0)
>>>>           goto rollback;
>>>> +    /* Get the real Ethernet overhead length */
>>>> +    if (dev_info.max_mtu != UINT16_MAX &&
>>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>>> +    else
>>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>>> +
>>> The Ethernet frame header length supported by each NIC may be 
>>> different, which cause the difference of allowed packets size and 
>>> drop you say.
>>> The diference of Ethernet frame header length have an impact for the 
>>> maximum Ethernet frame length, which is a boundary value
>>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>>> However, we need to do the same thing you do above to get the 
>>> overhead_len every time, which will
>>> cause a lot of duplicate code in the framework and app. For examples, 
>>> parsing and processing for '--max-pkt-len=xxx' parameter,
>>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here 
>>> modifying here dev_configure API.
>>> It's a little redundant and troublesome.
>>>
>>> Maybe, it is necessary for driver to directly report the supported 
>>> maximum Ethernet frame length by rte_dev_info_get API.
>>> As following:
>>> struct rte_eth_dev_info {
>>>      xxxx
>>>      /**
>>>       * The maximum Ethernet frame length supported by each
>>>       * driver varies with the Ethernet header length.
>>>       */
>>>      uint16_t eth_max_len;
>>>      xxxx
>>> }
>>>
>>> int
>>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info 
>>> *dev_info)
>>> {
>>>      xxx
>>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>>      dev_info->max_mtu = UINT16_MAX;
>>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>>>      xxx
>>> }
>>>
>>> And then:
>>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct 
>>> rte_eth_dev_info *info)
>>> {
>>>      xxx
>>>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>>>      xxx
>>> }
>>> What do you think?
>>
>> Hi Lijun,
>>
>> The 'max_mtu' has been added in the past to solve exact same problem, 
>> perhaps it would be better to add something like 'eth_max_len' at that 
>> time.
>>
>> But as Steve mentioned we are planning to switch to more MTU based 
>> configuration, which should remove the need of the field you mentioned.
>>
>>>>       /* If number of queues specified by application for both Rx 
>>>> and Tx is
>>>>        * zero, use driver preferred values. This cannot be done 
>>>> individually
>>>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>               goto rollback;
>>>>           }
>>>>       } else {
>>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>>               /* Use default value */
>>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>>> -                            RTE_ETHER_MAX_LEN;
>>>> +                        RTE_ETHER_MTU + overhead_len;
>>>> +    }
>>>> +
>>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>>> +                overhead_len;
>>>>       }
>>> Now that we update mtu here when jumbo frame enabled. It is necessary 
>>> to check validity of max_rx_pkt_len.
>>> As following:
>>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + 
>>> overhead_len) {
>>
>> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
>> overhead_len" is a valid value or not, the API says:
>> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"
>>
>> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
>> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
>>
> The MTU setting entry does need to be adjusted. I agree! But I think 
> this place should be modified, and it's probably more reasonable.
> 
> Generally speaking, max_rx_pkt_len will be also updated with 'mtu + 
> overhead_len' also when application sets mtu > 1500, which means 
> JUMBO_FRAME is enabled.
> Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 
> 1500 + overhead_len.
> Besides, having following check for "rxmode.offloads & 
> DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
> nb_rx_q, uint16_t nb_tx_q,
>                goto rollback;
>            }
>        } else {
>          uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>          if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>              pktlen > RTE_ETHER_MTU + overhead_len)
>                /* Use default value */
>                dev->data->dev_conf.rxmode.max_rx_pkt_len =
>                          RTE_ETHER_MTU + overhead_len;
> }
> One of check condition for the branch is "pktlen > RTE_ETHER_MTU + 
> overhead_len ".
> 
> In view of the above two aspects, when application set 
> DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be 
> greater than " RTE_ETHER_MTU + overhead_len "
> 
> Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.
> 
>>>              xxxx
>>>          }
>>> } else {
>>>          xxx
>>> }
>>>
>>> If it does not thing above, which will cause an unreasonable case.
>>> Like,
>>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>>> dev_conf->rxmode.max_rx_pkt_len = 1500
>>> overhead_len = 26
>>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 
>>> 1500 - 26 = 1474
>>>
>>
>> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 
>> 1500', the requeted MTU is 1474,
>> if application wants an MTU as 1500, it should set the 
>> 'max_rx_pkt_len' as "1500 + overhead".
>>
>>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when 
>>> mtu > 1500.
Now that there are two interfaces for modifying MTU, namely, 
'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()'.
They modify MTU based on 'max_rx_pkt_len' and 'mtu' value, respectively. 
The 'max_rx_pkt_len' field has been used
  in xxx_dev_configure ops in many PMD drivers. In addition, if 
'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, the value of
'max_rx_pkt_len' is also verified by using max_rx_pktlen and 
RTE_ETHER_MTU + overhead_len in i40e and ice driver.

Currently, DEV_RX_OFFLOAD_JUMBO_FRAME has not been removed, so it is 
necessary and reasonable
  to check 'max_rx_pkt_len' in 'rte_eth_dev_configure()'.

Please check it again. Looking forward to your reply.
>>
>> Overall the 'max_rx_pkt_len' setting is a little messy, it has been 
>> tried to fix in last release but we reverted them because of 
>> unexpected side affects.
>>
>> Current long term plan is,
>> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to 
>> work on MTU values. (instead one working on frame lenght and other on 
>> MTU)
>> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame 
>> request by requested MTU value check
>> - App decide the jumbo frame capability from the 'max_mtu' check
>> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
>>
>>>>       /*
>>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, 
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>       eth_dev_tx_queue_config(dev, 0);
>>>>   rollback:
>>>>       memcpy(&dev->data->dev_conf, &orig_conf, 
>>>> sizeof(dev->data->dev_conf));
>>>> +    if (old_mtu != dev->data->mtu)
>>>> +        dev->data->mtu = old_mtu;
>>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, 
>>>> dev_conf, ret);
>>>>       return ret;
>>>>
>>
>> .
>>
> .
>
  
Ferruh Yigit Jan. 25, 2021, 12:22 p.m. UTC | #9
On 1/19/2021 8:46 AM, oulijun wrote:
> 
> 
> 在 2021/1/18 18:42, Ferruh Yigit 写道:
>> On 1/15/2021 10:44 AM, oulijun wrote:
>>> Hi Steve
>>> This is a very good job! But I have some question and suggestions.
>>> Please check it.
>>>
>>> 在 2021/1/14 17:45, Steve Yang 写道:
>>>> Ethdev is using default Ethernet overhead to decide if provided
>>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>>> and limits it to MAX if it is.
>>>>
>>>> Since the application/driver used Ethernet overhead is different than
>>>> the ethdev one, check result is wrong.
>>>>
>>>> If the driver is using Ethernet overhead bigger than the default one,
>>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>>> less than the intended one, causing some packets to be dropped.
>>>>
>>>> Like,
>>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>>> driver  -> MTU = 1518 - 22 = 1496
>>>> Packets with size 1497-1500 are dropped although intention is to be able
>>>> to send/receive them.
>>>>
>>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>>> instead of default one.
>>>>
>>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>>
>>>> Signed-off-by: Steve Yang <stevex.yang@intel.com>
>>>> ---
>>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>>>> index 17ddacc78d..19ca4c4512 100644
>>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>       struct rte_eth_dev *dev;
>>>>       struct rte_eth_dev_info dev_info;
>>>>       struct rte_eth_conf orig_conf;
>>>> +    uint16_t overhead_len;
>>>>       int diag;
>>>>       int ret;
>>>> +    uint16_t old_mtu;
>>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>           memcpy(&dev->data->dev_conf, dev_conf,
>>>>                  sizeof(dev->data->dev_conf));
>>>> +    /* Backup mtu for rollback */
>>>> +    old_mtu = dev->data->mtu;
>>>> +
>>>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>       if (ret != 0)
>>>>           goto rollback;
>>>> +    /* Get the real Ethernet overhead length */
>>>> +    if (dev_info.max_mtu != UINT16_MAX &&
>>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>>> +    else
>>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>>> +
>>> The Ethernet frame header length supported by each NIC may be different, 
>>> which cause the difference of allowed packets size and drop you say.
>>> The diference of Ethernet frame header length have an impact for the maximum 
>>> Ethernet frame length, which is a boundary value
>>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>>> However, we need to do the same thing you do above to get the overhead_len 
>>> every time, which will
>>> cause a lot of duplicate code in the framework and app. For examples, parsing 
>>> and processing for '--max-pkt-len=xxx' parameter,
>>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here 
>>> dev_configure API.
>>> It's a little redundant and troublesome.
>>>
>>> Maybe, it is necessary for driver to directly report the supported maximum 
>>> Ethernet frame length by rte_dev_info_get API.
>>> As following:
>>> struct rte_eth_dev_info {
>>>      xxxx
>>>      /**
>>>       * The maximum Ethernet frame length supported by each
>>>       * driver varies with the Ethernet header length.
>>>       */
>>>      uint16_t eth_max_len;
>>>      xxxx
>>> }
>>>
>>> int
>>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>>> {
>>>      xxx
>>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>>      dev_info->max_mtu = UINT16_MAX;
>>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>>>      xxx
>>> }
>>>
>>> And then:
>>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
>>> {
>>>      xxx
>>>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>>>      xxx
>>> }
>>> What do you think?
>>
>> Hi Lijun,
>>
>> The 'max_mtu' has been added in the past to solve exact same problem, perhaps 
>> it would be better to add something like 'eth_max_len' at that time.
>>
>> But as Steve mentioned we are planning to switch to more MTU based 
>> configuration, which should remove the need of the field you mentioned.
>>
>>>>       /* If number of queues specified by application for both Rx and Tx is
>>>>        * zero, use driver preferred values. This cannot be done individually
>>>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>               goto rollback;
>>>>           }
>>>>       } else {
>>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>>               /* Use default value */
>>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>>> -                            RTE_ETHER_MAX_LEN;
>>>> +                        RTE_ETHER_MTU + overhead_len;
>>>> +    }
>>>> +
>>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>>> +                overhead_len;
>>>>       }
>>> Now that we update mtu here when jumbo frame enabled. It is necessary to 
>>> check validity of max_rx_pkt_len.
>>> As following:
>>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) {
>>
>> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
>> overhead_len" is a valid value or not, the API says:
>> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"
>>
>> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
>> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
>>
> The MTU setting entry does need to be adjusted. I agree! But I think this place 
> should be modified, and it's probably more reasonable.
> 
> Generally speaking, max_rx_pkt_len will be also updated with 'mtu + 
> overhead_len' also when application sets mtu > 1500, which means JUMBO_FRAME is 
> enabled.
> Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 1500 + 
> overhead_len.
> Besides, having following check for "rxmode.offloads & 
> DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
> nb_rx_q, uint16_t nb_tx_q,
>                goto rollback;
>            }
>        } else {
>          uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>          if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>              pktlen > RTE_ETHER_MTU + overhead_len)
>                /* Use default value */
>                dev->data->dev_conf.rxmode.max_rx_pkt_len =
>                          RTE_ETHER_MTU + overhead_len;
> }
> One of check condition for the branch is "pktlen > RTE_ETHER_MTU + overhead_len ".
> 
> In view of the above two aspects, when application set 
> DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be greater than " 
> RTE_ETHER_MTU + overhead_len "
> 

Hi Lijun,

Most probably intention was like you said, but that is not implemented or 
documented that way.

The API doc only says 'max_rx_pkt_len' is valid only when 'JUMBO_FRAME' is set, 
but it doesn't say what is the valid range.

And when 'JUMBO_FRAME' is set, the check in the 'rte_eth_dev_configure()' is 
same from the beggining of the project, and it is between 'RTE_ETHER_MIN_LEN' to 
'dev_info.max_rx_pktlen', and changing it will be the behaviour change.

So I suggest lets not add this new config restriction, but it is application's 
responsiblity to set/unset the 'JUMBO_FRAME' flag based on the packet size, and 
we can implemet this in the testpmd.

> Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.
> 
>>>              xxxx
>>>          }
>>> } else {
>>>          xxx
>>> }
>>>
>>> If it does not thing above, which will cause an unreasonable case.
>>> Like,
>>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>>> dev_conf->rxmode.max_rx_pkt_len = 1500
>>> overhead_len = 26
>>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = 
>>> 1474
>>>
>>
>> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', 
>> the requeted MTU is 1474,
>> if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as 
>> "1500 + overhead".
>>
>>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500.
>>
>> Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to 
>> fix in last release but we reverted them because of unexpected side affects.
>>
>> Current long term plan is,
>> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on 
>> MTU values. (instead one working on frame lenght and other on MTU)
>> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by 
>> requested MTU value check
>> - App decide the jumbo frame capability from the 'max_mtu' check
>> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
>>
>>>>       /*
>>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>       eth_dev_tx_queue_config(dev, 0);
>>>>   rollback:
>>>>       memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
>>>> +    if (old_mtu != dev->data->mtu)
>>>> +        dev->data->mtu = old_mtu;
>>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
>>>>       return ret;
>>>>
>>
>> .
>>
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 17ddacc78d..19ca4c4512 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1292,8 +1292,10 @@  rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 	struct rte_eth_dev *dev;
 	struct rte_eth_dev_info dev_info;
 	struct rte_eth_conf orig_conf;
+	uint16_t overhead_len;
 	int diag;
 	int ret;
+	uint16_t old_mtu;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
 
@@ -1319,10 +1321,20 @@  rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 		memcpy(&dev->data->dev_conf, dev_conf,
 		       sizeof(dev->data->dev_conf));
 
+	/* Backup mtu for rollback */
+	old_mtu = dev->data->mtu;
+
 	ret = rte_eth_dev_info_get(port_id, &dev_info);
 	if (ret != 0)
 		goto rollback;
 
+	/* Get the real Ethernet overhead length */
+	if (dev_info.max_mtu != UINT16_MAX &&
+	    dev_info.max_rx_pktlen > dev_info.max_mtu)
+		overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
+	else
+		overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
+
 	/* If number of queues specified by application for both Rx and Tx is
 	 * zero, use driver preferred values. This cannot be done individually
 	 * as it is valid for either Tx or Rx (but not both) to be zero.
@@ -1410,11 +1422,18 @@  rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 			goto rollback;
 		}
 	} else {
-		if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
-			dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
+		uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
+		if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
+		    pktlen > RTE_ETHER_MTU + overhead_len)
 			/* Use default value */
 			dev->data->dev_conf.rxmode.max_rx_pkt_len =
-							RTE_ETHER_MAX_LEN;
+						RTE_ETHER_MTU + overhead_len;
+	}
+
+	/* Scale the MTU size to adapt max_rx_pkt_len */
+	if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
+		dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
+				overhead_len;
 	}
 
 	/*
@@ -1549,6 +1568,8 @@  rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 	eth_dev_tx_queue_config(dev, 0);
 rollback:
 	memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
+	if (old_mtu != dev->data->mtu)
+		dev->data->mtu = old_mtu;
 
 	rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
 	return ret;