[v4,3/5] net/ice: fix max mtu size packets with vlan tag cannot be received by default

Message ID 20200928065541.7520-4-stevex.yang@intel.com (mailing list archive)
State Changes Requested, archived
Delegated to: Qi Zhang
Headers
Series fix default max mtu size when device configured |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Steve Yang Sept. 28, 2020, 6:55 a.m. UTC
  testpmd will initialize default max packet length to 1518 which doesn't
include vlan tag size in ether overheader. Once, send the max mtu length
packet with vlan tag, the max packet length will exceed 1518 that will
cause packets dropped directly from NIC hw side.

ice can support dual vlan tags that need more 8 bytes for max packet size,
so, configures the correct max packet size in dev_config ops.

Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")

Signed-off-by: SteveX Yang <stevex.yang@intel.com>
---
 drivers/net/ice/ice_ethdev.c | 11 +++++++++++
 1 file changed, 11 insertions(+)
  

Comments

Qi Zhang Sept. 29, 2020, 11:59 a.m. UTC | #1
> -----Original Message-----
> From: Yang, SteveX <stevex.yang@intel.com>
> Sent: Monday, September 28, 2020 2:56 PM
> To: dev@dpdk.org
> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang, SteveX
> <stevex.yang@intel.com>
> Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag cannot
> be received by default
> 
> testpmd will initialize default max packet length to 1518 which doesn't include
> vlan tag size in ether overheader. Once, send the max mtu length packet with
> vlan tag, the max packet length will exceed 1518 that will cause packets
> dropped directly from NIC hw side.
> 
> ice can support dual vlan tags that need more 8 bytes for max packet size, so,
> configures the correct max packet size in dev_config ops.
> 
> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> 
> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> ---
>  drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index
> cfd357b05..6b7098444 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
>  	struct ice_adapter *ad =
>  		ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>  	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>  	int ret;
> 
>  	/* Initialize to TRUE. If any of Rx queues doesn't meet the @@ -3157,6
> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>  	if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
>  		dev->data->dev_conf.rxmode.offloads |=
> DEV_RX_OFFLOAD_RSS_HASH;
> 
> +	/**
> +	 * Considering QinQ packet, max frame size should be equal or
> +	 * larger than total size of MTU and Ether overhead.
> +	 */

> +	if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {


Why we need this check?
Can we just call ice_mtu_set directly 
And please remove above comment, since ether overhead is already considered in ice_mtu_set.


> +		ret = ice_mtu_set(dev, dev->data->mtu);
> +		if (ret != 0)
> +			return ret;
> +	}
> +
>  	ret = ice_init_rss(pf);
>  	if (ret) {
>  		PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> --
> 2.17.1
  
Ananyev, Konstantin Sept. 29, 2020, 11:01 p.m. UTC | #2
> 
> > -----Original Message-----
> > From: Yang, SteveX <stevex.yang@intel.com>
> > Sent: Monday, September 28, 2020 2:56 PM
> > To: dev@dpdk.org
> > Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
> > Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
> > Jingjing <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
> > Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang, SteveX
> > <stevex.yang@intel.com>
> > Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag cannot
> > be received by default
> >
> > testpmd will initialize default max packet length to 1518 which doesn't include
> > vlan tag size in ether overheader. Once, send the max mtu length packet with
> > vlan tag, the max packet length will exceed 1518 that will cause packets
> > dropped directly from NIC hw side.
> >
> > ice can support dual vlan tags that need more 8 bytes for max packet size, so,
> > configures the correct max packet size in dev_config ops.
> >
> > Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> >
> > Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> > ---
> >  drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index
> > cfd357b05..6b7098444 100644
> > --- a/drivers/net/ice/ice_ethdev.c
> > +++ b/drivers/net/ice/ice_ethdev.c
> > @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
> >  struct ice_adapter *ad =
> >  ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> >  struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> >  int ret;
> >
> >  /* Initialize to TRUE. If any of Rx queues doesn't meet the @@ -3157,6
> > +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> >  if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
> >  dev->data->dev_conf.rxmode.offloads |=
> > DEV_RX_OFFLOAD_RSS_HASH;
> >
> > +/**
> > + * Considering QinQ packet, max frame size should be equal or
> > + * larger than total size of MTU and Ether overhead.
> > + */
> 
> > +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> 
> 
> Why we need this check?
> Can we just call ice_mtu_set directly

I think that without that check we can silently overwrite
provided by user dev_conf.rxmode.max_rx_pkt_len value. 

> And please remove above comment, since ether overhead is already considered in ice_mtu_set.
> 
> 
> > +ret = ice_mtu_set(dev, dev->data->mtu);
> > +if (ret != 0)
> > +return ret;
> > +}
> > +
> >  ret = ice_init_rss(pf);
> >  if (ret) {
> >  PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> > --
> > 2.17.1
>
  
Qi Zhang Sept. 30, 2020, 12:34 a.m. UTC | #3
> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Wednesday, September 30, 2020 7:02 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
> <stevex.yang@intel.com>; dev@dpdk.org
> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag
> cannot be received by default
> 
> >
> > > -----Original Message-----
> > > From: Yang, SteveX <stevex.yang@intel.com>
> > > Sent: Monday, September 28, 2020 2:56 PM
> > > To: dev@dpdk.org
> > > Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
> > > Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > > <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> > > Beilei <beilei.xing@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>; Yang, SteveX <stevex.yang@intel.com>
> > > Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan
> > > tag cannot be received by default
> > >
> > > testpmd will initialize default max packet length to 1518 which
> > > doesn't include vlan tag size in ether overheader. Once, send the
> > > max mtu length packet with vlan tag, the max packet length will
> > > exceed 1518 that will cause packets dropped directly from NIC hw side.
> > >
> > > ice can support dual vlan tags that need more 8 bytes for max packet
> > > size, so, configures the correct max packet size in dev_config ops.
> > >
> > > Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> > >
> > > Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> > > ---
> > >  drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/drivers/net/ice/ice_ethdev.c
> > > b/drivers/net/ice/ice_ethdev.c index
> > > cfd357b05..6b7098444 100644
> > > --- a/drivers/net/ice/ice_ethdev.c
> > > +++ b/drivers/net/ice/ice_ethdev.c
> > > @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
> > > struct ice_adapter *ad =
> > > ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> > >  struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > > +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> > >  int ret;
> > >
> > >  /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> > > -3157,6
> > > +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> > >  if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
> > > dev->data->dev_conf.rxmode.offloads |= DEV_RX_OFFLOAD_RSS_HASH;
> > >
> > > +/**
> > > + * Considering QinQ packet, max frame size should be equal or
> > > + * larger than total size of MTU and Ether overhead.
> > > + */
> >
> > > +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> >
> >
> > Why we need this check?
> > Can we just call ice_mtu_set directly
> 
> I think that without that check we can silently overwrite provided by user
> dev_conf.rxmode.max_rx_pkt_len value.

OK, I see

But still have one question
dev->data->mtu is initialized to 1518 as default , but if application set dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
does that mean we will still will set mtu to 1518, is this expected?

Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len) here?



> 
> > And please remove above comment, since ether overhead is already
> considered in ice_mtu_set.
> >
> >
> > > +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return ret; }
> > > +
> > >  ret = ice_init_rss(pf);
> > >  if (ret) {
> > >  PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> > > --
> > > 2.17.1
> >
>
  
Qi Zhang Sept. 30, 2020, 2:32 a.m. UTC | #4
> -----Original Message-----
> From: Yang, SteveX <stevex.yang@intel.com>
> Sent: Wednesday, September 30, 2020 9:32 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag
> cannot be received by default
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Wednesday, September 30, 2020 8:35 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang, SteveX
> > <stevex.yang@intel.com>; dev@dpdk.org
> > Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
> > Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > vlan tag cannot be received by default
> >
> >
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Sent: Wednesday, September 30, 2020 7:02 AM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
> > > <stevex.yang@intel.com>; dev@dpdk.org
> > > Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
> > > Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > > Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > > vlan tag cannot be received by default
> > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yang, SteveX <stevex.yang@intel.com>
> > > > > Sent: Monday, September 28, 2020 2:56 PM
> > > > > To: dev@dpdk.org
> > > > > Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
> > > > > <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> > > > > Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> > > > > <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
> > > > > Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang, SteveX
> > > > > <stevex.yang@intel.com>
> > > > > Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > > > > vlan tag cannot be received by default
> > > > >
> > > > > testpmd will initialize default max packet length to 1518 which
> > > > > doesn't include vlan tag size in ether overheader. Once, send
> > > > > the max mtu length packet with vlan tag, the max packet length
> > > > > will exceed 1518 that will cause packets dropped directly from NIC hw
> side.
> > > > >
> > > > > ice can support dual vlan tags that need more 8 bytes for max
> > > > > packet size, so, configures the correct max packet size in
> > > > > dev_config
> > ops.
> > > > >
> > > > > Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> > > > >
> > > > > Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> > > > > ---
> > > > >  drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> > > > >  1 file changed, 11 insertions(+)
> > > > >
> > > > > diff --git a/drivers/net/ice/ice_ethdev.c
> > > > > b/drivers/net/ice/ice_ethdev.c index
> > > > > cfd357b05..6b7098444 100644
> > > > > --- a/drivers/net/ice/ice_ethdev.c
> > > > > +++ b/drivers/net/ice/ice_ethdev.c
> > > > > @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
> > > > > struct ice_adapter *ad =
> > > > > ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> > > > >  struct ice_pf *pf =
> > > > > ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > > > > +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> > > > >  int ret;
> > > > >
> > > > >  /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> > > > > -3157,6
> > > > > +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> > > > >  if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
> > > > > dev->data->dev_conf.rxmode.offloads |=
> > DEV_RX_OFFLOAD_RSS_HASH;
> > > > >
> > > > > +/**
> > > > > + * Considering QinQ packet, max frame size should be equal or
> > > > > + * larger than total size of MTU and Ether overhead.
> > > > > + */
> > > >
> > > > > +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> > > >
> > > >
> > > > Why we need this check?
> > > > Can we just call ice_mtu_set directly
> > >
> > > I think that without that check we can silently overwrite provided
> > > by user dev_conf.rxmode.max_rx_pkt_len value.
> >
> > OK, I see
> >
> > But still have one question
> > dev->data->mtu is initialized to 1518 as default , but if application
> > dev->data->set
> > dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> > does that mean we will still will set mtu to 1518, is this expected?
> >
> 
> max_rx_pkt_len should be larger than mtu at least, so we should raise the
> max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).

Ok, this describe the problem more general and better to replace exist code comment and commit log for easy understanding.
Please send a new version for reword


> Generally, the mtu value can be adjustable from user (e.g.: ip link set ens801f0
> mtu 1400), hence, we just adjust the max_rx_pkt_len to satisfy mtu
> requirement.
> 
> > Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> > here?
> ice_mtu_set(dev, mtu) will append ether overhead to
> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> parameter, or not the max_rx_pkt_len.
> 
> >
> >
> > >
> > > > And please remove above comment, since ether overhead is already
> > > considered in ice_mtu_set.
> Ether overhead is already considered in ice_mtu_set, but it also should be
> considered as the adjustment condition that if ice_mtu_set need be invoked.
> So, it perhaps should remain this comment before this if() condition.
> 
> > > >
> > > >
> > > > > +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> > > > > +ret; }
> > > > > +
> > > > >  ret = ice_init_rss(pf);
> > > > >  if (ret) {
> > > > >  PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> > > > > --
> > > > > 2.17.1
> > > >
> > >
> >
>
  
Ferruh Yigit Oct. 14, 2020, 3:38 p.m. UTC | #5
On 9/30/2020 3:32 AM, Zhang, Qi Z wrote:
> 
> 
>> -----Original Message-----
>> From: Yang, SteveX <stevex.yang@intel.com>
>> Sent: Wednesday, September 30, 2020 9:32 AM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
>> Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
>> Beilei <beilei.xing@intel.com>
>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag
>> cannot be received by default
>>
>>
>>
>>> -----Original Message-----
>>> From: Zhang, Qi Z <qi.z.zhang@intel.com>
>>> Sent: Wednesday, September 30, 2020 8:35 AM
>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang, SteveX
>>> <stevex.yang@intel.com>; dev@dpdk.org
>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>> vlan tag cannot be received by default
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>> Sent: Wednesday, September 30, 2020 7:02 AM
>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>> vlan tag cannot be received by default
>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>>>> Sent: Monday, September 28, 2020 2:56 PM
>>>>>> To: dev@dpdk.org
>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
>>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
>>>>>> Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
>>>>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang, SteveX
>>>>>> <stevex.yang@intel.com>
>>>>>> Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>> vlan tag cannot be received by default
>>>>>>
>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>> doesn't include vlan tag size in ether overheader. Once, send
>>>>>> the max mtu length packet with vlan tag, the max packet length
>>>>>> will exceed 1518 that will cause packets dropped directly from NIC hw
>> side.
>>>>>>
>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>> packet size, so, configures the correct max packet size in
>>>>>> dev_config
>>> ops.
>>>>>>
>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>
>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>> ---
>>>>>>   drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>   1 file changed, 11 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>> cfd357b05..6b7098444 100644
>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>> struct ice_adapter *ad =
>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>   struct ice_pf *pf =
>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>   int ret;
>>>>>>
>>>>>>   /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>> -3157,6
>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>   if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>
>>>>>> +/**
>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>> + */
>>>>>
>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>
>>>>>
>>>>> Why we need this check?
>>>>> Can we just call ice_mtu_set directly
>>>>
>>>> I think that without that check we can silently overwrite provided
>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>
>>> OK, I see
>>>
>>> But still have one question
>>> dev->data->mtu is initialized to 1518 as default , but if application
>>> dev->data->set
>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>> does that mean we will still will set mtu to 1518, is this expected?
>>>
>>
>> max_rx_pkt_len should be larger than mtu at least, so we should raise the
>> max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> 
> Ok, this describe the problem more general and better to replace exist code comment and commit log for easy understanding.
> Please send a new version for reword
> 

I didn't really get this set.

Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than 
this size is dropped.
Isn't this what should be, why we are trying to overwrite user configuration in 
PMD to prevent this?

During eth_dev allocation, mtu set to default '1500', by ethdev layer.
And testpmd sets 'max_rx_pkt_len' by default to '1518'.
I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000' 
and mean it? PMD will not honor the user config.


Why not simply increase the default 'max_rx_pkt_len' in testpmd?

And I guess even better what we need is to tell to the application what the 
frame overhead PMD accepts.
So the application can set proper 'max_rx_pkt_len' value per port for a 
given/requested MTU value.
@Ian, cc'ed, was complaining almost same thing years ago, these PMD overhead 
macros and 'max_mtu'/'min_mtu' added because of that, perhaps he has a solution now?


And why this same thing can't happen to other PMDs? If this is a problem for all 
PMDs, we should solve in other level, not for only some PMDs.

> 
>> Generally, the mtu value can be adjustable from user (e.g.: ip link set ens801f0
>> mtu 1400), hence, we just adjust the max_rx_pkt_len to satisfy mtu
>> requirement.
>>
>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>> here?
>> ice_mtu_set(dev, mtu) will append ether overhead to
>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>> parameter, or not the max_rx_pkt_len.
>>
>>>
>>>
>>>>
>>>>> And please remove above comment, since ether overhead is already
>>>> considered in ice_mtu_set.
>> Ether overhead is already considered in ice_mtu_set, but it also should be
>> considered as the adjustment condition that if ice_mtu_set need be invoked.
>> So, it perhaps should remain this comment before this if() condition.
>>
>>>>>
>>>>>
>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>> +ret; }
>>>>>> +
>>>>>>   ret = ice_init_rss(pf);
>>>>>>   if (ret) {
>>>>>>   PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>> --
>>>>>> 2.17.1
>>>>>
>>>>
>>>
>>
>
  
Ananyev, Konstantin Oct. 19, 2020, 10:49 a.m. UTC | #6
> > -----Original Message-----
> > From: Ferruh Yigit <ferruh.yigit@intel.com>
> > Sent: Wednesday, October 14, 2020 11:38 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
> > <stevex.yang@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; dev@dpdk.org
> > Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
> > Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > Xing, Beilei <beilei.xing@intel.com>; Stokes, Ian <ian.stokes@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets
> > with vlan tag cannot be received by default
> >
> > On 9/30/2020 3:32 AM, Zhang, Qi Z wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Yang, SteveX <stevex.yang@intel.com>
> > >> Sent: Wednesday, September 30, 2020 9:32 AM
> > >> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ananyev, Konstantin
> > >> <konstantin.ananyev@intel.com>; dev@dpdk.org
> > >> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
> > >> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
> > >> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > >> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > >> vlan tag cannot be received by default
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > >>> Sent: Wednesday, September 30, 2020 8:35 AM
> > >>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
> > SteveX
> > >>> <stevex.yang@intel.com>; dev@dpdk.org
> > >>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
> > >>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
> > >>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > >>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > >>> vlan tag cannot be received by default
> > >>>
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > >>>> Sent: Wednesday, September 30, 2020 7:02 AM
> > >>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
> > >>>> <stevex.yang@intel.com>; dev@dpdk.org
> > >>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
> > >>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
> > >>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > >>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > >>>> vlan tag cannot be received by default
> > >>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Yang, SteveX <stevex.yang@intel.com>
> > >>>>>> Sent: Monday, September 28, 2020 2:56 PM
> > >>>>>> To: dev@dpdk.org
> > >>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
> > >>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> > Zhang,
> > >>>>>> Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> > >>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
> > >>>>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
> > SteveX
> > >>>>>> <stevex.yang@intel.com>
> > >>>>>> Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with
> > >>>>>> vlan tag cannot be received by default
> > >>>>>>
> > >>>>>> testpmd will initialize default max packet length to 1518 which
> > >>>>>> doesn't include vlan tag size in ether overheader. Once, send the
> > >>>>>> max mtu length packet with vlan tag, the max packet length will
> > >>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
> > >> side.
> > >>>>>>
> > >>>>>> ice can support dual vlan tags that need more 8 bytes for max
> > >>>>>> packet size, so, configures the correct max packet size in
> > >>>>>> dev_config
> > >>> ops.
> > >>>>>>
> > >>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> > >>>>>>
> > >>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> > >>>>>> ---
> > >>>>>>   drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> > >>>>>>   1 file changed, 11 insertions(+)
> > >>>>>>
> > >>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
> > >>>>>> b/drivers/net/ice/ice_ethdev.c index
> > >>>>>> cfd357b05..6b7098444 100644
> > >>>>>> --- a/drivers/net/ice/ice_ethdev.c
> > >>>>>> +++ b/drivers/net/ice/ice_ethdev.c
> > >>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
> > *dev)
> > >>>>>> struct ice_adapter *ad =
> > >>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> > >>>>>>   struct ice_pf *pf =
> > >>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > >>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> > >>>>>>   int ret;
> > >>>>>>
> > >>>>>>   /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> > >>>>>> -3157,6
> > >>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> > >>>>>>   if (dev->data->dev_conf.rxmode.mq_mode &
> > ETH_MQ_RX_RSS_FLAG)
> > >>>>>> dev->data->dev_conf.rxmode.offloads |=
> > >>> DEV_RX_OFFLOAD_RSS_HASH;
> > >>>>>>
> > >>>>>> +/**
> > >>>>>> + * Considering QinQ packet, max frame size should be equal or
> > >>>>>> + * larger than total size of MTU and Ether overhead.
> > >>>>>> + */
> > >>>>>
> > >>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> > >>>>>
> > >>>>>
> > >>>>> Why we need this check?
> > >>>>> Can we just call ice_mtu_set directly
> > >>>>
> > >>>> I think that without that check we can silently overwrite provided
> > >>>> by user dev_conf.rxmode.max_rx_pkt_len value.
> > >>>
> > >>> OK, I see
> > >>>
> > >>> But still have one question
> > >>> dev->data->mtu is initialized to 1518 as default , but if
> > >>> dev->data->application set
> > >>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> > >>> does that mean we will still will set mtu to 1518, is this expected?
> > >>>
> > >>
> > >> max_rx_pkt_len should be larger than mtu at least, so we should raise
> > >> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> > >
> > > Ok, this describe the problem more general and better to replace exist
> > code comment and commit log for easy understanding.
> > > Please send a new version for reword
> > >
> >
> > I didn't really get this set.
> >
> > Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
> > this size is dropped.
> 
> Sure, it is normal case for dropping oversize data.
> 
> > Isn't this what should be, why we are trying to overwrite user configuration
> > in PMD to prevent this?
> >
> 
> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> This fix will make a decision when confliction occurred.
> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> 
> > During eth_dev allocation, mtu set to default '1500', by ethdev layer.
> > And testpmd sets 'max_rx_pkt_len' by default to '1518'.
> > I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
> > and mean it? PMD will not honor the user config.
> 
> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> 
> >
> > Why not simply increase the default 'max_rx_pkt_len' in testpmd?
> >
> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> 
> > And I guess even better what we need is to tell to the application what the
> > frame overhead PMD accepts.
> > So the application can set proper 'max_rx_pkt_len' value per port for a
> > given/requested MTU value.
> > @Ian, cc'ed, was complaining almost same thing years ago, these PMD
> > overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
> > he has a solution now?

From my perspective the main problem here:
We have 2 different variables for nearly the same thing:
rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
and 2 different API to update them: dev_mtu_set() and dev_configure().
And inside majority of Intel PMDs we don't keep these 2 variables in sync:
- mtu_set() will update both variables.
- dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.

This patch fixes this inconsistency, which I think is a good thing.
Though yes, it introduces change in behaviour.

Let say the code:
rte_eth_dev_set_mtu(port, 1500);
dev_conf.max_rx_pkt_len = 1000;
rte_eth_dev_configure(port, 1, 1, &dev_conf);

Before the patch will result:
mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me

After the patch:
mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.

If you think we need to preserve current behaviour,
then I suppose the easiest thing would be to change dev_config() code
to update mtu value based on max_rx_pkt_len.
I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
So the code snippet above will result:
mtu=982,max_rx_pkt_len=1000; 

 Konstantin
























> 
> >
> > And why this same thing can't happen to other PMDs? If this is a problem for
> > all PMDs, we should solve in other level, not for only some PMDs.
> >
> No, all PMDs exist the same issue, another proposal:
>  -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>  - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> Is it feasible?
> 
> > >
> > >> Generally, the mtu value can be adjustable from user (e.g.: ip link
> > >> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
> > >> satisfy mtu requirement.
> > >>
> > >>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> > >>> here?
> > >> ice_mtu_set(dev, mtu) will append ether overhead to
> > >> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> > >> parameter, or not the max_rx_pkt_len.
> > >>
> > >>>
> > >>>
> > >>>>
> > >>>>> And please remove above comment, since ether overhead is already
> > >>>> considered in ice_mtu_set.
> > >> Ether overhead is already considered in ice_mtu_set, but it also
> > >> should be considered as the adjustment condition that if ice_mtu_set
> > need be invoked.
> > >> So, it perhaps should remain this comment before this if() condition.
> > >>
> > >>>>>
> > >>>>>
> > >>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> > >>>>>> +ret; }
> > >>>>>> +
> > >>>>>>   ret = ice_init_rss(pf);
> > >>>>>>   if (ret) {
> > >>>>>>   PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> > >>>>>> --
> > >>>>>> 2.17.1
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
  
Ferruh Yigit Oct. 19, 2020, 1:07 p.m. UTC | #7
On 10/19/2020 11:49 AM, Ananyev, Konstantin wrote:
> 
>>> -----Original Message-----
>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>> Sent: Wednesday, October 14, 2020 11:38 PM
>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>>> <stevex.yang@intel.com>; Ananyev, Konstantin
>>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
>>> Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
>>> Xing, Beilei <beilei.xing@intel.com>; Stokes, Ian <ian.stokes@intel.com>
>>> Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets
>>> with vlan tag cannot be received by default
>>>
>>> On 9/30/2020 3:32 AM, Zhang, Qi Z wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>>> Sent: Wednesday, September 30, 2020 9:32 AM
>>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ananyev, Konstantin
>>>>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>> vlan tag cannot be received by default
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Zhang, Qi Z <qi.z.zhang@intel.com>
>>>>>> Sent: Wednesday, September 30, 2020 8:35 AM
>>>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
>>> SteveX
>>>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>> vlan tag cannot be received by default
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>>>>> Sent: Wednesday, September 30, 2020 7:02 AM
>>>>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>>>>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>>> vlan tag cannot be received by default
>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>>>>>>> Sent: Monday, September 28, 2020 2:56 PM
>>>>>>>>> To: dev@dpdk.org
>>>>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
>>>>>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
>>> Zhang,
>>>>>>>>> Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
>>>>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
>>>>>>>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
>>> SteveX
>>>>>>>>> <stevex.yang@intel.com>
>>>>>>>>> Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>>>>> vlan tag cannot be received by default
>>>>>>>>>
>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>> side.
>>>>>>>>>
>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>> dev_config
>>>>>> ops.
>>>>>>>>>
>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>
>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>> ---
>>>>>>>>>    drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>    1 file changed, 11 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>> *dev)
>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>    struct ice_pf *pf =
>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>    int ret;
>>>>>>>>>
>>>>>>>>>    /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>> -3157,6
>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>    if (dev->data->dev_conf.rxmode.mq_mode &
>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>
>>>>>>>>> +/**
>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>> + */
>>>>>>>>
>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>
>>>>>>>>
>>>>>>>> Why we need this check?
>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>
>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>
>>>>>> OK, I see
>>>>>>
>>>>>> But still have one question
>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>> dev->data->application set
>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>
>>>>>
>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>
>>>> Ok, this describe the problem more general and better to replace exist
>>> code comment and commit log for easy understanding.
>>>> Please send a new version for reword
>>>>
>>>
>>> I didn't really get this set.
>>>
>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>> this size is dropped.
>>
>> Sure, it is normal case for dropping oversize data.
>>
>>> Isn't this what should be, why we are trying to overwrite user configuration
>>> in PMD to prevent this?
>>>
>>
>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>> This fix will make a decision when confliction occurred.
>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>
>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>> and mean it? PMD will not honor the user config.
>>
>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>
>>>
>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>
>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>
>>> And I guess even better what we need is to tell to the application what the
>>> frame overhead PMD accepts.
>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>> given/requested MTU value.
>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>> he has a solution now?
> 
>  From my perspective the main problem here:
> We have 2 different variables for nearly the same thing:
> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
> and 2 different API to update them: dev_mtu_set() and dev_configure().

According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
Although not sure that is practically what is done for all drivers.

> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
> - mtu_set() will update both variables.
> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
> 
> This patch fixes this inconsistency, which I think is a good thing.
> Though yes, it introduces change in behaviour.
> 
> Let say the code:
> rte_eth_dev_set_mtu(port, 1500);
> dev_conf.max_rx_pkt_len = 1000;
> rte_eth_dev_configure(port, 1, 1, &dev_conf);
> 

'rte_eth_dev_configure()' is one of the first APIs called, it is called before 
'rte_eth_dev_set_mtu().

When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by 
ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.

And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len' 
are updated (mostly).


> Before the patch will result:
> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
> 
> After the patch:
> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
> 
> If you think we need to preserve current behaviour,
> then I suppose the easiest thing would be to change dev_config() code
> to update mtu value based on max_rx_pkt_len.
> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
> So the code snippet above will result:
> mtu=982,max_rx_pkt_len=1000;
> 

The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just 
drop it?

By default device will be up with default MTU (1500), later 
'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.

Will this work?



And for short term, for above Intel PMDs, there must be a place this 
'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that 
function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set, 
otherwise use the 'MTU' value.

Without 'start()' updated the current logic won't work after stop & start anyway.



>   Konstantin
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>>
>>>
>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>
>> No, all PMDs exist the same issue, another proposal:
>>   -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>   - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>> Is it feasible?
>>
>>>>
>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>> satisfy mtu requirement.
>>>>>
>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>> here?
>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>> parameter, or not the max_rx_pkt_len.
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>> considered in ice_mtu_set.
>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>> need be invoked.
>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>> +ret; }
>>>>>>>>> +
>>>>>>>>>    ret = ice_init_rss(pf);
>>>>>>>>>    if (ret) {
>>>>>>>>>    PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>
  
Ananyev, Konstantin Oct. 19, 2020, 2:07 p.m. UTC | #8
> >>>>>>>>>
> >>>>>>>>> testpmd will initialize default max packet length to 1518 which
> >>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
> >>>>>>>>> max mtu length packet with vlan tag, the max packet length will
> >>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
> >>>>> side.
> >>>>>>>>>
> >>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
> >>>>>>>>> packet size, so, configures the correct max packet size in
> >>>>>>>>> dev_config
> >>>>>> ops.
> >>>>>>>>>
> >>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> >>>>>>>>> ---
> >>>>>>>>>    drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> >>>>>>>>>    1 file changed, 11 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
> >>>>>>>>> cfd357b05..6b7098444 100644
> >>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
> >>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
> >>> *dev)
> >>>>>>>>> struct ice_adapter *ad =
> >>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> >>>>>>>>>    struct ice_pf *pf =
> >>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> >>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> >>>>>>>>>    int ret;
> >>>>>>>>>
> >>>>>>>>>    /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> >>>>>>>>> -3157,6
> >>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> >>>>>>>>>    if (dev->data->dev_conf.rxmode.mq_mode &
> >>> ETH_MQ_RX_RSS_FLAG)
> >>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
> >>>>>> DEV_RX_OFFLOAD_RSS_HASH;
> >>>>>>>>>
> >>>>>>>>> +/**
> >>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
> >>>>>>>>> + * larger than total size of MTU and Ether overhead.
> >>>>>>>>> + */
> >>>>>>>>
> >>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Why we need this check?
> >>>>>>>> Can we just call ice_mtu_set directly
> >>>>>>>
> >>>>>>> I think that without that check we can silently overwrite provided
> >>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
> >>>>>>
> >>>>>> OK, I see
> >>>>>>
> >>>>>> But still have one question
> >>>>>> dev->data->mtu is initialized to 1518 as default , but if
> >>>>>> dev->data->application set
> >>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> >>>>>> does that mean we will still will set mtu to 1518, is this expected?
> >>>>>>
> >>>>>
> >>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
> >>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> >>>>
> >>>> Ok, this describe the problem more general and better to replace exist
> >>> code comment and commit log for easy understanding.
> >>>> Please send a new version for reword
> >>>>
> >>>
> >>> I didn't really get this set.
> >>>
> >>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
> >>> this size is dropped.
> >>
> >> Sure, it is normal case for dropping oversize data.
> >>
> >>> Isn't this what should be, why we are trying to overwrite user configuration
> >>> in PMD to prevent this?
> >>>
> >>
> >> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> >> This fix will make a decision when confliction occurred.
> >> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> >> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> >>
> >>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
> >>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
> >>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
> >>> and mean it? PMD will not honor the user config.
> >>
> >> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> >> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> >>
> >>>
> >>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
> >>>
> >> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> >> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> >>
> >>> And I guess even better what we need is to tell to the application what the
> >>> frame overhead PMD accepts.
> >>> So the application can set proper 'max_rx_pkt_len' value per port for a
> >>> given/requested MTU value.
> >>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
> >>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
> >>> he has a solution now?
> >
> >  From my perspective the main problem here:
> > We have 2 different variables for nearly the same thing:
> > rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
> > and 2 different API to update them: dev_mtu_set() and dev_configure().
> 
> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
> Although not sure that is practically what is done for all drivers.

I think most of Intel PMDs use it unconditionally.

> 
> > And inside majority of Intel PMDs we don't keep these 2 variables in sync:
> > - mtu_set() will update both variables.
> > - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
> >
> > This patch fixes this inconsistency, which I think is a good thing.
> > Though yes, it introduces change in behaviour.
> >
> > Let say the code:
> > rte_eth_dev_set_mtu(port, 1500);
> > dev_conf.max_rx_pkt_len = 1000;
> > rte_eth_dev_configure(port, 1, 1, &dev_conf);
> >
> 
> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
> 'rte_eth_dev_set_mtu().

Usually yes. 
But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();

> 
> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.

See above.
PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
and probably it shouldn't care.

>
> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
> are updated (mostly).

Yes, in mtu_set() we update both.
But we don't update MTU in dev_configure(), only max_rx_pkt_len.
That what this patch tries to fix (as I understand it).

> 
> 
> > Before the patch will result:
> > mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
> >
> > After the patch:
> > mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
> >
> > If you think we need to preserve current behaviour,
> > then I suppose the easiest thing would be to change dev_config() code
> > to update mtu value based on max_rx_pkt_len.
> > I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
> > So the code snippet above will result:
> > mtu=982,max_rx_pkt_len=1000;
> >
> 
> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
> drop it?
>
> By default device will be up with default MTU (1500), later
> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
> 
> Will this work?

I think it might, but that's a big change, probably too risky at that stage...
 

> 
> 
> And for short term, for above Intel PMDs, there must be a place this
> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
> otherwise use the 'MTU' value.

Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
I think we still need to keep max_rx_pkt_len and MTU values in sync. 

> 
> Without 'start()' updated the current logic won't work after stop & start anyway.
> 
> 
> >
> >
> >
> >>
> >>>
> >>> And why this same thing can't happen to other PMDs? If this is a problem for
> >>> all PMDs, we should solve in other level, not for only some PMDs.
> >>>
> >> No, all PMDs exist the same issue, another proposal:
> >>   -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
> >>   - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> >> Is it feasible?
> >>
> >>>>
> >>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
> >>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
> >>>>> satisfy mtu requirement.
> >>>>>
> >>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> >>>>>> here?
> >>>>> ice_mtu_set(dev, mtu) will append ether overhead to
> >>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> >>>>> parameter, or not the max_rx_pkt_len.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>> And please remove above comment, since ether overhead is already
> >>>>>>> considered in ice_mtu_set.
> >>>>> Ether overhead is already considered in ice_mtu_set, but it also
> >>>>> should be considered as the adjustment condition that if ice_mtu_set
> >>> need be invoked.
> >>>>> So, it perhaps should remain this comment before this if() condition.
> >>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> >>>>>>>>> +ret; }
> >>>>>>>>> +
> >>>>>>>>>    ret = ice_init_rss(pf);
> >>>>>>>>>    if (ret) {
> >>>>>>>>>    PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> >>>>>>>>> --
> >>>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >
  
Ananyev, Konstantin Oct. 19, 2020, 2:28 p.m. UTC | #9
> 
> > >>>>>>>>>
> > >>>>>>>>> testpmd will initialize default max packet length to 1518 which
> > >>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
> > >>>>>>>>> max mtu length packet with vlan tag, the max packet length will
> > >>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
> > >>>>> side.
> > >>>>>>>>>
> > >>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
> > >>>>>>>>> packet size, so, configures the correct max packet size in
> > >>>>>>>>> dev_config
> > >>>>>> ops.
> > >>>>>>>>>
> > >>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> > >>>>>>>>>
> > >>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> > >>>>>>>>> ---
> > >>>>>>>>>    drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> > >>>>>>>>>    1 file changed, 11 insertions(+)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
> > >>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
> > >>>>>>>>> cfd357b05..6b7098444 100644
> > >>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
> > >>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
> > >>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
> > >>> *dev)
> > >>>>>>>>> struct ice_adapter *ad =
> > >>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> > >>>>>>>>>    struct ice_pf *pf =
> > >>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > >>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> > >>>>>>>>>    int ret;
> > >>>>>>>>>
> > >>>>>>>>>    /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> > >>>>>>>>> -3157,6
> > >>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> > >>>>>>>>>    if (dev->data->dev_conf.rxmode.mq_mode &
> > >>> ETH_MQ_RX_RSS_FLAG)
> > >>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
> > >>>>>> DEV_RX_OFFLOAD_RSS_HASH;
> > >>>>>>>>>
> > >>>>>>>>> +/**
> > >>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
> > >>>>>>>>> + * larger than total size of MTU and Ether overhead.
> > >>>>>>>>> + */
> > >>>>>>>>
> > >>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Why we need this check?
> > >>>>>>>> Can we just call ice_mtu_set directly
> > >>>>>>>
> > >>>>>>> I think that without that check we can silently overwrite provided
> > >>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
> > >>>>>>
> > >>>>>> OK, I see
> > >>>>>>
> > >>>>>> But still have one question
> > >>>>>> dev->data->mtu is initialized to 1518 as default , but if
> > >>>>>> dev->data->application set
> > >>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> > >>>>>> does that mean we will still will set mtu to 1518, is this expected?
> > >>>>>>
> > >>>>>
> > >>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
> > >>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> > >>>>
> > >>>> Ok, this describe the problem more general and better to replace exist
> > >>> code comment and commit log for easy understanding.
> > >>>> Please send a new version for reword
> > >>>>
> > >>>
> > >>> I didn't really get this set.
> > >>>
> > >>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
> > >>> this size is dropped.
> > >>
> > >> Sure, it is normal case for dropping oversize data.
> > >>
> > >>> Isn't this what should be, why we are trying to overwrite user configuration
> > >>> in PMD to prevent this?
> > >>>
> > >>
> > >> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> > >> This fix will make a decision when confliction occurred.
> > >> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> > >> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> > >>
> > >>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
> > >>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
> > >>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
> > >>> and mean it? PMD will not honor the user config.
> > >>
> > >> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> > >> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> > >>
> > >>>
> > >>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
> > >>>
> > >> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> > >> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> > >>
> > >>> And I guess even better what we need is to tell to the application what the
> > >>> frame overhead PMD accepts.
> > >>> So the application can set proper 'max_rx_pkt_len' value per port for a
> > >>> given/requested MTU value.
> > >>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
> > >>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
> > >>> he has a solution now?
> > >
> > >  From my perspective the main problem here:
> > > We have 2 different variables for nearly the same thing:
> > > rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
> > > and 2 different API to update them: dev_mtu_set() and dev_configure().
> >
> > According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
> > Although not sure that is practically what is done for all drivers.
> 
> I think most of Intel PMDs use it unconditionally.
> 
> >
> > > And inside majority of Intel PMDs we don't keep these 2 variables in sync:
> > > - mtu_set() will update both variables.
> > > - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
> > >
> > > This patch fixes this inconsistency, which I think is a good thing.
> > > Though yes, it introduces change in behaviour.
> > >
> > > Let say the code:
> > > rte_eth_dev_set_mtu(port, 1500);
> > > dev_conf.max_rx_pkt_len = 1000;
> > > rte_eth_dev_configure(port, 1, 1, &dev_conf);
> > >
> >
> > 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
> > 'rte_eth_dev_set_mtu().
> 
> Usually yes.
> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
> 
> >
> > When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
> > ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
> 
> See above.
> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
> and probably it shouldn't care.
> 
> >
> > And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
> > are updated (mostly).
> 
> Yes, in mtu_set() we update both.
> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
> That what this patch tries to fix (as I understand it).

To be more precise - it doesn't change MTU value in dev_configure(),
but instead doesn't allow max_rx_pkt_len to become smaller
then MTU + OVERHEAD. 
Probably changing MTU value instead is a better choice.

> >
> >
> > > Before the patch will result:
> > > mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
> > >
> > > After the patch:
> > > mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
> > >
> > > If you think we need to preserve current behaviour,
> > > then I suppose the easiest thing would be to change dev_config() code
> > > to update mtu value based on max_rx_pkt_len.
> > > I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
> > > So the code snippet above will result:
> > > mtu=982,max_rx_pkt_len=1000;
> > >
> >
> > The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
> > drop it?
> >
> > By default device will be up with default MTU (1500), later
> > 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
> >
> > Will this work?
> 
> I think it might, but that's a big change, probably too risky at that stage...
> 
> 
> >
> >
> > And for short term, for above Intel PMDs, there must be a place this
> > 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
> > function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
> > otherwise use the 'MTU' value.
> 
> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
> I think we still need to keep max_rx_pkt_len and MTU values in sync.
> 
> >
> > Without 'start()' updated the current logic won't work after stop & start anyway.
> >
> >
> > >
> > >
> > >
> > >>
> > >>>
> > >>> And why this same thing can't happen to other PMDs? If this is a problem for
> > >>> all PMDs, we should solve in other level, not for only some PMDs.
> > >>>
> > >> No, all PMDs exist the same issue, another proposal:
> > >>   -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
> > >>   - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> > >> Is it feasible?
> > >>
> > >>>>
> > >>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
> > >>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
> > >>>>> satisfy mtu requirement.
> > >>>>>
> > >>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> > >>>>>> here?
> > >>>>> ice_mtu_set(dev, mtu) will append ether overhead to
> > >>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> > >>>>> parameter, or not the max_rx_pkt_len.
> > >>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> And please remove above comment, since ether overhead is already
> > >>>>>>> considered in ice_mtu_set.
> > >>>>> Ether overhead is already considered in ice_mtu_set, but it also
> > >>>>> should be considered as the adjustment condition that if ice_mtu_set
> > >>> need be invoked.
> > >>>>> So, it perhaps should remain this comment before this if() condition.
> > >>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> > >>>>>>>>> +ret; }
> > >>>>>>>>> +
> > >>>>>>>>>    ret = ice_init_rss(pf);
> > >>>>>>>>>    if (ret) {
> > >>>>>>>>>    PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> > >>>>>>>>> --
> > >>>>>>>>> 2.17.1
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >
  
Ferruh Yigit Oct. 19, 2020, 6:01 p.m. UTC | #10
On 10/19/2020 3:28 PM, Ananyev, Konstantin wrote:
> 
>>
>>>>>>>>>>>>
>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>>>>> side.
>>>>>>>>>>>>
>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>>>> dev_config
>>>>>>>>> ops.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>     drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>>>     1 file changed, 11 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>>>> *dev)
>>>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>>>     struct ice_pf *pf =
>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>>>     int ret;
>>>>>>>>>>>>
>>>>>>>>>>>>     /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>>>>> -3157,6
>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>>>     if (dev->data->dev_conf.rxmode.mq_mode &
>>>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>>>
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>>>> + */
>>>>>>>>>>>
>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Why we need this check?
>>>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>>>
>>>>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>>>
>>>>>>>>> OK, I see
>>>>>>>>>
>>>>>>>>> But still have one question
>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>>>> dev->data->application set
>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>>>
>>>>>>>>
>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>>>>
>>>>>>> Ok, this describe the problem more general and better to replace exist
>>>>>> code comment and commit log for easy understanding.
>>>>>>> Please send a new version for reword
>>>>>>>
>>>>>>
>>>>>> I didn't really get this set.
>>>>>>
>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>>>>> this size is dropped.
>>>>>
>>>>> Sure, it is normal case for dropping oversize data.
>>>>>
>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
>>>>>> in PMD to prevent this?
>>>>>>
>>>>>
>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>>>>> This fix will make a decision when confliction occurred.
>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>>>>
>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>>>>> and mean it? PMD will not honor the user config.
>>>>>
>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>>>>
>>>>>>
>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>>>
>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>>>>
>>>>>> And I guess even better what we need is to tell to the application what the
>>>>>> frame overhead PMD accepts.
>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>>>>> given/requested MTU value.
>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>>>>> he has a solution now?
>>>>
>>>>   From my perspective the main problem here:
>>>> We have 2 different variables for nearly the same thing:
>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
>>>
>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
>>> Although not sure that is practically what is done for all drivers.
>>
>> I think most of Intel PMDs use it unconditionally.
>>
>>>
>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
>>>> - mtu_set() will update both variables.
>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
>>>>
>>>> This patch fixes this inconsistency, which I think is a good thing.
>>>> Though yes, it introduces change in behaviour.
>>>>
>>>> Let say the code:
>>>> rte_eth_dev_set_mtu(port, 1500);
>>>> dev_conf.max_rx_pkt_len = 1000;
>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
>>>>
>>>
>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
>>> 'rte_eth_dev_set_mtu().
>>
>> Usually yes.
>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
>>
>>>
>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
>>
>> See above.
>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
>> and probably it shouldn't care.
>>
>>>
>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
>>> are updated (mostly).
>>
>> Yes, in mtu_set() we update both.
>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
>> That what this patch tries to fix (as I understand it).
> 
> To be more precise - it doesn't change MTU value in dev_configure(),
> but instead doesn't allow max_rx_pkt_len to become smaller
> then MTU + OVERHEAD.
> Probably changing MTU value instead is a better choice.
> 

+1 to change mtu for this case.
And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()' 
call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.

But this won't solve the problem Steve is trying to solve.

>>>
>>>
>>>> Before the patch will result:
>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
>>>>
>>>> After the patch:
>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
>>>>
>>>> If you think we need to preserve current behaviour,
>>>> then I suppose the easiest thing would be to change dev_config() code
>>>> to update mtu value based on max_rx_pkt_len.
>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
>>>> So the code snippet above will result:
>>>> mtu=982,max_rx_pkt_len=1000;
>>>>
>>>
>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
>>> drop it?
>>>
>>> By default device will be up with default MTU (1500), later
>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
>>>
>>> Will this work?
>>
>> I think it might, but that's a big change, probably too risky at that stage...
>>

Defintely, I was thinking for 21.11. Let me send a deprecation notice and see 
what happens.

>>
>>>
>>>
>>> And for short term, for above Intel PMDs, there must be a place this
>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
>>> otherwise use the 'MTU' value.
>>
>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
>>
>>>
>>> Without 'start()' updated the current logic won't work after stop & start anyway.
>>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>>>>
>>>>> No, all PMDs exist the same issue, another proposal:
>>>>>    -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>>>>    - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>>>>> Is it feasible?
>>>>>
>>>>>>>
>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>>>> satisfy mtu requirement.
>>>>>>>>
>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>>>>> here?
>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>>>>> considered in ice_mtu_set.
>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>>>> need be invoked.
>>>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>>>> +ret; }
>>>>>>>>>>>> +
>>>>>>>>>>>>     ret = ice_init_rss(pf);
>>>>>>>>>>>>     if (ret) {
>>>>>>>>>>>>     PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>
  
Ferruh Yigit Oct. 19, 2020, 6:05 p.m. UTC | #11
On 10/19/2020 4:07 AM, Yang, SteveX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Wednesday, October 14, 2020 11:38 PM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>> <stevex.yang@intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
>> Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
>> Xing, Beilei <beilei.xing@intel.com>; Stokes, Ian <ian.stokes@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets
>> with vlan tag cannot be received by default
>>
>> On 9/30/2020 3:32 AM, Zhang, Qi Z wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>> Sent: Wednesday, September 30, 2020 9:32 AM
>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ananyev, Konstantin
>>>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>> vlan tag cannot be received by default
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Zhang, Qi Z <qi.z.zhang@intel.com>
>>>>> Sent: Wednesday, September 30, 2020 8:35 AM
>>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
>> SteveX
>>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>> vlan tag cannot be received by default
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>>>> Sent: Wednesday, September 30, 2020 7:02 AM
>>>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>>>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>> vlan tag cannot be received by default
>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>>>>>> Sent: Monday, September 28, 2020 2:56 PM
>>>>>>>> To: dev@dpdk.org
>>>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
>>>>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
>> Zhang,
>>>>>>>> Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
>>>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
>>>>>>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
>> SteveX
>>>>>>>> <stevex.yang@intel.com>
>>>>>>>> Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>>>> vlan tag cannot be received by default
>>>>>>>>
>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>> side.
>>>>>>>>
>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>> dev_config
>>>>> ops.
>>>>>>>>
>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>
>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>> ---
>>>>>>>>    drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>    1 file changed, 11 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>> *dev)
>>>>>>>> struct ice_adapter *ad =
>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>    struct ice_pf *pf =
>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>    int ret;
>>>>>>>>
>>>>>>>>    /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>> -3157,6
>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>    if (dev->data->dev_conf.rxmode.mq_mode &
>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>> + */
>>>>>>>
>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>
>>>>>>>
>>>>>>> Why we need this check?
>>>>>>> Can we just call ice_mtu_set directly
>>>>>>
>>>>>> I think that without that check we can silently overwrite provided
>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>
>>>>> OK, I see
>>>>>
>>>>> But still have one question
>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>> dev->data->application set
>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>
>>>>
>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>
>>> Ok, this describe the problem more general and better to replace exist
>> code comment and commit log for easy understanding.
>>> Please send a new version for reword
>>>
>>
>> I didn't really get this set.
>>
>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>> this size is dropped.
> 
> Sure, it is normal case for dropping oversize data.
> 
>> Isn't this what should be, why we are trying to overwrite user configuration
>> in PMD to prevent this?
>>
> 
> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> This fix will make a decision when confliction occurred.
> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> 
>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>> and mean it? PMD will not honor the user config.
> 
> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> 
>>
>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>
> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> 
>> And I guess even better what we need is to tell to the application what the
>> frame overhead PMD accepts.
>> So the application can set proper 'max_rx_pkt_len' value per port for a
>> given/requested MTU value.
>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>> he has a solution now?
>>
> 
>>
>> And why this same thing can't happen to other PMDs? If this is a problem for
>> all PMDs, we should solve in other level, not for only some PMDs.
>>
> No, all PMDs exist the same issue, another proposal:
>   -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>   - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> Is it feasible?
> 

overhead can be calculated as "dev_info.max_rx_pktlen - dev_info.max_mtu"

What do you think update the testpmd 'init_config()', to update 
'port->dev_conf.rxmode.max_rx_pkt_len' as "RTE_ETHER_MTU + overhead"?


>>>
>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>> satisfy mtu requirement.
>>>>
>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>> here?
>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>> parameter, or not the max_rx_pkt_len.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> And please remove above comment, since ether overhead is already
>>>>>> considered in ice_mtu_set.
>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>> should be considered as the adjustment condition that if ice_mtu_set
>> need be invoked.
>>>> So, it perhaps should remain this comment before this if() condition.
>>>>
>>>>>>>
>>>>>>>
>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>> +ret; }
>>>>>>>> +
>>>>>>>>    ret = ice_init_rss(pf);
>>>>>>>>    if (ret) {
>>>>>>>>    PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
  
Ferruh Yigit Oct. 20, 2020, 8:13 a.m. UTC | #12
On 10/20/2020 3:57 AM, Yang, SteveX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Tuesday, October 20, 2020 2:05 AM
>> To: Yang, SteveX <stevex.yang@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang,
>> Qiming <qiming.yang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
>> Xing, Beilei <beilei.xing@intel.com>; Stokes, Ian <ian.stokes@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets
>> with vlan tag cannot be received by default
>>
>> On 10/19/2020 4:07 AM, Yang, SteveX wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: Wednesday, October 14, 2020 11:38 PM
>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>>>> <stevex.yang@intel.com>; Ananyev, Konstantin
>>>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
>>>> Stokes, Ian <ian.stokes@intel.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size
>>>> packets with vlan tag cannot be received by default
>>>>
>>>> On 9/30/2020 3:32 AM, Zhang, Qi Z wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>>>> Sent: Wednesday, September 30, 2020 9:32 AM
>>>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ananyev, Konstantin
>>>>>> <konstantin.ananyev@intel.com>; dev@dpdk.org
>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia <jia.guo@intel.com>;
>>>>>> Yang, Qiming <qiming.yang@intel.com>; Wu, Jingjing
>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>
>>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>> vlan tag cannot be received by default
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Zhang, Qi Z <qi.z.zhang@intel.com>
>>>>>>> Sent: Wednesday, September 30, 2020 8:35 AM
>>>>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
>>>> SteveX
>>>>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
>>>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>; Wu,
>>>>>>> Jingjing <jingjing.wu@intel.com>; Xing, Beilei
>>>>>>> <beilei.xing@intel.com>
>>>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>>> vlan tag cannot be received by default
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>>>>>> Sent: Wednesday, September 30, 2020 7:02 AM
>>>>>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, SteveX
>>>>>>>> <stevex.yang@intel.com>; dev@dpdk.org
>>>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
>>>>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>; Wu,
>>>>>>>> Jingjing <jingjing.wu@intel.com>; Xing, Beilei
>>>>>>>> <beilei.xing@intel.com>
>>>>>>>> Subject: RE: [PATCH v4 3/5] net/ice: fix max mtu size packets
>>>>>>>> with vlan tag cannot be received by default
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Yang, SteveX <stevex.yang@intel.com>
>>>>>>>>>> Sent: Monday, September 28, 2020 2:56 PM
>>>>>>>>>> To: dev@dpdk.org
>>>>>>>>>> Cc: Zhao1, Wei <wei.zhao1@intel.com>; Guo, Jia
>>>>>>>>>> <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
>>>> Zhang,
>>>>>>>>>> Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
>>>>>>>>>> <jingjing.wu@intel.com>; Xing, Beilei <beilei.xing@intel.com>;
>>>>>>>>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yang,
>>>> SteveX
>>>>>>>>>> <stevex.yang@intel.com>
>>>>>>>>>> Subject: [PATCH v4 3/5] net/ice: fix max mtu size packets with
>>>>>>>>>> vlan tag cannot be received by default
>>>>>>>>>>
>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send
>>>>>>>>>> the max mtu length packet with vlan tag, the max packet length
>>>>>>>>>> will exceed 1518 that will cause packets dropped directly from
>>>>>>>>>> NIC hw
>>>>>> side.
>>>>>>>>>>
>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>> dev_config
>>>>>>> ops.
>>>>>>>>>>
>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>     1 file changed, 11 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>     struct ice_pf *pf =
>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>     int ret;
>>>>>>>>>>
>>>>>>>>>>     /* Initialize to TRUE. If any of Rx queues doesn't meet the
>>>>>>>>>> @@
>>>>>>>>>> -3157,6
>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>     if (dev->data->dev_conf.rxmode.mq_mode &
>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>
>>>>>>>>>> +/**
>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>> + */
>>>>>>>>>
>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len)
>> {
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Why we need this check?
>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>
>>>>>>>> I think that without that check we can silently overwrite
>>>>>>>> provided by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>
>>>>>>> OK, I see
>>>>>>>
>>>>>>> But still have one question
>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>> dev->data->application set
>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>
>>>>>>
>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should
>>>>>> raise the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.:
>> 1500).
>>>>>
>>>>> Ok, this describe the problem more general and better to replace
>>>>> exist
>>>> code comment and commit log for easy understanding.
>>>>> Please send a new version for reword
>>>>>
>>>>
>>>> I didn't really get this set.
>>>>
>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame
>>>> bigger than this size is dropped.
>>>
>>> Sure, it is normal case for dropping oversize data.
>>>
>>>> Isn't this what should be, why we are trying to overwrite user
>>>> configuration in PMD to prevent this?
>>>>
>>>
>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at
>> the same time.
>>> This fix will make a decision when confliction occurred.
>>> MTU value will come from user operation (e.g.: port config mtu 0 1500)
>>> directly, so, the max_rx_pkt_len will resize itself to adapt expected MTU
>> value if its size is smaller than MTU + Ether overhead.
>>>
>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to
>> '1000'
>>>> and mean it? PMD will not honor the user config.
>>>
>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's
>> the behavior expected?
>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be
>> invalid.
>>>
>>>>
>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>
>>> The default 'max_rx_pkt_len' has been initialized to generical value
>>> (1518) and default 'mtu' is '1500' in testpmd, But it isn't suitable to those
>> NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu'
>> value is preferable.
>>>
>>>> And I guess even better what we need is to tell to the application
>>>> what the frame overhead PMD accepts.
>>>> So the application can set proper 'max_rx_pkt_len' value per port for
>>>> a given/requested MTU value.
>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that,
>>>> perhaps he has a solution now?
>>>>
>>>
>>>>
>>>> And why this same thing can't happen to other PMDs? If this is a
>>>> problem for all PMDs, we should solve in other level, not for only some
>> PMDs.
>>>>
>>> No, all PMDs exist the same issue, another proposal:
>>>    -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in
>> rte_eth_dev_configure();
>>>    - provide the uniform API for fetching the NIC's supported Ether
>>> Overhead size; Is it feasible?
>>>
>>
>> overhead can be calculated as "dev_info.max_rx_pktlen -
>> dev_info.max_mtu"
>>
>> What do you think update the testpmd 'init_config()', to update 'port-
>>> dev_conf.rxmode.max_rx_pkt_len' as "RTE_ETHER_MTU + overhead"?
> 
> If update the testpmd relative code, this fix will only impact testpmd application,
> Need we make the change more common for other applications or DPDK clients?
 >

This is something needs to be done in application level. Testpmd update can be a 
sample usage for them.


> How about update 'max_rx_pkt_len' within 'rte_eth_dev_configure()' of rte_ethdev?
> 

What is your proposal to do in the ethdev layer?

>>
>>>>>
>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>> satisfy mtu requirement.
>>>>>>
>>>>>>> Should we just call ice_mtu_set(dev,
>> dev_conf.rxmode.max_rx_pkt_len)
>>>>>>> here?
>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the
>> 2nd
>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> And please remove above comment, since ether overhead is
>> already
>>>>>>>> considered in ice_mtu_set.
>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>> need be invoked.
>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>> +ret; }
>>>>>>>>>> +
>>>>>>>>>>     ret = ice_init_rss(pf);
>>>>>>>>>>     if (ret) {
>>>>>>>>>>     PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>> --
>>>>>>>>>> 2.17.1
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>
  
Ananyev, Konstantin Oct. 20, 2020, 9:07 a.m. UTC | #13
> >>>>>>>>>>>>
> >>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
> >>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
> >>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
> >>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
> >>>>>>>> side.
> >>>>>>>>>>>>
> >>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
> >>>>>>>>>>>> packet size, so, configures the correct max packet size in
> >>>>>>>>>>>> dev_config
> >>>>>>>>> ops.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>     drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> >>>>>>>>>>>>     1 file changed, 11 insertions(+)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
> >>>>>>>>>>>> cfd357b05..6b7098444 100644
> >>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
> >>>>>> *dev)
> >>>>>>>>>>>> struct ice_adapter *ad =
> >>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> >>>>>>>>>>>>     struct ice_pf *pf =
> >>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> >>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> >>>>>>>>>>>>     int ret;
> >>>>>>>>>>>>
> >>>>>>>>>>>>     /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> >>>>>>>>>>>> -3157,6
> >>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> >>>>>>>>>>>>     if (dev->data->dev_conf.rxmode.mq_mode &
> >>>>>> ETH_MQ_RX_RSS_FLAG)
> >>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
> >>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +/**
> >>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
> >>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
> >>>>>>>>>>>> + */
> >>>>>>>>>>>
> >>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Why we need this check?
> >>>>>>>>>>> Can we just call ice_mtu_set directly
> >>>>>>>>>>
> >>>>>>>>>> I think that without that check we can silently overwrite provided
> >>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
> >>>>>>>>>
> >>>>>>>>> OK, I see
> >>>>>>>>>
> >>>>>>>>> But still have one question
> >>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
> >>>>>>>>> dev->data->application set
> >>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> >>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
> >>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> >>>>>>>
> >>>>>>> Ok, this describe the problem more general and better to replace exist
> >>>>>> code comment and commit log for easy understanding.
> >>>>>>> Please send a new version for reword
> >>>>>>>
> >>>>>>
> >>>>>> I didn't really get this set.
> >>>>>>
> >>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
> >>>>>> this size is dropped.
> >>>>>
> >>>>> Sure, it is normal case for dropping oversize data.
> >>>>>
> >>>>>> Isn't this what should be, why we are trying to overwrite user configuration
> >>>>>> in PMD to prevent this?
> >>>>>>
> >>>>>
> >>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> >>>>> This fix will make a decision when confliction occurred.
> >>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> >>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> >>>>>
> >>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
> >>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
> >>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
> >>>>>> and mean it? PMD will not honor the user config.
> >>>>>
> >>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> >>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> >>>>>
> >>>>>>
> >>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
> >>>>>>
> >>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> >>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> >>>>>
> >>>>>> And I guess even better what we need is to tell to the application what the
> >>>>>> frame overhead PMD accepts.
> >>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
> >>>>>> given/requested MTU value.
> >>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
> >>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
> >>>>>> he has a solution now?
> >>>>
> >>>>   From my perspective the main problem here:
> >>>> We have 2 different variables for nearly the same thing:
> >>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
> >>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
> >>>
> >>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
> >>> Although not sure that is practically what is done for all drivers.
> >>
> >> I think most of Intel PMDs use it unconditionally.
> >>
> >>>
> >>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
> >>>> - mtu_set() will update both variables.
> >>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
> >>>>
> >>>> This patch fixes this inconsistency, which I think is a good thing.
> >>>> Though yes, it introduces change in behaviour.
> >>>>
> >>>> Let say the code:
> >>>> rte_eth_dev_set_mtu(port, 1500);
> >>>> dev_conf.max_rx_pkt_len = 1000;
> >>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
> >>>>
> >>>
> >>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
> >>> 'rte_eth_dev_set_mtu().
> >>
> >> Usually yes.
> >> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
> >>
> >>>
> >>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
> >>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
> >>
> >> See above.
> >> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
> >> and probably it shouldn't care.
> >>
> >>>
> >>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
> >>> are updated (mostly).
> >>
> >> Yes, in mtu_set() we update both.
> >> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
> >> That what this patch tries to fix (as I understand it).
> >
> > To be more precise - it doesn't change MTU value in dev_configure(),
> > but instead doesn't allow max_rx_pkt_len to become smaller
> > then MTU + OVERHEAD.
> > Probably changing MTU value instead is a better choice.
> >
> 
> +1 to change mtu for this case.
> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.

Hmm, I don't see that happens within Intel PMDs.
As I can read the code: if user never call mtu_set(), then MTU value is left intact.

> But this won't solve the problem Steve is trying to solve.

You mean we still need to update test-pmd code to calculate max_rx_pkt_len
properly for default mtu value?

> >>>
> >>>
> >>>> Before the patch will result:
> >>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
> >>>>
> >>>> After the patch:
> >>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
> >>>>
> >>>> If you think we need to preserve current behaviour,
> >>>> then I suppose the easiest thing would be to change dev_config() code
> >>>> to update mtu value based on max_rx_pkt_len.
> >>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
> >>>> So the code snippet above will result:
> >>>> mtu=982,max_rx_pkt_len=1000;
> >>>>
> >>>
> >>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
> >>> drop it?
> >>>
> >>> By default device will be up with default MTU (1500), later
> >>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
> >>>
> >>> Will this work?
> >>
> >> I think it might, but that's a big change, probably too risky at that stage...
> >>
> 
> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
> what happens.
> 
> >>
> >>>
> >>>
> >>> And for short term, for above Intel PMDs, there must be a place this
> >>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
> >>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
> >>> otherwise use the 'MTU' value.
> >>
> >> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
> >> I think we still need to keep max_rx_pkt_len and MTU values in sync.
> >>
> >>>
> >>> Without 'start()' updated the current logic won't work after stop & start anyway.
> >>>
> >>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
> >>>>>> all PMDs, we should solve in other level, not for only some PMDs.
> >>>>>>
> >>>>> No, all PMDs exist the same issue, another proposal:
> >>>>>    -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
> >>>>>    - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> >>>>> Is it feasible?
> >>>>>
> >>>>>>>
> >>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
> >>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
> >>>>>>>> satisfy mtu requirement.
> >>>>>>>>
> >>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> >>>>>>>>> here?
> >>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
> >>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> >>>>>>>> parameter, or not the max_rx_pkt_len.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> And please remove above comment, since ether overhead is already
> >>>>>>>>>> considered in ice_mtu_set.
> >>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
> >>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
> >>>>>> need be invoked.
> >>>>>>>> So, it perhaps should remain this comment before this if() condition.
> >>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> >>>>>>>>>>>> +ret; }
> >>>>>>>>>>>> +
> >>>>>>>>>>>>     ret = ice_init_rss(pf);
> >>>>>>>>>>>>     if (ret) {
> >>>>>>>>>>>>     PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> >>>>>>>>>>>> --
> >>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >
  
Ferruh Yigit Oct. 20, 2020, 12:29 p.m. UTC | #14
On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
> 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>>>>>>> side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>>>>>> dev_config
>>>>>>>>>>> ops.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>      drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>>>>>      1 file changed, 11 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>>>>>> *dev)
>>>>>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>>>>>      struct ice_pf *pf =
>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>>>>>      int ret;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>>>>>>> -3157,6
>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>>>>>      if (dev->data->dev_conf.rxmode.mq_mode &
>>>>>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why we need this check?
>>>>>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>>>>>
>>>>>>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>>>>>
>>>>>>>>>>> OK, I see
>>>>>>>>>>>
>>>>>>>>>>> But still have one question
>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>>>>>> dev->data->application set
>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>>>>>>
>>>>>>>>> Ok, this describe the problem more general and better to replace exist
>>>>>>>> code comment and commit log for easy understanding.
>>>>>>>>> Please send a new version for reword
>>>>>>>>>
>>>>>>>>
>>>>>>>> I didn't really get this set.
>>>>>>>>
>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>>>>>>> this size is dropped.
>>>>>>>
>>>>>>> Sure, it is normal case for dropping oversize data.
>>>>>>>
>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
>>>>>>>> in PMD to prevent this?
>>>>>>>>
>>>>>>>
>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>>>>>>> This fix will make a decision when confliction occurred.
>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>>>>>>
>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>>>>>>> and mean it? PMD will not honor the user config.
>>>>>>>
>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>>>>>>
>>>>>>>>
>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>>>>>
>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>>>>>>
>>>>>>>> And I guess even better what we need is to tell to the application what the
>>>>>>>> frame overhead PMD accepts.
>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>>>>>>> given/requested MTU value.
>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>>>>>>> he has a solution now?
>>>>>>
>>>>>>    From my perspective the main problem here:
>>>>>> We have 2 different variables for nearly the same thing:
>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
>>>>>
>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
>>>>> Although not sure that is practically what is done for all drivers.
>>>>
>>>> I think most of Intel PMDs use it unconditionally.
>>>>
>>>>>
>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
>>>>>> - mtu_set() will update both variables.
>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
>>>>>>
>>>>>> This patch fixes this inconsistency, which I think is a good thing.
>>>>>> Though yes, it introduces change in behaviour.
>>>>>>
>>>>>> Let say the code:
>>>>>> rte_eth_dev_set_mtu(port, 1500);
>>>>>> dev_conf.max_rx_pkt_len = 1000;
>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
>>>>>>
>>>>>
>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
>>>>> 'rte_eth_dev_set_mtu().
>>>>
>>>> Usually yes.
>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
>>>>
>>>>>
>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
>>>>
>>>> See above.
>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
>>>> and probably it shouldn't care.
>>>>
>>>>>
>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
>>>>> are updated (mostly).
>>>>
>>>> Yes, in mtu_set() we update both.
>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
>>>> That what this patch tries to fix (as I understand it).
>>>
>>> To be more precise - it doesn't change MTU value in dev_configure(),
>>> but instead doesn't allow max_rx_pkt_len to become smaller
>>> then MTU + OVERHEAD.
>>> Probably changing MTU value instead is a better choice.
>>>
>>
>> +1 to change mtu for this case.
>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
> 
> Hmm, I don't see that happens within Intel PMDs.
> As I can read the code: if user never call mtu_set(), then MTU value is left intact.
> 

I was checking ice,
in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.

>> But this won't solve the problem Steve is trying to solve.
> 
> You mean we still need to update test-pmd code to calculate max_rx_pkt_len
> properly for default mtu value?
> 

Yes.
Because target of this set is able to receive packets with payload size 
'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len', 
device still won't able to receive those packets.

>>>>>
>>>>>
>>>>>> Before the patch will result:
>>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
>>>>>>
>>>>>> After the patch:
>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
>>>>>>
>>>>>> If you think we need to preserve current behaviour,
>>>>>> then I suppose the easiest thing would be to change dev_config() code
>>>>>> to update mtu value based on max_rx_pkt_len.
>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
>>>>>> So the code snippet above will result:
>>>>>> mtu=982,max_rx_pkt_len=1000;
>>>>>>
>>>>>
>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
>>>>> drop it?
>>>>>
>>>>> By default device will be up with default MTU (1500), later
>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
>>>>>
>>>>> Will this work?
>>>>
>>>> I think it might, but that's a big change, probably too risky at that stage...
>>>>
>>
>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
>> what happens.
>>
>>>>
>>>>>
>>>>>
>>>>> And for short term, for above Intel PMDs, there must be a place this
>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
>>>>> otherwise use the 'MTU' value.
>>>>
>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
>>>>
>>>>>
>>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>>>>>>
>>>>>>> No, all PMDs exist the same issue, another proposal:
>>>>>>>     -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>>>>>>     - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>>>>>>> Is it feasible?
>>>>>>>
>>>>>>>>>
>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>>>>>> satisfy mtu requirement.
>>>>>>>>>>
>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>>>>>>> here?
>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>>>>>>> considered in ice_mtu_set.
>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>>>>>> need be invoked.
>>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>>>>>> +ret; }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>      ret = ice_init_rss(pf);
>>>>>>>>>>>>>>      if (ret) {
>>>>>>>>>>>>>>      PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>
  
Ananyev, Konstantin Oct. 21, 2020, 9:47 a.m. UTC | #15
> 
> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
> >
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
> >>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
> >>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
> >>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
> >>>>>>>>>> side.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
> >>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
> >>>>>>>>>>>>>> dev_config
> >>>>>>>>>>> ops.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>      drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> >>>>>>>>>>>>>>      1 file changed, 11 insertions(+)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
> >>>>>>>>>>>>>> cfd357b05..6b7098444 100644
> >>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
> >>>>>>>> *dev)
> >>>>>>>>>>>>>> struct ice_adapter *ad =
> >>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> >>>>>>>>>>>>>>      struct ice_pf *pf =
> >>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> >>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> >>>>>>>>>>>>>>      int ret;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> >>>>>>>>>>>>>> -3157,6
> >>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> >>>>>>>>>>>>>>      if (dev->data->dev_conf.rxmode.mq_mode &
> >>>>>>>> ETH_MQ_RX_RSS_FLAG)
> >>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
> >>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> +/**
> >>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
> >>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
> >>>>>>>>>>>>>> + */
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Why we need this check?
> >>>>>>>>>>>>> Can we just call ice_mtu_set directly
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think that without that check we can silently overwrite provided
> >>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
> >>>>>>>>>>>
> >>>>>>>>>>> OK, I see
> >>>>>>>>>>>
> >>>>>>>>>>> But still have one question
> >>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
> >>>>>>>>>>> dev->data->application set
> >>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> >>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
> >>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> >>>>>>>>>
> >>>>>>>>> Ok, this describe the problem more general and better to replace exist
> >>>>>>>> code comment and commit log for easy understanding.
> >>>>>>>>> Please send a new version for reword
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I didn't really get this set.
> >>>>>>>>
> >>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
> >>>>>>>> this size is dropped.
> >>>>>>>
> >>>>>>> Sure, it is normal case for dropping oversize data.
> >>>>>>>
> >>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
> >>>>>>>> in PMD to prevent this?
> >>>>>>>>
> >>>>>>>
> >>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> >>>>>>> This fix will make a decision when confliction occurred.
> >>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> >>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> >>>>>>>
> >>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
> >>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
> >>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
> >>>>>>>> and mean it? PMD will not honor the user config.
> >>>>>>>
> >>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> >>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
> >>>>>>>>
> >>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> >>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> >>>>>>>
> >>>>>>>> And I guess even better what we need is to tell to the application what the
> >>>>>>>> frame overhead PMD accepts.
> >>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
> >>>>>>>> given/requested MTU value.
> >>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
> >>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
> >>>>>>>> he has a solution now?
> >>>>>>
> >>>>>>    From my perspective the main problem here:
> >>>>>> We have 2 different variables for nearly the same thing:
> >>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
> >>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
> >>>>>
> >>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
> >>>>> Although not sure that is practically what is done for all drivers.
> >>>>
> >>>> I think most of Intel PMDs use it unconditionally.
> >>>>
> >>>>>
> >>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
> >>>>>> - mtu_set() will update both variables.
> >>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
> >>>>>>
> >>>>>> This patch fixes this inconsistency, which I think is a good thing.
> >>>>>> Though yes, it introduces change in behaviour.
> >>>>>>
> >>>>>> Let say the code:
> >>>>>> rte_eth_dev_set_mtu(port, 1500);
> >>>>>> dev_conf.max_rx_pkt_len = 1000;
> >>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
> >>>>>>
> >>>>>
> >>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
> >>>>> 'rte_eth_dev_set_mtu().
> >>>>
> >>>> Usually yes.
> >>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
> >>>>
> >>>>>
> >>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
> >>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
> >>>>
> >>>> See above.
> >>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
> >>>> and probably it shouldn't care.
> >>>>
> >>>>>
> >>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
> >>>>> are updated (mostly).
> >>>>
> >>>> Yes, in mtu_set() we update both.
> >>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
> >>>> That what this patch tries to fix (as I understand it).
> >>>
> >>> To be more precise - it doesn't change MTU value in dev_configure(),
> >>> but instead doesn't allow max_rx_pkt_len to become smaller
> >>> then MTU + OVERHEAD.
> >>> Probably changing MTU value instead is a better choice.
> >>>
> >>
> >> +1 to change mtu for this case.
> >> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
> >> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
> >
> > Hmm, I don't see that happens within Intel PMDs.
> > As I can read the code: if user never call mtu_set(), then MTU value is left intact.
> >
> 
> I was checking ice,
> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.

Yes, I am not arguing with that.
What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len.
While it probably should. 

> 
> >> But this won't solve the problem Steve is trying to solve.
> >
> > You mean we still need to update test-pmd code to calculate max_rx_pkt_len
> > properly for default mtu value?
> >
> 
> Yes.
> Because target of this set is able to receive packets with payload size
> 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len',
> device still won't able to receive those packets.

Agree.

> 
> >>>>>
> >>>>>
> >>>>>> Before the patch will result:
> >>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
> >>>>>>
> >>>>>> After the patch:
> >>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
> >>>>>>
> >>>>>> If you think we need to preserve current behaviour,
> >>>>>> then I suppose the easiest thing would be to change dev_config() code
> >>>>>> to update mtu value based on max_rx_pkt_len.
> >>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
> >>>>>> So the code snippet above will result:
> >>>>>> mtu=982,max_rx_pkt_len=1000;
> >>>>>>
> >>>>>
> >>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
> >>>>> drop it?
> >>>>>
> >>>>> By default device will be up with default MTU (1500), later
> >>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
> >>>>>
> >>>>> Will this work?
> >>>>
> >>>> I think it might, but that's a big change, probably too risky at that stage...
> >>>>
> >>
> >> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
> >> what happens.
> >>
> >>>>
> >>>>>
> >>>>>
> >>>>> And for short term, for above Intel PMDs, there must be a place this
> >>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
> >>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
> >>>>> otherwise use the 'MTU' value.
> >>>>
> >>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
> >>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
> >>>>
> >>>>>
> >>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
> >>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
> >>>>>>>>
> >>>>>>> No, all PMDs exist the same issue, another proposal:
> >>>>>>>     -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
> >>>>>>>     - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> >>>>>>> Is it feasible?
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
> >>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
> >>>>>>>>>> satisfy mtu requirement.
> >>>>>>>>>>
> >>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> >>>>>>>>>>> here?
> >>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
> >>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> >>>>>>>>>> parameter, or not the max_rx_pkt_len.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> And please remove above comment, since ether overhead is already
> >>>>>>>>>>>> considered in ice_mtu_set.
> >>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
> >>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
> >>>>>>>> need be invoked.
> >>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
> >>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> >>>>>>>>>>>>>> +ret; }
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>      ret = ice_init_rss(pf);
> >>>>>>>>>>>>>>      if (ret) {
> >>>>>>>>>>>>>>      PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>
> >
  
Ferruh Yigit Oct. 21, 2020, 10:36 a.m. UTC | #16
On 10/21/2020 10:47 AM, Ananyev, Konstantin wrote:
> 
> 
>>
>> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>>>>>>>> dev_config
>>>>>>>>>>>>> ops.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>       drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>>>>>>>       1 file changed, 11 insertions(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>>>>>>>> *dev)
>>>>>>>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>>>>>>>       struct ice_pf *pf =
>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>>>>>>>       int ret;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>>>>>>>>> -3157,6
>>>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>>>>>>>       if (dev->data->dev_conf.rxmode.mq_mode &
>>>>>>>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why we need this check?
>>>>>>>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>>>>>>>
>>>>>>>>>>>>> OK, I see
>>>>>>>>>>>>>
>>>>>>>>>>>>> But still have one question
>>>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>>>>>>>> dev->data->application set
>>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>>>>>>>>
>>>>>>>>>>> Ok, this describe the problem more general and better to replace exist
>>>>>>>>>> code comment and commit log for easy understanding.
>>>>>>>>>>> Please send a new version for reword
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I didn't really get this set.
>>>>>>>>>>
>>>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>>>>>>>>> this size is dropped.
>>>>>>>>>
>>>>>>>>> Sure, it is normal case for dropping oversize data.
>>>>>>>>>
>>>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
>>>>>>>>>> in PMD to prevent this?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>>>>>>>>> This fix will make a decision when confliction occurred.
>>>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>>>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>>>>>>>>
>>>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>>>>>>>>> and mean it? PMD will not honor the user config.
>>>>>>>>>
>>>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>>>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>>>>>>>
>>>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>>>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>>>>>>>>
>>>>>>>>>> And I guess even better what we need is to tell to the application what the
>>>>>>>>>> frame overhead PMD accepts.
>>>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>>>>>>>>> given/requested MTU value.
>>>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>>>>>>>>> he has a solution now?
>>>>>>>>
>>>>>>>>     From my perspective the main problem here:
>>>>>>>> We have 2 different variables for nearly the same thing:
>>>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
>>>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
>>>>>>>
>>>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
>>>>>>> Although not sure that is practically what is done for all drivers.
>>>>>>
>>>>>> I think most of Intel PMDs use it unconditionally.
>>>>>>
>>>>>>>
>>>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
>>>>>>>> - mtu_set() will update both variables.
>>>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
>>>>>>>>
>>>>>>>> This patch fixes this inconsistency, which I think is a good thing.
>>>>>>>> Though yes, it introduces change in behaviour.
>>>>>>>>
>>>>>>>> Let say the code:
>>>>>>>> rte_eth_dev_set_mtu(port, 1500);
>>>>>>>> dev_conf.max_rx_pkt_len = 1000;
>>>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
>>>>>>>>
>>>>>>>
>>>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
>>>>>>> 'rte_eth_dev_set_mtu().
>>>>>>
>>>>>> Usually yes.
>>>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
>>>>>>
>>>>>>>
>>>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
>>>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
>>>>>>
>>>>>> See above.
>>>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
>>>>>> and probably it shouldn't care.
>>>>>>
>>>>>>>
>>>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
>>>>>>> are updated (mostly).
>>>>>>
>>>>>> Yes, in mtu_set() we update both.
>>>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
>>>>>> That what this patch tries to fix (as I understand it).
>>>>>
>>>>> To be more precise - it doesn't change MTU value in dev_configure(),
>>>>> but instead doesn't allow max_rx_pkt_len to become smaller
>>>>> then MTU + OVERHEAD.
>>>>> Probably changing MTU value instead is a better choice.
>>>>>
>>>>
>>>> +1 to change mtu for this case.
>>>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
>>>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
>>>
>>> Hmm, I don't see that happens within Intel PMDs.
>>> As I can read the code: if user never call mtu_set(), then MTU value is left intact.
>>>
>>
>> I was checking ice,
>> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.
> 
> Yes, I am not arguing with that.
> What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len.
> While it probably should.
> 

Yes 'dev_configure()' doesn't update the 'dev->data->mtu' and 'max_rx_pkt_len' & 
'dev->data->mtu' may diverge there.

I think best place to update 'dev->data->mtu' is where the device is actually 
updated, but to prevent the diversion above we can update 'dev->data->mtu' in 
ethdev layer, in 'rte_eth_dev_configure()' based on 'max_rx_pkt_len', will it work?

Only concern I see is if user reads the MTU ('rte_eth_dev_get_mtu()') after 
'rte_eth_dev_configure()' but before device configured, user will get the wrong 
value, I guess that problem was already there but changing default value may 
make it more visible.

>>
>>>> But this won't solve the problem Steve is trying to solve.
>>>
>>> You mean we still need to update test-pmd code to calculate max_rx_pkt_len
>>> properly for default mtu value?
>>>
>>
>> Yes.
>> Because target of this set is able to receive packets with payload size
>> 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len',
>> device still won't able to receive those packets.
> 
> Agree.
> 
>>
>>>>>>>
>>>>>>>
>>>>>>>> Before the patch will result:
>>>>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
>>>>>>>>
>>>>>>>> After the patch:
>>>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
>>>>>>>>
>>>>>>>> If you think we need to preserve current behaviour,
>>>>>>>> then I suppose the easiest thing would be to change dev_config() code
>>>>>>>> to update mtu value based on max_rx_pkt_len.
>>>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
>>>>>>>> So the code snippet above will result:
>>>>>>>> mtu=982,max_rx_pkt_len=1000;
>>>>>>>>
>>>>>>>
>>>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
>>>>>>> drop it?
>>>>>>>
>>>>>>> By default device will be up with default MTU (1500), later
>>>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
>>>>>>>
>>>>>>> Will this work?
>>>>>>
>>>>>> I think it might, but that's a big change, probably too risky at that stage...
>>>>>>
>>>>
>>>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
>>>> what happens.
>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And for short term, for above Intel PMDs, there must be a place this
>>>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
>>>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
>>>>>>> otherwise use the 'MTU' value.
>>>>>>
>>>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
>>>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
>>>>>>
>>>>>>>
>>>>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>>>>>>>>
>>>>>>>>> No, all PMDs exist the same issue, another proposal:
>>>>>>>>>      -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>>>>>>>>      - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>>>>>>>>> Is it feasible?
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>>>>>>>> satisfy mtu requirement.
>>>>>>>>>>>>
>>>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>>>>>>>>> here?
>>>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>>>>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>>>>>>>>> considered in ice_mtu_set.
>>>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>>>>>>>> need be invoked.
>>>>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>>>>>>>> +ret; }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>       ret = ice_init_rss(pf);
>>>>>>>>>>>>>>>>       if (ret) {
>>>>>>>>>>>>>>>>       PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>
  
Ananyev, Konstantin Oct. 21, 2020, 10:44 a.m. UTC | #17
> >> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
> >>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
> >>>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
> >>>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
> >>>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
> >>>>>>>>>>>> side.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
> >>>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
> >>>>>>>>>>>>>>>> dev_config
> >>>>>>>>>>>>> ops.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
> >>>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>>       drivers/net/ice/ice_ethdev.c | 11 +++++++++++
> >>>>>>>>>>>>>>>>       1 file changed, 11 insertions(+)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
> >>>>>>>>>>>>>>>> cfd357b05..6b7098444 100644
> >>>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
> >>>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
> >>>>>>>>>> *dev)
> >>>>>>>>>>>>>>>> struct ice_adapter *ad =
> >>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> >>>>>>>>>>>>>>>>       struct ice_pf *pf =
> >>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> >>>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
> >>>>>>>>>>>>>>>>       int ret;
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>       /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
> >>>>>>>>>>>>>>>> -3157,6
> >>>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
> >>>>>>>>>>>>>>>>       if (dev->data->dev_conf.rxmode.mq_mode &
> >>>>>>>>>> ETH_MQ_RX_RSS_FLAG)
> >>>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
> >>>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> +/**
> >>>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
> >>>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
> >>>>>>>>>>>>>>>> + */
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Why we need this check?
> >>>>>>>>>>>>>>> Can we just call ice_mtu_set directly
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think that without that check we can silently overwrite provided
> >>>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> OK, I see
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But still have one question
> >>>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
> >>>>>>>>>>>>> dev->data->application set
> >>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
> >>>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
> >>>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
> >>>>>>>>>>>
> >>>>>>>>>>> Ok, this describe the problem more general and better to replace exist
> >>>>>>>>>> code comment and commit log for easy understanding.
> >>>>>>>>>>> Please send a new version for reword
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I didn't really get this set.
> >>>>>>>>>>
> >>>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
> >>>>>>>>>> this size is dropped.
> >>>>>>>>>
> >>>>>>>>> Sure, it is normal case for dropping oversize data.
> >>>>>>>>>
> >>>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
> >>>>>>>>>> in PMD to prevent this?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
> >>>>>>>>> This fix will make a decision when confliction occurred.
> >>>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
> >>>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
> >>>>>>>>>
> >>>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
> >>>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
> >>>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
> >>>>>>>>>> and mean it? PMD will not honor the user config.
> >>>>>>>>>
> >>>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
> >>>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
> >>>>>>>>>>
> >>>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
> >>>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
> >>>>>>>>>
> >>>>>>>>>> And I guess even better what we need is to tell to the application what the
> >>>>>>>>>> frame overhead PMD accepts.
> >>>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
> >>>>>>>>>> given/requested MTU value.
> >>>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
> >>>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
> >>>>>>>>>> he has a solution now?
> >>>>>>>>
> >>>>>>>>     From my perspective the main problem here:
> >>>>>>>> We have 2 different variables for nearly the same thing:
> >>>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
> >>>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
> >>>>>>>
> >>>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
> >>>>>>> Although not sure that is practically what is done for all drivers.
> >>>>>>
> >>>>>> I think most of Intel PMDs use it unconditionally.
> >>>>>>
> >>>>>>>
> >>>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
> >>>>>>>> - mtu_set() will update both variables.
> >>>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
> >>>>>>>>
> >>>>>>>> This patch fixes this inconsistency, which I think is a good thing.
> >>>>>>>> Though yes, it introduces change in behaviour.
> >>>>>>>>
> >>>>>>>> Let say the code:
> >>>>>>>> rte_eth_dev_set_mtu(port, 1500);
> >>>>>>>> dev_conf.max_rx_pkt_len = 1000;
> >>>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
> >>>>>>>>
> >>>>>>>
> >>>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
> >>>>>>> 'rte_eth_dev_set_mtu().
> >>>>>>
> >>>>>> Usually yes.
> >>>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
> >>>>>>
> >>>>>>>
> >>>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
> >>>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
> >>>>>>
> >>>>>> See above.
> >>>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
> >>>>>> and probably it shouldn't care.
> >>>>>>
> >>>>>>>
> >>>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
> >>>>>>> are updated (mostly).
> >>>>>>
> >>>>>> Yes, in mtu_set() we update both.
> >>>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
> >>>>>> That what this patch tries to fix (as I understand it).
> >>>>>
> >>>>> To be more precise - it doesn't change MTU value in dev_configure(),
> >>>>> but instead doesn't allow max_rx_pkt_len to become smaller
> >>>>> then MTU + OVERHEAD.
> >>>>> Probably changing MTU value instead is a better choice.
> >>>>>
> >>>>
> >>>> +1 to change mtu for this case.
> >>>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
> >>>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
> >>>
> >>> Hmm, I don't see that happens within Intel PMDs.
> >>> As I can read the code: if user never call mtu_set(), then MTU value is left intact.
> >>>
> >>
> >> I was checking ice,
> >> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.
> >
> > Yes, I am not arguing with that.
> > What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len.
> > While it probably should.
> >
> 
> Yes 'dev_configure()' doesn't update the 'dev->data->mtu' and 'max_rx_pkt_len' &
> 'dev->data->mtu' may diverge there.
> 
> I think best place to update 'dev->data->mtu' is where the device is actually
> updated, but to prevent the diversion above we can update 'dev->data->mtu' in
> ethdev layer, in 'rte_eth_dev_configure()' based on 'max_rx_pkt_len', will it work?

I think - yes.
At least, I don't foresee any implications with that.

> 
> Only concern I see is if user reads the MTU ('rte_eth_dev_get_mtu()') after
> 'rte_eth_dev_configure()' but before device configured, user will get the wrong
> value, I guess that problem was already there but changing default value may
> make it more visible.
> 
> >>
> >>>> But this won't solve the problem Steve is trying to solve.
> >>>
> >>> You mean we still need to update test-pmd code to calculate max_rx_pkt_len
> >>> properly for default mtu value?
> >>>
> >>
> >> Yes.
> >> Because target of this set is able to receive packets with payload size
> >> 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len',
> >> device still won't able to receive those packets.
> >
> > Agree.
> >
> >>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Before the patch will result:
> >>>>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
> >>>>>>>>
> >>>>>>>> After the patch:
> >>>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
> >>>>>>>>
> >>>>>>>> If you think we need to preserve current behaviour,
> >>>>>>>> then I suppose the easiest thing would be to change dev_config() code
> >>>>>>>> to update mtu value based on max_rx_pkt_len.
> >>>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
> >>>>>>>> So the code snippet above will result:
> >>>>>>>> mtu=982,max_rx_pkt_len=1000;
> >>>>>>>>
> >>>>>>>
> >>>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
> >>>>>>> drop it?
> >>>>>>>
> >>>>>>> By default device will be up with default MTU (1500), later
> >>>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
> >>>>>>>
> >>>>>>> Will this work?
> >>>>>>
> >>>>>> I think it might, but that's a big change, probably too risky at that stage...
> >>>>>>
> >>>>
> >>>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
> >>>> what happens.
> >>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> And for short term, for above Intel PMDs, there must be a place this
> >>>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
> >>>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
> >>>>>>> otherwise use the 'MTU' value.
> >>>>>>
> >>>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
> >>>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
> >>>>>>
> >>>>>>>
> >>>>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
> >>>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
> >>>>>>>>>>
> >>>>>>>>> No, all PMDs exist the same issue, another proposal:
> >>>>>>>>>      -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
> >>>>>>>>>      - provide the uniform API for fetching the NIC's supported Ether Overhead size;
> >>>>>>>>> Is it feasible?
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
> >>>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
> >>>>>>>>>>>> satisfy mtu requirement.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
> >>>>>>>>>>>>> here?
> >>>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
> >>>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
> >>>>>>>>>>>> parameter, or not the max_rx_pkt_len.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> And please remove above comment, since ether overhead is already
> >>>>>>>>>>>>>> considered in ice_mtu_set.
> >>>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
> >>>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
> >>>>>>>>>> need be invoked.
> >>>>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
> >>>>>>>>>>>>>>>> +ret; }
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>       ret = ice_init_rss(pf);
> >>>>>>>>>>>>>>>>       if (ret) {
> >>>>>>>>>>>>>>>>       PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>
> >
  
Ferruh Yigit Oct. 21, 2020, 10:53 a.m. UTC | #18
On 10/21/2020 11:44 AM, Ananyev, Konstantin wrote:
>>>> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>>>>>>>>>> dev_config
>>>>>>>>>>>>>>> ops.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang@intel.com>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>        drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>>>>>>>>>        1 file changed, 11 insertions(+)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>>>>>>>>>> *dev)
>>>>>>>>>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>>>>>>>>>        struct ice_pf *pf =
>>>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>>>>>>>>>        int ret;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>        /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>>>>>>>>>>> -3157,6
>>>>>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>>>>>>>>>        if (dev->data->dev_conf.rxmode.mq_mode &
>>>>>>>>>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Why we need this check?
>>>>>>>>>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OK, I see
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But still have one question
>>>>>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>>>>>>>>>> dev->data->application set
>>>>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, this describe the problem more general and better to replace exist
>>>>>>>>>>>> code comment and commit log for easy understanding.
>>>>>>>>>>>>> Please send a new version for reword
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I didn't really get this set.
>>>>>>>>>>>>
>>>>>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>>>>>>>>>>> this size is dropped.
>>>>>>>>>>>
>>>>>>>>>>> Sure, it is normal case for dropping oversize data.
>>>>>>>>>>>
>>>>>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
>>>>>>>>>>>> in PMD to prevent this?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>>>>>>>>>>> This fix will make a decision when confliction occurred.
>>>>>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>>>>>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>>>>>>>>>>
>>>>>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>>>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>>>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>>>>>>>>>>> and mean it? PMD will not honor the user config.
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>>>>>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>>>>>>>>>
>>>>>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>>>>>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>>>>>>>>>>
>>>>>>>>>>>> And I guess even better what we need is to tell to the application what the
>>>>>>>>>>>> frame overhead PMD accepts.
>>>>>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>>>>>>>>>>> given/requested MTU value.
>>>>>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>>>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>>>>>>>>>>> he has a solution now?
>>>>>>>>>>
>>>>>>>>>>      From my perspective the main problem here:
>>>>>>>>>> We have 2 different variables for nearly the same thing:
>>>>>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
>>>>>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
>>>>>>>>>
>>>>>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
>>>>>>>>> Although not sure that is practically what is done for all drivers.
>>>>>>>>
>>>>>>>> I think most of Intel PMDs use it unconditionally.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
>>>>>>>>>> - mtu_set() will update both variables.
>>>>>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
>>>>>>>>>>
>>>>>>>>>> This patch fixes this inconsistency, which I think is a good thing.
>>>>>>>>>> Though yes, it introduces change in behaviour.
>>>>>>>>>>
>>>>>>>>>> Let say the code:
>>>>>>>>>> rte_eth_dev_set_mtu(port, 1500);
>>>>>>>>>> dev_conf.max_rx_pkt_len = 1000;
>>>>>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
>>>>>>>>> 'rte_eth_dev_set_mtu().
>>>>>>>>
>>>>>>>> Usually yes.
>>>>>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
>>>>>>>>
>>>>>>>>>
>>>>>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
>>>>>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
>>>>>>>>
>>>>>>>> See above.
>>>>>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
>>>>>>>> and probably it shouldn't care.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
>>>>>>>>> are updated (mostly).
>>>>>>>>
>>>>>>>> Yes, in mtu_set() we update both.
>>>>>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
>>>>>>>> That what this patch tries to fix (as I understand it).
>>>>>>>
>>>>>>> To be more precise - it doesn't change MTU value in dev_configure(),
>>>>>>> but instead doesn't allow max_rx_pkt_len to become smaller
>>>>>>> then MTU + OVERHEAD.
>>>>>>> Probably changing MTU value instead is a better choice.
>>>>>>>
>>>>>>
>>>>>> +1 to change mtu for this case.
>>>>>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
>>>>>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
>>>>>
>>>>> Hmm, I don't see that happens within Intel PMDs.
>>>>> As I can read the code: if user never call mtu_set(), then MTU value is left intact.
>>>>>
>>>>
>>>> I was checking ice,
>>>> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.
>>>
>>> Yes, I am not arguing with that.
>>> What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len.
>>> While it probably should.
>>>
>>
>> Yes 'dev_configure()' doesn't update the 'dev->data->mtu' and 'max_rx_pkt_len' &
>> 'dev->data->mtu' may diverge there.
>>
>> I think best place to update 'dev->data->mtu' is where the device is actually
>> updated, but to prevent the diversion above we can update 'dev->data->mtu' in
>> ethdev layer, in 'rte_eth_dev_configure()' based on 'max_rx_pkt_len', will it work?
> 
> I think - yes.
> At least, I don't foresee any implications with that.
> 

Thanks.

@Steve, I think there is agreement on two patches:
1- Update testpmd to take overhead account instead of setting 'max_rx_pkt_len' 
to 1518 blindly.

2- In 'rte_eth_dev_configure()' update 'dev->data->mtu' based on 
'max_rx_pkt_len', again taking overhead into the account.

Would you mind updating the new version as above?

Thanks,
ferruh

>>
>> Only concern I see is if user reads the MTU ('rte_eth_dev_get_mtu()') after
>> 'rte_eth_dev_configure()' but before device configured, user will get the wrong
>> value, I guess that problem was already there but changing default value may
>> make it more visible.
>>
>>>>
>>>>>> But this won't solve the problem Steve is trying to solve.
>>>>>
>>>>> You mean we still need to update test-pmd code to calculate max_rx_pkt_len
>>>>> properly for default mtu value?
>>>>>
>>>>
>>>> Yes.
>>>> Because target of this set is able to receive packets with payload size
>>>> 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len',
>>>> device still won't able to receive those packets.
>>>
>>> Agree.
>>>
>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Before the patch will result:
>>>>>>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
>>>>>>>>>>
>>>>>>>>>> After the patch:
>>>>>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
>>>>>>>>>>
>>>>>>>>>> If you think we need to preserve current behaviour,
>>>>>>>>>> then I suppose the easiest thing would be to change dev_config() code
>>>>>>>>>> to update mtu value based on max_rx_pkt_len.
>>>>>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
>>>>>>>>>> So the code snippet above will result:
>>>>>>>>>> mtu=982,max_rx_pkt_len=1000;
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
>>>>>>>>> drop it?
>>>>>>>>>
>>>>>>>>> By default device will be up with default MTU (1500), later
>>>>>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
>>>>>>>>>
>>>>>>>>> Will this work?
>>>>>>>>
>>>>>>>> I think it might, but that's a big change, probably too risky at that stage...
>>>>>>>>
>>>>>>
>>>>>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
>>>>>> what happens.
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And for short term, for above Intel PMDs, there must be a place this
>>>>>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
>>>>>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
>>>>>>>>> otherwise use the 'MTU' value.
>>>>>>>>
>>>>>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
>>>>>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>>>>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>>>>>>>>>>
>>>>>>>>>>> No, all PMDs exist the same issue, another proposal:
>>>>>>>>>>>       -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>>>>>>>>>>       - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>>>>>>>>>>> Is it feasible?
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>>>>>>>>>> satisfy mtu requirement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>>>>>>>>>>> here?
>>>>>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>>>>>>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>>>>>>>>>>> considered in ice_mtu_set.
>>>>>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>>>>>>>>>> need be invoked.
>>>>>>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>>>>>>>>>> +ret; }
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>        ret = ice_init_rss(pf);
>>>>>>>>>>>>>>>>>>        if (ret) {
>>>>>>>>>>>>>>>>>>        PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
  

Patch

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index cfd357b05..6b7098444 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3146,6 +3146,7 @@  ice_dev_configure(struct rte_eth_dev *dev)
 	struct ice_adapter *ad =
 		ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
 	int ret;
 
 	/* Initialize to TRUE. If any of Rx queues doesn't meet the
@@ -3157,6 +3158,16 @@  ice_dev_configure(struct rte_eth_dev *dev)
 	if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
 		dev->data->dev_conf.rxmode.offloads |= DEV_RX_OFFLOAD_RSS_HASH;
 
+	/**
+	 * Considering QinQ packet, max frame size should be equal or
+	 * larger than total size of MTU and Ether overhead.
+	 */
+	if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
+		ret = ice_mtu_set(dev, dev->data->mtu);
+		if (ret != 0)
+			return ret;
+	}
+
 	ret = ice_init_rss(pf);
 	if (ret) {
 		PMD_DRV_LOG(ERR, "Failed to enable rss for PF");