[3/3] net/bonding: start ethdev prior to setting 8023ad flow

Message ID 20210622092531.73112-4-xhavli56@stud.fit.vutbr.cz (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series net/bonding: make dedicated queues work with mlx5 |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/iol-abi-testing success Testing PASS
ci/github-robot success github build: passed
ci/iol-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Functional fail Functional Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Functional fail Functional Testing issues

Commit Message

Havlík Martin June 22, 2021, 9:25 a.m. UTC
  When dedicated queues are enabled, mlx5 PMD fails to install RTE Flows
if the underlying ethdev is not started:
bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set: port not started (slave_port=0 queue_id=1)

Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
Cc: Jan Viktorin <viktorin@cesnet.cz>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)
  

Comments

humin (Q) June 23, 2021, 7:04 a.m. UTC | #1
在 2021/6/22 17:25, Martin Havlik 写道:
> When dedicated queues are enabled, mlx5 PMD fails to install RTE Flows
> if the underlying ethdev is not started:
> bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set: port not started (slave_port=0 queue_id=1)
> 
Why mlx5 PMD doing flow create relys on port started ?
I noticed other PMDs did not has that reliance.

> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
> Cc: Jan Viktorin <viktorin@cesnet.cz>
> ---
>   drivers/net/bonding/rte_eth_bond_pmd.c | 26 ++++++++++++++++++--------
>   1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index a6755661c..fea3bc537 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>   			rte_flow_destroy(slave_eth_dev->data->port_id,
>   					internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
>   					&flow_error);
> +	}
>   
> +	/* Start device */
> +	errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> +	if (errval != 0) {
> +		RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
> +				slave_eth_dev->data->port_id, errval);
> +		return -1;
> +	}
> +
> +	if (internals->mode == BONDING_MODE_8023AD &&
> +			internals->mode4.dedicated_queues.enabled == 1) {
>   		errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
>   				slave_eth_dev->data->port_id);
>   		if (errval != 0) {
>   			RTE_BOND_LOG(ERR,
>   				"bond_ethdev_8023ad_flow_set: port=%d, err (%d)",
>   				slave_eth_dev->data->port_id, errval);
> +
> +			errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
> +			if (errval < 0) {
> +				RTE_BOND_LOG(ERR,
> +					"rte_eth_dev_stop: port=%d, err (%d)",
> +					slave_eth_dev->data->port_id, errval);
> +			}
>   			return errval;
>   		}
>   	}
>   
> -	/* Start device */
> -	errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> -	if (errval != 0) {
> -		RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
> -				slave_eth_dev->data->port_id, errval);
> -		return -1;
> -	}
> -
>   	/* If RSS is enabled for bonding, synchronize RETA */
>   	if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS) {
>   		int i;
>
  
Havlík Martin June 24, 2021, 11:57 a.m. UTC | #2
Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
> 在 2021/6/22 17:25, Martin Havlik 写道:
>> When dedicated queues are enabled, mlx5 PMD fails to install RTE Flows
>> if the underlying ethdev is not started:
>> bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set: port 
>> not started (slave_port=0 queue_id=1)
>> 
> Why mlx5 PMD doing flow create relys on port started ?
> I noticed other PMDs did not has that reliance.
> 
After looking into it, I really can't answer this mlx5 centered 
question.
Closest related info we found so far is the 5th point in 
https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
but it only specifies it's the application's responsibility and that
flow rules are assumed invalid after port stop/close/restart but doesn't 
say anything about
<stop - flow rule create - start> vs <stop - start - flow rule create>
where the former is the point of failure on mlx5.
I'm addressing the maintainers for mlx5, who might know a bit more on 
the topic.
>> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
>> Cc: Jan Viktorin <viktorin@cesnet.cz>
>> ---
>>   drivers/net/bonding/rte_eth_bond_pmd.c | 26 
>> ++++++++++++++++++--------
>>   1 file changed, 18 insertions(+), 8 deletions(-)
>> 
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index a6755661c..fea3bc537 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev 
>> *bonded_eth_dev,
>>   			rte_flow_destroy(slave_eth_dev->data->port_id,
>>   
>> 					internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
>>   					&flow_error);
>> +	}
>>   +	/* Start device */
>> +	errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
>> +	if (errval != 0) {
>> +		RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
>> +				slave_eth_dev->data->port_id, errval);
>> +		return -1;
>> +	}
>> +
>> +	if (internals->mode == BONDING_MODE_8023AD &&
>> +			internals->mode4.dedicated_queues.enabled == 1) {
>>   		errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
>>   				slave_eth_dev->data->port_id);
>>   		if (errval != 0) {
>>   			RTE_BOND_LOG(ERR,
>>   				"bond_ethdev_8023ad_flow_set: port=%d, err (%d)",
>>   				slave_eth_dev->data->port_id, errval);
>> +
>> +			errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
>> +			if (errval < 0) {
>> +				RTE_BOND_LOG(ERR,
>> +					"rte_eth_dev_stop: port=%d, err (%d)",
>> +					slave_eth_dev->data->port_id, errval);
>> +			}
>>   			return errval;
>>   		}
>>   	}
>>   -	/* Start device */
>> -	errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
>> -	if (errval != 0) {
>> -		RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
>> -				slave_eth_dev->data->port_id, errval);
>> -		return -1;
>> -	}
>> -
>>   	/* If RSS is enabled for bonding, synchronize RETA */
>>   	if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS) 
>> {
>>   		int i;
>>
  
Matan Azrad July 4, 2021, 3:18 p.m. UTC | #3
From: Havlík Martin
> Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
> > 在 2021/6/22 17:25, Martin Havlik 写道:
> >> When dedicated queues are enabled, mlx5 PMD fails to install RTE
> >> Flows if the underlying ethdev is not started:
> >> bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set:
> port
> >> not started (slave_port=0 queue_id=1)
> >>
> > Why mlx5 PMD doing flow create relys on port started ?
> > I noticed other PMDs did not has that reliance.
> >
> After looking into it, I really can't answer this mlx5 centered question.
> Closest related info we found so far is the 5th point in
> https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> but it only specifies it's the application's responsibility and that flow rules are
> assumed invalid after port stop/close/restart but doesn't say anything about
> <stop - flow rule create - start> vs <stop - start - flow rule create> where the
> former is the point of failure on mlx5.
> I'm addressing the maintainers for mlx5, who might know a bit more on the
> topic.

From rte_ethdev.h

* Please note that some configuration is not stored between calls to
 * rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration will
 * be retained:
 *
 *     - MTU
 *     - flow control settings
 *     - receive mode configuration (promiscuous mode, all-multicast mode,
 *       hardware checksum mode, RSS/VMDQ settings etc.)
 *     - VLAN filtering configuration
 *     - default MAC address
 *     - MAC addresses supplied to MAC address array
 *     - flow director filtering mode (but not filtering rules)
 *     - NIC queue statistics mappings


Mlx5 assumes flows are allowed to be configured only after rte_eth_dev_start().
Before start \ after stop - no flow is valid anymore.

Matan

> >> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
> >> Cc: Jan Viktorin <viktorin@cesnet.cz>
> >> ---
> >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
> >> ++++++++++++++++++--------
> >>   1 file changed, 18 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> index a6755661c..fea3bc537 100644
> >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
> >> *bonded_eth_dev,
> >>                      rte_flow_destroy(slave_eth_dev->data->port_id,
> >>
> >>                                      internals-
> >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
> >>                                      &flow_error);
> >> +    }
> >>   +  /* Start device */
> >> +    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> >> +    if (errval != 0) {
> >> +            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
> >> +                            slave_eth_dev->data->port_id, errval);
> >> +            return -1;
> >> +    }
> >> +
> >> +    if (internals->mode == BONDING_MODE_8023AD &&
> >> +                    internals->mode4.dedicated_queues.enabled == 1)
> >> + {
> >>              errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
> >>                              slave_eth_dev->data->port_id);
> >>              if (errval != 0) {
> >>                      RTE_BOND_LOG(ERR,
> >>                              "bond_ethdev_8023ad_flow_set: port=%d, err (%d)",
> >>                              slave_eth_dev->data->port_id, errval);
> >> +
> >> +                    errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
> >> +                    if (errval < 0) {
> >> +                            RTE_BOND_LOG(ERR,
> >> +                                    "rte_eth_dev_stop: port=%d, err (%d)",
> >> +                                    slave_eth_dev->data->port_id, errval);
> >> +                    }
> >>                      return errval;
> >>              }
> >>      }
> >>   -  /* Start device */
> >> -    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> >> -    if (errval != 0) {
> >> -            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
> >> -                            slave_eth_dev->data->port_id, errval);
> >> -            return -1;
> >> -    }
> >> -
> >>      /* If RSS is enabled for bonding, synchronize RETA */
> >>      if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
> >> ETH_MQ_RX_RSS) {
> >>              int i;
> >>
  
Jan Viktorin July 7, 2021, 3:54 p.m. UTC | #4
On Sun, 4 Jul 2021 15:18:01 +0000
Matan Azrad <matan@nvidia.com> wrote:

> From: Havlík Martin
> > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:  
> > > 在 2021/6/22 17:25, Martin Havlik 写道:  
> > >> When dedicated queues are enabled, mlx5 PMD fails to install RTE
> > >> Flows if the underlying ethdev is not started:
> > >> bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set:  
> > port  
> > >> not started (slave_port=0 queue_id=1)
> > >>  
> > > Why mlx5 PMD doing flow create relys on port started ?
> > > I noticed other PMDs did not has that reliance.
> > >  
> > After looking into it, I really can't answer this mlx5 centered
> > question. Closest related info we found so far is the 5th point in
> > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> > but it only specifies it's the application's responsibility and
> > that flow rules are assumed invalid after port stop/close/restart
> > but doesn't say anything about <stop - flow rule create - start> vs
> > <stop - start - flow rule create> where the former is the point of
> > failure on mlx5. I'm addressing the maintainers for mlx5, who might
> > know a bit more on the topic.  
> 

Hello Matan,

> From rte_ethdev.h
> 
> * Please note that some configuration is not stored between calls to
>  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> configuration will
>  * be retained:
>  *
>  *     - MTU
>  *     - flow control settings
>  *     - receive mode configuration (promiscuous mode, all-multicast
> mode,
>  *       hardware checksum mode, RSS/VMDQ settings etc.)
>  *     - VLAN filtering configuration
>  *     - default MAC address
>  *     - MAC addresses supplied to MAC address array
>  *     - flow director filtering mode (but not filtering rules)
>  *     - NIC queue statistics mappings

just after this section, you can find the following statement:

 * Any other configuration will not be stored and will need to be re-entered
 * before a call to rte_eth_dev_start().

It is not very clear how is this exactly related to flows (and this
applies for all the quoted section, I think) but at least it can be used
as a counter argument.

> 
> 
> Mlx5 assumes flows are allowed to be configured only after
> rte_eth_dev_start(). Before start \ after stop - no flow is valid
> anymore.

I believe that this discussion is not about validity of flows. Let the flows
be invalid after calling to rte_eth_dev_stop(). This is OK, flows must be
recreated and the bonding driver works this way. But why not *before start*?
Does somebody know how other drivers behaves in this situation? (We know and
can check for Intel, there it does not seem to be an issue.)

By the way, the mlx5 behaviour opens a (probably short) time window
between starting of a port and configuation of filtering flows. You may
want to start the port with thousands of flows that apply just when the
port starts (not after, that's late). This may introduce glitches in
filtering and measuring of traffic (well, it is a question how serious
issue could it be...).

This matters for the bonding case as well, doesn't it?. It is not
desirable to accidently omit a packet that was received by primary
ingress logic instead of being redirected into the dedicated queue.

Are there any chances that for mlx5 it would be possible to insert flow
rules before calling rte_eth_dev_start? Anyway, the behaviour should be
specified and documented in DPDK more precisely to avoid such
uncertainty in the future.

Jan

> 
> Matan
> 
> > >> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
> > >> Cc: Jan Viktorin <viktorin@cesnet.cz>
> > >> ---
> > >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
> > >> ++++++++++++++++++--------
> > >>   1 file changed, 18 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> index a6755661c..fea3bc537 100644
> > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
> > >> *bonded_eth_dev,
> > >>                      rte_flow_destroy(slave_eth_dev->data->port_id,
> > >>
> > >>                                      internals-  
> > >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],  
> > >>                                      &flow_error);
> > >> +    }
> > >>   +  /* Start device */
> > >> +    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > >> +    if (errval != 0) {
> > >> +            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > >> (%d)",
> > >> +                            slave_eth_dev->data->port_id,
> > >> errval);
> > >> +            return -1;
> > >> +    }
> > >> +
> > >> +    if (internals->mode == BONDING_MODE_8023AD &&
> > >> +                    internals->mode4.dedicated_queues.enabled
> > >> == 1)
> > >> + {
> > >>              errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
> > >>                              slave_eth_dev->data->port_id);
> > >>              if (errval != 0) {
> > >>                      RTE_BOND_LOG(ERR,
> > >>                              "bond_ethdev_8023ad_flow_set:
> > >> port=%d, err (%d)", slave_eth_dev->data->port_id, errval);
> > >> +
> > >> +                    errval =
> > >> rte_eth_dev_stop(slave_eth_dev->data->port_id);
> > >> +                    if (errval < 0) {
> > >> +                            RTE_BOND_LOG(ERR,
> > >> +                                    "rte_eth_dev_stop: port=%d,
> > >> err (%d)",
> > >> +
> > >> slave_eth_dev->data->port_id, errval);
> > >> +                    }
> > >>                      return errval;
> > >>              }
> > >>      }
> > >>   -  /* Start device */
> > >> -    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > >> -    if (errval != 0) {
> > >> -            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > >> (%d)",
> > >> -                            slave_eth_dev->data->port_id,
> > >> errval);
> > >> -            return -1;
> > >> -    }
> > >> -
> > >>      /* If RSS is enabled for bonding, synchronize RETA */
> > >>      if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
> > >> ETH_MQ_RX_RSS) {
> > >>              int i;
> > >>
  
Ori Kam July 11, 2021, 8:49 a.m. UTC | #5
Hi Jan,


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> Sent: Wednesday, July 7, 2021 6:54 PM
> 
> On Sun, 4 Jul 2021 15:18:01 +0000
> Matan Azrad <matan@nvidia.com> wrote:
> 
> > From: Havlík Martin
> > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
> > > > 在 2021/6/22 17:25, Martin Havlik 写道:
> > > >> When dedicated queues are enabled, mlx5 PMD fails to install RTE
> > > >> Flows if the underlying ethdev is not started:
> > > >> bond_ethdev_8023ad_flow_set(267) -
> bond_ethdev_8023ad_flow_set:
> > > port
> > > >> not started (slave_port=0 queue_id=1)
> > > >>
> > > > Why mlx5 PMD doing flow create relys on port started ?
> > > > I noticed other PMDs did not has that reliance.
> > > >
> > > After looking into it, I really can't answer this mlx5 centered
> > > question. Closest related info we found so far is the 5th point in
> > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> > > but it only specifies it's the application's responsibility and that
> > > flow rules are assumed invalid after port stop/close/restart but
> > > doesn't say anything about <stop - flow rule create - start> vs
> > > <stop - start - flow rule create> where the former is the point of
> > > failure on mlx5. I'm addressing the maintainers for mlx5, who might
> > > know a bit more on the topic.
> >
> 
> Hello Matan,
> 
> > From rte_ethdev.h
> >
> > * Please note that some configuration is not stored between calls to
> >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration
> > will
> >  * be retained:
> >  *
> >  *     - MTU
> >  *     - flow control settings
> >  *     - receive mode configuration (promiscuous mode, all-multicast
> > mode,
> >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> >  *     - VLAN filtering configuration
> >  *     - default MAC address
> >  *     - MAC addresses supplied to MAC address array
> >  *     - flow director filtering mode (but not filtering rules)
> >  *     - NIC queue statistics mappings
> 
> just after this section, you can find the following statement:
> 
>  * Any other configuration will not be stored and will need to be re-entered
>  * before a call to rte_eth_dev_start().
> 
> It is not very clear how is this exactly related to flows (and this applies for all
> the quoted section, I think) but at least it can be used as a counter argument.
> 
I agree the doc is not clear, as I see it flows are not part of configuration, at least not
when we are talking about rte_flow.

> >
> >
> > Mlx5 assumes flows are allowed to be configured only after
> > rte_eth_dev_start(). Before start \ after stop - no flow is valid
> > anymore.
> 
> I believe that this discussion is not about validity of flows. Let the flows be
> invalid after calling to rte_eth_dev_stop(). This is OK, flows must be
> recreated and the bonding driver works this way. But why not *before
> start*?
Think about it this way by changing the configuration you may create invalid flows,
for example, you can only change the number of queues after port stop, so if
you create a flow with jump to queue 3 and then you remove queue 3 then,
the flow that is cached is not valid anymore. This goes for other configuration that
may affect the validity of a flow.

> Does somebody know how other drivers behaves in this situation? (We know
> and can check for Intel, there it does not seem to be an issue.)
> 
> By the way, the mlx5 behaviour opens a (probably short) time window
> between starting of a port and configuation of filtering flows. You may want
> to start the port with thousands of flows that apply just when the port starts
> (not after, that's late). This may introduce glitches in filtering and measuring
> of traffic (well, it is a question how serious issue could it be...).
> 
Agree but this is always true nothing is done is zero time and even if it was the insertion
is not in zero time, (assuming that even if the flows are stored by the PMD until start
they still will not all be inserted in the same time) 

> This matters for the bonding case as well, doesn't it?. It is not desirable to
> accidently omit a packet that was received by primary ingress logic instead of
> being redirected into the dedicated queue.
> 
> Are there any chances that for mlx5 it would be possible to insert flow rules
> before calling rte_eth_dev_start? Anyway, the behaviour should be specified
> and documented in DPDK more precisely to avoid such uncertainty in the
> future.
> 
I agree the documentation should be fixed.

Ori
> Jan
> 
> >
> > Matan
> >
> > > >> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
> > > >> Cc: Jan Viktorin <viktorin@cesnet.cz>
> > > >> ---
> > > >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
> > > >> ++++++++++++++++++--------
> > > >>   1 file changed, 18 insertions(+), 8 deletions(-)
> > > >>
> > > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > >> index a6755661c..fea3bc537 100644
> > > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
> > > >> *bonded_eth_dev,
> > > >>
> > > >> rte_flow_destroy(slave_eth_dev->data->port_id,
> > > >>
> > > >>                                      internals-
> > > >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
> > > >>                                      &flow_error);
> > > >> +    }
> > > >>   +  /* Start device */
> > > >> +    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > > >> +    if (errval != 0) {
> > > >> +            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > > >> (%d)",
> > > >> +                            slave_eth_dev->data->port_id,
> > > >> errval);
> > > >> +            return -1;
> > > >> +    }
> > > >> +
> > > >> +    if (internals->mode == BONDING_MODE_8023AD &&
> > > >> +                    internals->mode4.dedicated_queues.enabled
> > > >> == 1)
> > > >> + {
> > > >>              errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
> > > >>                              slave_eth_dev->data->port_id);
> > > >>              if (errval != 0) {
> > > >>                      RTE_BOND_LOG(ERR,
> > > >>                              "bond_ethdev_8023ad_flow_set:
> > > >> port=%d, err (%d)", slave_eth_dev->data->port_id, errval);
> > > >> +
> > > >> +                    errval =
> > > >> rte_eth_dev_stop(slave_eth_dev->data->port_id);
> > > >> +                    if (errval < 0) {
> > > >> +                            RTE_BOND_LOG(ERR,
> > > >> +                                    "rte_eth_dev_stop: port=%d,
> > > >> err (%d)",
> > > >> +
> > > >> slave_eth_dev->data->port_id, errval);
> > > >> +                    }
> > > >>                      return errval;
> > > >>              }
> > > >>      }
> > > >>   -  /* Start device */
> > > >> -    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > > >> -    if (errval != 0) {
> > > >> -            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > > >> (%d)",
> > > >> -                            slave_eth_dev->data->port_id,
> > > >> errval);
> > > >> -            return -1;
> > > >> -    }
> > > >> -
> > > >>      /* If RSS is enabled for bonding, synchronize RETA */
> > > >>      if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
> > > >> ETH_MQ_RX_RSS) {
> > > >>              int i;
> > > >>
  
Jan Viktorin July 11, 2021, 9:45 p.m. UTC | #6
On Sun, 11 Jul 2021 08:49:18 +0000
Ori Kam <orika@nvidia.com> wrote:

> Hi Jan,

Hi Ori,

> 
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> > Sent: Wednesday, July 7, 2021 6:54 PM
> > 
> > On Sun, 4 Jul 2021 15:18:01 +0000
> > Matan Azrad <matan@nvidia.com> wrote:
> >   
> > > From: Havlík Martin  
> > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:  
> > > > > 在 2021/6/22 17:25, Martin Havlik 写道:  
> > > > >> When dedicated queues are enabled, mlx5 PMD fails to install RTE
> > > > >> Flows if the underlying ethdev is not started:
> > > > >> bond_ethdev_8023ad_flow_set(267) -  
> > bond_ethdev_8023ad_flow_set:  
> > > > port  
> > > > >> not started (slave_port=0 queue_id=1)
> > > > >>  
> > > > > Why mlx5 PMD doing flow create relys on port started ?
> > > > > I noticed other PMDs did not has that reliance.
> > > > >  
> > > > After looking into it, I really can't answer this mlx5 centered
> > > > question. Closest related info we found so far is the 5th point in
> > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> > > > but it only specifies it's the application's responsibility and that
> > > > flow rules are assumed invalid after port stop/close/restart but
> > > > doesn't say anything about <stop - flow rule create - start> vs
> > > > <stop - start - flow rule create> where the former is the point of
> > > > failure on mlx5. I'm addressing the maintainers for mlx5, who might
> > > > know a bit more on the topic.  
> > >  
> > 
> > Hello Matan,
> >   
> > > From rte_ethdev.h
> > >
> > > * Please note that some configuration is not stored between calls to
> > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration
> > > will
> > >  * be retained:
> > >  *
> > >  *     - MTU
> > >  *     - flow control settings
> > >  *     - receive mode configuration (promiscuous mode, all-multicast
> > > mode,
> > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> > >  *     - VLAN filtering configuration
> > >  *     - default MAC address
> > >  *     - MAC addresses supplied to MAC address array
> > >  *     - flow director filtering mode (but not filtering rules)
> > >  *     - NIC queue statistics mappings  
> > 
> > just after this section, you can find the following statement:
> > 
> >  * Any other configuration will not be stored and will need to be re-entered
> >  * before a call to rte_eth_dev_start().
> > 
> > It is not very clear how is this exactly related to flows (and this applies for all
> > the quoted section, I think) but at least it can be used as a counter argument.
> >   
> I agree the doc is not clear, as I see it flows are not part of configuration, at least not
> when we are talking about rte_flow.

Agree.

> 
> > >
> > >
> > > Mlx5 assumes flows are allowed to be configured only after
> > > rte_eth_dev_start(). Before start \ after stop - no flow is valid
> > > anymore.  
> > 
> > I believe that this discussion is not about validity of flows. Let the flows be
> > invalid after calling to rte_eth_dev_stop(). This is OK, flows must be
> > recreated and the bonding driver works this way. But why not *before
> > start*?  
> Think about it this way by changing the configuration you may create invalid flows,
> for example, you can only change the number of queues after port stop, so if
> you create a flow with jump to queue 3 and then you remove queue 3 then,
> the flow that is cached is not valid anymore. This goes for other configuration that
> may affect the validity of a flow.

This is a valid argument, of course. The thing is whether it is
a responsibility of the PMD to take care of those corner cases or if
this is up to the application developer. If we respect the fact that
calling to stop invalidates all flows then when you do:

 > port stop 0
 > flow create 0 ingress pattern ... / end actions queue 6 / end
 > port config rxq 3
 > port start 0

it's clear that something is really wrong with the caller/user. I would
say that this is an application bug. I would expect that you first
reconfigure port and after that you modify flows. It seems quite
logical and intuitive, doesn't it?

Anyway, the PMD can possibly catch this rxq inconsistency but I can
imagine that there are more complicated sitations than just changing
count of queues. Any idea for a more complex real case?

Martin Havlik, can you please check how ixgbe and i40e behave in the
case above with "rxq change after flow created"?

> 
> > Does somebody know how other drivers behaves in this situation? (We know
> > and can check for Intel, there it does not seem to be an issue.)
> > 
> > By the way, the mlx5 behaviour opens a (probably short) time window
> > between starting of a port and configuation of filtering flows. You may want
> > to start the port with thousands of flows that apply just when the port starts
> > (not after, that's late). This may introduce glitches in filtering and measuring
> > of traffic (well, it is a question how serious issue could it be...).
> >   
> Agree but this is always true nothing is done is zero time and even if it was the insertion
> is not in zero time, (assuming that even if the flows are stored by the PMD until start
> they still will not all be inserted in the same time) 

Sorry, I probably do not understand well. Yes, creation of a flow would
never be zero time, agree. But if I create flows before the port
starts, than I do not have to care too much about how long the creation
takes. Because the card is expected to be configured already before
pushing the "start button".

Maybe, you assume that the created flows would first be cached inside
the PMD and when the port is finally started, it than transparently
write the queues to the HW. But this is not what I was talking about,
that's maybe even worse because you are hiding such behaviour from
users...

(I do not know the exact mlx5 implementation details, so my answer can
miss something, sorry for that.)

> 
> > This matters for the bonding case as well, doesn't it?. It is not desirable to
> > accidently omit a packet that was received by primary ingress logic instead of
> > being redirected into the dedicated queue.
> > 
> > Are there any chances that for mlx5 it would be possible to insert flow rules
> > before calling rte_eth_dev_start? Anyway, the behaviour should be specified
> > and documented in DPDK more precisely to avoid such uncertainty in the
> > future.
> >   
> I agree the documentation should be fixed.

+1

Jan

> 
> Ori
> > Jan
> >   
> > >
> > > Matan
> > >  
> > > > >> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
> > > > >> Cc: Jan Viktorin <viktorin@cesnet.cz>
> > > > >> ---
> > > > >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
> > > > >> ++++++++++++++++++--------
> > > > >>   1 file changed, 18 insertions(+), 8 deletions(-)
> > > > >>
> > > > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > >> index a6755661c..fea3bc537 100644
> > > > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
> > > > >> *bonded_eth_dev,
> > > > >>
> > > > >> rte_flow_destroy(slave_eth_dev->data->port_id,
> > > > >>
> > > > >>                                      internals-  
> > > > >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],  
> > > > >>                                      &flow_error);
> > > > >> +    }
> > > > >>   +  /* Start device */
> > > > >> +    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > > > >> +    if (errval != 0) {
> > > > >> +            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > > > >> (%d)",
> > > > >> +                            slave_eth_dev->data->port_id,
> > > > >> errval);
> > > > >> +            return -1;
> > > > >> +    }
> > > > >> +
> > > > >> +    if (internals->mode == BONDING_MODE_8023AD &&
> > > > >> +                    internals->mode4.dedicated_queues.enabled
> > > > >> == 1)
> > > > >> + {
> > > > >>              errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
> > > > >>                              slave_eth_dev->data->port_id);
> > > > >>              if (errval != 0) {
> > > > >>                      RTE_BOND_LOG(ERR,
> > > > >>                              "bond_ethdev_8023ad_flow_set:
> > > > >> port=%d, err (%d)", slave_eth_dev->data->port_id, errval);
> > > > >> +
> > > > >> +                    errval =
> > > > >> rte_eth_dev_stop(slave_eth_dev->data->port_id);
> > > > >> +                    if (errval < 0) {
> > > > >> +                            RTE_BOND_LOG(ERR,
> > > > >> +                                    "rte_eth_dev_stop: port=%d,
> > > > >> err (%d)",
> > > > >> +
> > > > >> slave_eth_dev->data->port_id, errval);
> > > > >> +                    }
> > > > >>                      return errval;
> > > > >>              }
> > > > >>      }
> > > > >>   -  /* Start device */
> > > > >> -    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > > > >> -    if (errval != 0) {
> > > > >> -            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > > > >> (%d)",
> > > > >> -                            slave_eth_dev->data->port_id,
> > > > >> errval);
> > > > >> -            return -1;
> > > > >> -    }
> > > > >> -
> > > > >>      /* If RSS is enabled for bonding, synchronize RETA */
> > > > >>      if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
> > > > >> ETH_MQ_RX_RSS) {
> > > > >>              int i;
> > > > >>  
>
  
Ori Kam July 12, 2021, 1:07 p.m. UTC | #7
Hi Jan,

> -----Original Message-----
> From: Jan Viktorin <viktorin@cesnet.cz>
> Sent: Monday, July 12, 2021 12:46 AM
> 
> On Sun, 11 Jul 2021 08:49:18 +0000
> Ori Kam <orika@nvidia.com> wrote:
> 
> > Hi Jan,
> 
> Hi Ori,
> 
> >
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> > > Sent: Wednesday, July 7, 2021 6:54 PM
> > >
> > > On Sun, 4 Jul 2021 15:18:01 +0000
> > > Matan Azrad <matan@nvidia.com> wrote:
> > >
> > > > From: Havlík Martin
> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:
> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to install
> > > > > >> RTE Flows if the underlying ethdev is not started:
> > > > > >> bond_ethdev_8023ad_flow_set(267) -
> > > bond_ethdev_8023ad_flow_set:
> > > > > port
> > > > > >> not started (slave_port=0 queue_id=1)
> > > > > >>
> > > > > > Why mlx5 PMD doing flow create relys on port started ?
> > > > > > I noticed other PMDs did not has that reliance.
> > > > > >
> > > > > After looking into it, I really can't answer this mlx5 centered
> > > > > question. Closest related info we found so far is the 5th point
> > > > > in https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> > > > > but it only specifies it's the application's responsibility and
> > > > > that flow rules are assumed invalid after port
> > > > > stop/close/restart but doesn't say anything about <stop - flow
> > > > > rule create - start> vs <stop - start - flow rule create> where
> > > > > the former is the point of failure on mlx5. I'm addressing the
> > > > > maintainers for mlx5, who might know a bit more on the topic.
> > > >
> > >
> > > Hello Matan,
> > >
> > > > From rte_ethdev.h
> > > >
> > > > * Please note that some configuration is not stored between calls
> > > > to
> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> > > > configuration will
> > > >  * be retained:
> > > >  *
> > > >  *     - MTU
> > > >  *     - flow control settings
> > > >  *     - receive mode configuration (promiscuous mode, all-multicast
> > > > mode,
> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> > > >  *     - VLAN filtering configuration
> > > >  *     - default MAC address
> > > >  *     - MAC addresses supplied to MAC address array
> > > >  *     - flow director filtering mode (but not filtering rules)
> > > >  *     - NIC queue statistics mappings
> > >
> > > just after this section, you can find the following statement:
> > >
> > >  * Any other configuration will not be stored and will need to be
> > > re-entered
> > >  * before a call to rte_eth_dev_start().
> > >
> > > It is not very clear how is this exactly related to flows (and this
> > > applies for all the quoted section, I think) but at least it can be used as a
> counter argument.
> > >
> > I agree the doc is not clear, as I see it flows are not part of
> > configuration, at least not when we are talking about rte_flow.
> 
> Agree.
> 
> >
> > > >
> > > >
> > > > Mlx5 assumes flows are allowed to be configured only after
> > > > rte_eth_dev_start(). Before start \ after stop - no flow is valid
> > > > anymore.
> > >
> > > I believe that this discussion is not about validity of flows. Let
> > > the flows be invalid after calling to rte_eth_dev_stop(). This is
> > > OK, flows must be recreated and the bonding driver works this way.
> > > But why not *before start*?
> > Think about it this way by changing the configuration you may create
> > invalid flows, for example, you can only change the number of queues
> > after port stop, so if you create a flow with jump to queue 3 and then
> > you remove queue 3 then, the flow that is cached is not valid anymore.
> > This goes for other configuration that may affect the validity of a flow.
> 
> This is a valid argument, of course. The thing is whether it is a responsibility
> of the PMD to take care of those corner cases or if this is up to the
> application developer. If we respect the fact that calling to stop invalidates all
> flows then when you do:
> 
>  > port stop 0
>  > flow create 0 ingress pattern ... / end actions queue 6 / end  > port config
> rxq 3  > port start 0
> 
> it's clear that something is really wrong with the caller/user. I would say that
> this is an application bug. I would expect that you first reconfigure port and
> after that you modify flows. It seems quite logical and intuitive, doesn't it?
> 

I agree the use case I described is bad programming, but the same logic can be
applied on stopping the port why flows should be deleted?
And even more important on some HW and some flows can only be inserted after
start since there is no real traffic gate, basically the start means adding the flows
if a flow is in the HW it is working, adding a gate will result in PPS degradation.

> Anyway, the PMD can possibly catch this rxq inconsistency but I can imagine
> that there are more complicated sitations than just changing count of
> queues. Any idea for a more complex real case?
> 

Some of the resource may be created only after start,
But I don't have a very good example now.

> Martin Havlik, can you please check how ixgbe and i40e behave in the case
> above with "rxq change after flow created"?
> 
> >
> > > Does somebody know how other drivers behaves in this situation? (We
> > > know and can check for Intel, there it does not seem to be an
> > > issue.)
> > >
> > > By the way, the mlx5 behaviour opens a (probably short) time window
> > > between starting of a port and configuation of filtering flows. You
> > > may want to start the port with thousands of flows that apply just
> > > when the port starts (not after, that's late). This may introduce
> > > glitches in filtering and measuring of traffic (well, it is a question how
> serious issue could it be...).
> > >
> > Agree but this is always true nothing is done is zero time and even if
> > it was the insertion is not in zero time, (assuming that even if the
> > flows are stored by the PMD until start they still will not all be
> > inserted in the same time)
> 
> Sorry, I probably do not understand well. Yes, creation of a flow would never
> be zero time, agree. But if I create flows before the port starts, than I do not
> have to care too much about how long the creation takes. Because the card is
> expected to be configured already before pushing the "start button".
> 
> Maybe, you assume that the created flows would first be cached inside the
> PMD and when the port is finally started, it than transparently write the
> queues to the HW. But this is not what I was talking about, that's maybe even
> worse because you are hiding such behaviour from users...
> 
> (I do not know the exact mlx5 implementation details, so my answer can
> miss something, sorry for that.)
> 

You are correct this is HW/PMD implementation issue, in case of Nvidia we must
cache the flows and only insert it after start.
Since RTE flow is generic and HW is not generic sometimes the PMD needs to
translate this difference and in this gray area there is a game between
best API best performance understanding what the user really wants.
This is why the doc should be very clear on what is the expected.
but this is also the reason why such document is very hard to agree on.

Ori 
> >
> > > This matters for the bonding case as well, doesn't it?. It is not
> > > desirable to accidently omit a packet that was received by primary
> > > ingress logic instead of being redirected into the dedicated queue.
> > >
> > > Are there any chances that for mlx5 it would be possible to insert
> > > flow rules before calling rte_eth_dev_start? Anyway, the behaviour
> > > should be specified and documented in DPDK more precisely to avoid
> > > such uncertainty in the future.
> > >
> > I agree the documentation should be fixed.
> 
> +1
> 
> Jan
> 
> >
> > Ori
> > > Jan
> > >
> > > >
> > > > Matan
> > > >
> > > > > >> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
> > > > > >> Cc: Jan Viktorin <viktorin@cesnet.cz>
> > > > > >> ---
> > > > > >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
> > > > > >> ++++++++++++++++++--------
> > > > > >>   1 file changed, 18 insertions(+), 8 deletions(-)
> > > > > >>
> > > > > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > >> index a6755661c..fea3bc537 100644
> > > > > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
> > > > > >> *bonded_eth_dev,
> > > > > >>
> > > > > >> rte_flow_destroy(slave_eth_dev->data->port_id,
> > > > > >>
> > > > > >>                                      internals-
> > > > > >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
> > > > > >>                                      &flow_error);
> > > > > >> +    }
> > > > > >>   +  /* Start device */
> > > > > >> +    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > > > > >> +    if (errval != 0) {
> > > > > >> +            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u,
> > > > > >> + err
> > > > > >> (%d)",
> > > > > >> +                            slave_eth_dev->data->port_id,
> > > > > >> errval);
> > > > > >> +            return -1;
> > > > > >> +    }
> > > > > >> +
> > > > > >> +    if (internals->mode == BONDING_MODE_8023AD &&
> > > > > >> +
> > > > > >> + internals->mode4.dedicated_queues.enabled
> > > > > >> == 1)
> > > > > >> + {
> > > > > >>              errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
> > > > > >>                              slave_eth_dev->data->port_id);
> > > > > >>              if (errval != 0) {
> > > > > >>                      RTE_BOND_LOG(ERR,
> > > > > >>                              "bond_ethdev_8023ad_flow_set:
> > > > > >> port=%d, err (%d)", slave_eth_dev->data->port_id, errval);
> > > > > >> +
> > > > > >> +                    errval =
> > > > > >> rte_eth_dev_stop(slave_eth_dev->data->port_id);
> > > > > >> +                    if (errval < 0) {
> > > > > >> +                            RTE_BOND_LOG(ERR,
> > > > > >> +                                    "rte_eth_dev_stop:
> > > > > >> + port=%d,
> > > > > >> err (%d)",
> > > > > >> +
> > > > > >> slave_eth_dev->data->port_id, errval);
> > > > > >> +                    }
> > > > > >>                      return errval;
> > > > > >>              }
> > > > > >>      }
> > > > > >>   -  /* Start device */
> > > > > >> -    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> > > > > >> -    if (errval != 0) {
> > > > > >> -            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
> > > > > >> (%d)",
> > > > > >> -                            slave_eth_dev->data->port_id,
> > > > > >> errval);
> > > > > >> -            return -1;
> > > > > >> -    }
> > > > > >> -
> > > > > >>      /* If RSS is enabled for bonding, synchronize RETA */
> > > > > >>      if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
> > > > > >> ETH_MQ_RX_RSS) {
> > > > > >>              int i;
> > > > > >>
> >
  
Havlík Martin July 13, 2021, 8:18 a.m. UTC | #8
Dne 2021-07-12 15:07, Ori Kam napsal:
> Hi Jan,
> 
>> -----Original Message-----
>> From: Jan Viktorin <viktorin@cesnet.cz>
>> Sent: Monday, July 12, 2021 12:46 AM
>> 
>> On Sun, 11 Jul 2021 08:49:18 +0000
>> Ori Kam <orika@nvidia.com> wrote:
>> 
>> > Hi Jan,
>> 
>> Hi Ori,
>> 
>> >
>> >
>> > > -----Original Message-----
>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
>> > > Sent: Wednesday, July 7, 2021 6:54 PM
>> > >
>> > > On Sun, 4 Jul 2021 15:18:01 +0000
>> > > Matan Azrad <matan@nvidia.com> wrote:
>> > >
>> > > > From: Havlík Martin
>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:
>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to install
>> > > > > >> RTE Flows if the underlying ethdev is not started:
>> > > > > >> bond_ethdev_8023ad_flow_set(267) -
>> > > bond_ethdev_8023ad_flow_set:
>> > > > > port
>> > > > > >> not started (slave_port=0 queue_id=1)
>> > > > > >>
>> > > > > > Why mlx5 PMD doing flow create relys on port started ?
>> > > > > > I noticed other PMDs did not has that reliance.
>> > > > > >
>> > > > > After looking into it, I really can't answer this mlx5 centered
>> > > > > question. Closest related info we found so far is the 5th point
>> > > > > in https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
>> > > > > but it only specifies it's the application's responsibility and
>> > > > > that flow rules are assumed invalid after port
>> > > > > stop/close/restart but doesn't say anything about <stop - flow
>> > > > > rule create - start> vs <stop - start - flow rule create> where
>> > > > > the former is the point of failure on mlx5. I'm addressing the
>> > > > > maintainers for mlx5, who might know a bit more on the topic.
>> > > >
>> > >
>> > > Hello Matan,
>> > >
>> > > > From rte_ethdev.h
>> > > >
>> > > > * Please note that some configuration is not stored between calls
>> > > > to
>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
>> > > > configuration will
>> > > >  * be retained:
>> > > >  *
>> > > >  *     - MTU
>> > > >  *     - flow control settings
>> > > >  *     - receive mode configuration (promiscuous mode, all-multicast
>> > > > mode,
>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
>> > > >  *     - VLAN filtering configuration
>> > > >  *     - default MAC address
>> > > >  *     - MAC addresses supplied to MAC address array
>> > > >  *     - flow director filtering mode (but not filtering rules)
>> > > >  *     - NIC queue statistics mappings
>> > >
>> > > just after this section, you can find the following statement:
>> > >
>> > >  * Any other configuration will not be stored and will need to be
>> > > re-entered
>> > >  * before a call to rte_eth_dev_start().
>> > >
>> > > It is not very clear how is this exactly related to flows (and this
>> > > applies for all the quoted section, I think) but at least it can be used as a
>> counter argument.
>> > >
>> > I agree the doc is not clear, as I see it flows are not part of
>> > configuration, at least not when we are talking about rte_flow.
>> 
>> Agree.
>> 
>> >
>> > > >
>> > > >
>> > > > Mlx5 assumes flows are allowed to be configured only after
>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is valid
>> > > > anymore.
>> > >
>> > > I believe that this discussion is not about validity of flows. Let
>> > > the flows be invalid after calling to rte_eth_dev_stop(). This is
>> > > OK, flows must be recreated and the bonding driver works this way.
>> > > But why not *before start*?
>> > Think about it this way by changing the configuration you may create
>> > invalid flows, for example, you can only change the number of queues
>> > after port stop, so if you create a flow with jump to queue 3 and then
>> > you remove queue 3 then, the flow that is cached is not valid anymore.
>> > This goes for other configuration that may affect the validity of a flow.
>> 
>> This is a valid argument, of course. The thing is whether it is a 
>> responsibility
>> of the PMD to take care of those corner cases or if this is up to the
>> application developer. If we respect the fact that calling to stop 
>> invalidates all
>> flows then when you do:
>> 
>>  > port stop 0
>>  > flow create 0 ingress pattern ... / end actions queue 6 / end  > 
>> port config
>> rxq 3  > port start 0
>> 
>> it's clear that something is really wrong with the caller/user. I 
>> would say that
>> this is an application bug. I would expect that you first reconfigure 
>> port and
>> after that you modify flows. It seems quite logical and intuitive, 
>> doesn't it?
>> 
> 
> I agree the use case I described is bad programming, but the same logic 
> can be
> applied on stopping the port why flows should be deleted?
> And even more important on some HW and some flows can only be inserted 
> after
> start since there is no real traffic gate, basically the start means
> adding the flows
> if a flow is in the HW it is working, adding a gate will result in PPS
> degradation.
> 
>> Anyway, the PMD can possibly catch this rxq inconsistency but I can 
>> imagine
>> that there are more complicated sitations than just changing count of
>> queues. Any idea for a more complex real case?
>> 
> 
> Some of the resource may be created only after start,
> But I don't have a very good example now.
> 
>> Martin Havlik, can you please check how ixgbe and i40e behave in the 
>> case
>> above with "rxq change after flow created"?
>> 
Both PMDs do not raise any errors in the mentioned case. The packets 
that should go to the non-existent queue are dropped at some point in 
the processing (from what I could see and understand).

I'm glad for the discussion that is taking place but I feel we have 
strayed from the original reason of it. That being the inability to 
enable dedicated queues for bonding mode 8023AD on mlx5. We have 
concluded that flow creation has to be done after port/device start. So 
moving the function call to create the flow after device start should 
solve the issue 
(https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my 
testing, flow create after start is not a problem on Intel PMDs.

I'm turning to you, Connor, as you have made the original question on 
this. Is the patch I presented applicable?

Martin
>> >
>> > > Does somebody know how other drivers behaves in this situation? (We
>> > > know and can check for Intel, there it does not seem to be an
>> > > issue.)
>> > >
>> > > By the way, the mlx5 behaviour opens a (probably short) time window
>> > > between starting of a port and configuation of filtering flows. You
>> > > may want to start the port with thousands of flows that apply just
>> > > when the port starts (not after, that's late). This may introduce
>> > > glitches in filtering and measuring of traffic (well, it is a question how
>> serious issue could it be...).
>> > >
>> > Agree but this is always true nothing is done is zero time and even if
>> > it was the insertion is not in zero time, (assuming that even if the
>> > flows are stored by the PMD until start they still will not all be
>> > inserted in the same time)
>> 
>> Sorry, I probably do not understand well. Yes, creation of a flow 
>> would never
>> be zero time, agree. But if I create flows before the port starts, 
>> than I do not
>> have to care too much about how long the creation takes. Because the 
>> card is
>> expected to be configured already before pushing the "start button".
>> 
>> Maybe, you assume that the created flows would first be cached inside 
>> the
>> PMD and when the port is finally started, it than transparently write 
>> the
>> queues to the HW. But this is not what I was talking about, that's 
>> maybe even
>> worse because you are hiding such behaviour from users...
>> 
>> (I do not know the exact mlx5 implementation details, so my answer can
>> miss something, sorry for that.)
>> 
> 
> You are correct this is HW/PMD implementation issue, in case of Nvidia 
> we must
> cache the flows and only insert it after start.
> Since RTE flow is generic and HW is not generic sometimes the PMD needs 
> to
> translate this difference and in this gray area there is a game between
> best API best performance understanding what the user really wants.
> This is why the doc should be very clear on what is the expected.
> but this is also the reason why such document is very hard to agree on.
> 
> Ori
>> >
>> > > This matters for the bonding case as well, doesn't it?. It is not
>> > > desirable to accidently omit a packet that was received by primary
>> > > ingress logic instead of being redirected into the dedicated queue.
>> > >
>> > > Are there any chances that for mlx5 it would be possible to insert
>> > > flow rules before calling rte_eth_dev_start? Anyway, the behaviour
>> > > should be specified and documented in DPDK more precisely to avoid
>> > > such uncertainty in the future.
>> > >
>> > I agree the documentation should be fixed.
>> 
>> +1
>> 
>> Jan
>> 
>> >
>> > Ori
>> > > Jan
>> > >
>> > > >
>> > > > Matan
>> > > >
>> > > > > >> Signed-off-by: Martin Havlik <xhavli56@stud.fit.vutbr.cz>
>> > > > > >> Cc: Jan Viktorin <viktorin@cesnet.cz>
>> > > > > >> ---
>> > > > > >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
>> > > > > >> ++++++++++++++++++--------
>> > > > > >>   1 file changed, 18 insertions(+), 8 deletions(-)
>> > > > > >>
>> > > > > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>> > > > > >> b/drivers/net/bonding/rte_eth_bond_pmd.c
>> > > > > >> index a6755661c..fea3bc537 100644
>> > > > > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> > > > > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> > > > > >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
>> > > > > >> *bonded_eth_dev,
>> > > > > >>
>> > > > > >> rte_flow_destroy(slave_eth_dev->data->port_id,
>> > > > > >>
>> > > > > >>                                      internals-
>> > > > > >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
>> > > > > >>                                      &flow_error);
>> > > > > >> +    }
>> > > > > >>   +  /* Start device */
>> > > > > >> +    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
>> > > > > >> +    if (errval != 0) {
>> > > > > >> +            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u,
>> > > > > >> + err
>> > > > > >> (%d)",
>> > > > > >> +                            slave_eth_dev->data->port_id,
>> > > > > >> errval);
>> > > > > >> +            return -1;
>> > > > > >> +    }
>> > > > > >> +
>> > > > > >> +    if (internals->mode == BONDING_MODE_8023AD &&
>> > > > > >> +
>> > > > > >> + internals->mode4.dedicated_queues.enabled
>> > > > > >> == 1)
>> > > > > >> + {
>> > > > > >>              errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
>> > > > > >>                              slave_eth_dev->data->port_id);
>> > > > > >>              if (errval != 0) {
>> > > > > >>                      RTE_BOND_LOG(ERR,
>> > > > > >>                              "bond_ethdev_8023ad_flow_set:
>> > > > > >> port=%d, err (%d)", slave_eth_dev->data->port_id, errval);
>> > > > > >> +
>> > > > > >> +                    errval =
>> > > > > >> rte_eth_dev_stop(slave_eth_dev->data->port_id);
>> > > > > >> +                    if (errval < 0) {
>> > > > > >> +                            RTE_BOND_LOG(ERR,
>> > > > > >> +                                    "rte_eth_dev_stop:
>> > > > > >> + port=%d,
>> > > > > >> err (%d)",
>> > > > > >> +
>> > > > > >> slave_eth_dev->data->port_id, errval);
>> > > > > >> +                    }
>> > > > > >>                      return errval;
>> > > > > >>              }
>> > > > > >>      }
>> > > > > >>   -  /* Start device */
>> > > > > >> -    errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
>> > > > > >> -    if (errval != 0) {
>> > > > > >> -            RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err
>> > > > > >> (%d)",
>> > > > > >> -                            slave_eth_dev->data->port_id,
>> > > > > >> errval);
>> > > > > >> -            return -1;
>> > > > > >> -    }
>> > > > > >> -
>> > > > > >>      /* If RSS is enabled for bonding, synchronize RETA */
>> > > > > >>      if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
>> > > > > >> ETH_MQ_RX_RSS) {
>> > > > > >>              int i;
>> > > > > >>
>> >
  
Andrew Rybchenko July 13, 2021, 9:26 a.m. UTC | #9
On 7/13/21 11:18 AM, Havlík Martin wrote:
> Dne 2021-07-12 15:07, Ori Kam napsal:
>> Hi Jan,
>>
>>> -----Original Message-----
>>> From: Jan Viktorin <viktorin@cesnet.cz>
>>> Sent: Monday, July 12, 2021 12:46 AM
>>>
>>> On Sun, 11 Jul 2021 08:49:18 +0000
>>> Ori Kam <orika@nvidia.com> wrote:
>>>
>>> > Hi Jan,
>>>
>>> Hi Ori,
>>>
>>> >
>>> >
>>> > > -----Original Message-----
>>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
>>> > > Sent: Wednesday, July 7, 2021 6:54 PM
>>> > >
>>> > > On Sun, 4 Jul 2021 15:18:01 +0000
>>> > > Matan Azrad <matan@nvidia.com> wrote:
>>> > >
>>> > > > From: Havlík Martin
>>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
>>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:
>>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to install
>>> > > > > >> RTE Flows if the underlying ethdev is not started:
>>> > > > > >> bond_ethdev_8023ad_flow_set(267) -
>>> > > bond_ethdev_8023ad_flow_set:
>>> > > > > port
>>> > > > > >> not started (slave_port=0 queue_id=1)
>>> > > > > >>
>>> > > > > > Why mlx5 PMD doing flow create relys on port started ?
>>> > > > > > I noticed other PMDs did not has that reliance.
>>> > > > > >
>>> > > > > After looking into it, I really can't answer this mlx5 centered
>>> > > > > question. Closest related info we found so far is the 5th point
>>> > > > > in https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
>>> > > > > but it only specifies it's the application's responsibility and
>>> > > > > that flow rules are assumed invalid after port
>>> > > > > stop/close/restart but doesn't say anything about <stop - flow
>>> > > > > rule create - start> vs <stop - start - flow rule create> where
>>> > > > > the former is the point of failure on mlx5. I'm addressing the
>>> > > > > maintainers for mlx5, who might know a bit more on the topic.
>>> > > >
>>> > >
>>> > > Hello Matan,
>>> > >
>>> > > > From rte_ethdev.h
>>> > > >
>>> > > > * Please note that some configuration is not stored between calls
>>> > > > to
>>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
>>> > > > configuration will
>>> > > >  * be retained:
>>> > > >  *
>>> > > >  *     - MTU
>>> > > >  *     - flow control settings
>>> > > >  *     - receive mode configuration (promiscuous mode,
>>> all-multicast
>>> > > > mode,
>>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
>>> > > >  *     - VLAN filtering configuration
>>> > > >  *     - default MAC address
>>> > > >  *     - MAC addresses supplied to MAC address array
>>> > > >  *     - flow director filtering mode (but not filtering rules)
>>> > > >  *     - NIC queue statistics mappings
>>> > >
>>> > > just after this section, you can find the following statement:
>>> > >
>>> > >  * Any other configuration will not be stored and will need to be
>>> > > re-entered
>>> > >  * before a call to rte_eth_dev_start().
>>> > >
>>> > > It is not very clear how is this exactly related to flows (and this
>>> > > applies for all the quoted section, I think) but at least it can
>>> be used as a
>>> counter argument.
>>> > >
>>> > I agree the doc is not clear, as I see it flows are not part of
>>> > configuration, at least not when we are talking about rte_flow.
>>>
>>> Agree.
>>>
>>> >
>>> > > >
>>> > > >
>>> > > > Mlx5 assumes flows are allowed to be configured only after
>>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is valid
>>> > > > anymore.
>>> > >
>>> > > I believe that this discussion is not about validity of flows. Let
>>> > > the flows be invalid after calling to rte_eth_dev_stop(). This is
>>> > > OK, flows must be recreated and the bonding driver works this way.
>>> > > But why not *before start*?
>>> > Think about it this way by changing the configuration you may create
>>> > invalid flows, for example, you can only change the number of queues
>>> > after port stop, so if you create a flow with jump to queue 3 and then
>>> > you remove queue 3 then, the flow that is cached is not valid anymore.
>>> > This goes for other configuration that may affect the validity of a
>>> flow.
>>>
>>> This is a valid argument, of course. The thing is whether it is a
>>> responsibility
>>> of the PMD to take care of those corner cases or if this is up to the
>>> application developer. If we respect the fact that calling to stop
>>> invalidates all
>>> flows then when you do:
>>>
>>>  > port stop 0
>>>  > flow create 0 ingress pattern ... / end actions queue 6 / end  >
>>> port config
>>> rxq 3  > port start 0
>>>
>>> it's clear that something is really wrong with the caller/user. I
>>> would say that
>>> this is an application bug. I would expect that you first reconfigure
>>> port and
>>> after that you modify flows. It seems quite logical and intuitive,
>>> doesn't it?
>>>
>>
>> I agree the use case I described is bad programming, but the same
>> logic can be
>> applied on stopping the port why flows should be deleted?
>> And even more important on some HW and some flows can only be inserted
>> after
>> start since there is no real traffic gate, basically the start means
>> adding the flows
>> if a flow is in the HW it is working, adding a gate will result in PPS
>> degradation.
>>
>>> Anyway, the PMD can possibly catch this rxq inconsistency but I can
>>> imagine
>>> that there are more complicated sitations than just changing count of
>>> queues. Any idea for a more complex real case?
>>>
>>
>> Some of the resource may be created only after start,
>> But I don't have a very good example now.
>>
>>> Martin Havlik, can you please check how ixgbe and i40e behave in the
>>> case
>>> above with "rxq change after flow created"?
>>>
> Both PMDs do not raise any errors in the mentioned case. The packets
> that should go to the non-existent queue are dropped at some point in
> the processing (from what I could see and understand).
> 
> I'm glad for the discussion that is taking place but I feel we have
> strayed from the original reason of it. That being the inability to
> enable dedicated queues for bonding mode 8023AD on mlx5. We have
> concluded that flow creation has to be done after port/device start. So
> moving the function call to create the flow after device start should
> solve the issue
> (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my
> testing, flow create after start is not a problem on Intel PMDs.

It would be good to hear what net/bonding maintainers think
about it. I see no conclusion.

If net/mlx5 behaviour is fixed, will it fix the initial
problem?

> 
> I'm turning to you, Connor, as you have made the original question on
> this. Is the patch I presented applicable?
> 
> Martin
>>> >
>>> > > Does somebody know how other drivers behaves in this situation? (We
>>> > > know and can check for Intel, there it does not seem to be an
>>> > > issue.)
>>> > >
>>> > > By the way, the mlx5 behaviour opens a (probably short) time window
>>> > > between starting of a port and configuation of filtering flows. You
>>> > > may want to start the port with thousands of flows that apply just
>>> > > when the port starts (not after, that's late). This may introduce
>>> > > glitches in filtering and measuring of traffic (well, it is a
>>> question how
>>> serious issue could it be...).
>>> > >
>>> > Agree but this is always true nothing is done is zero time and even if
>>> > it was the insertion is not in zero time, (assuming that even if the
>>> > flows are stored by the PMD until start they still will not all be
>>> > inserted in the same time)
>>>
>>> Sorry, I probably do not understand well. Yes, creation of a flow
>>> would never
>>> be zero time, agree. But if I create flows before the port starts,
>>> than I do not
>>> have to care too much about how long the creation takes. Because the
>>> card is
>>> expected to be configured already before pushing the "start button".
>>>
>>> Maybe, you assume that the created flows would first be cached inside
>>> the
>>> PMD and when the port is finally started, it than transparently write
>>> the
>>> queues to the HW. But this is not what I was talking about, that's
>>> maybe even
>>> worse because you are hiding such behaviour from users...
>>>
>>> (I do not know the exact mlx5 implementation details, so my answer can
>>> miss something, sorry for that.)
>>>
>>
>> You are correct this is HW/PMD implementation issue, in case of Nvidia
>> we must
>> cache the flows and only insert it after start.
>> Since RTE flow is generic and HW is not generic sometimes the PMD
>> needs to
>> translate this difference and in this gray area there is a game between
>> best API best performance understanding what the user really wants.
>> This is why the doc should be very clear on what is the expected.
>> but this is also the reason why such document is very hard to agree on.

Do I understand correctly that net/mlx5 maintainers will follow
up?

>>
>> Ori
>>> >
>>> > > This matters for the bonding case as well, doesn't it?. It is not
>>> > > desirable to accidently omit a packet that was received by primary
>>> > > ingress logic instead of being redirected into the dedicated queue.
>>> > >
>>> > > Are there any chances that for mlx5 it would be possible to insert
>>> > > flow rules before calling rte_eth_dev_start? Anyway, the behaviour
>>> > > should be specified and documented in DPDK more precisely to avoid
>>> > > such uncertainty in the future.
>>> > >
>>> > I agree the documentation should be fixed.
>>>
>>> +1

Cc Thomas and Ferruh since ethdev documentation should be
clarified.

It looks like there is no consensus if the patch is a right
direction or wrong. For me it looks wrong taking all above
arguments in to account (mainly necessity to be able to
insert flows before pushing start button which enables the
traffic if HW supports it).

So, I'm applying first two patches and hold on this one.
  
Jan Viktorin July 13, 2021, 11:06 a.m. UTC | #10
On Tue, 13 Jul 2021 12:26:35 +0300
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:

> On 7/13/21 11:18 AM, Havlík Martin wrote:
> > Dne 2021-07-12 15:07, Ori Kam napsal:  
> >> Hi Jan,
> >>  
> >>> -----Original Message-----
> >>> From: Jan Viktorin <viktorin@cesnet.cz>
> >>> Sent: Monday, July 12, 2021 12:46 AM
> >>>
> >>> On Sun, 11 Jul 2021 08:49:18 +0000
> >>> Ori Kam <orika@nvidia.com> wrote:
> >>>  
> >>> > Hi Jan,  
> >>>
> >>> Hi Ori,
> >>>  
> >>> >
> >>> >  
> >>> > > -----Original Message-----
> >>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> >>> > > Sent: Wednesday, July 7, 2021 6:54 PM
> >>> > >
> >>> > > On Sun, 4 Jul 2021 15:18:01 +0000
> >>> > > Matan Azrad <matan@nvidia.com> wrote:
> >>> > >  
> >>> > > > From: Havlík Martin  
> >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:  
> >>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:  
> >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to
> >>> > > > > >> install RTE Flows if the underlying ethdev is not
> >>> > > > > >> started: bond_ethdev_8023ad_flow_set(267) -  
> >>> > > bond_ethdev_8023ad_flow_set:  
> >>> > > > > port  
> >>> > > > > >> not started (slave_port=0 queue_id=1)
> >>> > > > > >>  
> >>> > > > > > Why mlx5 PMD doing flow create relys on port started ?
> >>> > > > > > I noticed other PMDs did not has that reliance.
> >>> > > > > >  
> >>> > > > > After looking into it, I really can't answer this mlx5
> >>> > > > > centered question. Closest related info we found so far
> >>> > > > > is the 5th point in
> >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> >>> > > > > but it only specifies it's the application's
> >>> > > > > responsibility and that flow rules are assumed invalid
> >>> > > > > after port stop/close/restart but doesn't say anything
> >>> > > > > about <stop - flow rule create - start> vs <stop - start
> >>> > > > > - flow rule create> where the former is the point of
> >>> > > > > failure on mlx5. I'm addressing the maintainers for mlx5,
> >>> > > > > who might know a bit more on the topic.  
> >>> > > >  
> >>> > >
> >>> > > Hello Matan,
> >>> > >  
> >>> > > > From rte_ethdev.h
> >>> > > >
> >>> > > > * Please note that some configuration is not stored between
> >>> > > > calls to
> >>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> >>> > > > configuration will
> >>> > > >  * be retained:
> >>> > > >  *
> >>> > > >  *     - MTU
> >>> > > >  *     - flow control settings
> >>> > > >  *     - receive mode configuration (promiscuous mode,  
> >>> all-multicast  
> >>> > > > mode,
> >>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> >>> > > >  *     - VLAN filtering configuration
> >>> > > >  *     - default MAC address
> >>> > > >  *     - MAC addresses supplied to MAC address array
> >>> > > >  *     - flow director filtering mode (but not filtering
> >>> > > >rules)
> >>> > > >  *     - NIC queue statistics mappings  
> >>> > >
> >>> > > just after this section, you can find the following statement:
> >>> > >
> >>> > >  * Any other configuration will not be stored and will need
> >>> > >to be
> >>> > > re-entered
> >>> > >  * before a call to rte_eth_dev_start().
> >>> > >
> >>> > > It is not very clear how is this exactly related to flows
> >>> > > (and this applies for all the quoted section, I think) but at
> >>> > > least it can  
> >>> be used as a
> >>> counter argument.  
> >>> > >  
> >>> > I agree the doc is not clear, as I see it flows are not part of
> >>> > configuration, at least not when we are talking about rte_flow.
> >>> >  
> >>>
> >>> Agree.
> >>>  
> >>> >  
> >>> > > >
> >>> > > >
> >>> > > > Mlx5 assumes flows are allowed to be configured only after
> >>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is
> >>> > > > valid anymore.  
> >>> > >
> >>> > > I believe that this discussion is not about validity of
> >>> > > flows. Let the flows be invalid after calling to
> >>> > > rte_eth_dev_stop(). This is OK, flows must be recreated and
> >>> > > the bonding driver works this way. But why not *before
> >>> > > start*?  
> >>> > Think about it this way by changing the configuration you may
> >>> > create invalid flows, for example, you can only change the
> >>> > number of queues after port stop, so if you create a flow with
> >>> > jump to queue 3 and then you remove queue 3 then, the flow that
> >>> > is cached is not valid anymore. This goes for other
> >>> > configuration that may affect the validity of a  
> >>> flow.
> >>>
> >>> This is a valid argument, of course. The thing is whether it is a
> >>> responsibility
> >>> of the PMD to take care of those corner cases or if this is up to
> >>> the application developer. If we respect the fact that calling to
> >>> stop invalidates all
> >>> flows then when you do:
> >>>  
> >>>  > port stop 0
> >>>  > flow create 0 ingress pattern ... / end actions queue 6 / end
> >>> > port config
> >>> rxq 3  > port start 0
> >>>
> >>> it's clear that something is really wrong with the caller/user. I
> >>> would say that
> >>> this is an application bug. I would expect that you first
> >>> reconfigure port and
> >>> after that you modify flows. It seems quite logical and intuitive,
> >>> doesn't it?
> >>>  
> >>
> >> I agree the use case I described is bad programming, but the same
> >> logic can be
> >> applied on stopping the port why flows should be deleted?

I just took into account what Matan has written. Certainly, it is not
necessary to delete flows on stop.

> >> And even more important on some HW and some flows can only be
> >> inserted after
> >> start since there is no real traffic gate, basically the start
> >> means adding the flows
> >> if a flow is in the HW it is working, adding a gate will result in
> >> PPS degradation.

What do you mean by term "gate", here? Some FPGA logic? Or something
else? Why "adding a gate" would result in PPS degradation? Can you
make those ideas a bit clearer?

> >>  
> >>> Anyway, the PMD can possibly catch this rxq inconsistency but I
> >>> can imagine
> >>> that there are more complicated sitations than just changing
> >>> count of queues. Any idea for a more complex real case?
> >>>  
> >>
> >> Some of the resource may be created only after start,
> >> But I don't have a very good example now.

Please, try to find some reasonable example otherwise your argumentation
does not make much sense...

> >>  
> >>> Martin Havlik, can you please check how ixgbe and i40e behave in
> >>> the case
> >>> above with "rxq change after flow created"?
> >>>  
> > Both PMDs do not raise any errors in the mentioned case. The packets
> > that should go to the non-existent queue are dropped at some point
> > in the processing (from what I could see and understand).

Thanks for clarification. It would be nice if any other PMD can be
tested for this behaviour. Can sombody help us with other NICs?

> > 
> > I'm glad for the discussion that is taking place but I feel we have
> > strayed from the original reason of it. That being the inability to
> > enable dedicated queues for bonding mode 8023AD on mlx5. We have
> > concluded that flow creation has to be done after port/device
> > start. So moving the function call to create the flow after device
> > start should solve the issue
> > (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my
> > testing, flow create after start is not a problem on Intel PMDs.  
> 
> It would be good to hear what net/bonding maintainers think
> about it. I see no conclusion.
> 
> If net/mlx5 behaviour is fixed, will it fix the initial
> problem?

I believe that if mlx5 allows to create flows before calling to
rte_eth_dev_start() then the bonding would work as expected.

> 
> > 
> > I'm turning to you, Connor, as you have made the original question
> > on this. Is the patch I presented applicable?
> > 
> > Martin  
> >>> >  
> >>> > > Does somebody know how other drivers behaves in this
> >>> > > situation? (We know and can check for Intel, there it does
> >>> > > not seem to be an issue.)
> >>> > >
> >>> > > By the way, the mlx5 behaviour opens a (probably short) time
> >>> > > window between starting of a port and configuation of
> >>> > > filtering flows. You may want to start the port with
> >>> > > thousands of flows that apply just when the port starts (not
> >>> > > after, that's late). This may introduce glitches in filtering
> >>> > > and measuring of traffic (well, it is a  
> >>> question how
> >>> serious issue could it be...).  
> >>> > >  
> >>> > Agree but this is always true nothing is done is zero time and
> >>> > even if it was the insertion is not in zero time, (assuming
> >>> > that even if the flows are stored by the PMD until start they
> >>> > still will not all be inserted in the same time)  
> >>>
> >>> Sorry, I probably do not understand well. Yes, creation of a flow
> >>> would never
> >>> be zero time, agree. But if I create flows before the port starts,
> >>> than I do not
> >>> have to care too much about how long the creation takes. Because
> >>> the card is
> >>> expected to be configured already before pushing the "start
> >>> button".
> >>>
> >>> Maybe, you assume that the created flows would first be cached
> >>> inside the
> >>> PMD and when the port is finally started, it than transparently
> >>> write the
> >>> queues to the HW. But this is not what I was talking about, that's
> >>> maybe even
> >>> worse because you are hiding such behaviour from users...
> >>>
> >>> (I do not know the exact mlx5 implementation details, so my
> >>> answer can miss something, sorry for that.)
> >>>  
> >>
> >> You are correct this is HW/PMD implementation issue, in case of
> >> Nvidia we must

By "must" you probably mean "would have to"? Is that correct?

> >> cache the flows and only insert it after start.
> >> Since RTE flow is generic and HW is not generic sometimes the PMD
> >> needs to
> >> translate this difference and in this gray area there is a game
> >> between best API best performance understanding what the user

Yes, as a former FPGA dev, I can understand your concerns. RTE Flow is
a difficult beast.

Jan

> >> really wants. This is why the doc should be very clear on what is
> >> the expected. but this is also the reason why such document is
> >> very hard to agree on.  
> 
> Do I understand correctly that net/mlx5 maintainers will follow
> up?
> 
> >>
> >> Ori  
> >>> >  
> >>> > > This matters for the bonding case as well, doesn't it?. It is
> >>> > > not desirable to accidently omit a packet that was received
> >>> > > by primary ingress logic instead of being redirected into the
> >>> > > dedicated queue.
> >>> > >
> >>> > > Are there any chances that for mlx5 it would be possible to
> >>> > > insert flow rules before calling rte_eth_dev_start? Anyway,
> >>> > > the behaviour should be specified and documented in DPDK more
> >>> > > precisely to avoid such uncertainty in the future.
> >>> > >  
> >>> > I agree the documentation should be fixed.  
> >>>
> >>> +1  
> 
> Cc Thomas and Ferruh since ethdev documentation should be
> clarified.
> 
> It looks like there is no consensus if the patch is a right
> direction or wrong. For me it looks wrong taking all above
> arguments in to account (mainly necessity to be able to
> insert flows before pushing start button which enables the
> traffic if HW supports it).
> 
> So, I'm applying first two patches and hold on this one.
  
Ori Kam July 13, 2021, 5:17 p.m. UTC | #11
Hi Jan,

> -----Original Message-----
> From: Jan Viktorin <viktorin@cesnet.cz>
> Sent: Tuesday, July 13, 2021 2:06 PM
> 
> On Tue, 13 Jul 2021 12:26:35 +0300
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> 
> > On 7/13/21 11:18 AM, Havlík Martin wrote:
> > > Dne 2021-07-12 15:07, Ori Kam napsal:
> > >> Hi Jan,
> > >>
> > >>> -----Original Message-----
> > >>> From: Jan Viktorin <viktorin@cesnet.cz>
> > >>> Sent: Monday, July 12, 2021 12:46 AM
> > >>>
> > >>> On Sun, 11 Jul 2021 08:49:18 +0000 Ori Kam <orika@nvidia.com>
> > >>> wrote:
> > >>>
> > >>> > Hi Jan,
> > >>>
> > >>> Hi Ori,
> > >>>
> > >>> >
> > >>> >
> > >>> > > -----Original Message-----
> > >>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> > >>> > > Sent: Wednesday, July 7, 2021 6:54 PM
> > >>> > >
> > >>> > > On Sun, 4 Jul 2021 15:18:01 +0000 Matan Azrad
> > >>> > > <matan@nvidia.com> wrote:
> > >>> > >
> > >>> > > > From: Havlík Martin
> > >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
> > >>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:
> > >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to
> > >>> > > > > >> install RTE Flows if the underlying ethdev is not
> > >>> > > > > >> started: bond_ethdev_8023ad_flow_set(267) -
> > >>> > > bond_ethdev_8023ad_flow_set:
> > >>> > > > > port
> > >>> > > > > >> not started (slave_port=0 queue_id=1)
> > >>> > > > > >>
> > >>> > > > > > Why mlx5 PMD doing flow create relys on port started ?
> > >>> > > > > > I noticed other PMDs did not has that reliance.
> > >>> > > > > >
> > >>> > > > > After looking into it, I really can't answer this mlx5
> > >>> > > > > centered question. Closest related info we found so far is
> > >>> > > > > the 5th point in
> > >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#cavea
> > >>> > > > > ts but it only specifies it's the application's
> > >>> > > > > responsibility and that flow rules are assumed invalid
> > >>> > > > > after port stop/close/restart but doesn't say anything
> > >>> > > > > about <stop - flow rule create - start> vs <stop - start
> > >>> > > > > - flow rule create> where the former is the point of
> > >>> > > > > failure on mlx5. I'm addressing the maintainers for mlx5,
> > >>> > > > > who might know a bit more on the topic.
> > >>> > > >
> > >>> > >
> > >>> > > Hello Matan,
> > >>> > >
> > >>> > > > From rte_ethdev.h
> > >>> > > >
> > >>> > > > * Please note that some configuration is not stored between
> > >>> > > >calls to
> > >>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> > >>> > > >configuration will
> > >>> > > >  * be retained:
> > >>> > > >  *
> > >>> > > >  *     - MTU
> > >>> > > >  *     - flow control settings
> > >>> > > >  *     - receive mode configuration (promiscuous mode,
> > >>> all-multicast
> > >>> > > > mode,
> > >>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> > >>> > > >  *     - VLAN filtering configuration
> > >>> > > >  *     - default MAC address
> > >>> > > >  *     - MAC addresses supplied to MAC address array
> > >>> > > >  *     - flow director filtering mode (but not filtering
> > >>> > > >rules)
> > >>> > > >  *     - NIC queue statistics mappings
> > >>> > >
> > >>> > > just after this section, you can find the following statement:
> > >>> > >
> > >>> > >  * Any other configuration will not be stored and will need to
> > >>> > >be  re-entered
> > >>> > >  * before a call to rte_eth_dev_start().
> > >>> > >
> > >>> > > It is not very clear how is this exactly related to flows (and
> > >>> > > this applies for all the quoted section, I think) but at least
> > >>> > > it can
> > >>> be used as a
> > >>> counter argument.
> > >>> > >
> > >>> > I agree the doc is not clear, as I see it flows are not part of
> > >>> > configuration, at least not when we are talking about rte_flow.
> > >>> >
> > >>>
> > >>> Agree.
> > >>>
> > >>> >
> > >>> > > >
> > >>> > > >
> > >>> > > > Mlx5 assumes flows are allowed to be configured only after
> > >>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is
> > >>> > > > valid anymore.
> > >>> > >
> > >>> > > I believe that this discussion is not about validity of flows.
> > >>> > > Let the flows be invalid after calling to rte_eth_dev_stop().
> > >>> > > This is OK, flows must be recreated and the bonding driver
> > >>> > > works this way. But why not *before start*?
> > >>> > Think about it this way by changing the configuration you may
> > >>> > create invalid flows, for example, you can only change the
> > >>> > number of queues after port stop, so if you create a flow with
> > >>> > jump to queue 3 and then you remove queue 3 then, the flow that
> > >>> > is cached is not valid anymore. This goes for other
> > >>> > configuration that may affect the validity of a
> > >>> flow.
> > >>>
> > >>> This is a valid argument, of course. The thing is whether it is a
> > >>> responsibility of the PMD to take care of those corner cases or if
> > >>> this is up to the application developer. If we respect the fact
> > >>> that calling to stop invalidates all flows then when you do:
> > >>>
> > >>>  > port stop 0
> > >>>  > flow create 0 ingress pattern ... / end actions queue 6 / end
> > >>> > port config
> > >>> rxq 3  > port start 0
> > >>>
> > >>> it's clear that something is really wrong with the caller/user. I
> > >>> would say that this is an application bug. I would expect that you
> > >>> first reconfigure port and after that you modify flows. It seems
> > >>> quite logical and intuitive, doesn't it?
> > >>>
> > >>
> > >> I agree the use case I described is bad programming, but the same
> > >> logic can be applied on stopping the port why flows should be
> > >> deleted?
> 
> I just took into account what Matan has written. Certainly, it is not necessary
> to delete flows on stop.
> 

According to documentation it is, since flows are not maintained after stop.

> > >> And even more important on some HW and some flows can only be
> > >> inserted after start since there is no real traffic gate, basically
> > >> the start means adding the flows if a flow is in the HW it is
> > >> working, adding a gate will result in PPS degradation.
> 
> What do you mean by term "gate", here? Some FPGA logic? Or something
> else? Why "adding a gate" would result in PPS degradation? Can you make
> those ideas a bit clearer?
> 

Sure, I will try,
From Nvidia NIC point of view, the NIC always get traffic there is no "gate" meaning
starting the device is not just some bit that enables traffic, traffic is always there
there are number of reasons for this, (mainly the fact that Nvidia shares the device
with the kernel so any packet that the DPDK did take will be routed to the kernel)
So starting traffic means adding the flows that will direct the traffic to the queues.
Even if it was valid to add a "gate" rule it will mean that there will be an extra rule
which may lead to PPS degradation. 

Is that clearer?

> > >>
> > >>> Anyway, the PMD can possibly catch this rxq inconsistency but I
> > >>> can imagine that there are more complicated sitations than just
> > >>> changing count of queues. Any idea for a more complex real case?
> > >>>
> > >>
> > >> Some of the resource may be created only after start, But I don't
> > >> have a very good example now.
> 
> Please, try to find some reasonable example otherwise your argumentation
> does not make much sense...
> 

Agree let's leave it aside for now.

> > >>
> > >>> Martin Havlik, can you please check how ixgbe and i40e behave in
> > >>> the case above with "rxq change after flow created"?
> > >>>
> > > Both PMDs do not raise any errors in the mentioned case. The packets
> > > that should go to the non-existent queue are dropped at some point
> > > in the processing (from what I could see and understand).
> 
> Thanks for clarification. It would be nice if any other PMD can be tested for
> this behaviour. Can sombody help us with other NICs?
> 
> > >
> > > I'm glad for the discussion that is taking place but I feel we have
> > > strayed from the original reason of it. That being the inability to
> > > enable dedicated queues for bonding mode 8023AD on mlx5. We have
> > > concluded that flow creation has to be done after port/device start.
> > > So moving the function call to create the flow after device start
> > > should solve the issue
> > > (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my
> > > testing, flow create after start is not a problem on Intel PMDs.
> >
> > It would be good to hear what net/bonding maintainers think about it.
> > I see no conclusion.
> >
> > If net/mlx5 behaviour is fixed, will it fix the initial problem?
> 
> I believe that if mlx5 allows to create flows before calling to
> rte_eth_dev_start() then the bonding would work as expected.
> 
There are other solutions like enabling the mlx5 pmd in isolate mode.
This will require the bond device to just create the rule manually after start.

> >
> > >
> > > I'm turning to you, Connor, as you have made the original question
> > > on this. Is the patch I presented applicable?
> > >
> > > Martin
> > >>> >
> > >>> > > Does somebody know how other drivers behaves in this
> > >>> > > situation? (We know and can check for Intel, there it does not
> > >>> > > seem to be an issue.)
> > >>> > >
> > >>> > > By the way, the mlx5 behaviour opens a (probably short) time
> > >>> > > window between starting of a port and configuation of
> > >>> > > filtering flows. You may want to start the port with thousands
> > >>> > > of flows that apply just when the port starts (not after,
> > >>> > > that's late). This may introduce glitches in filtering and
> > >>> > > measuring of traffic (well, it is a
> > >>> question how
> > >>> serious issue could it be...).
> > >>> > >
> > >>> > Agree but this is always true nothing is done is zero time and
> > >>> > even if it was the insertion is not in zero time, (assuming that
> > >>> > even if the flows are stored by the PMD until start they still
> > >>> > will not all be inserted in the same time)
> > >>>
> > >>> Sorry, I probably do not understand well. Yes, creation of a flow
> > >>> would never be zero time, agree. But if I create flows before the
> > >>> port starts, than I do not have to care too much about how long
> > >>> the creation takes. Because the card is expected to be configured
> > >>> already before pushing the "start button".
> > >>>
> > >>> Maybe, you assume that the created flows would first be cached
> > >>> inside the PMD and when the port is finally started, it than
> > >>> transparently write the queues to the HW. But this is not what I
> > >>> was talking about, that's maybe even worse because you are hiding
> > >>> such behaviour from users...
> > >>>
> > >>> (I do not know the exact mlx5 implementation details, so my answer
> > >>> can miss something, sorry for that.)
> > >>>
> > >>
> > >> You are correct this is HW/PMD implementation issue, in case of
> > >> Nvidia we must
> 
> By "must" you probably mean "would have to"? Is that correct?
> 
Sorry for my English but I'm not sure what is the difference,
In any case what I meant is that currently we will need to cache the flows and only
apply them after start.

> > >> cache the flows and only insert it after start.
> > >> Since RTE flow is generic and HW is not generic sometimes the PMD
> > >> needs to translate this difference and in this gray area there is a
> > >> game between best API best performance understanding what the user
> 
> Yes, as a former FPGA dev, I can understand your concerns. RTE Flow is a
> difficult beast.
> 
+1 
> Jan
> 
> > >> really wants. This is why the doc should be very clear on what is
> > >> the expected. but this is also the reason why such document is very
> > >> hard to agree on.
> >
> > Do I understand correctly that net/mlx5 maintainers will follow up?

Yes, I will follow up on this.

> >
> > >>
> > >> Ori
> > >>> >
> > >>> > > This matters for the bonding case as well, doesn't it?. It is
> > >>> > > not desirable to accidently omit a packet that was received by
> > >>> > > primary ingress logic instead of being redirected into the
> > >>> > > dedicated queue.
> > >>> > >
> > >>> > > Are there any chances that for mlx5 it would be possible to
> > >>> > > insert flow rules before calling rte_eth_dev_start? Anyway,
> > >>> > > the behaviour should be specified and documented in DPDK more
> > >>> > > precisely to avoid such uncertainty in the future.
> > >>> > >
> > >>> > I agree the documentation should be fixed.
> > >>>
> > >>> +1
> >
> > Cc Thomas and Ferruh since ethdev documentation should be clarified.
> >
> > It looks like there is no consensus if the patch is a right direction
> > or wrong. For me it looks wrong taking all above arguments in to
> > account (mainly necessity to be able to insert flows before pushing
> > start button which enables the traffic if HW supports it).
> >
> > So, I'm applying first two patches and hold on this one.
  
Jan Viktorin July 14, 2021, 3 p.m. UTC | #12
On Tue, 13 Jul 2021 17:17:51 +0000
Ori Kam <orika@nvidia.com> wrote:

> Hi Jan,
> 
> > -----Original Message-----
> > From: Jan Viktorin <viktorin@cesnet.cz>
> > Sent: Tuesday, July 13, 2021 2:06 PM
> > 
> > On Tue, 13 Jul 2021 12:26:35 +0300
> > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> >   
> > > On 7/13/21 11:18 AM, Havlík Martin wrote:  
> > > > Dne 2021-07-12 15:07, Ori Kam napsal:  
> > > >> Hi Jan,
> > > >>  
> > > >>> -----Original Message-----
> > > >>> From: Jan Viktorin <viktorin@cesnet.cz>
> > > >>> Sent: Monday, July 12, 2021 12:46 AM
> > > >>>
> > > >>> On Sun, 11 Jul 2021 08:49:18 +0000 Ori Kam <orika@nvidia.com>
> > > >>> wrote:
> > > >>>  
> > > >>> > Hi Jan,  
> > > >>>
> > > >>> Hi Ori,
> > > >>>  
> > > >>> >
> > > >>> >  
> > > >>> > > -----Original Message-----
> > > >>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> > > >>> > > Sent: Wednesday, July 7, 2021 6:54 PM
> > > >>> > >
> > > >>> > > On Sun, 4 Jul 2021 15:18:01 +0000 Matan Azrad
> > > >>> > > <matan@nvidia.com> wrote:
> > > >>> > >  
> > > >>> > > > From: Havlík Martin  
> > > >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:  
> > > >>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:  
> > > >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails
> > > >>> > > > > >> to install RTE Flows if the underlying ethdev is
> > > >>> > > > > >> not started: bond_ethdev_8023ad_flow_set(267) -  
> > > >>> > > bond_ethdev_8023ad_flow_set:  
> > > >>> > > > > port  
> > > >>> > > > > >> not started (slave_port=0 queue_id=1)
> > > >>> > > > > >>  
> > > >>> > > > > > Why mlx5 PMD doing flow create relys on port
> > > >>> > > > > > started ? I noticed other PMDs did not has that
> > > >>> > > > > > reliance. 
> > > >>> > > > > After looking into it, I really can't answer this mlx5
> > > >>> > > > > centered question. Closest related info we found so
> > > >>> > > > > far is the 5th point in
> > > >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#cavea
> > > >>> > > > > ts but it only specifies it's the application's
> > > >>> > > > > responsibility and that flow rules are assumed invalid
> > > >>> > > > > after port stop/close/restart but doesn't say anything
> > > >>> > > > > about <stop - flow rule create - start> vs <stop -
> > > >>> > > > > start
> > > >>> > > > > - flow rule create> where the former is the point of
> > > >>> > > > > failure on mlx5. I'm addressing the maintainers for
> > > >>> > > > > mlx5, who might know a bit more on the topic.  
> > > >>> > > >  
> > > >>> > >
> > > >>> > > Hello Matan,
> > > >>> > >  
> > > >>> > > > From rte_ethdev.h
> > > >>> > > >
> > > >>> > > > * Please note that some configuration is not stored
> > > >>> > > > between
> > > >>> > > >calls to
> > > >>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> > > >>> > > >configuration will
> > > >>> > > >  * be retained:
> > > >>> > > >  *
> > > >>> > > >  *     - MTU
> > > >>> > > >  *     - flow control settings
> > > >>> > > >  *     - receive mode configuration (promiscuous mode,  
> > > >>> all-multicast  
> > > >>> > > > mode,
> > > >>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> > > >>> > > >  *     - VLAN filtering configuration
> > > >>> > > >  *     - default MAC address
> > > >>> > > >  *     - MAC addresses supplied to MAC address array
> > > >>> > > >  *     - flow director filtering mode (but not filtering
> > > >>> > > >rules)
> > > >>> > > >  *     - NIC queue statistics mappings  
> > > >>> > >
> > > >>> > > just after this section, you can find the following
> > > >>> > > statement:
> > > >>> > >
> > > >>> > >  * Any other configuration will not be stored and will
> > > >>> > >need to be  re-entered
> > > >>> > >  * before a call to rte_eth_dev_start().
> > > >>> > >
> > > >>> > > It is not very clear how is this exactly related to flows
> > > >>> > > (and this applies for all the quoted section, I think)
> > > >>> > > but at least it can  
> > > >>> be used as a
> > > >>> counter argument.  
> > > >>> > >  
> > > >>> > I agree the doc is not clear, as I see it flows are not
> > > >>> > part of configuration, at least not when we are talking
> > > >>> > about rte_flow. 
> > > >>>
> > > >>> Agree.
> > > >>>  
> > > >>> >  
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > Mlx5 assumes flows are allowed to be configured only
> > > >>> > > > after rte_eth_dev_start(). Before start \ after stop -
> > > >>> > > > no flow is valid anymore.  
> > > >>> > >
> > > >>> > > I believe that this discussion is not about validity of
> > > >>> > > flows. Let the flows be invalid after calling to
> > > >>> > > rte_eth_dev_stop(). This is OK, flows must be recreated
> > > >>> > > and the bonding driver works this way. But why not
> > > >>> > > *before start*?  
> > > >>> > Think about it this way by changing the configuration you
> > > >>> > may create invalid flows, for example, you can only change
> > > >>> > the number of queues after port stop, so if you create a
> > > >>> > flow with jump to queue 3 and then you remove queue 3 then,
> > > >>> > the flow that is cached is not valid anymore. This goes for
> > > >>> > other configuration that may affect the validity of a  
> > > >>> flow.
> > > >>>
> > > >>> This is a valid argument, of course. The thing is whether it
> > > >>> is a responsibility of the PMD to take care of those corner
> > > >>> cases or if this is up to the application developer. If we
> > > >>> respect the fact that calling to stop invalidates all flows
> > > >>> then when you do: 
> > > >>>  > port stop 0
> > > >>>  > flow create 0 ingress pattern ... / end actions queue 6 /
> > > >>> end
> > > >>> > port config  
> > > >>> rxq 3  > port start 0
> > > >>>
> > > >>> it's clear that something is really wrong with the
> > > >>> caller/user. I would say that this is an application bug. I
> > > >>> would expect that you first reconfigure port and after that
> > > >>> you modify flows. It seems quite logical and intuitive,
> > > >>> doesn't it? 
> > > >>
> > > >> I agree the use case I described is bad programming, but the
> > > >> same logic can be applied on stopping the port why flows
> > > >> should be deleted?  
> > 
> > I just took into account what Matan has written. Certainly, it is
> > not necessary to delete flows on stop.
> >   
> 
> According to documentation it is, since flows are not maintained
> after stop.

So why did you ask me? :) I didn't study port-stop behaviour at all,
so it was just my general opinion. Please, leave some reference to
the documentation you are talking about.

> 
> > > >> And even more important on some HW and some flows can only be
> > > >> inserted after start since there is no real traffic gate,
> > > >> basically the start means adding the flows if a flow is in the
> > > >> HW it is working, adding a gate will result in PPS
> > > >> degradation.  
> > 
> > What do you mean by term "gate", here? Some FPGA logic? Or something
> > else? Why "adding a gate" would result in PPS degradation? Can you
> > make those ideas a bit clearer?
> >   
> 
> Sure, I will try,
> From Nvidia NIC point of view, the NIC always get traffic there is no
> "gate" meaning starting the device is not just some bit that enables
> traffic, traffic is always there there are number of reasons for
> this, (mainly the fact that Nvidia shares the device with the kernel
> so any packet that the DPDK did take will be routed to the kernel) So
> starting traffic means adding the flows that will direct the traffic
> to the queues. Even if it was valid to add a "gate" rule it will mean
> that there will be an extra rule which may lead to PPS degradation. 
> 
> Is that clearer?

Not really. You still didn't explain the term "gate".

Another think, you say "starting traffic means adding the flows that
will direct the traffic to the queues". Does the term "flows" refer to
RTE Flows? Do you mean user-inserted RTE Flows? Or are those some
internal mlx-specific configuration flows? I consider RTE Flow as a
construct created by the user of DPDK either directly or indirectly
via some DPDK library (like bonding does it). I can work with such flows
via the DPDK RTE Flow API. The "flows" you are talking about, what
do you exactly mean by that?

I did not get why there should be any extra rule. But if so, how
does it relate to PPS degradation? Where can I see such PPS degradation?
In the kernel? The DPDK application is not active yet, so I don't see
any reasons for any PPS degradation in the DPDK app. Maybe, I am
missing something...

Probably, you expect that all packets are delivered both to the kernel
and to the DPDK application (that is however not started yet) and when
DPDK app starts, it can (temporarily?) affect performace seen in the
kernel. Is it so?

Another idea I have is that inserting an "extra rule" would mean some
added latency in the processing path, but this is not PPS
degradation... What am I missing here?

How significant would such PPS degradation be?

> 
> > > >>  
> > > >>> Anyway, the PMD can possibly catch this rxq inconsistency but
> > > >>> I can imagine that there are more complicated sitations than
> > > >>> just changing count of queues. Any idea for a more complex
> > > >>> real case? 
> > > >>
> > > >> Some of the resource may be created only after start, But I
> > > >> don't have a very good example now.  
> > 
> > Please, try to find some reasonable example otherwise your
> > argumentation does not make much sense...
> >   
> 
> Agree let's leave it aside for now.
> 
> > > >>  
> > > >>> Martin Havlik, can you please check how ixgbe and i40e behave
> > > >>> in the case above with "rxq change after flow created"?
> > > >>>  
> > > > Both PMDs do not raise any errors in the mentioned case. The
> > > > packets that should go to the non-existent queue are dropped at
> > > > some point in the processing (from what I could see and
> > > > understand).  
> > 
> > Thanks for clarification. It would be nice if any other PMD can be
> > tested for this behaviour. Can sombody help us with other NICs?
> >   
> > > >
> > > > I'm glad for the discussion that is taking place but I feel we
> > > > have strayed from the original reason of it. That being the
> > > > inability to enable dedicated queues for bonding mode 8023AD on
> > > > mlx5. We have concluded that flow creation has to be done after
> > > > port/device start. So moving the function call to create the
> > > > flow after device start should solve the issue
> > > > (https://mails.dpdk.org/archives/dev/2021-June/212210.html).
> > > > From my testing, flow create after start is not a problem on
> > > > Intel PMDs.  
> > >
> > > It would be good to hear what net/bonding maintainers think about
> > > it. I see no conclusion.
> > >
> > > If net/mlx5 behaviour is fixed, will it fix the initial problem?  
> > 
> > I believe that if mlx5 allows to create flows before calling to
> > rte_eth_dev_start() then the bonding would work as expected.
> >   
> There are other solutions like enabling the mlx5 pmd in isolate mode.
> This will require the bond device to just create the rule manually
> after start.

That is a kind of solution, thanks for this idea. But sorry, I consider
it more as a workaround. By the way, it does not solve the issue of
inserting thousands of flows atomically upon start of the port.

> 
> > >  
> > > >
> > > > I'm turning to you, Connor, as you have made the original
> > > > question on this. Is the patch I presented applicable?
> > > >
> > > > Martin  
> > > >>> >  
> > > >>> > > Does somebody know how other drivers behaves in this
> > > >>> > > situation? (We know and can check for Intel, there it
> > > >>> > > does not seem to be an issue.)
> > > >>> > >
> > > >>> > > By the way, the mlx5 behaviour opens a (probably short)
> > > >>> > > time window between starting of a port and configuation of
> > > >>> > > filtering flows. You may want to start the port with
> > > >>> > > thousands of flows that apply just when the port starts
> > > >>> > > (not after, that's late). This may introduce glitches in
> > > >>> > > filtering and measuring of traffic (well, it is a  
> > > >>> question how
> > > >>> serious issue could it be...).  
> > > >>> > >  
> > > >>> > Agree but this is always true nothing is done is zero time
> > > >>> > and even if it was the insertion is not in zero time,
> > > >>> > (assuming that even if the flows are stored by the PMD
> > > >>> > until start they still will not all be inserted in the same
> > > >>> > time)  
> > > >>>
> > > >>> Sorry, I probably do not understand well. Yes, creation of a
> > > >>> flow would never be zero time, agree. But if I create flows
> > > >>> before the port starts, than I do not have to care too much
> > > >>> about how long the creation takes. Because the card is
> > > >>> expected to be configured already before pushing the "start
> > > >>> button".
> > > >>>
> > > >>> Maybe, you assume that the created flows would first be cached
> > > >>> inside the PMD and when the port is finally started, it than
> > > >>> transparently write the queues to the HW. But this is not
> > > >>> what I was talking about, that's maybe even worse because you
> > > >>> are hiding such behaviour from users...
> > > >>>
> > > >>> (I do not know the exact mlx5 implementation details, so my
> > > >>> answer can miss something, sorry for that.)
> > > >>>  
> > > >>
> > > >> You are correct this is HW/PMD implementation issue, in case of
> > > >> Nvidia we must  
> > 
> > By "must" you probably mean "would have to"? Is that correct?
> >   
> Sorry for my English but I'm not sure what is the difference,

Google for it, otherwise this discussion becomes really very
complicated.

> In any case what I meant is that currently we will need to cache the
> flows and only apply them after start.

So, does it mean that you certainly *will* work on that?

> 
> > > >> cache the flows and only insert it after start.
> > > >> Since RTE flow is generic and HW is not generic sometimes the
> > > >> PMD needs to translate this difference and in this gray area
> > > >> there is a game between best API best performance
> > > >> understanding what the user  
> > 
> > Yes, as a former FPGA dev, I can understand your concerns. RTE Flow
> > is a difficult beast.
> >   
> +1 
> > Jan
> >   
> > > >> really wants. This is why the doc should be very clear on what
> > > >> is the expected. but this is also the reason why such document
> > > >> is very hard to agree on.  
> > >
> > > Do I understand correctly that net/mlx5 maintainers will follow
> > > up?  
> 
> Yes, I will follow up on this.
> 
> > >  
> > > >>
> > > >> Ori  
> > > >>> >  
> > > >>> > > This matters for the bonding case as well, doesn't it?.
> > > >>> > > It is not desirable to accidently omit a packet that was
> > > >>> > > received by primary ingress logic instead of being
> > > >>> > > redirected into the dedicated queue.
> > > >>> > >
> > > >>> > > Are there any chances that for mlx5 it would be possible
> > > >>> > > to insert flow rules before calling rte_eth_dev_start?
> > > >>> > > Anyway, the behaviour should be specified and documented
> > > >>> > > in DPDK more precisely to avoid such uncertainty in the
> > > >>> > > future. 
> > > >>> > I agree the documentation should be fixed.  
> > > >>>
> > > >>> +1  
> > >
> > > Cc Thomas and Ferruh since ethdev documentation should be
> > > clarified.
> > >
> > > It looks like there is no consensus if the patch is a right
> > > direction or wrong. For me it looks wrong taking all above
> > > arguments in to account (mainly necessity to be able to insert
> > > flows before pushing start button which enables the traffic if HW
> > > supports it).
> > >
> > > So, I'm applying first two patches and hold on this one.  

Andrew, I believe that it would be helpful to start some new thread
otherwise we would get lost here :). It seems that we will have few
more fixes for the bonding driver. Do you prefer an entirely new
patchset or v2 of this topic? Or any other advise how to proceed?

Jan

>
  
Thomas Monjalon July 15, 2021, 1:58 p.m. UTC | #13
14/07/2021 17:00, Jan Viktorin:
> > > On Tue, 13 Jul 2021 12:26:35 +0300
> > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> > > > >>> > > This matters for the bonding case as well, doesn't it?.
> > > > >>> > > It is not desirable to accidently omit a packet that was
> > > > >>> > > received by primary ingress logic instead of being
> > > > >>> > > redirected into the dedicated queue.
> > > > >>> > >
> > > > >>> > > Are there any chances that for mlx5 it would be possible
> > > > >>> > > to insert flow rules before calling rte_eth_dev_start?
> > > > >>> > > Anyway, the behaviour should be specified and documented
> > > > >>> > > in DPDK more precisely to avoid such uncertainty in the
> > > > >>> > > future. 
> > > > >>> > I agree the documentation should be fixed.  
> > > > >>>
> > > > >>> +1  
> > > >
> > > > Cc Thomas and Ferruh since ethdev documentation should be
> > > > clarified.
> > > >
> > > > It looks like there is no consensus if the patch is a right
> > > > direction or wrong. For me it looks wrong taking all above
> > > > arguments in to account (mainly necessity to be able to insert
> > > > flows before pushing start button which enables the traffic if HW
> > > > supports it).
> > > >
> > > > So, I'm applying first two patches and hold on this one.  
> 
> Andrew, I believe that it would be helpful to start some new thread
> otherwise we would get lost here :). It seems that we will have few
> more fixes for the bonding driver. Do you prefer an entirely new
> patchset or v2 of this topic? Or any other advise how to proceed?

This thread is about 3 things:
	- bonding issue
	- ethdev doc
	- mlx5 design
That's too much topics to address in one thread :)

You may restart the discussion with a doc update
if the stop/start requirement is not clear.
  
Ferruh Yigit Aug. 24, 2021, 1:18 p.m. UTC | #14
On 7/15/2021 2:58 PM, Thomas Monjalon wrote:
> 14/07/2021 17:00, Jan Viktorin:
>>>> On Tue, 13 Jul 2021 12:26:35 +0300
>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
>>>>>>>>>> This matters for the bonding case as well, doesn't it?.
>>>>>>>>>> It is not desirable to accidently omit a packet that was
>>>>>>>>>> received by primary ingress logic instead of being
>>>>>>>>>> redirected into the dedicated queue.
>>>>>>>>>>
>>>>>>>>>> Are there any chances that for mlx5 it would be possible
>>>>>>>>>> to insert flow rules before calling rte_eth_dev_start?
>>>>>>>>>> Anyway, the behaviour should be specified and documented
>>>>>>>>>> in DPDK more precisely to avoid such uncertainty in the
>>>>>>>>>> future. 
>>>>>>>>> I agree the documentation should be fixed.  
>>>>>>>>
>>>>>>>> +1  
>>>>>
>>>>> Cc Thomas and Ferruh since ethdev documentation should be
>>>>> clarified.
>>>>>
>>>>> It looks like there is no consensus if the patch is a right
>>>>> direction or wrong. For me it looks wrong taking all above
>>>>> arguments in to account (mainly necessity to be able to insert
>>>>> flows before pushing start button which enables the traffic if HW
>>>>> supports it).
>>>>>
>>>>> So, I'm applying first two patches and hold on this one.  
>>
>> Andrew, I believe that it would be helpful to start some new thread
>> otherwise we would get lost here :). It seems that we will have few
>> more fixes for the bonding driver. Do you prefer an entirely new
>> patchset or v2 of this topic? Or any other advise how to proceed?
> 
> This thread is about 3 things:
> 	- bonding issue
> 	- ethdev doc
> 	- mlx5 design
> That's too much topics to address in one thread :)
> 
> You may restart the discussion with a doc update
> if the stop/start requirement is not clear.
> 
> 

Is separate discussions created as follow up?
  
Jan Viktorin Aug. 26, 2021, 10:15 a.m. UTC | #15
On Tue, 24 Aug 2021 14:18:16 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 7/15/2021 2:58 PM, Thomas Monjalon wrote:
> > 14/07/2021 17:00, Jan Viktorin:  
> >>>> On Tue, 13 Jul 2021 12:26:35 +0300
> >>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:  
> >>>>>>>>>> This matters for the bonding case as well, doesn't it?.
> >>>>>>>>>> It is not desirable to accidently omit a packet that was
> >>>>>>>>>> received by primary ingress logic instead of being
> >>>>>>>>>> redirected into the dedicated queue.
> >>>>>>>>>>
> >>>>>>>>>> Are there any chances that for mlx5 it would be possible
> >>>>>>>>>> to insert flow rules before calling rte_eth_dev_start?
> >>>>>>>>>> Anyway, the behaviour should be specified and documented
> >>>>>>>>>> in DPDK more precisely to avoid such uncertainty in the
> >>>>>>>>>> future.   
> >>>>>>>>> I agree the documentation should be fixed.    
> >>>>>>>>
> >>>>>>>> +1    
> >>>>>
> >>>>> Cc Thomas and Ferruh since ethdev documentation should be
> >>>>> clarified.
> >>>>>
> >>>>> It looks like there is no consensus if the patch is a right
> >>>>> direction or wrong. For me it looks wrong taking all above
> >>>>> arguments in to account (mainly necessity to be able to insert
> >>>>> flows before pushing start button which enables the traffic if
> >>>>> HW supports it).
> >>>>>
> >>>>> So, I'm applying first two patches and hold on this one.    
> >>
> >> Andrew, I believe that it would be helpful to start some new thread
> >> otherwise we would get lost here :). It seems that we will have few
> >> more fixes for the bonding driver. Do you prefer an entirely new
> >> patchset or v2 of this topic? Or any other advise how to proceed?  
> > 
> > This thread is about 3 things:
> > 	- bonding issue
> > 	- ethdev doc
> > 	- mlx5 design
> > That's too much topics to address in one thread :)
> > 
> > You may restart the discussion with a doc update
> > if the stop/start requirement is not clear.
> > 
> >   
> 
> Is separate discussions created as follow up?
> 

Martin is recently unavailable. But he has already started a new
thread for _ethdev doc_ topic:

 [PATCH 0/4] doc: update RTE flow rule and bonding related info
 https://www.mail-archive.com/dev@dpdk.org/msg214517.html

to first clarify/document what is the current status and how to proceed.

Jan
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index a6755661c..fea3bc537 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1818,25 +1818,35 @@  slave_configure(struct rte_eth_dev *bonded_eth_dev,
 			rte_flow_destroy(slave_eth_dev->data->port_id,
 					internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
 					&flow_error);
+	}
 
+	/* Start device */
+	errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
+	if (errval != 0) {
+		RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
+				slave_eth_dev->data->port_id, errval);
+		return -1;
+	}
+
+	if (internals->mode == BONDING_MODE_8023AD &&
+			internals->mode4.dedicated_queues.enabled == 1) {
 		errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
 				slave_eth_dev->data->port_id);
 		if (errval != 0) {
 			RTE_BOND_LOG(ERR,
 				"bond_ethdev_8023ad_flow_set: port=%d, err (%d)",
 				slave_eth_dev->data->port_id, errval);
+
+			errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
+			if (errval < 0) {
+				RTE_BOND_LOG(ERR,
+					"rte_eth_dev_stop: port=%d, err (%d)",
+					slave_eth_dev->data->port_id, errval);
+			}
 			return errval;
 		}
 	}
 
-	/* Start device */
-	errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
-	if (errval != 0) {
-		RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
-				slave_eth_dev->data->port_id, errval);
-		return -1;
-	}
-
 	/* If RSS is enabled for bonding, synchronize RETA */
 	if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS) {
 		int i;