mbox series

[0/4] net/mlx5: keep indirect actions across port restart

Message ID 20210727073121.895620-1-dkozlyuk@nvidia.com (mailing list archive)
Headers
Series net/mlx5: keep indirect actions across port restart |

Message

Dmitry Kozlyuk July 27, 2021, 7:31 a.m. UTC
  It was unspecified what happens to indirect actions when a port
is stopped, possibly reconfigured, and started again. MLX5 PMD,
the first one to use indirect actions, intended to keep them across
such a sequence, but the implementation was buggy. Patches 1-3 fix
the PMD behavior, patch 4 adds common specification with rationale.

Dmitry Kozlyuk (4):
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions across port restart
  ethdev: document indirect flow action life cycle

 doc/guides/prog_guide/rte_flow.rst |  10 +
 drivers/net/mlx5/linux/mlx5_os.c   |   5 -
 drivers/net/mlx5/mlx5_devx.c       | 204 +++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.c       | 292 ++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.h       |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 ++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +-------
 drivers/net/mlx5/mlx5_rx.h         |   4 +
 drivers/net/mlx5/mlx5_rxq.c        |  99 ++++++++--
 drivers/net/mlx5/mlx5_trigger.c    |  10 +
 lib/ethdev/rte_flow.h              |   4 +
 11 files changed, 680 insertions(+), 134 deletions(-)
  

Comments

Andrew Rybchenko July 28, 2021, 8:05 a.m. UTC | #1
On 7/27/21 10:31 AM, Dmitry Kozlyuk wrote:
> It was unspecified what happens to indirect actions when a port
> is stopped, possibly reconfigured, and started again. MLX5 PMD,
> the first one to use indirect actions, intended to keep them across
> such a sequence, but the implementation was buggy. Patches 1-3 fix
> the PMD behavior, patch 4 adds common specification with rationale.

I'm sorry, but it looks very inconsistent. If flow rules are not
preserved across restart, indirect actions should not be preserved
as well. We need very strong reasons to introduce the inconsistency.

If we finally accept it, I think it would be very useful to care
about PMDs which cannot preserve it in HW across restart from the
very beginning and save it in ethdev layer and restore on start
automatically (i.e. do not force all such PMDs to care about
the restore internally and basically duplicate the code).
  
Dmitry Kozlyuk July 28, 2021, 11:18 a.m. UTC | #2
Hi Andrew,

> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> On 7/27/21 10:31 AM, Dmitry Kozlyuk wrote:
> > It was unspecified what happens to indirect actions when a port is
> > stopped, possibly reconfigured, and started again. MLX5 PMD, the first
> > one to use indirect actions, intended to keep them across such a
> > sequence, but the implementation was buggy. Patches 1-3 fix the PMD
> > behavior, patch 4 adds common specification with rationale.
> 
> I'm sorry, but it looks very inconsistent. If flow rules are not preserved across
> restart, indirect actions should not be preserved as well. We need very strong
> reasons to introduce the inconsistency.

Indirect actions really don't need to behave like flow rules. They are just objects owned by the port and they can exist while it exists. Consider a counter: stopping and starting the port doesn't logically affect its state. MLX5 PMD destroys flow rules on port stop for internal reasons and documents this behavior, but ethdev API doesn't require it either.

> If we finally accept it, I think it would be very useful to care about PMDs which
> cannot preserve it in HW across restart from the very beginning and save it in
> ethdev layer and restore on start automatically (i.e. do not force all such PMDs
> to care about the restore internally and basically duplicate the code).

Or keeping indirect actions can be an advertised PMD capability.
Given Ori's comments to patch 4, I think the common spec needs more work.
For this patchset that fixes MLX5 we can have the behavior documented for PMD and not require it from all the drivers.
  
Ori Kam July 28, 2021, 12:07 p.m. UTC | #3
Hi Dmitry and Andrew,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> Sent: Wednesday, July 28, 2021 2:19 PM
> 
> Hi Andrew,
> 
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> On 7/27/21
> > 10:31 AM, Dmitry Kozlyuk wrote:
> > > It was unspecified what happens to indirect actions when a port is
> > > stopped, possibly reconfigured, and started again. MLX5 PMD, the
> > > first one to use indirect actions, intended to keep them across such
> > > a sequence, but the implementation was buggy. Patches 1-3 fix the
> > > PMD behavior, patch 4 adds common specification with rationale.
> >
> > I'm sorry, but it looks very inconsistent. If flow rules are not
> > preserved across restart, indirect actions should not be preserved as
> > well. We need very strong reasons to introduce the inconsistency.
> 
> Indirect actions really don't need to behave like flow rules. They are just
> objects owned by the port and they can exist while it exists. Consider a
> counter: stopping and starting the port doesn't logically affect its state. MLX5
> PMD destroys flow rules on port stop for internal reasons and documents
> this behavior, but ethdev API doesn't require it either.
> 
> > If we finally accept it, I think it would be very useful to care about
> > PMDs which cannot preserve it in HW across restart from the very
> > beginning and save it in ethdev layer and restore on start
> > automatically (i.e. do not force all such PMDs to care about the restore
> internally and basically duplicate the code).
> 
> Or keeping indirect actions can be an advertised PMD capability.
> Given Ori's comments to patch 4, I think the common spec needs more work.
> For this patchset that fixes MLX5 we can have the behavior documented for
> PMD and not require it from all the drivers.

This also effects if flows can be stored or not, (there was other thread about it)
I think we should have device cap that says if flows are preserved,
if they can be created before start, the same goes to actions, but
what if some actions can be preserved and some not? For example RSS
can't in some HW (or due to configuration change) while other can? For example
counter?
I don't want to have cap for each action, I think this info is based explained in each
driver documentation.
Maybe we can have some general flag one for flows and one for actions, and each PMD will have
detail doc.


Best,
Ori
  
Andrew Rybchenko July 28, 2021, 12:26 p.m. UTC | #4
On 7/28/21 2:18 PM, Dmitry Kozlyuk wrote:
> Hi Andrew,
> 
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> On 7/27/21 10:31 AM, Dmitry Kozlyuk wrote:
>>> It was unspecified what happens to indirect actions when a port is
>>> stopped, possibly reconfigured, and started again. MLX5 PMD, the first
>>> one to use indirect actions, intended to keep them across such a
>>> sequence, but the implementation was buggy. Patches 1-3 fix the PMD
>>> behavior, patch 4 adds common specification with rationale.
>>
>> I'm sorry, but it looks very inconsistent. If flow rules are not preserved across
>> restart, indirect actions should not be preserved as well. We need very strong
>> reasons to introduce the inconsistency.
> 
> Indirect actions really don't need to behave like flow rules. They are just objects owned by the port and they can exist while it exists. Consider a counter: stopping and starting the port doesn't logically affect its state. MLX5 PMD destroys flow rules on port stop for internal reasons and documents this behavior, but ethdev API doesn't require it either.

It all sounds bad. All these gray areas just make it hard for DPDK
applications to switch from one HW to another.
Any rules must not be motivated because of some PMD internal reasons.
We should not adjust ethdev rules to fit some PMD behaviour.
ethdev rules should be motivated by common sense and convenience from
applications point of view.

For example, it is strange to preserve indirect RSS action with queues 
specified across device reconfiguration when queues count may change.
I'd say that reconfiguration must drop all indirect actions.
However, just stop/start could preserve both indirect actions and flow
rues since it could be more convenient from application point of view.
If application really wants to remove all flow rules, it can call
rte_flow_flush().
The strong reason to flush indirect actions and flow rules across
restart is possible actions or rules restore failure on start.
However, may be it is sufficient to document that start should really
fail, if it can't restore everything and application should retry
after rte_flow_flush() taking it into account.

>> If we finally accept it, I think it would be very useful to care about PMDs which
>> cannot preserve it in HW across restart from the very beginning and save it in
>> ethdev layer and restore on start automatically (i.e. do not force all such PMDs
>> to care about the restore internally and basically duplicate the code).
> 
> Or keeping indirect actions can be an advertised PMD capability.
> Given Ori's comments to patch 4, I think the common spec needs more work.
> For this patchset that fixes MLX5 we can have the behavior documented for PMD and not require it from all the drivers.

Are you going to drop 4th patch?

In general documenting PMD behaviour specifics in its documentation is
a wrong direction since it does not help DPDK applications to be
portable across different HW.
  
Dmitry Kozlyuk July 28, 2021, 2:08 p.m. UTC | #5
> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: 28 июля 2021 г. 15:27
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> Cc: David Marchand <david.marchand@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 7/28/21 2:18 PM, Dmitry Kozlyuk wrote:
> > Hi Andrew,
> >
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> On 7/27/21
> >> 10:31 AM, Dmitry Kozlyuk wrote:
> >>> It was unspecified what happens to indirect actions when a port is
> >>> stopped, possibly reconfigured, and started again. MLX5 PMD, the
> >>> first one to use indirect actions, intended to keep them across such
> >>> a sequence, but the implementation was buggy. Patches 1-3 fix the
> >>> PMD behavior, patch 4 adds common specification with rationale.
> >>
> >> I'm sorry, but it looks very inconsistent. If flow rules are not
> >> preserved across restart, indirect actions should not be preserved as
> >> well. We need very strong reasons to introduce the inconsistency.
> >
> > Indirect actions really don't need to behave like flow rules. They are just
> objects owned by the port and they can exist while it exists. Consider a counter:
> stopping and starting the port doesn't logically affect its state. MLX5 PMD
> destroys flow rules on port stop for internal reasons and documents this
> behavior, but ethdev API doesn't require it either.
> 
> It all sounds bad. All these gray areas just make it hard for DPDK applications to
> switch from one HW to another.
> Any rules must not be motivated because of some PMD internal reasons.
> We should not adjust ethdev rules to fit some PMD behaviour.
> ethdev rules should be motivated by common sense and convenience from
> applications point of view.

That is what this patchset is trying to do.
Current specification is unclear, application doesn't know
if it should destroy and recreate indirect actions or not.
MLX5 PMD is only mentioned above because it's the only one implementing
indirect action API, but it's not an attempt to tailor API to it, quite the opposite.

> For example, it is strange to preserve indirect RSS action with queues specified
> across device reconfiguration when queues count may change.
> I'd say that reconfiguration must drop all indirect actions.

I don't like it because 1) it is implicit, 2) it may be unnecessary even for RSS, and it's only one example of an indirect action.

> However, just stop/start could preserve both indirect actions and flow rues since
> it could be more convenient from application point of view.

For many cases I agree, but not for all.
What if an application creates numerous flows from its data path?
They are transient by nature, but PMD will have to save them all
at the cost of RAM and CPU but without benefit to anyone.
OTOH, application always controls indirect actions it creates,
because it is going to reuse or query them.
Therefore, it is both logical and convenient to preserve them.

> If application really wants to remove all flow rules, it can call rte_flow_flush().
> The strong reason to flush indirect actions and flow rules across restart is
> possible actions or rules restore failure on start.
> However, may be it is sufficient to document that start should really fail, if it
> can't restore everything and application should retry after rte_flow_flush()
> taking it into account.
> 
> >> If we finally accept it, I think it would be very useful to care
> >> about PMDs which cannot preserve it in HW across restart from the
> >> very beginning and save it in ethdev layer and restore on start
> >> automatically (i.e. do not force all such PMDs to care about the restore
> internally and basically duplicate the code).
> >
> > Or keeping indirect actions can be an advertised PMD capability.
> > Given Ori's comments to patch 4, I think the common spec needs more work.
> > For this patchset that fixes MLX5 we can have the behavior documented for
> PMD and not require it from all the drivers.
> 
> Are you going to drop 4th patch?

Yes.

> In general documenting PMD behaviour specifics in its documentation is a wrong
> direction since it does not help DPDK applications to be portable across different
> HW.

I agree. But currently there is a clear resource leak in MLX5 PMD, that can be solved either by destroying indirect actions on port stop or by keeping them (this is what PMD maintainers prefer). The leak should be fixed and what happens to indirect actions must be clearly documented. Ideally the fix should be aligned with common ethdev API, but if you and Ori think its design is wrong, then at least behavior can be described in PMD docs and later fixed or promoted to API.
  
Ori Kam July 28, 2021, 5:07 p.m. UTC | #6
Hi,

> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Wednesday, July 28, 2021 5:08 PM
> 
> > -----Original Message-----
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Sent: 28 июля 2021 г. 15:27
> > To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> > Cc: David Marchand <david.marchand@redhat.com>
> > Subject: Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions
> > across port restart
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On 7/28/21 2:18 PM, Dmitry Kozlyuk wrote:
> > > Hi Andrew,
> > >
> > >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> On
> 7/27/21
> > >> 10:31 AM, Dmitry Kozlyuk wrote:
> > >>> It was unspecified what happens to indirect actions when a port is
> > >>> stopped, possibly reconfigured, and started again. MLX5 PMD, the
> > >>> first one to use indirect actions, intended to keep them across
> > >>> such a sequence, but the implementation was buggy. Patches 1-3 fix
> > >>> the PMD behavior, patch 4 adds common specification with rationale.
> > >>
> > >> I'm sorry, but it looks very inconsistent. If flow rules are not
> > >> preserved across restart, indirect actions should not be preserved
> > >> as well. We need very strong reasons to introduce the inconsistency.
> > >
> > > Indirect actions really don't need to behave like flow rules. They
> > > are just
> > objects owned by the port and they can exist while it exists. Consider a
> counter:
> > stopping and starting the port doesn't logically affect its state.
> > MLX5 PMD destroys flow rules on port stop for internal reasons and
> > documents this behavior, but ethdev API doesn't require it either.
> >
> > It all sounds bad. All these gray areas just make it hard for DPDK
> > applications to switch from one HW to another.
> > Any rules must not be motivated because of some PMD internal reasons.
> > We should not adjust ethdev rules to fit some PMD behaviour.
> > ethdev rules should be motivated by common sense and convenience
> from
> > applications point of view.
> 
> That is what this patchset is trying to do.
> Current specification is unclear, application doesn't know if it should destroy
> and recreate indirect actions or not.
> MLX5 PMD is only mentioned above because it's the only one implementing
> indirect action API, but it's not an attempt to tailor API to it, quite the
> opposite.
> 
I agree gray areas are bad, but as more and more application are using more and more
flows, insertion of flows become part of the datapath, optimization of actions and rules
become even more critical. (To address this we are going to introduce additional API,
which will enable async insertion, allocation of resources before flow insertion)
So since each HW implements the actions and flows differently forcing the exact same
behavior will result in performance degradation.


> > For example, it is strange to preserve indirect RSS action with queues
> > specified across device reconfiguration when queues count may change.
> > I'd say that reconfiguration must drop all indirect actions.
> 
> I don't like it because 1) it is implicit, 2) it may be unnecessary even for RSS,
> and it's only one example of an indirect action.
> 
> > However, just stop/start could preserve both indirect actions and flow
> > rues since it could be more convenient from application point of view.
> 
> For many cases I agree, but not for all.
> What if an application creates numerous flows from its data path?
> They are transient by nature, but PMD will have to save them all at the cost
> of RAM and CPU but without benefit to anyone.
> OTOH, application always controls indirect actions it creates, because it is
> going to reuse or query them.
> Therefore, it is both logical and convenient to preserve them.
> 
> > If application really wants to remove all flow rules, it can call
> rte_flow_flush().
> > The strong reason to flush indirect actions and flow rules across
> > restart is possible actions or rules restore failure on start.
> > However, may be it is sufficient to document that start should really
> > fail, if it can't restore everything and application should retry
> > after rte_flow_flush() taking it into account.
> >
> > >> If we finally accept it, I think it would be very useful to care
> > >> about PMDs which cannot preserve it in HW across restart from the
> > >> very beginning and save it in ethdev layer and restore on start
> > >> automatically (i.e. do not force all such PMDs to care about the
> > >> restore
> > internally and basically duplicate the code).
> > >
> > > Or keeping indirect actions can be an advertised PMD capability.
> > > Given Ori's comments to patch 4, I think the common spec needs more
> work.
> > > For this patchset that fixes MLX5 we can have the behavior
> > > documented for
> > PMD and not require it from all the drivers.
> >
> > Are you going to drop 4th patch?
> 
> Yes.
> 
> > In general documenting PMD behaviour specifics in its documentation is
> > a wrong direction since it does not help DPDK applications to be
> > portable across different HW.
> 
> I agree. But currently there is a clear resource leak in MLX5 PMD, that can be
> solved either by destroying indirect actions on port stop or by keeping them
> (this is what PMD maintainers prefer). The leak should be fixed and what
> happens to indirect actions must be clearly documented. Ideally the fix
> should be aligned with common ethdev API, but if you and Ori think its
> design is wrong, then at least behavior can be described in PMD docs and
> later fixed or promoted to API.

I think application should be aware to different possibilities between the PMD.
If possible, it is best that all PMD will act the same but if there is HW issue I think
different behavior is better then not supporting at all.
In any case the doc should state the min requirement if HW can support better than
it can do so.

Best,
Ori