[00/11] net/mlx5: flow insertion performance improvements

Message ID 20240228170046.176600-1-dsosnowski@nvidia.com (mailing list archive)
Headers
Series net/mlx5: flow insertion performance improvements |

Message

Dariusz Sosnowski Feb. 28, 2024, 5 p.m. UTC
  Goal of this patchset is to improve the throughput of flow insertion
and deletion in mlx5 PMD when HW Steering flow engine is used.

- Patch 1 - Use preallocated per-queue, per-actions template buffer
  for storing translated flow actions, instead of allocating and
  filling it on demand, on each flow operation.
- Patches 2-4 - Make resource index allocation optional. This allocation
  will be skipped when it is not required by the created template table.
- Patches 5-7 - Reduce memory footprint of the internal flow queue.
- Patch 8 - Remove indirection between flow job and flow itself,
  by using flow as an operation container.
- Patches 9-10 - Reduce memory footpring of flow struct by moving
  rarely used flow fields outside of the main flow struct.
  These fields will accesses only when needed.
  Also remove unneeded `zmalloc` usage.
- Patch 11 - Remove unneeded device status check in flow create.

In general all of these changes result in the following improvements
(all numbers are averaged Kflows/sec):

|              | Insertion) |   +%   | Deletion |   +%  |
|--------------|:----------:|:------:|:--------:|:-----:|
| baseline     |   6338.7   |        |  9739.6  |       |
| improvements |   6978.8   | +10.1% |  10432.4 | +7.1% |

The basic benchmark was run on ConnectX-6 Dx (22.40.1000),
on the system with Intel Xeon Platinum 8380 CPU.

Bing Zhao (2):
  net/mlx5: skip the unneeded resource index allocation
  net/mlx5: remove unneeded device status checking

Dariusz Sosnowski (7):
  net/mlx5: allocate local DR rule action buffers
  net/mlx5: remove action params from job
  net/mlx5: remove flow pattern from job
  net/mlx5: remove updated flow from job
  net/mlx5: use flow as operation container
  net/mlx5: move rarely used flow fields outside
  net/mlx5: reuse flow fields

Erez Shitrit (2):
  net/mlx5/hws: add check for matcher rule update support
  net/mlx5/hws: add check if matcher contains complex rules

 drivers/net/mlx5/hws/mlx5dr.h         |  16 +
 drivers/net/mlx5/hws/mlx5dr_action.c  |   6 +
 drivers/net/mlx5/hws/mlx5dr_action.h  |   2 +
 drivers/net/mlx5/hws/mlx5dr_matcher.c |  29 +
 drivers/net/mlx5/mlx5.h               |  29 +-
 drivers/net/mlx5/mlx5_flow.h          | 128 ++++-
 drivers/net/mlx5/mlx5_flow_hw.c       | 794 ++++++++++++++++----------
 7 files changed, 666 insertions(+), 338 deletions(-)

--
2.39.2
  

Comments

Ori Kam Feb. 29, 2024, 8:52 a.m. UTC | #1
Hi Dariusz,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Wednesday, February 28, 2024 7:01 PM
> 
> Goal of this patchset is to improve the throughput of flow insertion
> and deletion in mlx5 PMD when HW Steering flow engine is used.
> 
> - Patch 1 - Use preallocated per-queue, per-actions template buffer
>   for storing translated flow actions, instead of allocating and
>   filling it on demand, on each flow operation.
> - Patches 2-4 - Make resource index allocation optional. This allocation
>   will be skipped when it is not required by the created template table.
> - Patches 5-7 - Reduce memory footprint of the internal flow queue.
> - Patch 8 - Remove indirection between flow job and flow itself,
>   by using flow as an operation container.
> - Patches 9-10 - Reduce memory footpring of flow struct by moving
>   rarely used flow fields outside of the main flow struct.
>   These fields will accesses only when needed.
>   Also remove unneeded `zmalloc` usage.
> - Patch 11 - Remove unneeded device status check in flow create.
> 
> In general all of these changes result in the following improvements
> (all numbers are averaged Kflows/sec):
> 
> |              | Insertion) |   +%   | Deletion |   +%  |
> |--------------|:----------:|:------:|:--------:|:-----:|
> | baseline     |   6338.7   |        |  9739.6  |       |
> | improvements |   6978.8   | +10.1% |  10432.4 | +7.1% |
> 
> The basic benchmark was run on ConnectX-6 Dx (22.40.1000),
> on the system with Intel Xeon Platinum 8380 CPU.
> 
> Bing Zhao (2):
>   net/mlx5: skip the unneeded resource index allocation
>   net/mlx5: remove unneeded device status checking
> 
> Dariusz Sosnowski (7):
>   net/mlx5: allocate local DR rule action buffers
>   net/mlx5: remove action params from job
>   net/mlx5: remove flow pattern from job
>   net/mlx5: remove updated flow from job
>   net/mlx5: use flow as operation container
>   net/mlx5: move rarely used flow fields outside
>   net/mlx5: reuse flow fields
> 
> Erez Shitrit (2):
>   net/mlx5/hws: add check for matcher rule update support
>   net/mlx5/hws: add check if matcher contains complex rules
> 
>  drivers/net/mlx5/hws/mlx5dr.h         |  16 +
>  drivers/net/mlx5/hws/mlx5dr_action.c  |   6 +
>  drivers/net/mlx5/hws/mlx5dr_action.h  |   2 +
>  drivers/net/mlx5/hws/mlx5dr_matcher.c |  29 +
>  drivers/net/mlx5/mlx5.h               |  29 +-
>  drivers/net/mlx5/mlx5_flow.h          | 128 ++++-
>  drivers/net/mlx5/mlx5_flow_hw.c       | 794 ++++++++++++++++----------
>  7 files changed, 666 insertions(+), 338 deletions(-)
> 
> --
> 2.39.2

Series-acked-by:  Ori Kam <orika@nvidia.com>
Best,
Ori