[v2,00/11] net/mlx5: flow insertion performance improvements

Message ID 20240229115157.201671-1-dsosnowski@nvidia.com (mailing list archive)
Headers
Series net/mlx5: flow insertion performance improvements |

Message

Dariusz Sosnowski Feb. 29, 2024, 11:51 a.m. UTC
  Goal of this patchset is to improve the throughput of flow insertion
and deletion in mlx5 PMD when HW Steering flow engine is used.

- Patch 1 - Use preallocated per-queue, per-actions template buffer
  for storing translated flow actions, instead of allocating and
  filling it on demand, on each flow operation.
- Patches 2-4 - Make resource index allocation optional. This allocation
  will be skipped when it is not required by the created template table.
- Patches 5-7 - Reduce memory footprint of the internal flow queue.
- Patch 8 - Remove indirection between flow job and flow itself,
  by using flow as an operation container.
- Patches 9-10 - Reduce memory footpring of flow struct by moving
  rarely used flow fields outside of the main flow struct.
  These fields will accesses only when needed.
  Also remove unneeded `zmalloc` usage.
- Patch 11 - Remove unneeded device status check in flow create.

In general all of these changes result in the following improvements
(all numbers are averaged Kflows/sec):

|              | Insertion) |   +%   | Deletion |   +%  |
|--------------|:----------:|:------:|:--------:|:-----:|
| baseline     |   6338.7   |        |  9739.6  |       |
| improvements |   6978.8   | +10.1% |  10432.4 | +7.1% |

The basic benchmark was run on ConnectX-6 Dx (22.40.1000),
on the system with Intel Xeon Platinum 8380 CPU.

v2:

- Rebased.
- Applied Acked-by tags from previous version.

Bing Zhao (2):
  net/mlx5: skip the unneeded resource index allocation
  net/mlx5: remove unneeded device status checking

Dariusz Sosnowski (7):
  net/mlx5: allocate local DR rule action buffers
  net/mlx5: remove action params from job
  net/mlx5: remove flow pattern from job
  net/mlx5: remove updated flow from job
  net/mlx5: use flow as operation container
  net/mlx5: move rarely used flow fields outside
  net/mlx5: reuse flow fields

Erez Shitrit (2):
  net/mlx5/hws: add check for matcher rule update support
  net/mlx5/hws: add check if matcher contains complex rules

 drivers/net/mlx5/hws/mlx5dr.h         |  16 +
 drivers/net/mlx5/hws/mlx5dr_action.c  |   6 +
 drivers/net/mlx5/hws/mlx5dr_action.h  |   2 +
 drivers/net/mlx5/hws/mlx5dr_matcher.c |  29 +
 drivers/net/mlx5/mlx5.h               |  29 +-
 drivers/net/mlx5/mlx5_flow.h          | 128 ++++-
 drivers/net/mlx5/mlx5_flow_hw.c       | 794 ++++++++++++++++----------
 7 files changed, 666 insertions(+), 338 deletions(-)

--
2.39.2
  

Comments

Raslan Darawsheh March 3, 2024, 12:16 p.m. UTC | #1
Hi.

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Thursday, February 29, 2024 1:52 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Bing Zhao
> <bingz@nvidia.com>
> Subject: [PATCH v2 00/11] net/mlx5: flow insertion performance
> improvements
> 
> Goal of this patchset is to improve the throughput of flow insertion and
> deletion in mlx5 PMD when HW Steering flow engine is used.
> 
> - Patch 1 - Use preallocated per-queue, per-actions template buffer
>   for storing translated flow actions, instead of allocating and
>   filling it on demand, on each flow operation.
> - Patches 2-4 - Make resource index allocation optional. This allocation
>   will be skipped when it is not required by the created template table.
> - Patches 5-7 - Reduce memory footprint of the internal flow queue.
> - Patch 8 - Remove indirection between flow job and flow itself,
>   by using flow as an operation container.
> - Patches 9-10 - Reduce memory footpring of flow struct by moving
>   rarely used flow fields outside of the main flow struct.
>   These fields will accesses only when needed.
>   Also remove unneeded `zmalloc` usage.
> - Patch 11 - Remove unneeded device status check in flow create.
> 
> In general all of these changes result in the following improvements (all
> numbers are averaged Kflows/sec):
> 
> |              | Insertion) |   +%   | Deletion |   +%  |
> |--------------|:----------:|:------:|:--------:|:-----:|
> | baseline     |   6338.7   |        |  9739.6  |       |
> | improvements |   6978.8   | +10.1% |  10432.4 | +7.1% |
> 
> The basic benchmark was run on ConnectX-6 Dx (22.40.1000), on the system
> with Intel Xeon Platinum 8380 CPU.
> 
> v2:
> 
> - Rebased.
> - Applied Acked-by tags from previous version.
> 
> Bing Zhao (2):
>   net/mlx5: skip the unneeded resource index allocation
>   net/mlx5: remove unneeded device status checking
> 
> Dariusz Sosnowski (7):
>   net/mlx5: allocate local DR rule action buffers
>   net/mlx5: remove action params from job
>   net/mlx5: remove flow pattern from job
>   net/mlx5: remove updated flow from job
>   net/mlx5: use flow as operation container
>   net/mlx5: move rarely used flow fields outside
>   net/mlx5: reuse flow fields
> 
> Erez Shitrit (2):
>   net/mlx5/hws: add check for matcher rule update support
>   net/mlx5/hws: add check if matcher contains complex rules
> 
>  drivers/net/mlx5/hws/mlx5dr.h         |  16 +
>  drivers/net/mlx5/hws/mlx5dr_action.c  |   6 +
>  drivers/net/mlx5/hws/mlx5dr_action.h  |   2 +
>  drivers/net/mlx5/hws/mlx5dr_matcher.c |  29 +
>  drivers/net/mlx5/mlx5.h               |  29 +-
>  drivers/net/mlx5/mlx5_flow.h          | 128 ++++-
>  drivers/net/mlx5/mlx5_flow_hw.c       | 794 ++++++++++++++++----------
>  7 files changed, 666 insertions(+), 338 deletions(-)
> 
> --
> 2.39.2

Series applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh