[v3,00/14] acl: introduce AVX512 classify methods

Message ID	20201005184526.7465-1-konstantin.ananyev@intel.com (mailing list archive)
Headers	IronPort-SDR: oolvSFpkH05q/7HftuCfBWV8U2BndxgKKVARwN2Fp6MQI6C1cxw6ztXd6rMYjBFvlFDc4dmEeU bzhpzsc/MCYQ== IronPort-SDR: qj8g4wDgpWj02vW8V6iQTAU2oHIVlwArfp17wr4cIbHqweYMWwkNq/vdQ5acELN+jaGQvofQ5s hkbtvu3rrv7g== From: Konstantin Ananyev <konstantin.ananyev@intel.com> To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev <konstantin.ananyev@intel.com> Date: Mon, 5 Oct 2020 19:45:12 +0100 Message-Id: <20201005184526.7465-1-konstantin.ananyev@intel.com> In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200915165025.543-1-konstantin.ananyev@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v3 00/14] acl: introduce AVX512 classify methods Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	acl: introduce AVX512 classify methods \| [v3,00/14] acl: introduce AVX512 classify methods [v3,01/14] acl: fix x86 build when compiler doesn't support AVX2 [v3,02/14] doc: fix missing classify methods in ACL guide [v3,03/14] acl: remove of unused enum value [v3,04/14] acl: remove library constructor [v3,05/14] app/acl: few small improvements [v3,06/14] test/acl: expand classify test coverage [v3,07/14] acl: add infrastructure to support AVX512 classify [v3,08/14] acl: introduce 256-bit width AVX512 classify implementation [v3,09/14] acl: update default classify algorithm selection [v3,10/14] acl: introduce 512-bit width AVX512 classify implementation [v3,11/14] acl: for AVX512 classify use 4B load whenever possible [v3,12/14] acl: deduplicate AVX512 code paths [v3,13/14] test/acl: add AVX512 classify support [v3,14/14] app/acl: add AVX512 classify support

Message ID

20201005184526.7465-1-konstantin.ananyev@intel.com (mailing list archive)

Headers

IronPort-SDR: 
 oolvSFpkH05q/7HftuCfBWV8U2BndxgKKVARwN2Fp6MQI6C1cxw6ztXd6rMYjBFvlFDc4dmEeU
 bzhpzsc/MCYQ==
IronPort-SDR: 
 qj8g4wDgpWj02vW8V6iQTAU2oHIVlwArfp17wr4cIbHqweYMWwkNq/vdQ5acELN+jaGQvofQ5s
 hkbtvu3rrv7g==
From: Konstantin Ananyev <konstantin.ananyev@intel.com>
To: dev@dpdk.org
Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com,
 Konstantin Ananyev <konstantin.ananyev@intel.com>
Date: Mon,  5 Oct 2020 19:45:12 +0100
Message-Id: <20201005184526.7465-1-konstantin.ananyev@intel.com>
In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com>
References: <20200915165025.543-1-konstantin.ananyev@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [dpdk-dev] [PATCH v3 00/14] acl: introduce AVX512 classify methods
Precedence: list
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Series

acl: introduce AVX512 classify methods |

Message

Ananyev, Konstantin Oct. 5, 2020, 6:45 p.m. UTC

  These patch series introduce support of AVX512 specific classify
implementation for ACL library.
It adds two new algorithms:
 - RTE_ACL_CLASSIFY_AVX512X16 - can process up to 16 flows in parallel.
   It uses 256-bit width instructions/registers only
   (to avoid frequency level change).
   On my SKX box test-acl shows ~15-30% improvement
   (depending on rule-set and input burst size)
   when switching from AVX2 to AVX512X16 classify algorithms.
 - RTE_ACL_CLASSIFY_AVX512X32 - can process up to 32 flows in parallel.
   It uses 512-bit width instructions/registers and provides higher
   performance then AVX512X16, but can cause frequency level change.
   On my SKX box test-acl shows ~50-70% improvement
   (depending on rule-set and input burst size)
   when switching from AVX2 to AVX512X32 classify algorithms.
   ICX and CLX testing showed similar level of speedup.

Current AVX512 classify implementation is only supported on x86_64.
Note that this series introduce a formal ABI incompatibility
with previous versions of ACL library.

v2 -> v3:
  Fix checkpatch warnings
  Split AVX512 algorithm into two and deduplicate common code
v1 -> v2:
  Deduplicated 8/16 code paths as much as possible
  Updated default algorithm selection
    Removed library constructor to make it easier integrate with
    https://patches.dpdk.org/project/dpdk/list/?series=11831
  Updated docs

These patch series depends on:
https://patches.dpdk.org/patch/79310/
to be applied first.

Konstantin Ananyev (14):
  acl: fix x86 build when compiler doesn't support AVX2
  doc: fix missing classify methods in ACL guide
  acl: remove of unused enum value
  acl: remove library constructor
  app/acl: few small improvements
  test/acl: expand classify test coverage
  acl: add infrastructure to support AVX512 classify
  acl: introduce 256-bit width AVX512 classify implementation
  acl: update default classify algorithm selection
  acl: introduce 512-bit width AVX512 classify implementation
  acl: for AVX512 classify use 4B load whenever possible
  acl: deduplicate AVX512 code paths
  test/acl: add AVX512 classify support
  app/acl: add AVX512 classify support

 app/test-acl/main.c                           |  23 +-
 app/test/test_acl.c                           | 105 ++--
 config/x86/meson.build                        |   3 +-
 .../prog_guide/packet_classif_access_ctrl.rst |  20 +
 doc/guides/rel_notes/deprecation.rst          |   4 -
 doc/guides/rel_notes/release_20_11.rst        |  12 +
 lib/librte_acl/acl.h                          |  16 +
 lib/librte_acl/acl_bld.c                      |  34 ++
 lib/librte_acl/acl_gen.c                      |   2 +-
 lib/librte_acl/acl_run_avx512.c               | 164 ++++++
 lib/librte_acl/acl_run_avx512_common.h        | 477 ++++++++++++++++++
 lib/librte_acl/acl_run_avx512x16.h            | 341 +++++++++++++
 lib/librte_acl/acl_run_avx512x8.h             | 253 ++++++++++
 lib/librte_acl/meson.build                    |  39 ++
 lib/librte_acl/rte_acl.c                      | 212 ++++++--
 lib/librte_acl/rte_acl.h                      |   4 +-
 16 files changed, 1609 insertions(+), 100 deletions(-)
 create mode 100644 lib/librte_acl/acl_run_avx512.c
 create mode 100644 lib/librte_acl/acl_run_avx512_common.h
 create mode 100644 lib/librte_acl/acl_run_avx512x16.h
 create mode 100644 lib/librte_acl/acl_run_avx512x8.h

Comments

David Marchand Oct. 6, 2020, 3:05 p.m. UTC | #1

On Mon, Oct 5, 2020 at 9:44 PM Konstantin Ananyev
<konstantin.ananyev@intel.com> wrote:
>
> These patch series introduce support of AVX512 specific classify
> implementation for ACL library.
> It adds two new algorithms:
>  - RTE_ACL_CLASSIFY_AVX512X16 - can process up to 16 flows in parallel.
>    It uses 256-bit width instructions/registers only
>    (to avoid frequency level change).
>    On my SKX box test-acl shows ~15-30% improvement
>    (depending on rule-set and input burst size)
>    when switching from AVX2 to AVX512X16 classify algorithms.
>  - RTE_ACL_CLASSIFY_AVX512X32 - can process up to 32 flows in parallel.
>    It uses 512-bit width instructions/registers and provides higher
>    performance then AVX512X16, but can cause frequency level change.
>    On my SKX box test-acl shows ~50-70% improvement
>    (depending on rule-set and input burst size)
>    when switching from AVX2 to AVX512X32 classify algorithms.
>    ICX and CLX testing showed similar level of speedup.
>
> Current AVX512 classify implementation is only supported on x86_64.
> Note that this series introduce a formal ABI incompatibility

The only API change I can see is in rte_acl_classify_alg() new error
code but I don't think we need an announcement for this.
As for ABI, we are breaking it in this release, so I see no pb.


> with previous versions of ACL library.
>
> v2 -> v3:
>   Fix checkpatch warnings
>   Split AVX512 algorithm into two and deduplicate common code

Patch 7 still references a RTE_MACHINE_CPUFLAG flag.
Can you rework now that those flags have been dropped?

Thanks.

Ananyev, Konstantin Oct. 6, 2020, 4:07 p.m. UTC | #2

> 
> On Mon, Oct 5, 2020 at 9:44 PM Konstantin Ananyev
> <konstantin.ananyev@intel.com> wrote:
> >
> > These patch series introduce support of AVX512 specific classify
> > implementation for ACL library.
> > It adds two new algorithms:
> >  - RTE_ACL_CLASSIFY_AVX512X16 - can process up to 16 flows in parallel.
> >    It uses 256-bit width instructions/registers only
> >    (to avoid frequency level change).
> >    On my SKX box test-acl shows ~15-30% improvement
> >    (depending on rule-set and input burst size)
> >    when switching from AVX2 to AVX512X16 classify algorithms.
> >  - RTE_ACL_CLASSIFY_AVX512X32 - can process up to 32 flows in parallel.
> >    It uses 512-bit width instructions/registers and provides higher
> >    performance then AVX512X16, but can cause frequency level change.
> >    On my SKX box test-acl shows ~50-70% improvement
> >    (depending on rule-set and input burst size)
> >    when switching from AVX2 to AVX512X32 classify algorithms.
> >    ICX and CLX testing showed similar level of speedup.
> >
> > Current AVX512 classify implementation is only supported on x86_64.
> > Note that this series introduce a formal ABI incompatibility
> 
> The only API change I can see is in rte_acl_classify_alg() new error
> code but I don't think we need an announcement for this.
> As for ABI, we are breaking it in this release, so I see no pb.

Cool, I just wanted to underline that patch #3:
https://patches.dpdk.org/patch/79786/
is a formal ABI breakage.

> 
> 
> > with previous versions of ACL library.
> >
> > v2 -> v3:
> >   Fix checkpatch warnings
> >   Split AVX512 algorithm into two and deduplicate common code
> 
> Patch 7 still references a RTE_MACHINE_CPUFLAG flag.
> Can you rework now that those flags have been dropped?
> 

Should be fixed in v4:
https://patches.dpdk.org/project/dpdk/list/?series=12721

One more thing to mention - this series has a dependency on Vladimir's patch:
https://patches.dpdk.org/patch/79310/ ("eal/x86: introduce AVX 512-bit type"),
so CI/travis would still report an error.

Thanks
Konstantin

David Marchand Oct. 8, 2020, 10:49 a.m. UTC | #3

On Tue, Oct 6, 2020 at 6:13 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
> > > v2 -> v3:
> > >   Fix checkpatch warnings
> > >   Split AVX512 algorithm into two and deduplicate common code
> >
> > Patch 7 still references a RTE_MACHINE_CPUFLAG flag.
> > Can you rework now that those flags have been dropped?
> >
>
> Should be fixed in v4:
> https://patches.dpdk.org/project/dpdk/list/?series=12721

Thanks.


>
> One more thing to mention - this series has a dependency on Vladimir's patch:
> https://patches.dpdk.org/patch/79310/ ("eal/x86: introduce AVX 512-bit type"),
> so CI/travis would still report an error.

Yes, I am looking at this patch and I think I will take it alone so
that we remove the dependency between those series.

Ray Kinsella Oct. 14, 2020, 9:23 a.m. UTC | #4

On 06/10/2020 17:07, Ananyev, Konstantin wrote:
> 
>>
>> On Mon, Oct 5, 2020 at 9:44 PM Konstantin Ananyev
>> <konstantin.ananyev@intel.com> wrote:
>>>
>>> These patch series introduce support of AVX512 specific classify
>>> implementation for ACL library.
>>> It adds two new algorithms:
>>>  - RTE_ACL_CLASSIFY_AVX512X16 - can process up to 16 flows in parallel.
>>>    It uses 256-bit width instructions/registers only
>>>    (to avoid frequency level change).
>>>    On my SKX box test-acl shows ~15-30% improvement
>>>    (depending on rule-set and input burst size)
>>>    when switching from AVX2 to AVX512X16 classify algorithms.
>>>  - RTE_ACL_CLASSIFY_AVX512X32 - can process up to 32 flows in parallel.
>>>    It uses 512-bit width instructions/registers and provides higher
>>>    performance then AVX512X16, but can cause frequency level change.
>>>    On my SKX box test-acl shows ~50-70% improvement
>>>    (depending on rule-set and input burst size)
>>>    when switching from AVX2 to AVX512X32 classify algorithms.
>>>    ICX and CLX testing showed similar level of speedup.
>>>
>>> Current AVX512 classify implementation is only supported on x86_64.
>>> Note that this series introduce a formal ABI incompatibility
>>
>> The only API change I can see is in rte_acl_classify_alg() new error
>> code but I don't think we need an announcement for this.
>> As for ABI, we are breaking it in this release, so I see no pb.
> 
> Cool, I just wanted to underline that patch #3:
> https://patches.dpdk.org/patch/79786/
> is a formal ABI breakage.

As David said, this is an ABI breaking release - so there is no requirement to maintain compatibility. 

https://doc.dpdk.org/guides/contributing/abi_policy.html

However the following requirements remain:-

* The acknowledgment of the maintainer of the component is mandatory, or if no maintainer is available for the component, the tree/sub-tree maintainer for that component must acknowledge the ABI change instead.
* The acknowledgment of three members of the technical board, as delegates of the technical board acknowledging the need for the ABI change, is also mandatory.

I guess you are the maintainer in this case, so that requirement is satisfied. 

> 
>>
>>
>>> with previous versions of ACL library.
>>>
>>> v2 -> v3:
>>>   Fix checkpatch warnings
>>>   Split AVX512 algorithm into two and deduplicate common code
>>
>> Patch 7 still references a RTE_MACHINE_CPUFLAG flag.
>> Can you rework now that those flags have been dropped?
>>
> 
> Should be fixed in v4:
> https://patches.dpdk.org/project/dpdk/list/?series=12721
> 
> One more thing to mention - this series has a dependency on Vladimir's patch:
> https://patches.dpdk.org/patch/79310/ ("eal/x86: introduce AVX 512-bit type"),
> so CI/travis would still report an error.
> 
> Thanks
> Konstantin
>