From patchwork Tue Sep 15 16:50:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77763 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 535E8A04C7; Tue, 15 Sep 2020 18:50:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 371D61C124; Tue, 15 Sep 2020 18:50:57 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id E8A9D1C124; Tue, 15 Sep 2020 18:50:55 +0200 (CEST) IronPort-SDR: vy3h9Akpbo0vgsX444heVCdsjYzwFsylKd3JNOVBxwqegkMmYrATM026qYRrSC8k7yxMUP29i7 qnUhdpsKEsgg== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139310950" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139310950" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:50:54 -0700 IronPort-SDR: H0klmcRwTGuOyWGfsrk9HKXJlyoB/6/QrLqrX98AhaQkZGQxbMXFHc1SeXGOji6MkgLrZaDeAu na09Gp+yOI4Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709312" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:50:53 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev , stable@dpdk.org Date: Tue, 15 Sep 2020 17:50:14 +0100 Message-Id: <20200915165025.543-2-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 01/12] acl: fix x86 build when compiler doesn't support AVX2 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Right now we define dummy version of rte_acl_classify_avx2() when both X86 and AVX2 are not detected, though it should be for non-AVX2 case only. Fixes: e53ce4e41379 ("acl: remove use of weak functions") Cc: stable@dpdk.org Signed-off-by: Konstantin Ananyev --- lib/librte_acl/rte_acl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c index 777ec4d34..715b02359 100644 --- a/lib/librte_acl/rte_acl.c +++ b/lib/librte_acl/rte_acl.c @@ -16,7 +16,6 @@ static struct rte_tailq_elem rte_acl_tailq = { }; EAL_REGISTER_TAILQ(rte_acl_tailq) -#ifndef RTE_ARCH_X86 #ifndef CC_AVX2_SUPPORT /* * If the compiler doesn't support AVX2 instructions, @@ -33,6 +32,7 @@ rte_acl_classify_avx2(__rte_unused const struct rte_acl_ctx *ctx, } #endif +#ifndef RTE_ARCH_X86 int rte_acl_classify_sse(__rte_unused const struct rte_acl_ctx *ctx, __rte_unused const uint8_t **data, From patchwork Tue Sep 15 16:50:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77764 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0D7E4A04C7; Tue, 15 Sep 2020 18:51:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B6CB71C131; Tue, 15 Sep 2020 18:51:00 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 373761C12C; Tue, 15 Sep 2020 18:50:59 +0200 (CEST) IronPort-SDR: 8qNBZJO/BTWGhjA7eUxTs025eS/SOya+Z/bmuOa4/4tcAqZ4PqzsfVgmMJyB3kDXMzloPBphbR BpmZXKrS3Wlw== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139310959" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139310959" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:50:58 -0700 IronPort-SDR: 4jjPQJAOTEMRBhV1L4OKjeqFG/R8GgCwYV/js3VJ6pP5m+LQ0wwKWNuPm0ddL6LGMCKse3dFM5 hBPi0VyBhRhw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709330" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:50:57 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev , stable@dpdk.org Date: Tue, 15 Sep 2020 17:50:15 +0100 Message-Id: <20200915165025.543-3-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 02/12] doc: fix mixing classify methods in ACL guide X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add brief description for missing ACL classify algorithms: RTE_ACL_CLASSIFY_NEON and RTE_ACL_CLASSIFY_ALTIVEC. Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") Fixes: 1d73135f9f1c ("acl: add AltiVec for ppc64") Cc: stable@dpdk.org Signed-off-by: Konstantin Ananyev --- doc/guides/prog_guide/packet_classif_access_ctrl.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/prog_guide/packet_classif_access_ctrl.rst b/doc/guides/prog_guide/packet_classif_access_ctrl.rst index 0345512b9..daf03e6d7 100644 --- a/doc/guides/prog_guide/packet_classif_access_ctrl.rst +++ b/doc/guides/prog_guide/packet_classif_access_ctrl.rst @@ -373,6 +373,12 @@ There are several implementations of classify algorithm: * **RTE_ACL_CLASSIFY_AVX2**: vector implementation, can process up to 16 flows in parallel. Requires AVX2 support. +* **RTE_ACL_CLASSIFY_NEON**: vector implementation, can process up to 8 flows + in parallel. Requires NEON support. + +* **RTE_ACL_CLASSIFY_ALTIVEC**: vector implementation, can process up to 8 + flows in parallel. Requires ALTIVEC support. + It is purely a runtime decision which method to choose, there is no build-time difference. All implementations operates over the same internal RT structures and use similar principles. The main difference is that vector implementations can manually exploit IA SIMD instructions and process several input data flows in parallel. At startup ACL library determines the highest available classify method for the given platform and sets it as default one. Though the user has an ability to override the default classifier function for a given ACL context or perform particular search using non-default classify method. In that case it is user responsibility to make sure that given platform supports selected classify implementation. From patchwork Tue Sep 15 16:50:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77765 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7DFB9A04C7; Tue, 15 Sep 2020 18:51:16 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0C7FA1C135; Tue, 15 Sep 2020 18:51:04 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 9494B1C135 for ; Tue, 15 Sep 2020 18:51:01 +0200 (CEST) IronPort-SDR: wajr732L/fM6sTR1dILIQf3HKh9IfEoVqWkU7yTuaaBLQya8Mw0zv3vyoNGI9JW7W5tWs0klVl 9nWWCE77OZTQ== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139310964" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139310964" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:01 -0700 IronPort-SDR: 2rDKShdBW57l/6hk+ab508LoGlBCLJYcePb9wGcj3UAo08tUJBOFHn2MqVoPtbvsdUVxF7/jDB jL96nPWo80OQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709342" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:50:59 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:16 +0100 Message-Id: <20200915165025.543-4-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 03/12] acl: remove of unused enum value X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Removal of unused enum value (RTE_ACL_CLASSIFY_NUM). This enum value is not used inside DPDK, while it prevents to add new classify algorithms without causing an ABI breakage. Note that this change introduce a formal ABI incompatibility with previous versions of ACL library. Signed-off-by: Konstantin Ananyev Reviewed-by: Ruifeng Wang --- doc/guides/rel_notes/deprecation.rst | 4 ---- doc/guides/rel_notes/release_20_11.rst | 4 ++++ lib/librte_acl/rte_acl.h | 1 - 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 52168f775..3279a01ef 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -288,10 +288,6 @@ Deprecation Notices - https://patches.dpdk.org/patch/71457/ - https://patches.dpdk.org/patch/71456/ -* acl: ``RTE_ACL_CLASSIFY_NUM`` enum value will be removed. - This enum value is not used inside DPDK, while it prevents to add new - classify algorithms without causing an ABI breakage. - * sched: To allow more traffic classes, flexible mapping of pipe queues to traffic classes, and subport level configuration of pipes and queues changes will be made to macros, data structures and API functions defined diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index b729bdf20..a9a1b0305 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -97,6 +97,10 @@ API Changes and the function ``rte_rawdev_queue_conf_get()`` from ``void`` to ``int`` allowing the return of error codes from drivers. +* acl: ``RTE_ACL_CLASSIFY_NUM`` enum value has been removed. + This enum value was not used inside DPDK, while it prevented to add new + classify algorithms without causing an ABI breakage. + ABI Changes ----------- diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h index aa22e70c6..b814423a6 100644 --- a/lib/librte_acl/rte_acl.h +++ b/lib/librte_acl/rte_acl.h @@ -241,7 +241,6 @@ enum rte_acl_classify_alg { RTE_ACL_CLASSIFY_AVX2 = 3, /**< requires AVX2 support. */ RTE_ACL_CLASSIFY_NEON = 4, /**< requires NEON support. */ RTE_ACL_CLASSIFY_ALTIVEC = 5, /**< requires ALTIVEC support. */ - RTE_ACL_CLASSIFY_NUM /* should always be the last one. */ }; /** From patchwork Tue Sep 15 16:50:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77766 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4C0B1A04C7; Tue, 15 Sep 2020 18:51:27 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 18DD91C13B; Tue, 15 Sep 2020 18:51:06 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 5C9D51C139 for ; Tue, 15 Sep 2020 18:51:04 +0200 (CEST) IronPort-SDR: Re/oGYddhX7LuxksHNWSRqWloaNV/kMrRcwbm44p9ylT4Xqmf8Cbl3XI8Uk2cfFdl1w65AH321 EjqMHO0rd4oA== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139310971" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139310971" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:03 -0700 IronPort-SDR: RJ1mEritolBoRhwtp+uV59Y97onnukwLKKJnPZgOIWX8M3QpX9xfKf94Prm03OGdae5e3vW+oM mqU4hOp8EsNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709376" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:02 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:17 +0100 Message-Id: <20200915165025.543-5-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 04/12] acl: remove library constructor X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Right now ACL library determines best possible (default) classify method on a given platform with specilal constructor function rte_acl_init(). This patch makes the following changes: - Move selection of default classify method into a separate private function and call it for each ACL context creation (rte_acl_create()). - Remove library constructor function - Make rte_acl_set_ctx_classify() to check that requested algorithm is supported on given platform. The purpose of these changes to improve and simplify algorithm selection process and prepare ACL library to be integrated with: add max SIMD bitwidth to EAL (https://patches.dpdk.org/project/dpdk/list/?series=11831) patch-set Signed-off-by: Konstantin Ananyev --- lib/librte_acl/rte_acl.c | 166 ++++++++++++++++++++++++++++++--------- lib/librte_acl/rte_acl.h | 1 + 2 files changed, 132 insertions(+), 35 deletions(-) diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c index 715b02359..fbcf45fdc 100644 --- a/lib/librte_acl/rte_acl.c +++ b/lib/librte_acl/rte_acl.c @@ -79,57 +79,153 @@ static const rte_acl_classify_t classify_fns[] = { [RTE_ACL_CLASSIFY_ALTIVEC] = rte_acl_classify_altivec, }; -/* by default, use always available scalar code path. */ -static enum rte_acl_classify_alg rte_acl_default_classify = - RTE_ACL_CLASSIFY_SCALAR; +/* + * Helper function for acl_check_alg. + * Check support for ARM specific classify methods. + */ +static int +acl_check_alg_arm(enum rte_acl_classify_alg alg) +{ + if (alg == RTE_ACL_CLASSIFY_NEON) { +#if defined(RTE_ARCH_ARM64) + return 0; +#elif defined(RTE_ARCH_ARM) + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) + return 0; + return -ENOTSUP; +#else + return -ENOTSUP; +#endif + } + + return -EINVAL; +} -static void -rte_acl_set_default_classify(enum rte_acl_classify_alg alg) +/* + * Helper function for acl_check_alg. + * Check support for PPC specific classify methods. + */ +static int +acl_check_alg_ppc(enum rte_acl_classify_alg alg) { - rte_acl_default_classify = alg; + if (alg == RTE_ACL_CLASSIFY_ALTIVEC) { +#if defined(RTE_ARCH_PPC_64) + return 0; +#else + return -ENOTSUP; +#endif + } + + return -EINVAL; } -extern int -rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_classify_alg alg) +/* + * Helper function for acl_check_alg. + * Check support for x86 specific classify methods. + */ +static int +acl_check_alg_x86(enum rte_acl_classify_alg alg) { - if (ctx == NULL || (uint32_t)alg >= RTE_DIM(classify_fns)) - return -EINVAL; + if (alg == RTE_ACL_CLASSIFY_AVX2) { +#ifdef CC_AVX2_SUPPORT + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) + return 0; +#endif + return -ENOTSUP; + } - ctx->alg = alg; - return 0; + if (alg == RTE_ACL_CLASSIFY_SSE) { +#ifdef RTE_ARCH_X86 + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) + return 0; +#endif + return -ENOTSUP; + } + + return -EINVAL; } /* - * Select highest available classify method as default one. - * Note that CLASSIFY_AVX2 should be set as a default only - * if both conditions are met: - * at build time compiler supports AVX2 and target cpu supports AVX2. + * Check if input alg is supported by given platform/binary. + * Note that both conditions should be met: + * - at build time compiler supports ISA used by given methos + * at run time target cpu supports necessary ISA. */ -RTE_INIT(rte_acl_init) +static int +acl_check_alg(enum rte_acl_classify_alg alg) { - enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT; + switch (alg) { + case RTE_ACL_CLASSIFY_NEON: + return acl_check_alg_arm(alg); + case RTE_ACL_CLASSIFY_ALTIVEC: + return acl_check_alg_ppc(alg); + case RTE_ACL_CLASSIFY_AVX2: + case RTE_ACL_CLASSIFY_SSE: + return acl_check_alg_x86(alg); + /* scalar method is supported on all platforms */ + case RTE_ACL_CLASSIFY_SCALAR: + return 0; + default: + return -EINVAL; + } +} -#if defined(RTE_ARCH_ARM64) - alg = RTE_ACL_CLASSIFY_NEON; -#elif defined(RTE_ARCH_ARM) - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) - alg = RTE_ACL_CLASSIFY_NEON; +/* + * Get preferred alg for given platform. + */ +static enum rte_acl_classify_alg +acl_get_best_alg(void) +{ + /* + * array of supported methods for each platform. + * Note that order is important - from most to less preferable. + */ + static const enum rte_acl_classify_alg alg[] = { +#if defined(RTE_ARCH_ARM) + RTE_ACL_CLASSIFY_NEON, #elif defined(RTE_ARCH_PPC_64) - alg = RTE_ACL_CLASSIFY_ALTIVEC; -#else -#ifdef CC_AVX2_SUPPORT - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) - alg = RTE_ACL_CLASSIFY_AVX2; - else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) -#else - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) + RTE_ACL_CLASSIFY_ALTIVEC, +#elif defined(RTE_ARCH_X86) + RTE_ACL_CLASSIFY_AVX2, + RTE_ACL_CLASSIFY_SSE, #endif - alg = RTE_ACL_CLASSIFY_SSE; + RTE_ACL_CLASSIFY_SCALAR, + }; -#endif - rte_acl_set_default_classify(alg); + uint32_t i; + + /* find best possible alg */ + for (i = 0; i != RTE_DIM(alg) && acl_check_alg(alg[i]) != 0; i++) + ; + + /* we always have to find something suitable */ + RTE_VERIFY(i != RTE_DIM(alg)); + return alg[i]; +} + +extern int +rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_classify_alg alg) +{ + int32_t rc; + + /* formal parameters check */ + if (ctx == NULL || (uint32_t)alg >= RTE_DIM(classify_fns)) + return -EINVAL; + + /* user asked us to select the *best* one */ + if (alg == RTE_ACL_CLASSIFY_DEFAULT) + alg = acl_get_best_alg(); + + /* check that given alg is supported */ + rc = acl_check_alg(alg); + if (rc != 0) + return rc; + + ctx->alg = alg; + return 0; } + int rte_acl_classify_alg(const struct rte_acl_ctx *ctx, const uint8_t **data, uint32_t *results, uint32_t num, uint32_t categories, @@ -262,7 +358,7 @@ rte_acl_create(const struct rte_acl_param *param) ctx->max_rules = param->max_rule_num; ctx->rule_sz = param->rule_size; ctx->socket_id = param->socket_id; - ctx->alg = rte_acl_default_classify; + ctx->alg = acl_get_best_alg(); strlcpy(ctx->name, param->name, sizeof(ctx->name)); te->data = (void *) ctx; diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h index b814423a6..3999f15de 100644 --- a/lib/librte_acl/rte_acl.h +++ b/lib/librte_acl/rte_acl.h @@ -329,6 +329,7 @@ rte_acl_classify_alg(const struct rte_acl_ctx *ctx, * existing algorithm, and that it could be run on the given CPU. * @return * - -EINVAL if the parameters are invalid. + * - -ENOTSUP requested algorithm is not supported by given platform. * - Zero if operation completed successfully. */ extern int From patchwork Tue Sep 15 16:50:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77767 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 49321A04C7; Tue, 15 Sep 2020 18:51:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5C1451C194; Tue, 15 Sep 2020 18:51:07 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 7EC511C191 for ; Tue, 15 Sep 2020 18:51:06 +0200 (CEST) IronPort-SDR: 2al2VBKC6zTTlsDnmG6LoO5YHI8kBv7c1gmWjZWPWY39DqC2vgN+avX93OI2fFvOU/2AnLHJ9C nbYDt2WIPEow== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139310982" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139310982" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:06 -0700 IronPort-SDR: jh3TnRVESISoVKPxugliuBBLrzHQDzuLPXE3dGaoKLUMP3SbROqICEiMCEG+XEOXGz9etmZUge MsycQNMEmfWQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709403" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:04 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:18 +0100 Message-Id: <20200915165025.543-6-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 05/12] app/acl: few small improvements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" - enhance output to print extra stats - use rte_rdtsc_precise() for cycle measurements Signed-off-by: Konstantin Ananyev --- app/test-acl/main.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/app/test-acl/main.c b/app/test-acl/main.c index 0a5dfb621..d9b65517c 100644 --- a/app/test-acl/main.c +++ b/app/test-acl/main.c @@ -862,9 +862,10 @@ search_ip5tuples(__rte_unused void *arg) { uint64_t pkt, start, tm; uint32_t i, lcore; + long double st; lcore = rte_lcore_id(); - start = rte_rdtsc(); + start = rte_rdtsc_precise(); pkt = 0; for (i = 0; i != config.iter_num; i++) { @@ -872,12 +873,16 @@ search_ip5tuples(__rte_unused void *arg) config.trace_step, config.alg.name); } - tm = rte_rdtsc() - start; + tm = rte_rdtsc_precise() - start; + + st = (long double)tm / rte_get_timer_hz(); dump_verbose(DUMP_NONE, stdout, "%s @lcore %u: %" PRIu32 " iterations, %" PRIu64 " pkts, %" - PRIu32 " categories, %" PRIu64 " cycles, %#Lf cycles/pkt\n", - __func__, lcore, i, pkt, config.run_categories, - tm, (pkt == 0) ? 0 : (long double)tm / pkt); + PRIu32 " categories, %" PRIu64 " cycles (%.2Lf sec), " + "%.2Lf cycles/pkt, %.2Lf pkt/sec\n", + __func__, lcore, i, pkt, + config.run_categories, tm, st, + (pkt == 0) ? 0 : (long double)tm / pkt, pkt / st); return 0; } From patchwork Tue Sep 15 16:50:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77768 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6AF75A04C7; Tue, 15 Sep 2020 18:51:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6A0A71C198; Tue, 15 Sep 2020 18:51:10 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id E7D451C198 for ; Tue, 15 Sep 2020 18:51:08 +0200 (CEST) IronPort-SDR: 4adVAMNmsmk6HYSzz+vUvc0KoYN/BhKqSalkgJHyyKmnTpAaISJSCvHhyrpWwMvq4yrhy6iXoe C5iLiz4DXrFw== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139310995" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139310995" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:08 -0700 IronPort-SDR: Q3vXBlZw1SbjE2CF6TUCHetxOu/aKGC8d2cIL3od5+SWBeFSXiz58PrNb49ZEF0TXacnVUG1BY uTtSaWU0Klgw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709423" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:07 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:19 +0100 Message-Id: <20200915165025.543-7-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 06/12] test/acl: expand classify test coverage X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Make classify test to run for all supported methods. Signed-off-by: Konstantin Ananyev --- app/test/test_acl.c | 103 ++++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 52 deletions(-) diff --git a/app/test/test_acl.c b/app/test/test_acl.c index 316bf4d06..333b34757 100644 --- a/app/test/test_acl.c +++ b/app/test/test_acl.c @@ -266,22 +266,20 @@ rte_acl_ipv4vlan_build(struct rte_acl_ctx *ctx, } /* - * Test scalar and SSE ACL lookup. + * Test ACL lookup (selected alg). */ static int -test_classify_run(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], - size_t dim) +test_classify_alg(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], + const uint8_t *data[], size_t dim, enum rte_acl_classify_alg alg) { - int ret, i; - uint32_t result, count; + int32_t ret; + uint32_t i, result, count; uint32_t results[dim * RTE_ACL_MAX_CATEGORIES]; - const uint8_t *data[dim]; - /* swap all bytes in the data to network order */ - bswap_test_data(test_data, dim, 1); - /* store pointers to test data */ - for (i = 0; i < (int) dim; i++) - data[i] = (uint8_t *)&test_data[i]; + /* set given classify alg, skip test if alg is not supported */ + ret = rte_acl_set_ctx_classify(acx, alg); + if (ret == -ENOTSUP) + return 0; /** * these will run quite a few times, it's necessary to test code paths @@ -291,12 +289,13 @@ test_classify_run(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], ret = rte_acl_classify(acx, data, results, count, RTE_ACL_MAX_CATEGORIES); if (ret != 0) { - printf("Line %i: SSE classify failed!\n", __LINE__); - goto err; + printf("Line %i: classify(alg=%d) failed!\n", + __LINE__, alg); + return ret; } /* check if we allow everything we should allow */ - for (i = 0; i < (int) count; i++) { + for (i = 0; i < count; i++) { result = results[i * RTE_ACL_MAX_CATEGORIES + ACL_ALLOW]; if (result != test_data[i].allow) { @@ -304,63 +303,63 @@ test_classify_run(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], "(expected %"PRIu32" got %"PRIu32")!\n", __LINE__, i, test_data[i].allow, result); - ret = -EINVAL; - goto err; + return -EINVAL; } } /* check if we deny everything we should deny */ - for (i = 0; i < (int) count; i++) { + for (i = 0; i < count; i++) { result = results[i * RTE_ACL_MAX_CATEGORIES + ACL_DENY]; if (result != test_data[i].deny) { printf("Line %i: Error in deny results at %i " "(expected %"PRIu32" got %"PRIu32")!\n", __LINE__, i, test_data[i].deny, result); - ret = -EINVAL; - goto err; + return -EINVAL; } } } - /* make a quick check for scalar */ - ret = rte_acl_classify_alg(acx, data, results, - dim, RTE_ACL_MAX_CATEGORIES, - RTE_ACL_CLASSIFY_SCALAR); - if (ret != 0) { - printf("Line %i: scalar classify failed!\n", __LINE__); - goto err; - } + /* restore default classify alg */ + return rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_DEFAULT); +} - /* check if we allow everything we should allow */ - for (i = 0; i < (int) dim; i++) { - result = results[i * RTE_ACL_MAX_CATEGORIES + ACL_ALLOW]; - if (result != test_data[i].allow) { - printf("Line %i: Error in allow results at %i " - "(expected %"PRIu32" got %"PRIu32")!\n", - __LINE__, i, test_data[i].allow, - result); - ret = -EINVAL; - goto err; - } - } +/* + * Test ACL lookup (all possible methods). + */ +static int +test_classify_run(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], + size_t dim) +{ + int32_t ret; + uint32_t i; + const uint8_t *data[dim]; - /* check if we deny everything we should deny */ - for (i = 0; i < (int) dim; i++) { - result = results[i * RTE_ACL_MAX_CATEGORIES + ACL_DENY]; - if (result != test_data[i].deny) { - printf("Line %i: Error in deny results at %i " - "(expected %"PRIu32" got %"PRIu32")!\n", - __LINE__, i, test_data[i].deny, - result); - ret = -EINVAL; - goto err; - } - } + static const enum rte_acl_classify_alg alg[] = { + RTE_ACL_CLASSIFY_SCALAR, + RTE_ACL_CLASSIFY_SSE, + RTE_ACL_CLASSIFY_AVX2, + RTE_ACL_CLASSIFY_NEON, + RTE_ACL_CLASSIFY_ALTIVEC, + }; + + /* swap all bytes in the data to network order */ + bswap_test_data(test_data, dim, 1); + + /* store pointers to test data */ + for (i = 0; i < dim; i++) + data[i] = (uint8_t *)&test_data[i]; ret = 0; + for (i = 0; i != RTE_DIM(alg); i++) { + ret = test_classify_alg(acx, test_data, data, dim, alg[i]); + if (ret < 0) { + printf("Line %i: %s() for alg=%d failed, errno=%d\n", + __LINE__, __func__, alg[i], -ret); + break; + } + } -err: /* swap data back to cpu order so that next time tests don't fail */ bswap_test_data(test_data, dim, 0); return ret; From patchwork Tue Sep 15 16:50:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77769 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0B5CBA04C7; Tue, 15 Sep 2020 18:52:04 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 03F501C1A8; Tue, 15 Sep 2020 18:51:13 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 535BC1C1A0 for ; Tue, 15 Sep 2020 18:51:11 +0200 (CEST) IronPort-SDR: bWo8IzCot8JTPu4n4u12MLVulQalTRB0aQdAjVFmDGjFH9MKVPUS63y3a9+mj1ZukVYASugnvO u3BGSqinegTA== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139311001" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139311001" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:10 -0700 IronPort-SDR: PlgVQtPV52o/7RXMrkaVIzfUL6MWk4jXrO2r52h3ZPd3h726kqALs2qWiJ04oM9zme3A0oNr19 wm0chjuTNcAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709448" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:09 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:20 +0100 Message-Id: <20200915165025.543-8-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 07/12] acl: add infrastructure to support AVX512 classify X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add necessary changes to support new AVX512 specific ACL classify algorithm: - changes in meson.build to check that build tools (compiler, assembler, etc.) do properly support AVX512. - run-time checks to make sure target platform does support AVX512. - dummy rte_acl_classify_avx512() for targets where AVX512 implementation couldn't be properly supported. Signed-off-by: Konstantin Ananyev Acked-by: Bruce Richardson --- config/x86/meson.build | 3 ++- lib/librte_acl/acl.h | 4 ++++ lib/librte_acl/acl_run_avx512.c | 17 ++++++++++++++ lib/librte_acl/meson.build | 39 +++++++++++++++++++++++++++++++++ lib/librte_acl/rte_acl.c | 29 ++++++++++++++++++++++++ lib/librte_acl/rte_acl.h | 1 + 6 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 lib/librte_acl/acl_run_avx512.c diff --git a/config/x86/meson.build b/config/x86/meson.build index 6ec020ef6..c5626e914 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -23,7 +23,8 @@ foreach f:base_flags endforeach optional_flags = ['AES', 'PCLMUL', - 'AVX', 'AVX2', 'AVX512F', + 'AVX', 'AVX2', + 'AVX512F', 'AVX512VL', 'AVX512CD', 'AVX512BW', 'RDRND', 'RDSEED'] foreach f:optional_flags if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h index 39d45a0c2..2022cf253 100644 --- a/lib/librte_acl/acl.h +++ b/lib/librte_acl/acl.h @@ -201,6 +201,10 @@ int rte_acl_classify_avx2(const struct rte_acl_ctx *ctx, const uint8_t **data, uint32_t *results, uint32_t num, uint32_t categories); +int +rte_acl_classify_avx512(const struct rte_acl_ctx *ctx, const uint8_t **data, + uint32_t *results, uint32_t num, uint32_t categories); + int rte_acl_classify_neon(const struct rte_acl_ctx *ctx, const uint8_t **data, uint32_t *results, uint32_t num, uint32_t categories); diff --git a/lib/librte_acl/acl_run_avx512.c b/lib/librte_acl/acl_run_avx512.c new file mode 100644 index 000000000..67274989d --- /dev/null +++ b/lib/librte_acl/acl_run_avx512.c @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include "acl_run_sse.h" + +int +rte_acl_classify_avx512(const struct rte_acl_ctx *ctx, const uint8_t **data, + uint32_t *results, uint32_t num, uint32_t categories) +{ + if (num >= MAX_SEARCHES_SSE8) + return search_sse_8(ctx, data, results, num, categories); + if (num >= MAX_SEARCHES_SSE4) + return search_sse_4(ctx, data, results, num, categories); + + return rte_acl_classify_scalar(ctx, data, results, num, categories); +} diff --git a/lib/librte_acl/meson.build b/lib/librte_acl/meson.build index d1e2c184c..b2fd61cad 100644 --- a/lib/librte_acl/meson.build +++ b/lib/librte_acl/meson.build @@ -27,6 +27,45 @@ if dpdk_conf.has('RTE_ARCH_X86') cflags += '-DCC_AVX2_SUPPORT' endif + # compile AVX512 version if: + # we are building 64-bit binary AND binutils can generate proper code + + if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok.returncode() == 0 + + # compile AVX512 version if either: + # a. we have AVX512 supported in minimum instruction set + # baseline + # b. it's not minimum instruction set, but supported by + # compiler + # + # in former case, just add avx512 C file to files list + # in latter case, compile c file to static lib, using correct + # compiler flags, and then have the .o file from static lib + # linked into main lib. + + if dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512F') and \ + dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512VL') and \ + dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512CD') and \ + dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512BW') + + sources += files('acl_run_avx512.c') + cflags += '-DCC_AVX512_SUPPORT' + + elif cc.has_multi_arguments('-mavx512f', '-mavx512vl', + '-mavx512cd', '-mavx512bw') + + avx512_tmplib = static_library('avx512_tmp', + 'acl_run_avx512.c', + dependencies: static_rte_eal, + c_args: cflags + + ['-mavx512f', '-mavx512vl', + '-mavx512cd', '-mavx512bw']) + objs += avx512_tmplib.extract_objects( + 'acl_run_avx512.c') + cflags += '-DCC_AVX512_SUPPORT' + endif + endif + elif dpdk_conf.has('RTE_ARCH_ARM') or dpdk_conf.has('RTE_ARCH_ARM64') cflags += '-flax-vector-conversions' sources += files('acl_run_neon.c') diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c index fbcf45fdc..fdcb7a798 100644 --- a/lib/librte_acl/rte_acl.c +++ b/lib/librte_acl/rte_acl.c @@ -16,6 +16,22 @@ static struct rte_tailq_elem rte_acl_tailq = { }; EAL_REGISTER_TAILQ(rte_acl_tailq) +#ifndef CC_AVX512_SUPPORT +/* + * If the compiler doesn't support AVX512 instructions, + * then the dummy one would be used instead for AVX512 classify method. + */ +int +rte_acl_classify_avx512(__rte_unused const struct rte_acl_ctx *ctx, + __rte_unused const uint8_t **data, + __rte_unused uint32_t *results, + __rte_unused uint32_t num, + __rte_unused uint32_t categories) +{ + return -ENOTSUP; +} +#endif + #ifndef CC_AVX2_SUPPORT /* * If the compiler doesn't support AVX2 instructions, @@ -77,6 +93,7 @@ static const rte_acl_classify_t classify_fns[] = { [RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_avx2, [RTE_ACL_CLASSIFY_NEON] = rte_acl_classify_neon, [RTE_ACL_CLASSIFY_ALTIVEC] = rte_acl_classify_altivec, + [RTE_ACL_CLASSIFY_AVX512] = rte_acl_classify_avx512, }; /* @@ -126,6 +143,17 @@ acl_check_alg_ppc(enum rte_acl_classify_alg alg) static int acl_check_alg_x86(enum rte_acl_classify_alg alg) { + if (alg == RTE_ACL_CLASSIFY_AVX512) { +#ifdef CC_AVX512_SUPPORT + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512CD) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW)) + return 0; +#endif + return -ENOTSUP; + } + if (alg == RTE_ACL_CLASSIFY_AVX2) { #ifdef CC_AVX2_SUPPORT if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) @@ -159,6 +187,7 @@ acl_check_alg(enum rte_acl_classify_alg alg) return acl_check_alg_arm(alg); case RTE_ACL_CLASSIFY_ALTIVEC: return acl_check_alg_ppc(alg); + case RTE_ACL_CLASSIFY_AVX512: case RTE_ACL_CLASSIFY_AVX2: case RTE_ACL_CLASSIFY_SSE: return acl_check_alg_x86(alg); diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h index 3999f15de..d243a1c84 100644 --- a/lib/librte_acl/rte_acl.h +++ b/lib/librte_acl/rte_acl.h @@ -241,6 +241,7 @@ enum rte_acl_classify_alg { RTE_ACL_CLASSIFY_AVX2 = 3, /**< requires AVX2 support. */ RTE_ACL_CLASSIFY_NEON = 4, /**< requires NEON support. */ RTE_ACL_CLASSIFY_ALTIVEC = 5, /**< requires ALTIVEC support. */ + RTE_ACL_CLASSIFY_AVX512 = 6, /**< requires AVX512 support. */ }; /** From patchwork Tue Sep 15 16:50:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77770 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A5C25A04C7; Tue, 15 Sep 2020 18:52:15 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3D84E1C1AE; Tue, 15 Sep 2020 18:51:16 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 21CDD1C1AD for ; Tue, 15 Sep 2020 18:51:13 +0200 (CEST) IronPort-SDR: eybAKa8aaTIANSM2GIvygsM4jG4D+Ubob1s6I1krrbwLSD4I1HuH62kEOlfp7g9vtKwC4IOFyn 4hXNg8mB7spg== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139311007" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139311007" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:13 -0700 IronPort-SDR: qa+JqtzFw4KBpiei+I6rgoLYFx+oCYQNEYksB4rnx+qab6aqT2ccetWWqkPisuxYVEYlaVLqdz fS+3UPkWi7+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709492" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:12 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:21 +0100 Message-Id: <20200915165025.543-9-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 08/12] acl: introduce AVX512 classify implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Introduce classify implementation that uses AVX512 specific ISA. Current approach uses mix of 256i/512-bit width registers/instructions and is able to process up to 16 flows in parallel. Note that for now only 64-bit version of rte_acl_classify_avx512() is available. Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl.h | 7 + lib/librte_acl/acl_gen.c | 2 +- lib/librte_acl/acl_run_avx512.c | 145 +++++++ lib/librte_acl/acl_run_avx512x8.h | 620 ++++++++++++++++++++++++++++++ 4 files changed, 773 insertions(+), 1 deletion(-) create mode 100644 lib/librte_acl/acl_run_avx512x8.h diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h index 2022cf253..3f0719f33 100644 --- a/lib/librte_acl/acl.h +++ b/lib/librte_acl/acl.h @@ -76,6 +76,13 @@ struct rte_acl_bitset { * input_byte - ((uint8_t *)&transition)[4 + input_byte / 64]. */ +/* + * Each ACL RT contains an idle nomatch node: + * a SINGLE node at predefined position (RTE_ACL_DFA_SIZE) + * that points to itself. + */ +#define RTE_ACL_IDLE_NODE (RTE_ACL_DFA_SIZE | RTE_ACL_NODE_SINGLE) + /* * Structure of a node is a set of ptrs and each ptr has a bit map * of values associated with this transition. diff --git a/lib/librte_acl/acl_gen.c b/lib/librte_acl/acl_gen.c index f1b9d12f1..e759a2ca1 100644 --- a/lib/librte_acl/acl_gen.c +++ b/lib/librte_acl/acl_gen.c @@ -496,7 +496,7 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie, * highest index, that points to itself) */ - node_array[RTE_ACL_DFA_SIZE] = RTE_ACL_DFA_SIZE | RTE_ACL_NODE_SINGLE; + node_array[RTE_ACL_DFA_SIZE] = RTE_ACL_IDLE_NODE; for (n = 0; n < RTE_ACL_DFA_SIZE; n++) node_array[n] = no_match; diff --git a/lib/librte_acl/acl_run_avx512.c b/lib/librte_acl/acl_run_avx512.c index 67274989d..353a3c004 100644 --- a/lib/librte_acl/acl_run_avx512.c +++ b/lib/librte_acl/acl_run_avx512.c @@ -4,10 +4,155 @@ #include "acl_run_sse.h" +/*sizeof(uint32_t) << match_log == sizeof(struct rte_acl_match_results)*/ +static const uint32_t match_log = 5; + +struct acl_flow_avx512 { + uint32_t num_packets; /* number of packets processed */ + uint32_t total_packets; /* max number of packets to process */ + uint32_t root_index; /* current root index */ + const uint64_t *trans; /* transition table */ + const uint32_t *data_index; /* input data indexes */ + const uint8_t **idata; /* input data */ + uint32_t *matches; /* match indexes */ +}; + +static inline void +acl_set_flow_avx512(struct acl_flow_avx512 *flow, const struct rte_acl_ctx *ctx, + uint32_t trie, const uint8_t *data[], uint32_t *matches, + uint32_t total_packets) +{ + flow->num_packets = 0; + flow->total_packets = total_packets; + flow->root_index = ctx->trie[trie].root_index; + flow->trans = ctx->trans_table; + flow->data_index = ctx->trie[trie].data_index; + flow->idata = data; + flow->matches = matches; +} + +/* + * Resolve matches for multiple categories (LE 8, use 128b instuctions/regs) + */ +static inline void +resolve_mcle8_avx512x1(uint32_t result[], + const struct rte_acl_match_results pr[], const uint32_t match[], + uint32_t nb_pkt, uint32_t nb_cat, uint32_t nb_trie) +{ + const int32_t *pri; + const uint32_t *pm, *res; + uint32_t i, j, k, mi, mn; + __mmask8 msk; + xmm_t cp, cr, np, nr; + + res = pr->results; + pri = pr->priority; + + for (k = 0; k != nb_pkt; k++, result += nb_cat) { + + mi = match[k] << match_log; + + for (j = 0; j != nb_cat; j += RTE_ACL_RESULTS_MULTIPLIER) { + + cr = _mm_loadu_si128((const xmm_t *)(res + mi + j)); + cp = _mm_loadu_si128((const xmm_t *)(pri + mi + j)); + + for (i = 1, pm = match + nb_pkt; i != nb_trie; + i++, pm += nb_pkt) { + + mn = j + (pm[k] << match_log); + + nr = _mm_loadu_si128((const xmm_t *)(res + mn)); + np = _mm_loadu_si128((const xmm_t *)(pri + mn)); + + msk = _mm_cmpgt_epi32_mask(cp, np); + cr = _mm_mask_mov_epi32(nr, msk, cr); + cp = _mm_mask_mov_epi32(np, msk, cp); + } + + _mm_storeu_si128((xmm_t *)(result + j), cr); + } + } +} + +/* + * Resolve matches for multiple categories (GT 8, use 512b instuctions/regs) + */ +static inline void +resolve_mcgt8_avx512x1(uint32_t result[], + const struct rte_acl_match_results pr[], const uint32_t match[], + uint32_t nb_pkt, uint32_t nb_cat, uint32_t nb_trie) +{ + const int32_t *pri; + const uint32_t *pm, *res; + uint32_t i, k, mi; + __mmask16 cm, sm; + __m512i cp, cr, np, nr; + + const uint32_t match_log = 5; + + res = pr->results; + pri = pr->priority; + + cm = (1 << nb_cat) - 1; + + for (k = 0; k != nb_pkt; k++, result += nb_cat) { + + mi = match[k] << match_log; + + cr = _mm512_maskz_loadu_epi32(cm, res + mi); + cp = _mm512_maskz_loadu_epi32(cm, pri + mi); + + for (i = 1, pm = match + nb_pkt; i != nb_trie; + i++, pm += nb_pkt) { + + mi = pm[k] << match_log; + + nr = _mm512_maskz_loadu_epi32(cm, res + mi); + np = _mm512_maskz_loadu_epi32(cm, pri + mi); + + sm = _mm512_cmpgt_epi32_mask(cp, np); + cr = _mm512_mask_mov_epi32(nr, sm, cr); + cp = _mm512_mask_mov_epi32(np, sm, cp); + } + + _mm512_mask_storeu_epi32(result, cm, cr); + } +} + +static inline ymm_t +_m512_mask_gather_epi8x8(__m512i pdata, __mmask8 mask) +{ + __m512i t; + rte_ymm_t v; + __rte_x86_zmm_t p; + + static const uint32_t zero; + + t = _mm512_set1_epi64((uintptr_t)&zero); + p.z = _mm512_mask_mov_epi64(t, mask, pdata); + + v.u32[0] = *(uint8_t *)p.u64[0]; + v.u32[1] = *(uint8_t *)p.u64[1]; + v.u32[2] = *(uint8_t *)p.u64[2]; + v.u32[3] = *(uint8_t *)p.u64[3]; + v.u32[4] = *(uint8_t *)p.u64[4]; + v.u32[5] = *(uint8_t *)p.u64[5]; + v.u32[6] = *(uint8_t *)p.u64[6]; + v.u32[7] = *(uint8_t *)p.u64[7]; + + return v.y; +} + + +#include "acl_run_avx512x8.h" + int rte_acl_classify_avx512(const struct rte_acl_ctx *ctx, const uint8_t **data, uint32_t *results, uint32_t num, uint32_t categories) { + if (num >= MAX_SEARCHES_AVX16) + return search_avx512x8x2(ctx, data, results, num, categories); if (num >= MAX_SEARCHES_SSE8) return search_sse_8(ctx, data, results, num, categories); if (num >= MAX_SEARCHES_SSE4) diff --git a/lib/librte_acl/acl_run_avx512x8.h b/lib/librte_acl/acl_run_avx512x8.h new file mode 100644 index 000000000..66fc26b26 --- /dev/null +++ b/lib/librte_acl/acl_run_avx512x8.h @@ -0,0 +1,620 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#define NUM_AVX512X8X2 (2 * CHAR_BIT) +#define MSK_AVX512X8X2 (NUM_AVX512X8X2 - 1) + +static const rte_ymm_t ymm_match_mask = { + .u32 = { + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + }, +}; + +static const rte_ymm_t ymm_index_mask = { + .u32 = { + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + }, +}; + +static const rte_ymm_t ymm_trlo_idle = { + .u32 = { + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + }, +}; + +static const rte_ymm_t ymm_trhi_idle = { + .u32 = { + 0, 0, 0, 0, + 0, 0, 0, 0, + }, +}; + +static const rte_ymm_t ymm_shuffle_input = { + .u32 = { + 0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c, + 0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c, + }, +}; + +static const rte_ymm_t ymm_four_32 = { + .u32 = { + 4, 4, 4, 4, + 4, 4, 4, 4, + }, +}; + +static const rte_ymm_t ymm_idx_add = { + .u32 = { + 0, 1, 2, 3, + 4, 5, 6, 7, + }, +}; + +static const rte_ymm_t ymm_range_base = { + .u32 = { + 0xffffff00, 0xffffff04, 0xffffff08, 0xffffff0c, + 0xffffff00, 0xffffff04, 0xffffff08, 0xffffff0c, + }, +}; + +/* + * Calculate the address of the next transition for + * all types of nodes. Note that only DFA nodes and range + * nodes actually transition to another node. Match + * nodes not supposed to be encountered here. + * For quad range nodes: + * Calculate number of range boundaries that are less than the + * input value. Range boundaries for each node are in signed 8 bit, + * ordered from -128 to 127. + * This is effectively a popcnt of bytes that are greater than the + * input byte. + * Single nodes are processed in the same ways as quad range nodes. + */ +static __rte_always_inline ymm_t +calc_addr8(ymm_t index_mask, ymm_t next_input, ymm_t shuffle_input, + ymm_t four_32, ymm_t range_base, ymm_t tr_lo, ymm_t tr_hi) +{ + ymm_t addr, in, node_type, r, t; + ymm_t dfa_msk, dfa_ofs, quad_ofs; + + t = _mm256_xor_si256(index_mask, index_mask); + in = _mm256_shuffle_epi8(next_input, shuffle_input); + + /* Calc node type and node addr */ + node_type = _mm256_andnot_si256(index_mask, tr_lo); + addr = _mm256_and_si256(index_mask, tr_lo); + + /* mask for DFA type(0) nodes */ + dfa_msk = _mm256_cmpeq_epi32(node_type, t); + + /* DFA calculations. */ + r = _mm256_srli_epi32(in, 30); + r = _mm256_add_epi8(r, range_base); + t = _mm256_srli_epi32(in, 24); + r = _mm256_shuffle_epi8(tr_hi, r); + + dfa_ofs = _mm256_sub_epi32(t, r); + + /* QUAD/SINGLE calculations. */ + t = _mm256_cmpgt_epi8(in, tr_hi); + t = _mm256_lzcnt_epi32(t); + t = _mm256_srli_epi32(t, 3); + quad_ofs = _mm256_sub_epi32(four_32, t); + + /* blend DFA and QUAD/SINGLE. */ + t = _mm256_blendv_epi8(quad_ofs, dfa_ofs, dfa_msk); + + /* calculate address for next transitions. */ + addr = _mm256_add_epi32(addr, t); + return addr; +} + +/* + * Process 8 transitions in parallel. + * tr_lo contains low 32 bits for 8 transitions. + * tr_hi contains high 32 bits for 8 transitions. + * next_input contains up to 4 input bytes for 8 flows. + */ +static __rte_always_inline ymm_t +transition8(ymm_t next_input, const uint64_t *trans, ymm_t *tr_lo, ymm_t *tr_hi) +{ + const int32_t *tr; + ymm_t addr; + + tr = (const int32_t *)(uintptr_t)trans; + + /* Calculate the address (array index) for all 8 transitions. */ + addr = calc_addr8(ymm_index_mask.y, next_input, ymm_shuffle_input.y, + ymm_four_32.y, ymm_range_base.y, *tr_lo, *tr_hi); + + /* load lower 32 bits of 8 transactions at once. */ + *tr_lo = _mm256_i32gather_epi32(tr, addr, sizeof(trans[0])); + + next_input = _mm256_srli_epi32(next_input, CHAR_BIT); + + /* load high 32 bits of 8 transactions at once. */ + *tr_hi = _mm256_i32gather_epi32(tr + 1, addr, sizeof(trans[0])); + + return next_input; +} + +/* + * Execute first transition for up to 8 flows in parallel. + * next_input should contain one input byte for up to 8 flows. + * msk - mask of active flows. + * tr_lo contains low 32 bits for up to 8 transitions. + * tr_hi contains high 32 bits for up to 8 transitions. + */ +static __rte_always_inline void +first_trans8(const struct acl_flow_avx512 *flow, ymm_t next_input, + __mmask8 msk, ymm_t *tr_lo, ymm_t *tr_hi) +{ + const int32_t *tr; + ymm_t addr, root; + + tr = (const int32_t *)(uintptr_t)flow->trans; + + addr = _mm256_set1_epi32(UINT8_MAX); + root = _mm256_set1_epi32(flow->root_index); + + addr = _mm256_and_si256(next_input, addr); + addr = _mm256_add_epi32(root, addr); + + /* load lower 32 bits of 8 transactions at once. */ + *tr_lo = _mm256_mmask_i32gather_epi32(*tr_lo, msk, addr, tr, + sizeof(flow->trans[0])); + + /* load high 32 bits of 8 transactions at once. */ + *tr_hi = _mm256_mmask_i32gather_epi32(*tr_hi, msk, addr, (tr + 1), + sizeof(flow->trans[0])); +} + +/* + * Load and return next 4 input bytes for up to 8 flows in parallel. + * pdata - 8 pointers to flow input data + * mask - mask of active flows. + * di - data indexes for these 8 flows. + */ +static inline ymm_t +get_next_bytes_avx512x8(const struct acl_flow_avx512 *flow, __m512i pdata, + __mmask8 mask, ymm_t *di, uint32_t bnum) +{ + const int32_t *div; + ymm_t one, zero; + ymm_t inp, t; + __m512i p; + + div = (const int32_t *)flow->data_index; + + one = _mm256_set1_epi32(1); + zero = _mm256_xor_si256(one, one); + + /* load data offsets for given indexes */ + t = _mm256_mmask_i32gather_epi32(zero, mask, *di, div, sizeof(div[0])); + + /* increment data indexes */ + *di = _mm256_mask_add_epi32(*di, mask, *di, one); + + p = _mm512_cvtepu32_epi64(t); + p = _mm512_add_epi64(p, pdata); + + /* load input byte(s), either one or four */ + if (bnum == sizeof(uint8_t)) + inp = _m512_mask_gather_epi8x8(p, mask); + else + inp = _mm512_mask_i64gather_epi32(zero, mask, p, NULL, + sizeof(uint8_t)); + return inp; +} + +/* + * Start up to 8 new flows. + * num - number of flows to start + * msk - mask of new flows. + * pdata - pointers to flow input data + * di - data indexes for these flows. + */ +static inline void +start_flow8(struct acl_flow_avx512 *flow, uint32_t num, uint32_t msk, + __m512i *pdata, ymm_t *idx, ymm_t *di) +{ + uint32_t nm; + ymm_t ni; + __m512i nd; + + /* load input data pointers for new flows */ + nm = (1 << num) - 1; + nd = _mm512_maskz_loadu_epi64(nm, flow->idata + flow->num_packets); + + /* calculate match indexes of new flows */ + ni = _mm256_set1_epi32(flow->num_packets); + ni = _mm256_add_epi32(ni, ymm_idx_add.y); + + /* merge new and existing flows data */ + *pdata = _mm512_mask_expand_epi64(*pdata, msk, nd); + *idx = _mm256_mask_expand_epi32(*idx, msk, ni); + *di = _mm256_maskz_mov_epi32(msk ^ UINT8_MAX, *di); + + flow->num_packets += num; +} + +/* + * Update flow and result masks based on the number of unprocessed flows. + */ +static inline uint32_t +update_flow_mask8(const struct acl_flow_avx512 *flow, __mmask8 *fmsk, + __mmask8 *rmsk) +{ + uint32_t i, j, k, m, n; + + fmsk[0] ^= rmsk[0]; + m = rmsk[0]; + + k = __builtin_popcount(m); + n = flow->total_packets - flow->num_packets; + + if (n < k) { + /* reduce mask */ + for (i = k - n; i != 0; i--) { + j = sizeof(m) * CHAR_BIT - 1 - __builtin_clz(m); + m ^= 1 << j; + } + } else + n = k; + + rmsk[0] = m; + fmsk[0] |= rmsk[0]; + + return n; +} + +/* + * Process found matches for up to 8 flows. + * fmsk - mask of active flows + * rmsk - mask of found matches + * pdata - pointers to flow input data + * di - data indexes for these flows + * idx - match indexed for given flows + * tr_lo contains low 32 bits for up to 8 transitions. + * tr_hi contains high 32 bits for up to 8 transitions. + */ +static inline uint32_t +match_process_avx512x8(struct acl_flow_avx512 *flow, __mmask8 *fmsk, + __mmask8 *rmsk, __m512i *pdata, ymm_t *di, ymm_t *idx, + ymm_t *tr_lo, ymm_t *tr_hi) +{ + uint32_t n; + ymm_t res; + + if (rmsk[0] == 0) + return 0; + + /* extract match indexes */ + res = _mm256_and_si256(tr_lo[0], ymm_index_mask.y); + + /* mask matched transitions to nop */ + tr_lo[0] = _mm256_mask_mov_epi32(tr_lo[0], rmsk[0], ymm_trlo_idle.y); + tr_hi[0] = _mm256_mask_mov_epi32(tr_hi[0], rmsk[0], ymm_trhi_idle.y); + + /* save found match indexes */ + _mm256_mask_i32scatter_epi32(flow->matches, rmsk[0], + idx[0], res, sizeof(flow->matches[0])); + + /* update masks and start new flows for matches */ + n = update_flow_mask8(flow, fmsk, rmsk); + start_flow8(flow, n, rmsk[0], pdata, idx, di); + + return n; +} + + +static inline void +match_check_process_avx512x8x2(struct acl_flow_avx512 *flow, __mmask8 fm[2], + __m512i pdata[2], ymm_t di[2], ymm_t idx[2], ymm_t inp[2], + ymm_t tr_lo[2], ymm_t tr_hi[2]) +{ + uint32_t n[2]; + __mmask8 rm[2]; + + /* check for matches */ + rm[0] = _mm256_test_epi32_mask(tr_lo[0], ymm_match_mask.y); + rm[1] = _mm256_test_epi32_mask(tr_lo[1], ymm_match_mask.y); + + /* till unprocessed matches exist */ + while ((rm[0] | rm[1]) != 0) { + + /* process matches and start new flows */ + n[0] = match_process_avx512x8(flow, &fm[0], &rm[0], &pdata[0], + &di[0], &idx[0], &tr_lo[0], &tr_hi[0]); + n[1] = match_process_avx512x8(flow, &fm[1], &rm[1], &pdata[1], + &di[1], &idx[1], &tr_lo[1], &tr_hi[1]); + + /* execute first transition for new flows, if any */ + + if (n[0] != 0) { + inp[0] = get_next_bytes_avx512x8(flow, pdata[0], rm[0], + &di[0], sizeof(uint8_t)); + first_trans8(flow, inp[0], rm[0], &tr_lo[0], &tr_hi[0]); + + rm[0] = _mm256_test_epi32_mask(tr_lo[0], + ymm_match_mask.y); + } + + if (n[1] != 0) { + inp[1] = get_next_bytes_avx512x8(flow, pdata[1], rm[1], + &di[1], sizeof(uint8_t)); + first_trans8(flow, inp[1], rm[1], &tr_lo[1], &tr_hi[1]); + + rm[1] = _mm256_test_epi32_mask(tr_lo[1], + ymm_match_mask.y); + } + } +} + +/* + * Perform search for up to 16 flows in parallel. + * Use two sets of metadata, each serves 8 flows max. + * So in fact we perform search for 2x8 flows. + */ +static inline void +search_trie_avx512x8x2(struct acl_flow_avx512 *flow) +{ + __mmask8 fm[2]; + __m512i pdata[2]; + ymm_t di[2], idx[2], inp[2], tr_lo[2], tr_hi[2]; + + /* first 1B load */ + start_flow8(flow, CHAR_BIT, UINT8_MAX, &pdata[0], &idx[0], &di[0]); + start_flow8(flow, CHAR_BIT, UINT8_MAX, &pdata[1], &idx[1], &di[1]); + + inp[0] = get_next_bytes_avx512x8(flow, pdata[0], UINT8_MAX, &di[0], + sizeof(uint8_t)); + inp[1] = get_next_bytes_avx512x8(flow, pdata[1], UINT8_MAX, &di[1], + sizeof(uint8_t)); + + first_trans8(flow, inp[0], UINT8_MAX, &tr_lo[0], &tr_hi[0]); + first_trans8(flow, inp[1], UINT8_MAX, &tr_lo[1], &tr_hi[1]); + + fm[0] = UINT8_MAX; + fm[1] = UINT8_MAX; + + /* match check */ + match_check_process_avx512x8x2(flow, fm, pdata, di, idx, inp, + tr_lo, tr_hi); + + while ((fm[0] | fm[1]) != 0) { + + /* load next 4B */ + + inp[0] = get_next_bytes_avx512x8(flow, pdata[0], fm[0], + &di[0], sizeof(uint32_t)); + inp[1] = get_next_bytes_avx512x8(flow, pdata[1], fm[1], + &di[1], sizeof(uint32_t)); + + /* main 4B loop */ + + inp[0] = transition8(inp[0], flow->trans, &tr_lo[0], &tr_hi[0]); + inp[1] = transition8(inp[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + inp[0] = transition8(inp[0], flow->trans, &tr_lo[0], &tr_hi[0]); + inp[1] = transition8(inp[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + inp[0] = transition8(inp[0], flow->trans, &tr_lo[0], &tr_hi[0]); + inp[1] = transition8(inp[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + inp[0] = transition8(inp[0], flow->trans, &tr_lo[0], &tr_hi[0]); + inp[1] = transition8(inp[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + /* check for matches */ + match_check_process_avx512x8x2(flow, fm, pdata, di, idx, inp, + tr_lo, tr_hi); + } +} + +/* + * resolve match index to actual result/priority offset. + */ +static inline ymm_t +resolve_match_idx_avx512x8(ymm_t mi) +{ + RTE_BUILD_BUG_ON(sizeof(struct rte_acl_match_results) != + 1 << (match_log + 2)); + return _mm256_slli_epi32(mi, match_log); +} + + +/* + * Resolve multiple matches for the same flow based on priority. + */ +static inline ymm_t +resolve_pri_avx512x8(const int32_t res[], const int32_t pri[], + const uint32_t match[], __mmask8 msk, uint32_t nb_trie, + uint32_t nb_skip) +{ + uint32_t i; + const uint32_t *pm; + __mmask8 m; + ymm_t cp, cr, np, nr, mch; + + const ymm_t zero = _mm256_set1_epi32(0); + + mch = _mm256_maskz_loadu_epi32(msk, match); + mch = resolve_match_idx_avx512x8(mch); + + cr = _mm256_mmask_i32gather_epi32(zero, msk, mch, res, sizeof(res[0])); + cp = _mm256_mmask_i32gather_epi32(zero, msk, mch, pri, sizeof(pri[0])); + + for (i = 1, pm = match + nb_skip; i != nb_trie; + i++, pm += nb_skip) { + + mch = _mm256_maskz_loadu_epi32(msk, pm); + mch = resolve_match_idx_avx512x8(mch); + + nr = _mm256_mmask_i32gather_epi32(zero, msk, mch, res, + sizeof(res[0])); + np = _mm256_mmask_i32gather_epi32(zero, msk, mch, pri, + sizeof(pri[0])); + + m = _mm256_cmpgt_epi32_mask(cp, np); + cr = _mm256_mask_mov_epi32(nr, m, cr); + cp = _mm256_mask_mov_epi32(np, m, cp); + } + + return cr; +} + +/* + * Resolve num (<= 8) matches for single category + */ +static inline void +resolve_sc_avx512x8(uint32_t result[], const int32_t res[], const int32_t pri[], + const uint32_t match[], uint32_t nb_pkt, uint32_t nb_trie, + uint32_t nb_skip) +{ + __mmask8 msk; + ymm_t cr; + + msk = (1 << nb_pkt) - 1; + cr = resolve_pri_avx512x8(res, pri, match, msk, nb_trie, nb_skip); + _mm256_mask_storeu_epi32(result, msk, cr); +} + +/* + * Resolve matches for single category + */ +static inline void +resolve_sc_avx512x8x2(uint32_t result[], + const struct rte_acl_match_results pr[], const uint32_t match[], + uint32_t nb_pkt, uint32_t nb_trie) +{ + uint32_t i, j, k, n; + const uint32_t *pm; + const int32_t *res, *pri; + __mmask8 m[2]; + ymm_t cp[2], cr[2], np[2], nr[2], mch[2]; + + res = (const int32_t *)pr->results; + pri = pr->priority; + + for (k = 0; k != (nb_pkt & ~MSK_AVX512X8X2); k += NUM_AVX512X8X2) { + + j = k + CHAR_BIT; + + /* load match indexes for first trie */ + mch[0] = _mm256_loadu_si256((const ymm_t *)(match + k)); + mch[1] = _mm256_loadu_si256((const ymm_t *)(match + j)); + + mch[0] = resolve_match_idx_avx512x8(mch[0]); + mch[1] = resolve_match_idx_avx512x8(mch[1]); + + /* load matches and their priorities for first trie */ + + cr[0] = _mm256_i32gather_epi32(res, mch[0], sizeof(res[0])); + cr[1] = _mm256_i32gather_epi32(res, mch[1], sizeof(res[0])); + + cp[0] = _mm256_i32gather_epi32(pri, mch[0], sizeof(pri[0])); + cp[1] = _mm256_i32gather_epi32(pri, mch[1], sizeof(pri[0])); + + /* select match with highest priority */ + for (i = 1, pm = match + nb_pkt; i != nb_trie; + i++, pm += nb_pkt) { + + mch[0] = _mm256_loadu_si256((const ymm_t *)(pm + k)); + mch[1] = _mm256_loadu_si256((const ymm_t *)(pm + j)); + + mch[0] = resolve_match_idx_avx512x8(mch[0]); + mch[1] = resolve_match_idx_avx512x8(mch[1]); + + nr[0] = _mm256_i32gather_epi32(res, mch[0], + sizeof(res[0])); + nr[1] = _mm256_i32gather_epi32(res, mch[1], + sizeof(res[0])); + + np[0] = _mm256_i32gather_epi32(pri, mch[0], + sizeof(pri[0])); + np[1] = _mm256_i32gather_epi32(pri, mch[1], + sizeof(pri[0])); + + m[0] = _mm256_cmpgt_epi32_mask(cp[0], np[0]); + m[1] = _mm256_cmpgt_epi32_mask(cp[1], np[1]); + + cr[0] = _mm256_mask_mov_epi32(nr[0], m[0], cr[0]); + cr[1] = _mm256_mask_mov_epi32(nr[1], m[1], cr[1]); + + cp[0] = _mm256_mask_mov_epi32(np[0], m[0], cp[0]); + cp[1] = _mm256_mask_mov_epi32(np[1], m[1], cp[1]); + } + + _mm256_storeu_si256((ymm_t *)(result + k), cr[0]); + _mm256_storeu_si256((ymm_t *)(result + j), cr[1]); + } + + n = nb_pkt - k; + if (n != 0) { + if (n > CHAR_BIT) { + resolve_sc_avx512x8(result + k, res, pri, match + k, + CHAR_BIT, nb_trie, nb_pkt); + k += CHAR_BIT; + n -= CHAR_BIT; + } + resolve_sc_avx512x8(result + k, res, pri, match + k, n, + nb_trie, nb_pkt); + } +} + +static inline int +search_avx512x8x2(const struct rte_acl_ctx *ctx, const uint8_t **data, + uint32_t *results, uint32_t total_packets, uint32_t categories) +{ + uint32_t i, *pm; + const struct rte_acl_match_results *pr; + struct acl_flow_avx512 flow; + uint32_t match[ctx->num_tries * total_packets]; + + for (i = 0, pm = match; i != ctx->num_tries; i++, pm += total_packets) { + + /* setup for next trie */ + acl_set_flow_avx512(&flow, ctx, i, data, pm, total_packets); + + /* process the trie */ + search_trie_avx512x8x2(&flow); + } + + /* resolve matches */ + pr = (const struct rte_acl_match_results *) + (ctx->trans_table + ctx->match_index); + + if (categories == 1) + resolve_sc_avx512x8x2(results, pr, match, total_packets, + ctx->num_tries); + else if (categories <= RTE_ACL_MAX_CATEGORIES / 2) + resolve_mcle8_avx512x1(results, pr, match, total_packets, + categories, ctx->num_tries); + else + resolve_mcgt8_avx512x1(results, pr, match, total_packets, + categories, ctx->num_tries); + + return 0; +} From patchwork Tue Sep 15 16:50:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77771 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9F77DA04C7; Tue, 15 Sep 2020 18:52:28 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F16A81C1B9; Tue, 15 Sep 2020 18:51:17 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id B621F1C1B2 for ; Tue, 15 Sep 2020 18:51:16 +0200 (CEST) IronPort-SDR: FIhsNfDhLIkGPE14u3wTugcneIZNAHJj7nkPxPzGCx249F7MgqgzKibQ5T40VZeZoW14Kdaj55 kTqRBrRmjuEg== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="139311015" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="139311015" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:16 -0700 IronPort-SDR: kJOrM57ePwN/rWaR+aurwRfSh7/y+2BocIe9dWy9BGTh84XzQFXclR2LnKn4n5klG2ThfZoZQ4 YXKPs1+yYeSA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709524" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:14 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:22 +0100 Message-Id: <20200915165025.543-10-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 09/12] acl: enhance AVX512 classify implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add search_avx512x16x2() which uses mostly 512-bit width registers/instructions and is able to process up to 32 flows in parallel. That allows to futher speedup rte_acl_classify_avx512() for bursts with 32+ requests. Run-time code-path selection is done internally based on input burst size and is totally opaque to the user. Signed-off-by: Konstantin Ananyev --- These patch depends on: https://patches.dpdk.org/patch/73922/mbox/ to be applied first. .../prog_guide/packet_classif_access_ctrl.rst | 9 + doc/guides/rel_notes/release_20_11.rst | 5 + lib/librte_acl/acl_run_avx512.c | 162 ++++++ lib/librte_acl/acl_run_avx512x16.h | 526 ++++++++++++++++++ lib/librte_acl/acl_run_avx512x8.h | 195 +------ 5 files changed, 709 insertions(+), 188 deletions(-) create mode 100644 lib/librte_acl/acl_run_avx512x16.h diff --git a/doc/guides/prog_guide/packet_classif_access_ctrl.rst b/doc/guides/prog_guide/packet_classif_access_ctrl.rst index daf03e6d7..f6c64fbd9 100644 --- a/doc/guides/prog_guide/packet_classif_access_ctrl.rst +++ b/doc/guides/prog_guide/packet_classif_access_ctrl.rst @@ -379,10 +379,19 @@ There are several implementations of classify algorithm: * **RTE_ACL_CLASSIFY_ALTIVEC**: vector implementation, can process up to 8 flows in parallel. Requires ALTIVEC support. +* **RTE_ACL_CLASSIFY_AVX512**: vector implementation, can process up to 32 + flows in parallel. Requires AVX512 support. + It is purely a runtime decision which method to choose, there is no build-time difference. All implementations operates over the same internal RT structures and use similar principles. The main difference is that vector implementations can manually exploit IA SIMD instructions and process several input data flows in parallel. At startup ACL library determines the highest available classify method for the given platform and sets it as default one. Though the user has an ability to override the default classifier function for a given ACL context or perform particular search using non-default classify method. In that case it is user responsibility to make sure that given platform supports selected classify implementation. +.. note:: + + Right now ``RTE_ACL_CLASSIFY_AVX512`` is not selected by default + (due to possible frequency level change), but it can be selected at + runtime by apps through the use of ACL API: ``rte_acl_set_ctx_classify``. + Application Programming Interface (API) Usage --------------------------------------------- diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index a9a1b0305..acdd12ef9 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Add new AVX512 specific classify algorithm for ACL library.** + + Added new ``RTE_ACL_CLASSIFY_AVX512`` vector implementation, + which can processup to 32 flows in parallel. Requires AVX512 support. + Removed Items ------------- diff --git a/lib/librte_acl/acl_run_avx512.c b/lib/librte_acl/acl_run_avx512.c index 353a3c004..60762b7d6 100644 --- a/lib/librte_acl/acl_run_avx512.c +++ b/lib/librte_acl/acl_run_avx512.c @@ -4,6 +4,11 @@ #include "acl_run_sse.h" +#define MASK16_BIT (sizeof(__mmask16) * CHAR_BIT) + +#define NUM_AVX512X16X2 (2 * MASK16_BIT) +#define MSK_AVX512X16X2 (NUM_AVX512X16X2 - 1) + /*sizeof(uint32_t) << match_log == sizeof(struct rte_acl_match_results)*/ static const uint32_t match_log = 5; @@ -31,6 +36,36 @@ acl_set_flow_avx512(struct acl_flow_avx512 *flow, const struct rte_acl_ctx *ctx, flow->matches = matches; } +/* + * Update flow and result masks based on the number of unprocessed flows. + */ +static inline uint32_t +update_flow_mask(const struct acl_flow_avx512 *flow, uint32_t *fmsk, + uint32_t *rmsk) +{ + uint32_t i, j, k, m, n; + + fmsk[0] ^= rmsk[0]; + m = rmsk[0]; + + k = __builtin_popcount(m); + n = flow->total_packets - flow->num_packets; + + if (n < k) { + /* reduce mask */ + for (i = k - n; i != 0; i--) { + j = sizeof(m) * CHAR_BIT - 1 - __builtin_clz(m); + m ^= 1 << j; + } + } else + n = k; + + rmsk[0] = m; + fmsk[0] |= rmsk[0]; + + return n; +} + /* * Resolve matches for multiple categories (LE 8, use 128b instuctions/regs) */ @@ -144,13 +179,140 @@ _m512_mask_gather_epi8x8(__m512i pdata, __mmask8 mask) return v.y; } +/* + * resolve match index to actual result/priority offset. + */ +static inline __m512i +resolve_match_idx_avx512x16(__m512i mi) +{ + RTE_BUILD_BUG_ON(sizeof(struct rte_acl_match_results) != + 1 << (match_log + 2)); + return _mm512_slli_epi32(mi, match_log); +} + +/* + * Resolve multiple matches for the same flow based on priority. + */ +static inline __m512i +resolve_pri_avx512x16(const int32_t res[], const int32_t pri[], + const uint32_t match[], __mmask16 msk, uint32_t nb_trie, + uint32_t nb_skip) +{ + uint32_t i; + const uint32_t *pm; + __mmask16 m; + __m512i cp, cr, np, nr, mch; + + const __m512i zero = _mm512_set1_epi32(0); + + /* get match indexes */ + mch = _mm512_maskz_loadu_epi32(msk, match); + mch = resolve_match_idx_avx512x16(mch); + + /* read result and priority values for first trie */ + cr = _mm512_mask_i32gather_epi32(zero, msk, mch, res, sizeof(res[0])); + cp = _mm512_mask_i32gather_epi32(zero, msk, mch, pri, sizeof(pri[0])); + + /* + * read result and priority values for next tries and select one + * with highest priority. + */ + for (i = 1, pm = match + nb_skip; i != nb_trie; + i++, pm += nb_skip) { + + mch = _mm512_maskz_loadu_epi32(msk, pm); + mch = resolve_match_idx_avx512x16(mch); + + nr = _mm512_mask_i32gather_epi32(zero, msk, mch, res, + sizeof(res[0])); + np = _mm512_mask_i32gather_epi32(zero, msk, mch, pri, + sizeof(pri[0])); + + m = _mm512_cmpgt_epi32_mask(cp, np); + cr = _mm512_mask_mov_epi32(nr, m, cr); + cp = _mm512_mask_mov_epi32(np, m, cp); + } + + return cr; +} + +/* + * Resolve num (<= 16) matches for single category + */ +static inline void +resolve_sc_avx512x16(uint32_t result[], const int32_t res[], + const int32_t pri[], const uint32_t match[], uint32_t nb_pkt, + uint32_t nb_trie, uint32_t nb_skip) +{ + __mmask16 msk; + __m512i cr; + + msk = (1 << nb_pkt) - 1; + cr = resolve_pri_avx512x16(res, pri, match, msk, nb_trie, nb_skip); + _mm512_mask_storeu_epi32(result, msk, cr); +} + +/* + * Resolve matches for single category + */ +static inline void +resolve_sc_avx512x16x2(uint32_t result[], + const struct rte_acl_match_results pr[], const uint32_t match[], + uint32_t nb_pkt, uint32_t nb_trie) +{ + uint32_t j, k, n; + const int32_t *res, *pri; + __m512i cr[2]; + + res = (const int32_t *)pr->results; + pri = pr->priority; + + for (k = 0; k != (nb_pkt & ~MSK_AVX512X16X2); k += NUM_AVX512X16X2) { + + j = k + MASK16_BIT; + + cr[0] = resolve_pri_avx512x16(res, pri, match + k, UINT16_MAX, + nb_trie, nb_pkt); + cr[1] = resolve_pri_avx512x16(res, pri, match + j, UINT16_MAX, + nb_trie, nb_pkt); + + _mm512_storeu_si512(result + k, cr[0]); + _mm512_storeu_si512(result + j, cr[1]); + } + + n = nb_pkt - k; + if (n != 0) { + if (n > MASK16_BIT) { + resolve_sc_avx512x16(result + k, res, pri, match + k, + MASK16_BIT, nb_trie, nb_pkt); + k += MASK16_BIT; + n -= MASK16_BIT; + } + resolve_sc_avx512x16(result + k, res, pri, match + k, n, + nb_trie, nb_pkt); + } +} #include "acl_run_avx512x8.h" +#include "acl_run_avx512x16.h" int rte_acl_classify_avx512(const struct rte_acl_ctx *ctx, const uint8_t **data, uint32_t *results, uint32_t num, uint32_t categories) { + const uint32_t max_iter = MAX_SEARCHES_AVX16 * MAX_SEARCHES_AVX16; + + /* split huge lookup (gt 256) into series of fixed size ones */ + while (num > max_iter) { + search_avx512x16x2(ctx, data, results, max_iter, categories); + data += max_iter; + results += max_iter * categories; + num -= max_iter; + } + + /* select classify method based on number of remainig requests */ + if (num >= 2 * MAX_SEARCHES_AVX16) + return search_avx512x16x2(ctx, data, results, num, categories); if (num >= MAX_SEARCHES_AVX16) return search_avx512x8x2(ctx, data, results, num, categories); if (num >= MAX_SEARCHES_SSE8) diff --git a/lib/librte_acl/acl_run_avx512x16.h b/lib/librte_acl/acl_run_avx512x16.h new file mode 100644 index 000000000..45b0b4db6 --- /dev/null +++ b/lib/librte_acl/acl_run_avx512x16.h @@ -0,0 +1,526 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +static const __rte_x86_zmm_t zmm_match_mask = { + .u32 = { + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + RTE_ACL_NODE_MATCH, + }, +}; + +static const __rte_x86_zmm_t zmm_index_mask = { + .u32 = { + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + RTE_ACL_NODE_INDEX, + }, +}; + +static const __rte_x86_zmm_t zmm_trlo_idle = { + .u32 = { + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + RTE_ACL_IDLE_NODE, + }, +}; + +static const __rte_x86_zmm_t zmm_trhi_idle = { + .u32 = { + 0, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0, + }, +}; + +static const __rte_x86_zmm_t zmm_shuffle_input = { + .u32 = { + 0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c, + 0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c, + 0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c, + 0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c, + }, +}; + +static const __rte_x86_zmm_t zmm_four_32 = { + .u32 = { + 4, 4, 4, 4, + 4, 4, 4, 4, + 4, 4, 4, 4, + 4, 4, 4, 4, + }, +}; + +static const __rte_x86_zmm_t zmm_idx_add = { + .u32 = { + 0, 1, 2, 3, + 4, 5, 6, 7, + 8, 9, 10, 11, + 12, 13, 14, 15, + }, +}; + +static const __rte_x86_zmm_t zmm_range_base = { + .u32 = { + 0xffffff00, 0xffffff04, 0xffffff08, 0xffffff0c, + 0xffffff00, 0xffffff04, 0xffffff08, 0xffffff0c, + 0xffffff00, 0xffffff04, 0xffffff08, 0xffffff0c, + 0xffffff00, 0xffffff04, 0xffffff08, 0xffffff0c, + }, +}; + +/* + * Calculate the address of the next transition for + * all types of nodes. Note that only DFA nodes and range + * nodes actually transition to another node. Match + * nodes not supposed to be encountered here. + * For quad range nodes: + * Calculate number of range boundaries that are less than the + * input value. Range boundaries for each node are in signed 8 bit, + * ordered from -128 to 127. + * This is effectively a popcnt of bytes that are greater than the + * input byte. + * Single nodes are processed in the same ways as quad range nodes. + */ +static __rte_always_inline __m512i +calc_addr16(__m512i index_mask, __m512i next_input, __m512i shuffle_input, + __m512i four_32, __m512i range_base, __m512i tr_lo, __m512i tr_hi) +{ + __mmask64 qm; + __mmask16 dfa_msk; + __m512i addr, in, node_type, r, t; + __m512i dfa_ofs, quad_ofs; + + t = _mm512_xor_si512(index_mask, index_mask); + in = _mm512_shuffle_epi8(next_input, shuffle_input); + + /* Calc node type and node addr */ + node_type = _mm512_andnot_si512(index_mask, tr_lo); + addr = _mm512_and_si512(index_mask, tr_lo); + + /* mask for DFA type(0) nodes */ + dfa_msk = _mm512_cmpeq_epi32_mask(node_type, t); + + /* DFA calculations. */ + r = _mm512_srli_epi32(in, 30); + r = _mm512_add_epi8(r, range_base); + t = _mm512_srli_epi32(in, 24); + r = _mm512_shuffle_epi8(tr_hi, r); + + dfa_ofs = _mm512_sub_epi32(t, r); + + /* QUAD/SINGLE calculations. */ + qm = _mm512_cmpgt_epi8_mask(in, tr_hi); + t = _mm512_maskz_set1_epi8(qm, (uint8_t)UINT8_MAX); + t = _mm512_lzcnt_epi32(t); + t = _mm512_srli_epi32(t, 3); + quad_ofs = _mm512_sub_epi32(four_32, t); + + /* blend DFA and QUAD/SINGLE. */ + t = _mm512_mask_mov_epi32(quad_ofs, dfa_msk, dfa_ofs); + + /* calculate address for next transitions. */ + addr = _mm512_add_epi32(addr, t); + return addr; +} + +/* + * Process 16 transitions in parallel. + * tr_lo contains low 32 bits for 16 transition. + * tr_hi contains high 32 bits for 16 transition. + * next_input contains up to 4 input bytes for 16 flows. + */ +static __rte_always_inline __m512i +transition16(__m512i next_input, const uint64_t *trans, __m512i *tr_lo, + __m512i *tr_hi) +{ + const int32_t *tr; + __m512i addr; + + tr = (const int32_t *)(uintptr_t)trans; + + /* Calculate the address (array index) for all 16 transitions. */ + addr = calc_addr16(zmm_index_mask.z, next_input, zmm_shuffle_input.z, + zmm_four_32.z, zmm_range_base.z, *tr_lo, *tr_hi); + + /* load lower 32 bits of 16 transactions at once. */ + *tr_lo = _mm512_i32gather_epi32(addr, tr, sizeof(trans[0])); + + next_input = _mm512_srli_epi32(next_input, CHAR_BIT); + + /* load high 32 bits of 16 transactions at once. */ + *tr_hi = _mm512_i32gather_epi32(addr, (tr + 1), sizeof(trans[0])); + + return next_input; +} + +/* + * Execute first transition for up to 16 flows in parallel. + * next_input should contain one input byte for up to 16 flows. + * msk - mask of active flows. + * tr_lo contains low 32 bits for up to 16 transitions. + * tr_hi contains high 32 bits for up to 16 transitions. + */ +static __rte_always_inline void +first_trans16(const struct acl_flow_avx512 *flow, __m512i next_input, + __mmask16 msk, __m512i *tr_lo, __m512i *tr_hi) +{ + const int32_t *tr; + __m512i addr, root; + + tr = (const int32_t *)(uintptr_t)flow->trans; + + addr = _mm512_set1_epi32(UINT8_MAX); + root = _mm512_set1_epi32(flow->root_index); + + addr = _mm512_and_si512(next_input, addr); + addr = _mm512_add_epi32(root, addr); + + /* load lower 32 bits of 16 transactions at once. */ + *tr_lo = _mm512_mask_i32gather_epi32(*tr_lo, msk, addr, tr, + sizeof(flow->trans[0])); + + /* load high 32 bits of 16 transactions at once. */ + *tr_hi = _mm512_mask_i32gather_epi32(*tr_hi, msk, addr, (tr + 1), + sizeof(flow->trans[0])); +} + +/* + * Load and return next 4 input bytes for up to 16 flows in parallel. + * pdata - 8x2 pointers to flow input data + * mask - mask of active flows. + * di - data indexes for these 16 flows. + */ +static inline __m512i +get_next_bytes_avx512x16(const struct acl_flow_avx512 *flow, __m512i pdata[2], + uint32_t msk, __m512i *di, uint32_t bnum) +{ + const int32_t *div; + __m512i one, zero, t, p[2]; + ymm_t inp[2]; + + static const __rte_x86_zmm_t zmm_pminp = { + .u32 = { + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, + 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, + }, + }; + + const __mmask16 pmidx_msk = 0x5555; + + static const __rte_x86_zmm_t zmm_pmidx[2] = { + [0] = { + .u32 = { + 0, 0, 1, 0, 2, 0, 3, 0, + 4, 0, 5, 0, 6, 0, 7, 0, + }, + }, + [1] = { + .u32 = { + 8, 0, 9, 0, 10, 0, 11, 0, + 12, 0, 13, 0, 14, 0, 15, 0, + }, + }, + }; + + div = (const int32_t *)flow->data_index; + + one = _mm512_set1_epi32(1); + zero = _mm512_xor_si512(one, one); + + /* load data offsets for given indexes */ + t = _mm512_mask_i32gather_epi32(zero, msk, *di, div, sizeof(div[0])); + + /* increment data indexes */ + *di = _mm512_mask_add_epi32(*di, msk, *di, one); + + /* + * unsigned expand 32-bit indexes to 64-bit + * (for later pointer arithmetic), i.e: + * for (i = 0; i != 16; i++) + * p[i/8].u64[i%8] = (uint64_t)t.u32[i]; + */ + p[0] = _mm512_maskz_permutexvar_epi32(pmidx_msk, zmm_pmidx[0].z, t); + p[1] = _mm512_maskz_permutexvar_epi32(pmidx_msk, zmm_pmidx[1].z, t); + + p[0] = _mm512_add_epi64(p[0], pdata[0]); + p[1] = _mm512_add_epi64(p[1], pdata[1]); + + /* load input byte(s), either one or four */ + if (bnum == sizeof(uint8_t)) { + inp[0] = _m512_mask_gather_epi8x8(p[0], (msk & UINT8_MAX)); + inp[1] = _m512_mask_gather_epi8x8(p[1], (msk >> CHAR_BIT)); + } else { + inp[0] = _mm512_mask_i64gather_epi32( + _mm512_castsi512_si256(zero), (msk & UINT8_MAX), + p[0], NULL, sizeof(uint8_t)); + inp[1] = _mm512_mask_i64gather_epi32( + _mm512_castsi512_si256(zero), (msk >> CHAR_BIT), + p[1], NULL, sizeof(uint8_t)); + } + + /* squeeze input into one 512-bit register */ + return _mm512_permutex2var_epi32(_mm512_castsi256_si512(inp[0]), + zmm_pminp.z, _mm512_castsi256_si512(inp[1])); +} + +/* + * Start up to 16 new flows. + * num - number of flows to start + * msk - mask of new flows. + * pdata - pointers to flow input data + * idx - match indexed for given flows + * di - data indexes for these flows. + */ +static inline void +start_flow16(struct acl_flow_avx512 *flow, uint32_t num, uint32_t msk, + __m512i pdata[2], __m512i *idx, __m512i *di) +{ + uint32_t n, nm[2]; + __m512i ni, nd[2]; + + /* load input data pointers for new flows */ + n = __builtin_popcount(msk & UINT8_MAX); + nm[0] = (1 << n) - 1; + nm[1] = (1 << (num - n)) - 1; + + nd[0] = _mm512_maskz_loadu_epi64(nm[0], + flow->idata + flow->num_packets); + nd[1] = _mm512_maskz_loadu_epi64(nm[1], + flow->idata + flow->num_packets + n); + + /* calculate match indexes of new flows */ + ni = _mm512_set1_epi32(flow->num_packets); + ni = _mm512_add_epi32(ni, zmm_idx_add.z); + + /* merge new and existing flows data */ + pdata[0] = _mm512_mask_expand_epi64(pdata[0], (msk & UINT8_MAX), nd[0]); + pdata[1] = _mm512_mask_expand_epi64(pdata[1], (msk >> CHAR_BIT), nd[1]); + + /* update match and data indexes */ + *idx = _mm512_mask_expand_epi32(*idx, msk, ni); + *di = _mm512_maskz_mov_epi32(msk ^ UINT16_MAX, *di); + + flow->num_packets += num; +} + +/* + * Process found matches for up to 16 flows. + * fmsk - mask of active flows + * rmsk - mask of found matches + * pdata - pointers to flow input data + * di - data indexes for these flows + * idx - match indexed for given flows + * tr_lo contains low 32 bits for up to 8 transitions. + * tr_hi contains high 32 bits for up to 8 transitions. + */ +static inline uint32_t +match_process_avx512x16(struct acl_flow_avx512 *flow, uint32_t *fmsk, + uint32_t *rmsk, __m512i pdata[2], __m512i *di, __m512i *idx, + __m512i *tr_lo, __m512i *tr_hi) +{ + uint32_t n; + __m512i res; + + if (rmsk[0] == 0) + return 0; + + /* extract match indexes */ + res = _mm512_and_si512(tr_lo[0], zmm_index_mask.z); + + /* mask matched transitions to nop */ + tr_lo[0] = _mm512_mask_mov_epi32(tr_lo[0], rmsk[0], zmm_trlo_idle.z); + tr_hi[0] = _mm512_mask_mov_epi32(tr_hi[0], rmsk[0], zmm_trhi_idle.z); + + /* save found match indexes */ + _mm512_mask_i32scatter_epi32(flow->matches, rmsk[0], + idx[0], res, sizeof(flow->matches[0])); + + /* update masks and start new flows for matches */ + n = update_flow_mask(flow, fmsk, rmsk); + start_flow16(flow, n, rmsk[0], pdata, idx, di); + + return n; +} + +/* + * Test for matches ut to 32 (2x16) flows at once, + * if matches exist - process them and start new flows. + */ +static inline void +match_check_process_avx512x16x2(struct acl_flow_avx512 *flow, uint32_t fm[2], + __m512i pdata[4], __m512i di[2], __m512i idx[2], __m512i inp[2], + __m512i tr_lo[2], __m512i tr_hi[2]) +{ + uint32_t n[2]; + uint32_t rm[2]; + + /* check for matches */ + rm[0] = _mm512_test_epi32_mask(tr_lo[0], zmm_match_mask.z); + rm[1] = _mm512_test_epi32_mask(tr_lo[1], zmm_match_mask.z); + + /* till unprocessed matches exist */ + while ((rm[0] | rm[1]) != 0) { + + /* process matches and start new flows */ + n[0] = match_process_avx512x16(flow, &fm[0], &rm[0], &pdata[0], + &di[0], &idx[0], &tr_lo[0], &tr_hi[0]); + n[1] = match_process_avx512x16(flow, &fm[1], &rm[1], &pdata[2], + &di[1], &idx[1], &tr_lo[1], &tr_hi[1]); + + /* execute first transition for new flows, if any */ + + if (n[0] != 0) { + inp[0] = get_next_bytes_avx512x16(flow, &pdata[0], + rm[0], &di[0], sizeof(uint8_t)); + first_trans16(flow, inp[0], rm[0], &tr_lo[0], + &tr_hi[0]); + rm[0] = _mm512_test_epi32_mask(tr_lo[0], + zmm_match_mask.z); + } + + if (n[1] != 0) { + inp[1] = get_next_bytes_avx512x16(flow, &pdata[2], + rm[1], &di[1], sizeof(uint8_t)); + first_trans16(flow, inp[1], rm[1], &tr_lo[1], + &tr_hi[1]); + rm[1] = _mm512_test_epi32_mask(tr_lo[1], + zmm_match_mask.z); + } + } +} + +/* + * Perform search for up to 32 flows in parallel. + * Use two sets of metadata, each serves 16 flows max. + * So in fact we perform search for 2x16 flows. + */ +static inline void +search_trie_avx512x16x2(struct acl_flow_avx512 *flow) +{ + uint32_t fm[2]; + __m512i di[2], idx[2], in[2], pdata[4], tr_lo[2], tr_hi[2]; + + /* first 1B load */ + start_flow16(flow, MASK16_BIT, UINT16_MAX, &pdata[0], &idx[0], &di[0]); + start_flow16(flow, MASK16_BIT, UINT16_MAX, &pdata[2], &idx[1], &di[1]); + + in[0] = get_next_bytes_avx512x16(flow, &pdata[0], UINT16_MAX, &di[0], + sizeof(uint8_t)); + in[1] = get_next_bytes_avx512x16(flow, &pdata[2], UINT16_MAX, &di[1], + sizeof(uint8_t)); + + first_trans16(flow, in[0], UINT16_MAX, &tr_lo[0], &tr_hi[0]); + first_trans16(flow, in[1], UINT16_MAX, &tr_lo[1], &tr_hi[1]); + + fm[0] = UINT16_MAX; + fm[1] = UINT16_MAX; + + /* match check */ + match_check_process_avx512x16x2(flow, fm, pdata, di, idx, in, + tr_lo, tr_hi); + + while ((fm[0] | fm[1]) != 0) { + + /* load next 4B */ + + in[0] = get_next_bytes_avx512x16(flow, &pdata[0], fm[0], + &di[0], sizeof(uint32_t)); + in[1] = get_next_bytes_avx512x16(flow, &pdata[2], fm[1], + &di[1], sizeof(uint32_t)); + + /* main 4B loop */ + + in[0] = transition16(in[0], flow->trans, &tr_lo[0], &tr_hi[0]); + in[1] = transition16(in[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + in[0] = transition16(in[0], flow->trans, &tr_lo[0], &tr_hi[0]); + in[1] = transition16(in[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + in[0] = transition16(in[0], flow->trans, &tr_lo[0], &tr_hi[0]); + in[1] = transition16(in[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + in[0] = transition16(in[0], flow->trans, &tr_lo[0], &tr_hi[0]); + in[1] = transition16(in[1], flow->trans, &tr_lo[1], &tr_hi[1]); + + /* check for matches */ + match_check_process_avx512x16x2(flow, fm, pdata, di, idx, in, + tr_lo, tr_hi); + } +} + +static inline int +search_avx512x16x2(const struct rte_acl_ctx *ctx, const uint8_t **data, + uint32_t *results, uint32_t total_packets, uint32_t categories) +{ + uint32_t i, *pm; + const struct rte_acl_match_results *pr; + struct acl_flow_avx512 flow; + uint32_t match[ctx->num_tries * total_packets]; + + for (i = 0, pm = match; i != ctx->num_tries; i++, pm += total_packets) { + + /* setup for next trie */ + acl_set_flow_avx512(&flow, ctx, i, data, pm, total_packets); + + /* process the trie */ + search_trie_avx512x16x2(&flow); + } + + /* resolve matches */ + pr = (const struct rte_acl_match_results *) + (ctx->trans_table + ctx->match_index); + + if (categories == 1) + resolve_sc_avx512x16x2(results, pr, match, total_packets, + ctx->num_tries); + else if (categories <= RTE_ACL_MAX_CATEGORIES / 2) + resolve_mcle8_avx512x1(results, pr, match, total_packets, + categories, ctx->num_tries); + else + resolve_mcgt8_avx512x1(results, pr, match, total_packets, + categories, ctx->num_tries); + + return 0; +} diff --git a/lib/librte_acl/acl_run_avx512x8.h b/lib/librte_acl/acl_run_avx512x8.h index 66fc26b26..82171e8e0 100644 --- a/lib/librte_acl/acl_run_avx512x8.h +++ b/lib/librte_acl/acl_run_avx512x8.h @@ -260,36 +260,6 @@ start_flow8(struct acl_flow_avx512 *flow, uint32_t num, uint32_t msk, flow->num_packets += num; } -/* - * Update flow and result masks based on the number of unprocessed flows. - */ -static inline uint32_t -update_flow_mask8(const struct acl_flow_avx512 *flow, __mmask8 *fmsk, - __mmask8 *rmsk) -{ - uint32_t i, j, k, m, n; - - fmsk[0] ^= rmsk[0]; - m = rmsk[0]; - - k = __builtin_popcount(m); - n = flow->total_packets - flow->num_packets; - - if (n < k) { - /* reduce mask */ - for (i = k - n; i != 0; i--) { - j = sizeof(m) * CHAR_BIT - 1 - __builtin_clz(m); - m ^= 1 << j; - } - } else - n = k; - - rmsk[0] = m; - fmsk[0] |= rmsk[0]; - - return n; -} - /* * Process found matches for up to 8 flows. * fmsk - mask of active flows @@ -301,8 +271,8 @@ update_flow_mask8(const struct acl_flow_avx512 *flow, __mmask8 *fmsk, * tr_hi contains high 32 bits for up to 8 transitions. */ static inline uint32_t -match_process_avx512x8(struct acl_flow_avx512 *flow, __mmask8 *fmsk, - __mmask8 *rmsk, __m512i *pdata, ymm_t *di, ymm_t *idx, +match_process_avx512x8(struct acl_flow_avx512 *flow, uint32_t *fmsk, + uint32_t *rmsk, __m512i *pdata, ymm_t *di, ymm_t *idx, ymm_t *tr_lo, ymm_t *tr_hi) { uint32_t n; @@ -323,7 +293,7 @@ match_process_avx512x8(struct acl_flow_avx512 *flow, __mmask8 *fmsk, idx[0], res, sizeof(flow->matches[0])); /* update masks and start new flows for matches */ - n = update_flow_mask8(flow, fmsk, rmsk); + n = update_flow_mask(flow, fmsk, rmsk); start_flow8(flow, n, rmsk[0], pdata, idx, di); return n; @@ -331,12 +301,12 @@ match_process_avx512x8(struct acl_flow_avx512 *flow, __mmask8 *fmsk, static inline void -match_check_process_avx512x8x2(struct acl_flow_avx512 *flow, __mmask8 fm[2], +match_check_process_avx512x8x2(struct acl_flow_avx512 *flow, uint32_t fm[2], __m512i pdata[2], ymm_t di[2], ymm_t idx[2], ymm_t inp[2], ymm_t tr_lo[2], ymm_t tr_hi[2]) { uint32_t n[2]; - __mmask8 rm[2]; + uint32_t rm[2]; /* check for matches */ rm[0] = _mm256_test_epi32_mask(tr_lo[0], ymm_match_mask.y); @@ -381,7 +351,7 @@ match_check_process_avx512x8x2(struct acl_flow_avx512 *flow, __mmask8 fm[2], static inline void search_trie_avx512x8x2(struct acl_flow_avx512 *flow) { - __mmask8 fm[2]; + uint32_t fm[2]; __m512i pdata[2]; ymm_t di[2], idx[2], inp[2], tr_lo[2], tr_hi[2]; @@ -433,157 +403,6 @@ search_trie_avx512x8x2(struct acl_flow_avx512 *flow) } } -/* - * resolve match index to actual result/priority offset. - */ -static inline ymm_t -resolve_match_idx_avx512x8(ymm_t mi) -{ - RTE_BUILD_BUG_ON(sizeof(struct rte_acl_match_results) != - 1 << (match_log + 2)); - return _mm256_slli_epi32(mi, match_log); -} - - -/* - * Resolve multiple matches for the same flow based on priority. - */ -static inline ymm_t -resolve_pri_avx512x8(const int32_t res[], const int32_t pri[], - const uint32_t match[], __mmask8 msk, uint32_t nb_trie, - uint32_t nb_skip) -{ - uint32_t i; - const uint32_t *pm; - __mmask8 m; - ymm_t cp, cr, np, nr, mch; - - const ymm_t zero = _mm256_set1_epi32(0); - - mch = _mm256_maskz_loadu_epi32(msk, match); - mch = resolve_match_idx_avx512x8(mch); - - cr = _mm256_mmask_i32gather_epi32(zero, msk, mch, res, sizeof(res[0])); - cp = _mm256_mmask_i32gather_epi32(zero, msk, mch, pri, sizeof(pri[0])); - - for (i = 1, pm = match + nb_skip; i != nb_trie; - i++, pm += nb_skip) { - - mch = _mm256_maskz_loadu_epi32(msk, pm); - mch = resolve_match_idx_avx512x8(mch); - - nr = _mm256_mmask_i32gather_epi32(zero, msk, mch, res, - sizeof(res[0])); - np = _mm256_mmask_i32gather_epi32(zero, msk, mch, pri, - sizeof(pri[0])); - - m = _mm256_cmpgt_epi32_mask(cp, np); - cr = _mm256_mask_mov_epi32(nr, m, cr); - cp = _mm256_mask_mov_epi32(np, m, cp); - } - - return cr; -} - -/* - * Resolve num (<= 8) matches for single category - */ -static inline void -resolve_sc_avx512x8(uint32_t result[], const int32_t res[], const int32_t pri[], - const uint32_t match[], uint32_t nb_pkt, uint32_t nb_trie, - uint32_t nb_skip) -{ - __mmask8 msk; - ymm_t cr; - - msk = (1 << nb_pkt) - 1; - cr = resolve_pri_avx512x8(res, pri, match, msk, nb_trie, nb_skip); - _mm256_mask_storeu_epi32(result, msk, cr); -} - -/* - * Resolve matches for single category - */ -static inline void -resolve_sc_avx512x8x2(uint32_t result[], - const struct rte_acl_match_results pr[], const uint32_t match[], - uint32_t nb_pkt, uint32_t nb_trie) -{ - uint32_t i, j, k, n; - const uint32_t *pm; - const int32_t *res, *pri; - __mmask8 m[2]; - ymm_t cp[2], cr[2], np[2], nr[2], mch[2]; - - res = (const int32_t *)pr->results; - pri = pr->priority; - - for (k = 0; k != (nb_pkt & ~MSK_AVX512X8X2); k += NUM_AVX512X8X2) { - - j = k + CHAR_BIT; - - /* load match indexes for first trie */ - mch[0] = _mm256_loadu_si256((const ymm_t *)(match + k)); - mch[1] = _mm256_loadu_si256((const ymm_t *)(match + j)); - - mch[0] = resolve_match_idx_avx512x8(mch[0]); - mch[1] = resolve_match_idx_avx512x8(mch[1]); - - /* load matches and their priorities for first trie */ - - cr[0] = _mm256_i32gather_epi32(res, mch[0], sizeof(res[0])); - cr[1] = _mm256_i32gather_epi32(res, mch[1], sizeof(res[0])); - - cp[0] = _mm256_i32gather_epi32(pri, mch[0], sizeof(pri[0])); - cp[1] = _mm256_i32gather_epi32(pri, mch[1], sizeof(pri[0])); - - /* select match with highest priority */ - for (i = 1, pm = match + nb_pkt; i != nb_trie; - i++, pm += nb_pkt) { - - mch[0] = _mm256_loadu_si256((const ymm_t *)(pm + k)); - mch[1] = _mm256_loadu_si256((const ymm_t *)(pm + j)); - - mch[0] = resolve_match_idx_avx512x8(mch[0]); - mch[1] = resolve_match_idx_avx512x8(mch[1]); - - nr[0] = _mm256_i32gather_epi32(res, mch[0], - sizeof(res[0])); - nr[1] = _mm256_i32gather_epi32(res, mch[1], - sizeof(res[0])); - - np[0] = _mm256_i32gather_epi32(pri, mch[0], - sizeof(pri[0])); - np[1] = _mm256_i32gather_epi32(pri, mch[1], - sizeof(pri[0])); - - m[0] = _mm256_cmpgt_epi32_mask(cp[0], np[0]); - m[1] = _mm256_cmpgt_epi32_mask(cp[1], np[1]); - - cr[0] = _mm256_mask_mov_epi32(nr[0], m[0], cr[0]); - cr[1] = _mm256_mask_mov_epi32(nr[1], m[1], cr[1]); - - cp[0] = _mm256_mask_mov_epi32(np[0], m[0], cp[0]); - cp[1] = _mm256_mask_mov_epi32(np[1], m[1], cp[1]); - } - - _mm256_storeu_si256((ymm_t *)(result + k), cr[0]); - _mm256_storeu_si256((ymm_t *)(result + j), cr[1]); - } - - n = nb_pkt - k; - if (n != 0) { - if (n > CHAR_BIT) { - resolve_sc_avx512x8(result + k, res, pri, match + k, - CHAR_BIT, nb_trie, nb_pkt); - k += CHAR_BIT; - n -= CHAR_BIT; - } - resolve_sc_avx512x8(result + k, res, pri, match + k, n, - nb_trie, nb_pkt); - } -} - static inline int search_avx512x8x2(const struct rte_acl_ctx *ctx, const uint8_t **data, uint32_t *results, uint32_t total_packets, uint32_t categories) @@ -607,7 +426,7 @@ search_avx512x8x2(const struct rte_acl_ctx *ctx, const uint8_t **data, (ctx->trans_table + ctx->match_index); if (categories == 1) - resolve_sc_avx512x8x2(results, pr, match, total_packets, + resolve_sc_avx512x16x2(results, pr, match, total_packets, ctx->num_tries); else if (categories <= RTE_ACL_MAX_CATEGORIES / 2) resolve_mcle8_avx512x1(results, pr, match, total_packets, From patchwork Tue Sep 15 16:50:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77772 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2BB4BA04C7; Tue, 15 Sep 2020 18:52:41 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9688A1C19B; Tue, 15 Sep 2020 18:51:28 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 5FF011C113 for ; Tue, 15 Sep 2020 18:51:26 +0200 (CEST) IronPort-SDR: V6H9bi0DQdKHaq3Q9zV5yeIymwrZNUQdKzhOz1JM0AyqK7s+WmTixSH8+EmcJdrdAOAID5uBCS nsDdmQMDJx7Q== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="146995883" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="146995883" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:25 -0700 IronPort-SDR: z35yseePC/ucUxx0INURAX68ryVja9T8P3WHvr/ARFrLE12MLu23ToUmTPUt3HgHK1v77R9zwP vY1j2bZSwVuA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709566" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:23 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:23 +0100 Message-Id: <20200915165025.543-11-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 10/12] acl: for AVX512 classify use 4B load whenever possible X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" With current ACL implementation first field in the rule definition has always to be one byte long. Though for optimising classify implementation it might be useful to be able to use 4B reads (as we do for rest of the fields). So at build phase, check user provided field definitions to determine is it safe to do 4B loads for first ACL field. Then at run-time this information can be used to choose classify behavior. Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl.h | 1 + lib/librte_acl/acl_bld.c | 34 ++++++++++++++++++++++++++++++ lib/librte_acl/acl_run_avx512.c | 7 ++++++ lib/librte_acl/acl_run_avx512x16.h | 8 +++---- lib/librte_acl/acl_run_avx512x8.h | 8 +++---- lib/librte_acl/rte_acl.c | 1 + 6 files changed, 51 insertions(+), 8 deletions(-) diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h index 3f0719f33..493dec2a2 100644 --- a/lib/librte_acl/acl.h +++ b/lib/librte_acl/acl.h @@ -169,6 +169,7 @@ struct rte_acl_ctx { int32_t socket_id; /** Socket ID to allocate memory from. */ enum rte_acl_classify_alg alg; + uint32_t first_load_sz; void *rules; uint32_t max_rules; uint32_t rule_sz; diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index d1f920b09..da10864cd 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -1581,6 +1581,37 @@ acl_check_bld_param(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg) return 0; } +/* + * With current ACL implementation first field in the rule definition + * has always to be one byte long. Though for optimising *classify* + * implementation it might be useful to be able to use 4B reads + * (as we do for rest of the fields). + * This function checks input config to determine is it safe to do 4B + * loads for first ACL field. For that we need to make sure that + * first field in our rule definition doesn't have the biggest offset, + * i.e. we still do have other fields located after the first one. + * Contrary if first field has the largest offset, then it means + * first field can occupy the very last byte in the input data buffer, + * and we have to do single byte load for it. + */ +static uint32_t +get_first_load_size(const struct rte_acl_config *cfg) +{ + uint32_t i, max_ofs, ofs; + + ofs = 0; + max_ofs = 0; + + for (i = 0; i != cfg->num_fields; i++) { + if (cfg->defs[i].field_index == 0) + ofs = cfg->defs[i].offset; + else if (max_ofs < cfg->defs[i].offset) + max_ofs = cfg->defs[i].offset; + } + + return (ofs < max_ofs) ? sizeof(uint32_t) : sizeof(uint8_t); +} + int rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg) { @@ -1618,6 +1649,9 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg) /* set data indexes. */ acl_set_data_indexes(ctx); + /* determine can we always do 4B load */ + ctx->first_load_sz = get_first_load_size(cfg); + /* copy in build config. */ ctx->config = *cfg; } diff --git a/lib/librte_acl/acl_run_avx512.c b/lib/librte_acl/acl_run_avx512.c index 60762b7d6..51bfa6a3b 100644 --- a/lib/librte_acl/acl_run_avx512.c +++ b/lib/librte_acl/acl_run_avx512.c @@ -16,6 +16,7 @@ struct acl_flow_avx512 { uint32_t num_packets; /* number of packets processed */ uint32_t total_packets; /* max number of packets to process */ uint32_t root_index; /* current root index */ + uint32_t first_load_sz; /* first load size for new packet */ const uint64_t *trans; /* transition table */ const uint32_t *data_index; /* input data indexes */ const uint8_t **idata; /* input data */ @@ -29,6 +30,7 @@ acl_set_flow_avx512(struct acl_flow_avx512 *flow, const struct rte_acl_ctx *ctx, { flow->num_packets = 0; flow->total_packets = total_packets; + flow->first_load_sz = ctx->first_load_sz; flow->root_index = ctx->trie[trie].root_index; flow->trans = ctx->trans_table; flow->data_index = ctx->trie[trie].data_index; @@ -155,6 +157,11 @@ resolve_mcgt8_avx512x1(uint32_t result[], } } +/* + * unfortunately current AVX512 ISA doesn't provide ability for + * gather load on a byte quantity. So we have to mimic it in SW, + * by doing 8x1B scalar loads. + */ static inline ymm_t _m512_mask_gather_epi8x8(__m512i pdata, __mmask8 mask) { diff --git a/lib/librte_acl/acl_run_avx512x16.h b/lib/librte_acl/acl_run_avx512x16.h index 45b0b4db6..df5f6135f 100644 --- a/lib/librte_acl/acl_run_avx512x16.h +++ b/lib/librte_acl/acl_run_avx512x16.h @@ -413,7 +413,7 @@ match_check_process_avx512x16x2(struct acl_flow_avx512 *flow, uint32_t fm[2], if (n[0] != 0) { inp[0] = get_next_bytes_avx512x16(flow, &pdata[0], - rm[0], &di[0], sizeof(uint8_t)); + rm[0], &di[0], flow->first_load_sz); first_trans16(flow, inp[0], rm[0], &tr_lo[0], &tr_hi[0]); rm[0] = _mm512_test_epi32_mask(tr_lo[0], @@ -422,7 +422,7 @@ match_check_process_avx512x16x2(struct acl_flow_avx512 *flow, uint32_t fm[2], if (n[1] != 0) { inp[1] = get_next_bytes_avx512x16(flow, &pdata[2], - rm[1], &di[1], sizeof(uint8_t)); + rm[1], &di[1], flow->first_load_sz); first_trans16(flow, inp[1], rm[1], &tr_lo[1], &tr_hi[1]); rm[1] = _mm512_test_epi32_mask(tr_lo[1], @@ -447,9 +447,9 @@ search_trie_avx512x16x2(struct acl_flow_avx512 *flow) start_flow16(flow, MASK16_BIT, UINT16_MAX, &pdata[2], &idx[1], &di[1]); in[0] = get_next_bytes_avx512x16(flow, &pdata[0], UINT16_MAX, &di[0], - sizeof(uint8_t)); + flow->first_load_sz); in[1] = get_next_bytes_avx512x16(flow, &pdata[2], UINT16_MAX, &di[1], - sizeof(uint8_t)); + flow->first_load_sz); first_trans16(flow, in[0], UINT16_MAX, &tr_lo[0], &tr_hi[0]); first_trans16(flow, in[1], UINT16_MAX, &tr_lo[1], &tr_hi[1]); diff --git a/lib/librte_acl/acl_run_avx512x8.h b/lib/librte_acl/acl_run_avx512x8.h index 82171e8e0..777451973 100644 --- a/lib/librte_acl/acl_run_avx512x8.h +++ b/lib/librte_acl/acl_run_avx512x8.h @@ -325,7 +325,7 @@ match_check_process_avx512x8x2(struct acl_flow_avx512 *flow, uint32_t fm[2], if (n[0] != 0) { inp[0] = get_next_bytes_avx512x8(flow, pdata[0], rm[0], - &di[0], sizeof(uint8_t)); + &di[0], flow->first_load_sz); first_trans8(flow, inp[0], rm[0], &tr_lo[0], &tr_hi[0]); rm[0] = _mm256_test_epi32_mask(tr_lo[0], @@ -334,7 +334,7 @@ match_check_process_avx512x8x2(struct acl_flow_avx512 *flow, uint32_t fm[2], if (n[1] != 0) { inp[1] = get_next_bytes_avx512x8(flow, pdata[1], rm[1], - &di[1], sizeof(uint8_t)); + &di[1], flow->first_load_sz); first_trans8(flow, inp[1], rm[1], &tr_lo[1], &tr_hi[1]); rm[1] = _mm256_test_epi32_mask(tr_lo[1], @@ -360,9 +360,9 @@ search_trie_avx512x8x2(struct acl_flow_avx512 *flow) start_flow8(flow, CHAR_BIT, UINT8_MAX, &pdata[1], &idx[1], &di[1]); inp[0] = get_next_bytes_avx512x8(flow, pdata[0], UINT8_MAX, &di[0], - sizeof(uint8_t)); + flow->first_load_sz); inp[1] = get_next_bytes_avx512x8(flow, pdata[1], UINT8_MAX, &di[1], - sizeof(uint8_t)); + flow->first_load_sz); first_trans8(flow, inp[0], UINT8_MAX, &tr_lo[0], &tr_hi[0]); first_trans8(flow, inp[1], UINT8_MAX, &tr_lo[1], &tr_hi[1]); diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c index fdcb7a798..9f16d28ea 100644 --- a/lib/librte_acl/rte_acl.c +++ b/lib/librte_acl/rte_acl.c @@ -486,6 +486,7 @@ rte_acl_dump(const struct rte_acl_ctx *ctx) printf("acl context <%s>@%p\n", ctx->name, ctx); printf(" socket_id=%"PRId32"\n", ctx->socket_id); printf(" alg=%"PRId32"\n", ctx->alg); + printf(" first_load_sz=%"PRIu32"\n", ctx->first_load_sz); printf(" max_rules=%"PRIu32"\n", ctx->max_rules); printf(" rule_size=%"PRIu32"\n", ctx->rule_sz); printf(" num_rules=%"PRIu32"\n", ctx->num_rules); From patchwork Tue Sep 15 16:50:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77773 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9A5F4A04C7; Tue, 15 Sep 2020 18:52:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 696901C1BF; Tue, 15 Sep 2020 18:51:30 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 318231C12B for ; Tue, 15 Sep 2020 18:51:28 +0200 (CEST) IronPort-SDR: fFdrmwHMY8yfe4VQDz2NWPYO0OwwCtBYrVFx46QqEAus6qT822O8fGwB5QQ9mNKIxsAoB1mtf6 ePDJgdcU6qmw== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="146995909" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="146995909" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:27 -0700 IronPort-SDR: DpPgxqruy5VS7jxRXZ5zIWYmCZHfIXfPh2bc05ocOzVgO+yztWJSMlAXIcjWts4rTssGI36YcG nuaWPfMUO+eQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709574" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:26 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:24 +0100 Message-Id: <20200915165025.543-12-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 11/12] test/acl: add AVX512 classify support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add AVX512 classify to the test coverage. Signed-off-by: Konstantin Ananyev --- app/test/test_acl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/app/test/test_acl.c b/app/test/test_acl.c index 333b34757..11d69d2d5 100644 --- a/app/test/test_acl.c +++ b/app/test/test_acl.c @@ -278,8 +278,8 @@ test_classify_alg(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], /* set given classify alg, skip test if alg is not supported */ ret = rte_acl_set_ctx_classify(acx, alg); - if (ret == -ENOTSUP) - return 0; + if (ret != 0) + return (ret == -ENOTSUP) ? 0 : ret; /** * these will run quite a few times, it's necessary to test code paths @@ -341,6 +341,7 @@ test_classify_run(struct rte_acl_ctx *acx, struct ipv4_7tuple test_data[], RTE_ACL_CLASSIFY_AVX2, RTE_ACL_CLASSIFY_NEON, RTE_ACL_CLASSIFY_ALTIVEC, + RTE_ACL_CLASSIFY_AVX512, }; /* swap all bytes in the data to network order */ From patchwork Tue Sep 15 16:50:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 77774 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7ADE7A04C7; Tue, 15 Sep 2020 18:53:00 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ED6C81C12B; Tue, 15 Sep 2020 18:51:33 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 50A931C12F for ; Tue, 15 Sep 2020 18:51:30 +0200 (CEST) IronPort-SDR: kouCZ6erkYJSwTznh7t+j3keTKd2WeazuYqsevkQDYZ49Q3YKG3icneJ8Fnx2VD2rvNau8LdfJ K5n/U94qXfsw== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="146995917" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="146995917" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 09:51:30 -0700 IronPort-SDR: 8PC+mO32cgeqslgHI3Vaoni65VYG5ID9yaDpRrJK5tN7wcYtPhUeE+ObHGhsEa/8VqYyY2rSiV cakhgwOjSw9w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="306709582" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by orsmga006.jf.intel.com with ESMTP; 15 Sep 2020 09:51:28 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 15 Sep 2020 17:50:25 +0100 Message-Id: <20200915165025.543-13-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20200915165025.543-1-konstantin.ananyev@intel.com> References: <20200807162829.11690-1-konstantin.ananyev@intel.com> <20200915165025.543-1-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 12/12] app/acl: add AVX512 classify support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add ability to use AVX512 classify method. Signed-off-by: Konstantin Ananyev --- app/test-acl/main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/app/test-acl/main.c b/app/test-acl/main.c index d9b65517c..19b714335 100644 --- a/app/test-acl/main.c +++ b/app/test-acl/main.c @@ -81,6 +81,10 @@ static const struct acl_alg acl_alg[] = { .name = "altivec", .alg = RTE_ACL_CLASSIFY_ALTIVEC, }, + { + .name = "avx512", + .alg = RTE_ACL_CLASSIFY_AVX512, + }, }; static struct {