From patchwork Tue Apr 9 19:06:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivian Kong X-Patchwork-Id: 52498 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4E48858CB; Tue, 9 Apr 2019 21:06:46 +0200 (CEST) Received: from mail-qt1-f194.google.com (mail-qt1-f194.google.com [209.85.160.194]) by dpdk.org (Postfix) with ESMTP id 896B154AE for ; Tue, 9 Apr 2019 21:06:37 +0200 (CEST) Received: by mail-qt1-f194.google.com with SMTP id x12so21153707qts.7 for ; Tue, 09 Apr 2019 12:06:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:reply-to; bh=nbi8ciZYHZiEavNR0hFcg7X/0A/LAumdrTKr6hSQXNc=; b=UalmOM7yPaxfxS7KEblIbqPKmWWP76Q5hVd2cXUYYiCP1JmxUpGzGEhyAHuNjnNUmD t5W3GSfc0KTK5hNiLhc8Nips/Q+yCtKGcSe7+e99rlw/liTN7D4yvDCPD9iz6XX01YmT LrXWLhhFAODJT3wR8KCZc8Y0Nfddf6ecZqB60KFvVPFJ5Isuy8QUEYudEvHVXeaj9Mg+ MXugLpAmJDYZYGmUT41hR56S1GgSGWa0CbH+hJcE+sR5EPQgpJ2dz7iyn5fDauiGYtJh VruK4R8djqd0XprcNcUB6d4M2Auvd9DIiWCb4hHuydrloZtFTX/ZLkDLBtsMKrAEy5xr 5uTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:reply-to; bh=nbi8ciZYHZiEavNR0hFcg7X/0A/LAumdrTKr6hSQXNc=; b=F0DV10y9QAVWtMESEIkxrz+pqRV3k8BVr+DIPOXGsH/uFnMLudRQRuy6+DX3KbAfdJ JPf4H0bJEsKvRgLrzJZ01jPjFYW+Pk0wGoty1pmqFo1uQVN/exJbdxOpoT31tw2IUyVP O3lA+ECI0cWCcX5eAA3g3iiYbbUv9XyvISMo97diXy0UcgZWCwQo5O0haVMPyDnjcpxx NJtRxPNTVRjPVAgAdMVudhOFujBvJI7O6M653BxPH1DoKo5yti5FnkOj4CCA6xRaveA4 gYptOrHpFRBRdJHdkCzlUyp9sSmmj9mkHKIm/Wfax1CaawxS4zbPFcO7MHQwTDCa1mVj Ar7A== X-Gm-Message-State: APjAAAWNNNoq2XJAXuFGnAxJdZ4upYaoWMCMoTNBC5hgSRgLAR76MOqV pQSBpuVG/jospGVB7bSYmqvt94hbspI= X-Google-Smtp-Source: APXvYqz4VugK7Re35n0MlYtcHzk3gkcMtltfTk8j9rFfRkZwPNh6i29xlQ7Eqk0+EEMdz9gOXdf68w== X-Received: by 2002:ac8:2c72:: with SMTP id e47mr31711658qta.189.1554836796500; Tue, 09 Apr 2019 12:06:36 -0700 (PDT) Received: from csz25116.canlab.ibm.com ([199.246.40.57]) by smtp.gmail.com with ESMTPSA id q23sm17934789qkc.16.2019.04.09.12.06.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Apr 2019 12:06:35 -0700 (PDT) From: Vivian Kong X-Google-Original-From: Vivian Kong To: dev@dpdk.org Date: Tue, 9 Apr 2019 15:06:21 -0400 Message-Id: <20190409190630.31975-4-vivkong@ca.ibm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190409190630.31975-1-vivkong@ca.ibm.com> References: <20190409190630.31975-1-vivkong@ca.ibm.com> Subject: [dpdk-dev] [RFC 03/12] acl: add support for s390x architecture X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: vivkong@ca.ibm.com List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add big endian support for s390x architecture. Signed-off-by: Vivian Kong --- app/test-acl/main.c | 4 ++ lib/librte_acl/Makefile | 2 + lib/librte_acl/acl_bld.c | 69 +++++++++++++++++++++++++++------ lib/librte_acl/acl_gen.c | 9 +++++ lib/librte_acl/acl_run_scalar.c | 8 ++++ lib/librte_acl/rte_acl.c | 4 ++ lib/librte_acl/rte_acl.h | 1 + 7 files changed, 85 insertions(+), 12 deletions(-) diff --git a/app/test-acl/main.c b/app/test-acl/main.c index b80179417..b6c5c9abd 100644 --- a/app/test-acl/main.c +++ b/app/test-acl/main.c @@ -81,6 +81,10 @@ static const struct acl_alg acl_alg[] = { .name = "altivec", .alg = RTE_ACL_CLASSIFY_ALTIVEC, }, + { + .name = "s390x", + .alg = RTE_ACL_CLASSIFY_S390X, + }, }; static struct { diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile index ea5edf00a..f87693d1e 100644 --- a/lib/librte_acl/Makefile +++ b/lib/librte_acl/Makefile @@ -30,6 +30,8 @@ CFLAGS_acl_run_neon.o += -Wno-maybe-uninitialized endif else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y) SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_altivec.c +else ifeq ($(CONFIG_RTE_ARCH_S390X),y) +SRCS-$(CONFIG_RTE_LIBRTE_ACL) += else SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c endif diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index b82191f42..16bf09304 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -777,6 +777,16 @@ acl_build_reset(struct rte_acl_ctx *ctx) sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories)); } +static uint32_t get_le_byte_index(uint32_t index, int size) +{ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + (void) size; + return index; +#else + return size - 1 - index; +#endif +} + static void acl_gen_range(struct acl_build_context *context, const uint8_t *hi, const uint8_t *lo, int size, int level, @@ -786,12 +796,19 @@ acl_gen_range(struct acl_build_context *context, uint32_t n; prev = root; + + /* On big endian min and max point to highest byte. + * Therefore iterate in opposite direction as on + * little endian with helper function. + */ for (n = size - 1; n > 0; n--) { + uint32_t le_idx = get_le_byte_index(n, size); node = acl_alloc_node(context, level++); - acl_add_ptr_range(context, prev, node, lo[n], hi[n]); + acl_add_ptr_range(context, prev, node, lo[le_idx], hi[le_idx]); prev = node; } - acl_add_ptr_range(context, prev, end, lo[0], hi[0]); + const uint32_t first_idx = get_le_byte_index(0, size); + acl_add_ptr_range(context, prev, end, lo[first_idx], hi[first_idx]); } static struct rte_acl_node * @@ -804,10 +821,16 @@ acl_gen_range_trie(struct acl_build_context *context, const uint8_t *lo = min; const uint8_t *hi = max; + /* On big endian min and max point to highest byte. + * Therefore iterate in opposite direction as on + * little endian. + */ + const int byte_index = get_le_byte_index(size-1, size); + *pend = acl_alloc_node(context, level+size); root = acl_alloc_node(context, level++); - if (lo[size - 1] == hi[size - 1]) { + if (lo[byte_index] == hi[byte_index]) { acl_gen_range(context, hi, lo, size, level, root, *pend); } else { uint8_t limit_lo[64]; @@ -819,27 +842,29 @@ acl_gen_range_trie(struct acl_build_context *context, memset(limit_hi, UINT8_MAX, RTE_DIM(limit_hi)); for (n = size - 2; n >= 0; n--) { - hi_ff = (uint8_t)(hi_ff & hi[n]); - lo_00 = (uint8_t)(lo_00 | lo[n]); + const uint32_t le_idx = get_le_byte_index(n, size); + hi_ff = (uint8_t)(hi_ff & hi[le_idx]); + lo_00 = (uint8_t)(lo_00 | lo[le_idx]); } if (hi_ff != UINT8_MAX) { - limit_lo[size - 1] = hi[size - 1]; + limit_lo[byte_index] = hi[byte_index]; acl_gen_range(context, hi, limit_lo, size, level, root, *pend); } if (lo_00 != 0) { - limit_hi[size - 1] = lo[size - 1]; + limit_hi[byte_index] = lo[byte_index]; acl_gen_range(context, limit_hi, lo, size, level, root, *pend); } - if (hi[size - 1] - lo[size - 1] > 1 || + if (hi[byte_index] - lo[byte_index] > 1 || lo_00 == 0 || hi_ff == UINT8_MAX) { - limit_lo[size-1] = (uint8_t)(lo[size-1] + (lo_00 != 0)); - limit_hi[size-1] = (uint8_t)(hi[size-1] - + limit_lo[byte_index] = (uint8_t)(lo[byte_index] + + (lo_00 != 0)); + limit_hi[byte_index] = (uint8_t)(hi[byte_index] - (hi_ff != UINT8_MAX)); acl_gen_range(context, limit_hi, limit_lo, size, level, root, *pend); @@ -863,13 +888,17 @@ acl_gen_mask_trie(struct acl_build_context *context, root = acl_alloc_node(context, level++); prev = root; + /* On big endian val and msk point to highest byte. + * Therefore iterate in opposite direction as on + * little endian with helper function + */ for (n = size - 1; n >= 0; n--) { + uint32_t le_idx = get_le_byte_index(n, size); node = acl_alloc_node(context, level++); - acl_gen_mask(&bits, val[n] & msk[n], msk[n]); + acl_gen_mask(&bits, val[le_idx] & msk[le_idx], msk[le_idx]); acl_add_ptr(context, prev, node, &bits); prev = node; } - *pend = prev; return root; } @@ -927,6 +956,14 @@ build_trie(struct acl_build_context *context, struct rte_acl_build_rule *head, fld->mask_range.u32, rule->config->defs[n].size); + /* Fields are aligned highest to lowest bit. + * Masked needs to be shifted to follow same + * convention + */ +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + mask = mask << 32; +#endif + /* gen a mini-trie for this field */ merge = acl_gen_mask_trie(context, &fld->value, @@ -1017,6 +1054,14 @@ acl_calc_wildness(struct rte_acl_build_rule *head, uint32_t bit_len = CHAR_BIT * config->defs[n].size; uint64_t msk_val = RTE_LEN2MASK(bit_len, typeof(msk_val)); + + /* Fields are aligned highest to lowest bit. + * Masked needs to be shifted to follow same + * convention + */ + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + msk_val <<= 32; + double size = bit_len; int field_index = config->defs[n].field_index; const struct rte_acl_field *fld = rule->f->field + diff --git a/lib/librte_acl/acl_gen.c b/lib/librte_acl/acl_gen.c index 35a0140b4..b4e8df0ce 100644 --- a/lib/librte_acl/acl_gen.c +++ b/lib/librte_acl/acl_gen.c @@ -360,7 +360,16 @@ acl_gen_node(struct rte_acl_node *node, uint64_t *node_array, array_ptr = &node_array[index->quad_index]; acl_add_ptrs(node, array_ptr, no_match, 0); qtrp = (uint32_t *)node->transitions; + + /* Swap qtrp on big endian that transitions[0] + * is at least signifcant byte. + */ +#if __BYTE_ORDER == __ORDER_BIG_ENDIAN__ + node->node_index = __bswap_32(qtrp[0]); +#else node->node_index = qtrp[0]; +#endif + node->node_index <<= sizeof(index->quad_index) * CHAR_BIT; node->node_index |= index->quad_index | node->node_type; index->quad_index += node->fanout; diff --git a/lib/librte_acl/acl_run_scalar.c b/lib/librte_acl/acl_run_scalar.c index 3d61e7940..9f01ef8d8 100644 --- a/lib/librte_acl/acl_run_scalar.c +++ b/lib/librte_acl/acl_run_scalar.c @@ -141,6 +141,14 @@ rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **data, input0 = GET_NEXT_4BYTES(parms, 0); input1 = GET_NEXT_4BYTES(parms, 1); + /* input needs to be swapped because the rules get + * swapped while building the trie. + */ +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + input0 = __bswap_32(input0); + input1 = __bswap_32(input1); +#endif + for (n = 0; n < 4; n++) { transition0 = scalar_transition(flows.trans, diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c index c436a9bfd..6d4d3f239 100644 --- a/lib/librte_acl/rte_acl.c +++ b/lib/librte_acl/rte_acl.c @@ -64,6 +64,8 @@ static const rte_acl_classify_t classify_fns[] = { [RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_avx2, [RTE_ACL_CLASSIFY_NEON] = rte_acl_classify_neon, [RTE_ACL_CLASSIFY_ALTIVEC] = rte_acl_classify_altivec, + /* use scalar for s390x for now */ + [RTE_ACL_CLASSIFY_S390X] = rte_acl_classify_scalar, }; /* by default, use always available scalar code path. */ @@ -103,6 +105,8 @@ RTE_INIT(rte_acl_init) alg = RTE_ACL_CLASSIFY_NEON; #elif defined(RTE_ARCH_PPC_64) alg = RTE_ACL_CLASSIFY_ALTIVEC; +#elif defined(RTE_ARCH_S390X) + alg = RTE_ACL_CLASSIFY_S390X; #else #ifdef CC_AVX2_SUPPORT if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h index aa22e70c6..9537196db 100644 --- a/lib/librte_acl/rte_acl.h +++ b/lib/librte_acl/rte_acl.h @@ -241,6 +241,7 @@ enum rte_acl_classify_alg { RTE_ACL_CLASSIFY_AVX2 = 3, /**< requires AVX2 support. */ RTE_ACL_CLASSIFY_NEON = 4, /**< requires NEON support. */ RTE_ACL_CLASSIFY_ALTIVEC = 5, /**< requires ALTIVEC support. */ + RTE_ACL_CLASSIFY_S390X = 6, /**< requires s390x z13 support. */ RTE_ACL_CLASSIFY_NUM /* should always be the last one. */ };