From patchwork Thu Mar 7 20:39:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Szczepanek X-Patchwork-Id: 138110 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5595043C21; Thu, 7 Mar 2024 21:40:00 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A789742F06; Thu, 7 Mar 2024 21:39:55 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 1D529402BA for ; Thu, 7 Mar 2024 21:39:53 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 71D4CC15; Thu, 7 Mar 2024 12:40:29 -0800 (PST) Received: from ampere-altra-2-1.usa.Arm.com (ampere-altra-2-1.usa.arm.com [10.118.91.158]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5FB4D3F8A4; Thu, 7 Mar 2024 12:39:52 -0800 (PST) From: Paul Szczepanek To: dev@dpdk.org Cc: Paul Szczepanek , Honnappa Nagarahalli , Kamalakshitha Aligeri , Nathan Brown Subject: [PATCH v8 1/4] ptr_compress: add pointer compression library Date: Thu, 7 Mar 2024 20:39:40 +0000 Message-Id: <20240307203943.188101-2-paul.szczepanek@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240307203943.188101-1-paul.szczepanek@arm.com> References: <20230927150854.3670391-2-paul.szczepanek@arm.com> <20240307203943.188101-1-paul.szczepanek@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add a new utility header for compressing pointers. The provided functions can store pointers in 32-bit or 16-bit offsets. The compression takes advantage of the fact that pointers are usually located in a limited memory region (like a mempool). We can compress them by converting them to offsets from a base memory address. Offsets can be stored in fewer bytes (dictated by the memory region size and alignment of the pointer). For example: an 8 byte aligned pointer which is part of a 32GB memory pool can be stored in 4 bytes. Suggested-by: Honnappa Nagarahalli Signed-off-by: Paul Szczepanek Signed-off-by: Kamalakshitha Aligeri Reviewed-by: Honnappa Nagarahalli Reviewed-by: Nathan Brown --- Depends-on: patch-138071 ("lib: allow libraries with no sources") lib/meson.build | 1 + lib/ptr_compress/meson.build | 4 + lib/ptr_compress/rte_ptr_compress.h | 266 ++++++++++++++++++++++++++++ lib/ptr_compress/version.map | 3 + 4 files changed, 274 insertions(+) create mode 100644 lib/ptr_compress/meson.build create mode 100644 lib/ptr_compress/rte_ptr_compress.h create mode 100644 lib/ptr_compress/version.map -- 2.25.1 diff --git a/lib/meson.build b/lib/meson.build index 0fcf3336d1..b46d3e15c6 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -14,6 +14,7 @@ libraries = [ 'argparse', 'telemetry', # basic info querying 'eal', # everything depends on eal + 'ptr_compress', 'ring', 'rcu', # rcu depends on ring 'mempool', diff --git a/lib/ptr_compress/meson.build b/lib/ptr_compress/meson.build new file mode 100644 index 0000000000..e92706a45f --- /dev/null +++ b/lib/ptr_compress/meson.build @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 Arm Limited + +headers = files('rte_ptr_compress.h') diff --git a/lib/ptr_compress/rte_ptr_compress.h b/lib/ptr_compress/rte_ptr_compress.h new file mode 100644 index 0000000000..97c084003d --- /dev/null +++ b/lib/ptr_compress/rte_ptr_compress.h @@ -0,0 +1,266 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Arm Limited + */ + +#ifndef RTE_PTR_COMPRESS_H +#define RTE_PTR_COMPRESS_H + +/** + * @file + * Pointer compression and decompression functions. + * + * When passing arrays full of pointers between threads, memory containing + * the pointers is copied multiple times which is especially costly between + * cores. These functions allow us to compress the pointers. + * + * Compression takes advantage of the fact that pointers are usually located in + * a limited memory region (like a mempool). We compress them by converting them + * to offsets from a base memory address. Offsets can be stored in fewer bytes. + * + * The compression functions come in two varieties: 32-bit and 16-bit. + * + * To determine how many bits are needed to compress the pointer calculate + * the biggest offset possible (highest value pointer - base pointer) + * and shift the value right according to alignment (shift by exponent of the + * power of 2 of alignment: aligned by 4 - shift by 2, aligned by 8 - shift by + * 3, etc.). The resulting value must fit in either 32 or 16 bits. + * + * For usage example and further explanation please see "Pointer Compression" in + * doc/guides/prog_guide/ptr_compress_lib.rst + */ + +#include +#include + +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Compress pointers into 32-bit offsets from base pointer. + * + * @note It is programmer's responsibility to ensure the resulting offsets fit + * into 32 bits. Alignment of the structures pointed to by the pointers allows + * us to drop bits from the offsets. This is controlled by the bit_shift + * parameter. This means that if structures are aligned by 8 bytes they must be + * within 32GB of the base pointer. If there is no such alignment guarantee they + * must be within 4GB. + * + * @param ptr_base + * A pointer used to calculate offsets of pointers in src_table. + * @param src_table + * A pointer to an array of pointers. + * @param dest_table + * A pointer to an array of compressed pointers returned by this function. + * @param n + * The number of objects to compress, must be strictly positive. + * @param bit_shift + * Byte alignment of memory pointed to by the pointers allows for + * bits to be dropped from the offset and hence widen the memory region that + * can be covered. This controls how many bits are right shifted. + **/ +static __rte_always_inline void +rte_ptr_compress_32(void *ptr_base, void **src_table, + uint32_t *dest_table, size_t n, uint8_t bit_shift) +{ + unsigned int i = 0; +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 + svuint64_t v_ptr_table; + svbool_t pg = svwhilelt_b64(i, n); + do { + v_ptr_table = svld1_u64(pg, (uint64_t *)src_table + i); + v_ptr_table = svsub_x(pg, v_ptr_table, (uint64_t)ptr_base); + v_ptr_table = svlsr_x(pg, v_ptr_table, bit_shift); + svst1w(pg, &dest_table[i], v_ptr_table); + i += svcntd(); + pg = svwhilelt_b64(i, n); + } while (i < n); +#elif defined __ARM_NEON && !defined RTE_ARCH_ARMv8_AARCH32 + uint64_t ptr_diff; + uint64x2_t v_ptr_table; + /* right shift is done by left shifting by negative int */ + int64x2_t v_shift = vdupq_n_s64(-bit_shift); + uint64x2_t v_ptr_base = vdupq_n_u64((uint64_t)ptr_base); + for (; i < (n & ~0x1); i += 2) { + v_ptr_table = vld1q_u64((const uint64_t *)src_table + i); + v_ptr_table = vsubq_u64(v_ptr_table, v_ptr_base); + v_ptr_table = vshlq_u64(v_ptr_table, v_shift); + vst1_u32(dest_table + i, vqmovn_u64(v_ptr_table)); + } + /* process leftover single item in case of odd number of n */ + if (unlikely(n & 0x1)) { + ptr_diff = RTE_PTR_DIFF(src_table[i], ptr_base); + dest_table[i] = (uint32_t) (ptr_diff >> bit_shift); + } +#else + uintptr_t ptr_diff; + for (; i < n; i++) { + ptr_diff = RTE_PTR_DIFF(src_table[i], ptr_base); + ptr_diff = ptr_diff >> bit_shift; + RTE_ASSERT(ptr_diff <= UINT32_MAX); + dest_table[i] = (uint32_t) ptr_diff; + } +#endif +} + +/** + * Decompress pointers from 32-bit offsets from base pointer. + * + * @param ptr_base + * A pointer which was used to calculate offsets in src_table. + * @param src_table + * A pointer to an array to compressed pointers. + * @param dest_table + * A pointer to an array of decompressed pointers returned by this function. + * @param n + * The number of objects to decompress, must be strictly positive. + * @param bit_shift + * Byte alignment of memory pointed to by the pointers allows for + * bits to be dropped from the offset and hence widen the memory region that + * can be covered. This controls how many bits are left shifted when pointers + * are recovered from the offsets. + **/ +static __rte_always_inline void +rte_ptr_decompress_32(void *ptr_base, uint32_t *src_table, + void **dest_table, size_t n, uint8_t bit_shift) +{ + unsigned int i = 0; +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 + svuint64_t v_ptr_table; + svbool_t pg = svwhilelt_b64(i, n); + do { + v_ptr_table = svld1uw_u64(pg, &src_table[i]); + v_ptr_table = svlsl_x(pg, v_ptr_table, bit_shift); + v_ptr_table = svadd_x(pg, v_ptr_table, (uint64_t)ptr_base); + svst1(pg, (uint64_t *)dest_table + i, v_ptr_table); + i += svcntd(); + pg = svwhilelt_b64(i, n); + } while (i < n); +#elif defined __ARM_NEON && !defined RTE_ARCH_ARMv8_AARCH32 + uint64_t ptr_diff; + uint64x2_t v_ptr_table; + int64x2_t v_shift = vdupq_n_s64(bit_shift); + uint64x2_t v_ptr_base = vdupq_n_u64((uint64_t)ptr_base); + for (; i < (n & ~0x1); i += 2) { + v_ptr_table = vmovl_u32(vld1_u32(src_table + i)); + v_ptr_table = vshlq_u64(v_ptr_table, v_shift); + v_ptr_table = vaddq_u64(v_ptr_table, v_ptr_base); + vst1q_u64((uint64_t *)dest_table + i, v_ptr_table); + } + /* process leftover single item in case of odd number of n */ + if (unlikely(n & 0x1)) { + ptr_diff = ((uint64_t) src_table[i]) << bit_shift; + dest_table[i] = RTE_PTR_ADD(ptr_base, ptr_diff); + } +#else + uintptr_t ptr_diff; + for (; i < n; i++) { + ptr_diff = ((uintptr_t) src_table[i]) << bit_shift; + dest_table[i] = RTE_PTR_ADD(ptr_base, ptr_diff); + } +#endif +} + +/** + * Compress pointers into 16-bit offsets from base pointer. + * + * @note It is programmer's responsibility to ensure the resulting offsets fit + * into 16 bits. Alignment of the structures pointed to by the pointers allows + * us to drop bits from the offsets. This is controlled by the bit_shift + * parameter. This means that if structures are aligned by 8 bytes they must be + * within 256KB of the base pointer. If there is no such alignment guarantee + * they must be within 64KB. + * + * @param ptr_base + * A pointer used to calculate offsets of pointers in src_table. + * @param src_table + * A pointer to an array of pointers. + * @param dest_table + * A pointer to an array of compressed pointers returned by this function. + * @param n + * The number of objects to compress, must be strictly positive. + * @param bit_shift + * Byte alignment of memory pointed to by the pointers allows for + * bits to be dropped from the offset and hence widen the memory region that + * can be covered. This controls how many bits are right shifted. + **/ +static __rte_always_inline void +rte_ptr_compress_16(void *ptr_base, void **src_table, + uint16_t *dest_table, size_t n, uint8_t bit_shift) +{ + + unsigned int i = 0; +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 + svuint64_t v_ptr_table; + svbool_t pg = svwhilelt_b64(i, n); + do { + v_ptr_table = svld1_u64(pg, (uint64_t *)src_table + i); + v_ptr_table = svsub_x(pg, v_ptr_table, (uint64_t)ptr_base); + v_ptr_table = svlsr_x(pg, v_ptr_table, bit_shift); + svst1h(pg, &dest_table[i], v_ptr_table); + i += svcntd(); + pg = svwhilelt_b64(i, n); + } while (i < n); +#else + uintptr_t ptr_diff; + for (; i < n; i++) { + ptr_diff = RTE_PTR_DIFF(src_table[i], ptr_base); + ptr_diff = ptr_diff >> bit_shift; + RTE_ASSERT(ptr_diff <= UINT16_MAX); + dest_table[i] = (uint16_t) ptr_diff; + } +#endif +} + +/** + * Decompress pointers from 16-bit offsets from base pointer. + * + * @param ptr_base + * A pointer which was used to calculate offsets in src_table. + * @param src_table + * A pointer to an array to compressed pointers. + * @param dest_table + * A pointer to an array of decompressed pointers returned by this function. + * @param n + * The number of objects to decompress, must be strictly positive. + * @param bit_shift + * Byte alignment of memory pointed to by the pointers allows for + * bits to be dropped from the offset and hence widen the memory region that + * can be covered. This controls how many bits are left shifted when pointers + * are recovered from the offsets. + **/ +static __rte_always_inline void +rte_ptr_decompress_16(void *ptr_base, uint16_t *src_table, + void **dest_table, size_t n, uint8_t bit_shift) +{ + unsigned int i = 0; +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 + svuint64_t v_ptr_table; + svbool_t pg = svwhilelt_b64(i, n); + do { + v_ptr_table = svld1uh_u64(pg, &src_table[i]); + v_ptr_table = svlsl_x(pg, v_ptr_table, bit_shift); + v_ptr_table = svadd_x(pg, v_ptr_table, (uint64_t)ptr_base); + svst1(pg, (uint64_t *)dest_table + i, v_ptr_table); + i += svcntd(); + pg = svwhilelt_b64(i, n); + } while (i < n); +#else + uintptr_t ptr_diff; + for (; i < n; i++) { + ptr_diff = ((uintptr_t) src_table[i]) << bit_shift; + dest_table[i] = RTE_PTR_ADD(ptr_base, ptr_diff); + } +#endif +} + +#ifdef __cplusplus +} +#endif + +#endif /* RTE_PTR_COMPRESS_H */ diff --git a/lib/ptr_compress/version.map b/lib/ptr_compress/version.map new file mode 100644 index 0000000000..5535c79061 --- /dev/null +++ b/lib/ptr_compress/version.map @@ -0,0 +1,3 @@ +DPDK_24 { + local: *; +};