From patchwork Tue Jul 13 06:49:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 95725 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 56AF2A0C4A; Tue, 13 Jul 2021 08:49:52 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CFDC0411E8; Tue, 13 Jul 2021 08:49:41 +0200 (CEST) Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam08on2088.outbound.protection.outlook.com [40.107.100.88]) by mails.dpdk.org (Postfix) with ESMTP id 55506411D8; Tue, 13 Jul 2021 08:49:40 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OsPCd1dd+eWw0QruW/fDEF08DP8PWGfvx29Tu/VqNzOYmQyWg16hqPboKENyQiKZt+tiUU5+Lxr0qCoOP8gS3C1xmCJKCQIAryuND3Rk6b+UT8ksPZmOqsXPwjhyfKj0gljBMVX6ySq1gyHbuBpTY1Q1qdBW9m0LkSQ6Rm8DDLHW4s6RbmDEBJhxLTHl44kjvGooJb5HYKzoIZYqhwpCoCWlei20I0a7PZYcgyyn1cHmjXh99PIpJHPZoeqZ4DGMAMyI/demT6qyQjb/5WdJDToPsmAaMbt9OEOoAmXEcrWys8VwOo/ftC3Cev+fIyVhkA3ywrDcN1rUQsJ7Gdzddg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iyCJ5RHi1etyom745svnhtwm9TvPTgafMQheU9WzybY=; b=nlf1RM9NYYDiP1ZZesvRLFgs1oz717a+eO/NnXMrvChwPEd4SN8F2MCfYTRiKbRbc5u1Fw/Y0dgwcuTvYILyERSJjq15pRsOGVBO42elvjMHgqEUrb9ljo5+LqQHsKoxAkqW7vF8Xmi/HtSMA8TvxiI3/driBG5I0roS3C3ihrO04FdhvO1GvarwtvTcN6akP5yD2mqkliMGiXP5VQi8Esn6funu/5jQqLC1Q2xyaTZp6tjiLBCkNsUxE+j8kAN2TXtW1GIiuF9C0oC25c1H7Pqvz0EyQRNqVDyv45u8FHpkw3uXyXVWtI+sSficDfTcmwon/cd5fRW4Gb+AsjHgxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iyCJ5RHi1etyom745svnhtwm9TvPTgafMQheU9WzybY=; b=mYZgifL5ew6tveDX5FioEx/h7agB8lNPgxjuKnu47U82Uc3LEHi+wKQ974QdOECEM5DKDZiov3qSU9pQGPsc04j3xdfSljbRbzsokonaZ7mHPuNjCW0K6MLZkoQyaHMpI0UjBbOjme6voIYqo98Bfmyxrrsy0lPvsksQFPysPfI7XoGLkbNfAY6hxRamBKXT3A2QVBgMPwozMCvym25NqLS7+9PMdg7K37T7hIzZNwav3uqtuewPhuXSbX+jNTrLRVEIVP+jTklAZuGbVFDWY9BteD1qMktwiiIet+5o0AAidumItysGg8ZaFs5rKpiOZKow3VuZkzISOmDD+HlNjw== Received: from DM6PR03CA0079.namprd03.prod.outlook.com (2603:10b6:5:333::12) by BL0PR12MB5508.namprd12.prod.outlook.com (2603:10b6:208:1c1::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4308.25; Tue, 13 Jul 2021 06:49:39 +0000 Received: from DM6NAM11FT065.eop-nam11.prod.protection.outlook.com (2603:10b6:5:333:cafe::c5) by DM6PR03CA0079.outlook.office365.com (2603:10b6:5:333::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4308.20 via Frontend Transport; Tue, 13 Jul 2021 06:49:38 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by DM6NAM11FT065.mail.protection.outlook.com (10.13.172.109) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4308.20 via Frontend Transport; Tue, 13 Jul 2021 06:49:38 +0000 Received: from nvidia.com (172.20.187.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 13 Jul 2021 06:49:34 +0000 From: Eli Britstein To: CC: Ilya Maximets , Gaetan Rivet , Majd Dibbiny , Asaf Penso , "Thomas Monjalon" , Harry Van Haaren , Date: Tue, 13 Jul 2021 09:49:10 +0300 Message-ID: <20210713064910.12793-4-elibr@nvidia.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20210713064910.12793-1-elibr@nvidia.com> References: <20210713064910.12793-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.5] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 319b0c21-3a4f-4a7c-ad33-08d945ca6292 X-MS-TrafficTypeDiagnostic: BL0PR12MB5508: X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:4714; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Y+AdVwHkaewuVcERP9DXgoUXHlyJUbnTBN/ggFtbDVkj5LKCAZyxLDWhg8rT2qLPsDIj4HFwN2YPI0HNxJ08wsNiKOpoYFtEUEeSjqY3TgJtNHLbUR3kS0eMvOoSF1tJbSA18c7JdPfpVQizpkTiPRYjcGDttvfm0Tq1rITfszlN3gR4ZWdTeKmk8a39GB84B16Ka3xs1AuFk/EnjNLsHjSXMukNrLCgghcLnyrC6jRYLFrfklKcalxl+u1ynaCyU+v2xwUUBbisgiUzvVA23/H5ORkNv538grIk0+B4EaVRnPInuauRojLeiVTls9GmRtB4vHcyFxlPNr7HEDeiIkn0hVqxhrchM/n45FjJCu0wAGm34Zni34HmO3kZtkCcOqrNQiqlUhENuPkxM/3wN6XZg7vpSEMkqfQBSROvpgt+VMphxzqzoZpdXyVKUllHpStkfLjwj004nfSg3eZusBjd7Z1GsffO8iujzq6oFz3WsXTEdhTWgwyNF4JfZ+O8LkwJY3GVnCqd9jOBbIuImRyPdzyUjK6M3pner6i66RLFUH1Nm+16IMvITnZoUvNIX02LcRmVg5j+njSRpCKyO8lbZTFJ2wSzO9XWposg+G4lufYoanrw2q2trmltRuiQR+3N+PhXuqhfS0GkIRAyGcxTxYCQnINDarQNpWmxhlDeZaTEF6Bw3H8NYAovPliaFPimKu4oFEOvVgMQB6Hxm/XtkfQxubkZl/47wAC0UoA= X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(346002)(376002)(39860400002)(136003)(396003)(36840700001)(46966006)(2906002)(36756003)(82310400003)(54906003)(34020700004)(70206006)(55016002)(5660300002)(356005)(6286002)(36906005)(47076005)(7636003)(70586007)(316002)(36860700001)(6916009)(16526019)(83380400001)(2616005)(8936002)(1076003)(26005)(82740400003)(86362001)(6666004)(4326008)(186003)(478600001)(8676002)(336012)(426003)(7696005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jul 2021 06:49:38.7536 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 319b0c21-3a4f-4a7c-ad33-08d945ca6292 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT065.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB5508 Subject: [dpdk-dev] [PATCH 3/3] eal/x86: avoid cast-align warning in x86 memcpy functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Functions and macros in x86 rte_memcpy.h may cause cast-align warnings, when using gcc flags '-Werror -Wcast-align': For example: .../include/rte_memcpy.h:499:42: error: cast increases required alignment of target type [-Werror=cast-align] 499 | xmm0 = _mm_loadu_si128((const __m128i *)(const __m128i *)src); | ^ As the code assumes correct alignment, add first a (void *) or (const void *) castings, to avoid the warnings. Fixes: 9484092baad3 ("eal/x86: optimize memcpy for AVX512 platforms") Cc: stable@dpdk.org Signed-off-by: Eli Britstein --- lib/eal/x86/include/rte_memcpy.h | 80 ++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 36 deletions(-) diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h index 79f381dd9b..1b6c6e585f 100644 --- a/lib/eal/x86/include/rte_memcpy.h +++ b/lib/eal/x86/include/rte_memcpy.h @@ -303,8 +303,8 @@ rte_mov16(uint8_t *dst, const uint8_t *src) { __m128i xmm0; - xmm0 = _mm_loadu_si128((const __m128i *)src); - _mm_storeu_si128((__m128i *)dst, xmm0); + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)src); + _mm_storeu_si128((__m128i *)(void *)dst, xmm0); } /** @@ -316,8 +316,8 @@ rte_mov32(uint8_t *dst, const uint8_t *src) { __m256i ymm0; - ymm0 = _mm256_loadu_si256((const __m256i *)src); - _mm256_storeu_si256((__m256i *)dst, ymm0); + ymm0 = _mm256_loadu_si256((const __m256i *)(const void *)src); + _mm256_storeu_si256((__m256i *)(void *)dst, ymm0); } /** @@ -354,16 +354,24 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) __m256i ymm0, ymm1, ymm2, ymm3; while (n >= 128) { - ymm0 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 0 * 32)); + ymm0 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 0 * 32)); n -= 128; - ymm1 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 1 * 32)); - ymm2 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 2 * 32)); - ymm3 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 3 * 32)); + ymm1 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 1 * 32)); + ymm2 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 2 * 32)); + ymm3 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 3 * 32)); src = (const uint8_t *)src + 128; - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), ymm0); - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), ymm1); - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 2 * 32), ymm2); - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 3 * 32), ymm3); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 0 * 32), ymm0); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 1 * 32), ymm1); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 2 * 32), ymm2); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 3 * 32), ymm3); dst = (uint8_t *)dst + 128; } } @@ -496,8 +504,8 @@ rte_mov16(uint8_t *dst, const uint8_t *src) { __m128i xmm0; - xmm0 = _mm_loadu_si128((const __m128i *)(const __m128i *)src); - _mm_storeu_si128((__m128i *)dst, xmm0); + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)src); + _mm_storeu_si128((__m128i *)(void *)dst, xmm0); } /** @@ -581,25 +589,25 @@ rte_mov256(uint8_t *dst, const uint8_t *src) __extension__ ({ \ size_t tmp; \ while (len >= 128 + 16 - offset) { \ - xmm0 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 0 * 16)); \ + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 0 * 16)); \ len -= 128; \ - xmm1 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 1 * 16)); \ - xmm2 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 2 * 16)); \ - xmm3 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 3 * 16)); \ - xmm4 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 4 * 16)); \ - xmm5 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 5 * 16)); \ - xmm6 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 6 * 16)); \ - xmm7 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 7 * 16)); \ - xmm8 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 8 * 16)); \ + xmm1 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 1 * 16)); \ + xmm2 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 2 * 16)); \ + xmm3 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 3 * 16)); \ + xmm4 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 4 * 16)); \ + xmm5 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 5 * 16)); \ + xmm6 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 6 * 16)); \ + xmm7 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 7 * 16)); \ + xmm8 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 8 * 16)); \ src = (const uint8_t *)src + 128; \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 2 * 16), _mm_alignr_epi8(xmm3, xmm2, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 3 * 16), _mm_alignr_epi8(xmm4, xmm3, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 4 * 16), _mm_alignr_epi8(xmm5, xmm4, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 5 * 16), _mm_alignr_epi8(xmm6, xmm5, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 6 * 16), _mm_alignr_epi8(xmm7, xmm6, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 7 * 16), _mm_alignr_epi8(xmm8, xmm7, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 2 * 16), _mm_alignr_epi8(xmm3, xmm2, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 3 * 16), _mm_alignr_epi8(xmm4, xmm3, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 4 * 16), _mm_alignr_epi8(xmm5, xmm4, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 5 * 16), _mm_alignr_epi8(xmm6, xmm5, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 6 * 16), _mm_alignr_epi8(xmm7, xmm6, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 7 * 16), _mm_alignr_epi8(xmm8, xmm7, offset)); \ dst = (uint8_t *)dst + 128; \ } \ tmp = len; \ @@ -609,13 +617,13 @@ __extension__ ({ dst = (uint8_t *)dst + tmp; \ if (len >= 32 + 16 - offset) { \ while (len >= 32 + 16 - offset) { \ - xmm0 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 0 * 16)); \ + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 0 * 16)); \ len -= 32; \ - xmm1 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 1 * 16)); \ - xmm2 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 2 * 16)); \ + xmm1 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 1 * 16)); \ + xmm2 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 2 * 16)); \ src = (const uint8_t *)src + 32; \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ dst = (uint8_t *)dst + 32; \ } \ tmp = len; \