Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/29464/?format=api
https://patches.dpdk.org/api/patches/29464/?format=api", "web_url": "https://patches.dpdk.org/project/dpdk/patch/B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com/", "project": { "id": 1, "url": "https://patches.dpdk.org/api/projects/1/?format=api", "name": "DPDK", "link_name": "dpdk", "list_id": "dev.dpdk.org", "list_email": "dev@dpdk.org", "web_url": "http://core.dpdk.org", "scm_url": "git://dpdk.org/dpdk", "webscm_url": "http://git.dpdk.org/dpdk", "list_archive_url": "https://inbox.dpdk.org/dev", "list_archive_url_format": "https://inbox.dpdk.org/dev/{}", "commit_url_format": "" }, "msgid": "<B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com", "date": "2017-10-02T00:12:25", "name": "[dpdk-dev,v3,1/3] eal/x86: run-time dispatch over memcpy", "commit_ref": null, "pull_url": null, "state": "superseded", "archived": true, "hash": "9accf0ae97965018b5a9c37920c946a38f54fd7d", "submitter": { "id": 798, "url": "https://patches.dpdk.org/api/people/798/?format=api", "name": "Li, Xiaoyun", "email": "xiaoyun.li@intel.com" }, "delegate": null, "mbox": "https://patches.dpdk.org/project/dpdk/patch/B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com/mbox/", "series": [], "comments": "https://patches.dpdk.org/api/patches/29464/comments/", "check": "warning", "checks": "https://patches.dpdk.org/api/patches/29464/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [127.0.0.1])\n\tby dpdk.org (Postfix) with ESMTP id 173A91B1B9;\n\tMon, 2 Oct 2017 02:12:32 +0200 (CEST)", "from mga02.intel.com (mga02.intel.com [134.134.136.20])\n\tby dpdk.org (Postfix) with ESMTP id 42FA61B1B1\n\tfor <dev@dpdk.org>; Mon, 2 Oct 2017 02:12:30 +0200 (CEST)", "from orsmga001.jf.intel.com ([10.7.209.18])\n\tby orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t01 Oct 2017 17:12:29 -0700", "from fmsmsx103.amr.corp.intel.com ([10.18.124.201])\n\tby orsmga001.jf.intel.com with ESMTP; 01 Oct 2017 17:12:28 -0700", "from fmsmsx152.amr.corp.intel.com (10.18.125.5) by\n\tFMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP\n\tServer (TLS) id 14.3.319.2; Sun, 1 Oct 2017 17:12:28 -0700", "from shsmsx152.ccr.corp.intel.com (10.239.6.52) by\n\tFMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server\n\t(TLS) id 14.3.319.2; Sun, 1 Oct 2017 17:12:28 -0700", "from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by\n\tSHSMSX152.ccr.corp.intel.com ([169.254.6.93]) with mapi id\n\t14.03.0319.002; Mon, 2 Oct 2017 08:12:26 +0800" ], "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos; i=\"5.42,466,1500966000\"; d=\"scan'208\";\n\ta=\"1177598932\"", "From": "\"Li, Xiaoyun\" <xiaoyun.li@intel.com>", "To": "\"Ananyev, Konstantin\" <konstantin.ananyev@intel.com>, \"Richardson, Bruce\"\n\t<bruce.richardson@intel.com>", "CC": "\"Lu, Wenzhuo\" <wenzhuo.lu@intel.com>, \"Zhang, Helin\"\n\t<helin.zhang@intel.com>, \"dev@dpdk.org\" <dev@dpdk.org>", "Thread-Topic": "[PATCH v3 1/3] eal/x86: run-time dispatch over memcpy", "Thread-Index": "AQHTNps4d1S4LinEtUS+0otIwfUeF6LPKmYAgACLkxA=", "Date": "Mon, 2 Oct 2017 00:12:25 +0000", "Message-ID": "<B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com>", "References": "<1506411689-94690-1-git-send-email-xiaoyun.li@intel.com>\n\t<1506411689-94690-2-git-send-email-xiaoyun.li@intel.com>\n\t<2601191342CEEE43887BDE71AB9772585FAA2BD2@IRSMSX103.ger.corp.intel.com>", "In-Reply-To": "<2601191342CEEE43887BDE71AB9772585FAA2BD2@IRSMSX103.ger.corp.intel.com>", "Accept-Language": "en-US", "Content-Language": "en-US", "X-MS-Has-Attach": "", "X-MS-TNEF-Correlator": "", "dlp-product": "dlpe-windows", "dlp-version": "11.0.0.116", "dlp-reaction": "no-action", "x-ctpclassification": "CTP_IC", "x-titus-metadata-40": "eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYWFiOTY0ZGMtZDVmYS00MzI1LTk0YTctMWEzODU4ZmY5MDE0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IndyakxKcE9ZeVdJejdSNFBlOFc0blRuWDRTdkJWdkUrN28zS2h2NmFENTg9In0=", "x-originating-ip": "[10.239.127.40]", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "quoted-printable", "MIME-Version": "1.0", "Subject": "Re: [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "content": "Hi\n\n> That means that each file with '#include <re_memcpy.h> will have its own\n> copy\n> of that function:\n> $ objdump -d x86_64-native-linuxapp-gcc/app/testpmd | grep\n> '<rte_memcpy_init>:' | sort -u | wc -l\n> 233\n> Same story for rte_memcpy_ptr and rte_memcpy_DEFAULT, etc...\n> Obviously we need (and want) only one copy of that stuff per binary.\n> \n> > +#ifdef CC_SUPPORT_AVX2\n> \n> Why do you assume this macro will be defined?\n> By whom?\n> There is no such macro with gcc:\n> $ gcc -march=native -dM -E - </dev/null 2>&1 | grep AVX2\n> #define __AVX2__ 1\n> , and you don't define it yourself.\n> When building with '-march=native' on BDW only rte_memcpy_DEFAULT get\n> compiled.\n> \n\nI defined it myself. But when I sort the patch, I forgot to modify the file in this version. Sorry about that.\nIt should be like this. To check whether the compiler supports AVX2 or AVX512.\n\n\n\n> To summarize: as I understand the goal of that patch was\n> (assuming that our current rte_memcpy() implementation is good in terms of\n> both performance and functionality):\n> 1. Based on current rte_memcpy() implementation define 3 x86 arch specific\n> rte_memcpy flavors:\n> a) rte_memcpy_SSE\n> b) rte_memcpy_AVX2\n> c) rte_memcpy_AVX512\n> 2. Select appropriate flavor based on current HW at runtime,\n> i.e. both 3 flavors should be present in the binary and selection should be\n> made\n> at program startup.\n> \n> As I can see none of the goals was achieved with the current patch,\n> instead a lot of redundant code was introduced.\n> So I think it is NACK for the current version.\n> What I think need to be done instead:\n> \n> 1. mv lib/librte_eal/common/include/arch/x86/rte_memcpy.h\n> lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h\n> 2. inside rte_memcpy_internal.h rename rte_memcpy() into\n> rte_memcpy_internal().\n> 3. create 3 files:\n> rte_memcpy_sse.c\n> rte_memcpy_avx2.c\n> rte_memcpy_avx512.c\n> \n> Inside each of these files we define corresponding rte_memcpy_xxx()\n> function.\n> I.E:\n> rte_memcpy_avx2.c:\n> ....\n> #ifndef RTE_MACHINE_CPUFLAG_AVX2\n> #error \"no avx2 support\"\n> endif\n> \n> #include \"rte_memcpy_internal.h\"\n> ...\n> \n> void *\n> rte_memcpy_avx2(void *dst, const void *src, size_t n)\n> {\n> return rte_memcpy_internal(dst, src, n);\n> }\n> \n> 4. Make changes inside lib/librte_eal/Makefile to ensure that each of\n> rte_memcpy_xxx()\n> get build with appropriate -march flags (I.E: avx2 with -mavx2, etc.)\n> You can use librte_acl/Makefile as a reference.\n> \n> 5. Create rte_memcpy.c and put rte_memcpy_ptr/rte_memcpy_init()\n> definitions in that file.\n> 6. Create new rte_memcpy.h and define rte_memcpy() in it:\n> \n> ...\n> #include <rte_memcpy_internal.h>\n> ...\n> \n> +#define RTE_X86_MEMCPY_THRESH 128\n> static inline void *\n> rte_memcpy(void *dst, const void *src, size_t n)\n> {\n> \tif (n <= RTE_X86_MEMCPY_THRESH)\n> return rte_memcpy_internal(dst, src, n);\n> else\n> \t return (*rte_memcpy_ptr)(dst, src, n);\n> }\n> \n> 7. Test it properly - i.e. build dpdk with default target and make sure each of\n> 3 flavors\n> could be selected properly at runtime based on underlying arch.\n> \n> 8. As a possible future improvement - with such changes we don't need a\n> generic inline\n> implementation. We can think about creating a faster version that need to\n> copy\n> <= 128B.\n> \n> Konstantin\n> \n\nWill modify it in next version.\nThanks.", "diff": "diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk\nindex a813c91..92399ec 100644\n--- a/mk/rte.cpuflags.mk\n+++ b/mk/rte.cpuflags.mk\n@@ -141,3 +141,17 @@ space:= $(empty) $(empty)\n CPUFLAGSTMP1 := $(addprefix RTE_CPUFLAG_,$(CPUFLAGS))\n CPUFLAGSTMP2 := $(subst $(space),$(comma),$(CPUFLAGSTMP1))\n CPUFLAGS_LIST := -DRTE_COMPILE_TIME_CPUFLAGS=$(CPUFLAGSTMP2)\n+\n+# Check if the compiler supports AVX512.\n+CC_SUPPORT_AVX512 := $(shell $(CC) -march=skylake-avx512 -dM -E - < /dev/null 2>&1 | grep -q AVX512 && echo 1)\n+ifeq ($(CC_SUPPORT_AVX512),1)\n+ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)\n+MACHINE_CFLAGS += -DCC_SUPPORT_AVX512\n+endif\n+endif\n+\n+# Check if the compiler supports AVX2.\n+CC_SUPPORT_AVX2 := $(shell $(CC) -march=core-avx2 -dM -E - < /dev/null 2>&1 | grep -q AVX2 && echo 1)\n+ifeq ($(CC_SUPPORT_AVX2),1)\n+MACHINE_CFLAGS += -DCC_SUPPORT_AVX2\n+endif\n", "prefixes": [ "dpdk-dev", "v3", "1/3" ] }{ "id": 29464, "url": "