Message ID | 1600306778-46470-1-git-send-email-wenzhuo.lu@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1761FA04B1; Thu, 17 Sep 2020 03:40:17 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9B0801D421; Thu, 17 Sep 2020 03:40:15 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 333EF1BEB2 for <dev@dpdk.org>; Thu, 17 Sep 2020 03:40:13 +0200 (CEST) IronPort-SDR: 1pKKcfknxssu2wSY+vD7+I4hcMHi6T077aLhRm+2ufygV1kP5VtX9isTk4tUSmwaz890wGTCWD RkYxacvz/s4w== X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="147357304" X-IronPort-AV: E=Sophos;i="5.76,434,1592895600"; d="scan'208";a="147357304" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2020 18:40:12 -0700 IronPort-SDR: Bl3tG2WwMGiEyxOpiRYRz2YOTM3HBAMUu4ELanrWLvoTCxrEBVJbGKNwg61Y60Aux49j/k9Tut BHqFYnPAU2KA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,434,1592895600"; d="scan'208";a="508214038" Received: from dpdk-wenzhuo-haswell.sh.intel.com ([10.67.111.137]) by fmsmga005.fm.intel.com with ESMTP; 16 Sep 2020 18:40:11 -0700 From: Wenzhuo Lu <wenzhuo.lu@intel.com> To: dev@dpdk.org Cc: Wenzhuo Lu <wenzhuo.lu@intel.com> Date: Thu, 17 Sep 2020 09:39:35 +0800 Message-Id: <1600306778-46470-1-git-send-email-wenzhuo.lu@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1599717545-106571-1-git-send-email-wenzhuo.lu@intel.com> References: <1599717545-106571-1-git-send-email-wenzhuo.lu@intel.com> Subject: [dpdk-dev] [PATCH v2 0/3] enable AVX512 for iavf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
enable AVX512 for iavf
|
|
Message
Wenzhuo Lu
Sept. 17, 2020, 1:39 a.m. UTC
AVX512 instructions is supported by more and more platforms. These instructions can be used in the data path to enhance the per-core performance of packet processing. Comparing with the existing implementation, this path set introduces some AVX512 instructions into the iavf data path, and we get a better per-code throughput. v2: Update meson.build. Repalce the deprecated 'buf_physaddr' by 'buf_iova'. Wenzhuo Lu (3): net/iavf: enable AVX512 for legacy RX net/iavf: enable AVX512 for flexible RX net/iavf: enable AVX512 for TX doc/guides/rel_notes/release_20_11.rst | 3 + drivers/net/iavf/iavf_ethdev.c | 3 +- drivers/net/iavf/iavf_rxtx.c | 69 +- drivers/net/iavf/iavf_rxtx.h | 18 + drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720 +++++++++++++++++++++++++++++++ drivers/net/iavf/meson.build | 17 + 6 files changed, 1818 insertions(+), 12 deletions(-) create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c
Comments
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu > Sent: Thursday, September 17, 2020 3:40 AM > > AVX512 instructions is supported by more and more platforms. These > instructions > can be used in the data path to enhance the per-core performance of > packet > processing. > Comparing with the existing implementation, this path set introduces > some AVX512 > instructions into the iavf data path, and we get a better per-code > throughput. > > v2: > Update meson.build. > Repalce the deprecated 'buf_physaddr' by 'buf_iova'. > > Wenzhuo Lu (3): > net/iavf: enable AVX512 for legacy RX > net/iavf: enable AVX512 for flexible RX > net/iavf: enable AVX512 for TX > > doc/guides/rel_notes/release_20_11.rst | 3 + > drivers/net/iavf/iavf_ethdev.c | 3 +- > drivers/net/iavf/iavf_rxtx.c | 69 +- > drivers/net/iavf/iavf_rxtx.h | 18 + > drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720 > +++++++++++++++++++++++++++++++ > drivers/net/iavf/meson.build | 17 + > 6 files changed, 1818 insertions(+), 12 deletions(-) > create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c > > -- > 1.9.3 > I am not sure I understand the full context here, so please bear with me if I'm completely off... With this patch set, it looks like the driver manipulates the mempool cache directly, bypassing the libararies encapsulating it. Isn't that going deeper into a library than expected... What if the implementation of the mempool library changes radically? And if there are performance gains to be achieved by using vector instructions for manipulating the mempool, perhaps your vector optimizations should go into the mempool library instead? Med venlig hilsen / kind regards - Morten Brørup
On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote: > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu > > Sent: Thursday, September 17, 2020 3:40 AM > > > > AVX512 instructions is supported by more and more platforms. These > > instructions > > can be used in the data path to enhance the per-core performance of > > packet > > processing. > > Comparing with the existing implementation, this path set introduces > > some AVX512 > > instructions into the iavf data path, and we get a better per-code > > throughput. > > > > v2: > > Update meson.build. > > Repalce the deprecated 'buf_physaddr' by 'buf_iova'. > > > > Wenzhuo Lu (3): > > net/iavf: enable AVX512 for legacy RX > > net/iavf: enable AVX512 for flexible RX > > net/iavf: enable AVX512 for TX > > > > doc/guides/rel_notes/release_20_11.rst | 3 + > > drivers/net/iavf/iavf_ethdev.c | 3 +- > > drivers/net/iavf/iavf_rxtx.c | 69 +- > > drivers/net/iavf/iavf_rxtx.h | 18 + > > drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720 > > +++++++++++++++++++++++++++++++ > > drivers/net/iavf/meson.build | 17 + > > 6 files changed, 1818 insertions(+), 12 deletions(-) > > create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c > > > > -- > > 1.9.3 > > > > I am not sure I understand the full context here, so please bear with me if I'm completely off... > > With this patch set, it looks like the driver manipulates the mempool cache directly, bypassing the libararies encapsulating it. > > Isn't that going deeper into a library than expected... What if the implementation of the mempool library changes radically? > > And if there are performance gains to be achieved by using vector instructions for manipulating the mempool, perhaps your vector optimizations should go into the mempool library instead? > Looking specifically at the descriptor re-arm code, the benefit from working off the mempool cache directly comes from saving loads by merging the code blocks, rather than directly from the vectorization itself - though the vectorization doesn't hurt. The original code having a separate mempool function worked roughly like below: 1. mempool code loads mbuf pointers from cache 2. mempool code writes mbuf pointers to the SW ring for the NIC 3. driver code loads the mempool pointers from the SW ring 4. driver code then does the rest of the descriptor re-arm. The benefit comes from eliminating step 3, the loads in the driver, which are dependent upon the previous stores. By having the driver itself read from the mempool cache (the code still uses mempool functions for every other part, since everything beyond the cache depends on the ring/stack/bucket implementation), we can have the stores go out, and while they are completing reuse the already-loaded data to do the descriptor rearm. Hope this clarifies things. /Bruce
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > Sent: Thursday, September 17, 2020 11:13 AM > > On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote: > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu > > > Sent: Thursday, September 17, 2020 3:40 AM > > > > > > AVX512 instructions is supported by more and more platforms. These > > > instructions > > > can be used in the data path to enhance the per-core performance of > > > packet > > > processing. > > > Comparing with the existing implementation, this path set > introduces > > > some AVX512 > > > instructions into the iavf data path, and we get a better per-code > > > throughput. > > > > > > v2: > > > Update meson.build. > > > Repalce the deprecated 'buf_physaddr' by 'buf_iova'. > > > > > > Wenzhuo Lu (3): > > > net/iavf: enable AVX512 for legacy RX > > > net/iavf: enable AVX512 for flexible RX > > > net/iavf: enable AVX512 for TX > > > > > > doc/guides/rel_notes/release_20_11.rst | 3 + > > > drivers/net/iavf/iavf_ethdev.c | 3 +- > > > drivers/net/iavf/iavf_rxtx.c | 69 +- > > > drivers/net/iavf/iavf_rxtx.h | 18 + > > > drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720 > > > +++++++++++++++++++++++++++++++ > > > drivers/net/iavf/meson.build | 17 + > > > 6 files changed, 1818 insertions(+), 12 deletions(-) > > > create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c > > > > > > -- > > > 1.9.3 > > > > > > > I am not sure I understand the full context here, so please bear with > me if I'm completely off... > > > > With this patch set, it looks like the driver manipulates the mempool > cache directly, bypassing the libararies encapsulating it. > > > > Isn't that going deeper into a library than expected... What if the > implementation of the mempool library changes radically? > > > > And if there are performance gains to be achieved by using vector > instructions for manipulating the mempool, perhaps your vector > optimizations should go into the mempool library instead? > > > > Looking specifically at the descriptor re-arm code, the benefit from > working off the mempool cache directly comes from saving loads by > merging > the code blocks, rather than directly from the vectorization itself - > though the vectorization doesn't hurt. The original code having a > separate > mempool function worked roughly like below: > > 1. mempool code loads mbuf pointers from cache > 2. mempool code writes mbuf pointers to the SW ring for the NIC > 3. driver code loads the mempool pointers from the SW ring > 4. driver code then does the rest of the descriptor re-arm. > > The benefit comes from eliminating step 3, the loads in the driver, > which > are dependent upon the previous stores. By having the driver itself > read > from the mempool cache (the code still uses mempool functions for every > other part, since everything beyond the cache depends on the > ring/stack/bucket implementation), we can have the stores go out, and > while > they are completing reuse the already-loaded data to do the descriptor > rearm. > > Hope this clarifies things. > > /Bruce > Thank you for the detailed explanation, Bruce. It makes sense to me now. So, Acked-By: Morten Brørup <mb@smartsharesystems.com> Med venlig hilsen / kind regards - Morten Brørup