Message ID | 20200827101008.76906-1-jia.guo@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B7705A04B1; Thu, 27 Aug 2020 12:13:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0032C1C193; Thu, 27 Aug 2020 12:13:11 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id A06F01C191 for <dev@dpdk.org>; Thu, 27 Aug 2020 12:13:09 +0200 (CEST) IronPort-SDR: fikcGuVvN4pobjIyC0TQADv1mCXam4grn2KvPa4jkB0mLM+t9fwpzlLjXrI18V86i42Tb2wtg1 EiVTUqgoD6dA== X-IronPort-AV: E=McAfee;i="6000,8403,9725"; a="155712647" X-IronPort-AV: E=Sophos;i="5.76,359,1592895600"; d="scan'208";a="155712647" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Aug 2020 03:13:08 -0700 IronPort-SDR: y2voYv3DtP36yh1w2FkIPGKrxRrwic3zHruNkDQCm3+RvGVYpowW/yTYRoaA5Fi8sY7OYgzYnM J3P1GfRTOkSw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,359,1592895600"; d="scan'208";a="475150416" Received: from npg-dpdk-cvl-jeffguo-01.sh.intel.com ([10.67.111.128]) by orsmga005.jf.intel.com with ESMTP; 27 Aug 2020 03:13:05 -0700 From: Jeff Guo <jia.guo@intel.com> To: qiming.yang@intel.com, beilei.xing@intel.com, wei.zhao1@intel.com, qi.z.zhang@intel.com, jingjing.wu@intel.com Cc: bruce.richardson@intel.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com, mb@smartsharesystems.com, ferruh.yigit@intel.com, barbette@kth.se Date: Thu, 27 Aug 2020 18:10:03 +0800 Message-Id: <20200827101008.76906-1-jia.guo@intel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
maximize vector rx burst for PMDs
|
|
Message
Guo, Jia
Aug. 27, 2020, 10:10 a.m. UTC
The limitation of burst size in vector rx was removed, since it should retrieve as much received packets as possible. And also the scattered receive path should use a wrapper function to achieve the goal of burst maximizing. This patch set aims to maximize vector rx burst for for ixgbe/i40e/ice/iavf/fm10k PMDs. v2->v1: 1:add fm10k driver case 2:refine some doc Jeff Guo (5): net/ixgbe: maximize vector rx burst for ixgbe net/i40e: maximize vector rx burst for i40e net/ice: maximize vector rx burst for ice net/iavf: maximize vector rx burst for iavf net/fm10k: maximize vector rx burst for fm10k drivers/net/fm10k/fm10k_rxtx_vec.c | 39 +++++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 61 ++++++++----- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 13 +-- drivers/net/i40e/i40e_rxtx_vec_neon.c | 55 +++++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 55 +++++++----- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +---- drivers/net/iavf/iavf_rxtx_vec_sse.c | 107 ++++++++++++++--------- drivers/net/ice/ice_rxtx_vec_avx2.c | 11 +-- drivers/net/ice/ice_rxtx_vec_sse.c | 49 +++++++---- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 72 +++++++-------- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 56 ++++++------ 11 files changed, 312 insertions(+), 227 deletions(-)
Comments
> From: Jeff Guo [mailto:jia.guo@intel.com] > Sent: Thursday, August 27, 2020 12:10 PM > > The limitation of burst size in vector rx was removed, since it should > retrieve as much received packets as possible. And also the scattered > receive path should use a wrapper function to achieve the goal of > burst maximizing. > > This patch set aims to maximize vector rx burst for for > ixgbe/i40e/ice/iavf/fm10k PMDs. > > v2->v1: > 1:add fm10k driver case > 2:refine some doc > I now noticed that the vector functions also does: nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP); I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33 (not 32) would only return 32 packets, even if more packets are available. (I assume that RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the requested packets using the non-vector function in order to support any nb_pkts value. That is, unless the rte_eth_rx_burst() API is extended with requirements to nb_pkts, as discussed in the other thread: http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3-MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266 Med venlig hilsen / kind regards - Morten Brørup
> -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of Morten Brørup > Sent: Thursday, August 27, 2020 20:38 > To: Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>; Xing, Beilei > <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, > Jingjing <jingjing.wu@intel.com> > Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; > Yigit, Ferruh <ferruh.yigit@intel.com>; barbette@kth.se > Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs > > > From: Jeff Guo [mailto:jia.guo@intel.com] > > Sent: Thursday, August 27, 2020 12:10 PM > > > > The limitation of burst size in vector rx was removed, since it should > > retrieve as much received packets as possible. And also the scattered > > receive path should use a wrapper function to achieve the goal of > > burst maximizing. > > > > This patch set aims to maximize vector rx burst for for > > ixgbe/i40e/ice/iavf/fm10k PMDs. > > > > v2->v1: > > 1:add fm10k driver case > > 2:refine some doc > > > > I now noticed that the vector functions also does: > nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP); > > I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33 > (not 32) would only return 32 packets, even if more packets are available. (I assume that > RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the > requested packets using the non-vector function in order to support any nb_pkts value. > This is vector instruction handling design requirement, like for avx2: #define ICE_DESCS_PER_LOOP_AVX 8, if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-) _ice_recv_raw_pkts_vec_avx2: for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX) Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the application burst size it wants. > That is, unless the rte_eth_rx_burst() API is extended with requirements to nb_pkts, as discussed in > the other thread: > http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3- > MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266 > > > Med venlig hilsen / kind regards > - Morten Brørup > >
On 8/28/2020 10:06 AM, Wang, Haiyue wrote: >> -----Original Message----- >> From: dev <dev-bounces@dpdk.org> On Behalf Of Morten Brørup >> Sent: Thursday, August 27, 2020 20:38 >> To: Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>; Xing, Beilei >> <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, >> Jingjing <jingjing.wu@intel.com> >> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; >> Yigit, Ferruh <ferruh.yigit@intel.com>; barbette@kth.se >> Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs >> >>> From: Jeff Guo [mailto:jia.guo@intel.com] >>> Sent: Thursday, August 27, 2020 12:10 PM >>> >>> The limitation of burst size in vector rx was removed, since it should >>> retrieve as much received packets as possible. And also the scattered >>> receive path should use a wrapper function to achieve the goal of >>> burst maximizing. >>> >>> This patch set aims to maximize vector rx burst for for >>> ixgbe/i40e/ice/iavf/fm10k PMDs. >>> >>> v2->v1: >>> 1:add fm10k driver case >>> 2:refine some doc >>> >> I now noticed that the vector functions also does: >> nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP); >> >> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33 >> (not 32) would only return 32 packets, even if more packets are available. (I assume that >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the >> requested packets using the non-vector function in order to support any nb_pkts value. >> > This is vector instruction handling design requirement, like for avx2: #define ICE_DESCS_PER_LOOP_AVX 8, > if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-) > > _ice_recv_raw_pkts_vec_avx2: > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX) > > Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the application > burst size it wants. > >> That is, unless the rte_eth_rx_burst() API is extended with requirements to nb_pkts, as discussed in >> the other thread: >> http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3- >> MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266 Agree with above haiyue said, and go through the discuss on the thread, i think vector path was born definitely for the spirit of dpdk, each driver could keep the performance base on the instinct requirement and define what is the specific "max", the path option could give to app, it could static choose one when set up queue or dynamic but not the driver scope, document could be add to AVI if need for benefit user. >> >> Med venlig hilsen / kind regards >> - Morten Brørup >> >>
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo > Sent: Friday, August 28, 2020 8:40 AM > > > On 8/28/2020 10:06 AM, Wang, Haiyue wrote: > >> -----Original Message----- > >> From: dev <dev-bounces@dpdk.org> On Behalf Of Morten Brørup > >> Sent: Thursday, August 27, 2020 20:38 > >> To: Guo, Jia <jia.guo@intel.com>; Yang, Qiming > <qiming.yang@intel.com>; Xing, Beilei > >> <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, Qi > Z <qi.z.zhang@intel.com>; Wu, > >> Jingjing <jingjing.wu@intel.com> > >> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org; > Zhang, Helin <helin.zhang@intel.com>; > >> Yigit, Ferruh <ferruh.yigit@intel.com>; barbette@kth.se > >> Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for > PMDs > >> > >>> From: Jeff Guo [mailto:jia.guo@intel.com] > >>> Sent: Thursday, August 27, 2020 12:10 PM > >>> > >>> The limitation of burst size in vector rx was removed, since it > should > >>> retrieve as much received packets as possible. And also the > scattered > >>> receive path should use a wrapper function to achieve the goal of > >>> burst maximizing. > >>> > >>> This patch set aims to maximize vector rx burst for for > >>> ixgbe/i40e/ice/iavf/fm10k PMDs. > >>> > >>> v2->v1: > >>> 1:add fm10k driver case > >>> 2:refine some doc > >>> > >> I now noticed that the vector functions also does: > >> nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP); > >> > >> I am not sure about this, but if I read it correctly, calling > rte_eth_rx_burst() with nb_pkts = 33 > >> (not 32) would only return 32 packets, even if more packets are > available. (I assume that > >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need > to read the remaining of the > >> requested packets using the non-vector function in order to support > any nb_pkts value. > >> > > This is vector instruction handling design requirement, like for > avx2: #define ICE_DESCS_PER_LOOP_AVX 8, > > if deep into the real loop handling, you will get the point why doing > RTE_ALIGN_FLOOR. ;-) > > > > _ice_recv_raw_pkts_vec_avx2: > > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, > rxdp += ICE_DESCS_PER_LOOP_AVX) > > > > Maybe it is worth to tell PMD to stop using vector by calling > rte_eth_rx_queue_setup with the application > > burst size it wants. > > > >> That is, unless the rte_eth_rx_burst() API is extended with > requirements to nb_pkts, as discussed in > >> the other thread: > >> http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3- > >> MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266 > > > Agree with above haiyue said, and go through the discuss on the thread, > i think vector path was born definitely for the spirit of dpdk, each > driver could keep the performance base on > > the instinct requirement and define what is the specific "max", the > path > option could give to app, it could static choose one when set up queue > or dynamic but not the driver scope, > > document could be add to AVI if need for benefit user. > Based on the discussion in the other thread, I think a minimum requirement to nb_pkts will be accepted. On that note, for the series: Acked-by: Morten Brørup <mb@smartsharesystems.com>
On Fri, 28 Aug 2020 14:39:33 +0800 Jeff Guo <jia.guo@intel.com> wrote: > >> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33 > >> (not 32) would only return 32 packets, even if more packets are available. (I assume that > >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the > >> requested packets using the non-vector function in order to support any nb_pkts value. > >> > > This is vector instruction handling design requirement, like for avx2: #define ICE_DESCS_PER_LOOP_AVX 8, > > if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-) > > > > _ice_recv_raw_pkts_vec_avx2: > > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX) > > > > Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the application > > burst size it wants. There is no need for yet another nerd knob in DPDK. The driver must accept any burst size > 0. If it wants to optimize for certain multiples, then it can do so in its burst handler. Like any optimized checksum calculator does. Let the CPU branch predictor do its job and find the best path through the driver code. Something like: uint16_t my_driver_recv_burst((void *prxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) { if (nb_pkts == 32) return my_driver_recv_burst32(prxq, rx_pkts, nb_pkts); ... else return my_driver_recv_burstN(prxq, rx_pkts, nb_pkts); ... } You could repeatedly call the burst32 if passed large nb_pkts.
> -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Saturday, August 29, 2020 04:31 > To: Guo, Jia <jia.guo@intel.com> > Cc: Wang, Haiyue <haiyue.wang@intel.com>; Morten Brørup <mb@smartsharesystems.com>; Yang, Qiming > <qiming.yang@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, > Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Yigit, Ferruh > <ferruh.yigit@intel.com>; barbette@kth.se > Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs > > On Fri, 28 Aug 2020 14:39:33 +0800 > Jeff Guo <jia.guo@intel.com> wrote: > > > >> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = > 33 > > >> (not 32) would only return 32 packets, even if more packets are available. (I assume that > > >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the > > >> requested packets using the non-vector function in order to support any nb_pkts value. > > >> > > > This is vector instruction handling design requirement, like for avx2: #define > ICE_DESCS_PER_LOOP_AVX 8, > > > if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-) > > > > > > _ice_recv_raw_pkts_vec_avx2: > > > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX) > > > > > > Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the > application > > > burst size it wants. > > There is no need for yet another nerd knob in DPDK. > > The driver must accept any burst size > 0. > If it wants to optimize for certain multiples, then it can do so in its burst handler. > Like any optimized checksum calculator does. I think if people care about performance, then PMDs have done every possible methods like the burst size, then no need to invent their burst handers. ;-) At least, burst size is an optimize hit for performance.