mbox series

[v2,0/5] maximize vector rx burst for PMDs

Message ID 20200827101008.76906-1-jia.guo@intel.com (mailing list archive)
Headers
Series maximize vector rx burst for PMDs |

Message

Guo, Jia Aug. 27, 2020, 10:10 a.m. UTC
  The limitation of burst size in vector rx was removed, since it should
retrieve as much received packets as possible. And also the scattered
receive path should use a wrapper function to achieve the goal of
burst maximizing.

This patch set aims to maximize vector rx burst for for
ixgbe/i40e/ice/iavf/fm10k PMDs.

v2->v1:
1:add fm10k driver case
2:refine some doc

Jeff Guo (5):
  net/ixgbe: maximize vector rx burst for ixgbe
  net/i40e: maximize vector rx burst for i40e
  net/ice: maximize vector rx burst for ice
  net/iavf: maximize vector rx burst for iavf
  net/fm10k: maximize vector rx burst for fm10k

 drivers/net/fm10k/fm10k_rxtx_vec.c       |  39 +++++++--
 drivers/net/i40e/i40e_rxtx_vec_altivec.c |  61 ++++++++-----
 drivers/net/i40e/i40e_rxtx_vec_avx2.c    |  13 +--
 drivers/net/i40e/i40e_rxtx_vec_neon.c    |  55 +++++++-----
 drivers/net/i40e/i40e_rxtx_vec_sse.c     |  55 +++++++-----
 drivers/net/iavf/iavf_rxtx_vec_avx2.c    |  21 +----
 drivers/net/iavf/iavf_rxtx_vec_sse.c     | 107 ++++++++++++++---------
 drivers/net/ice/ice_rxtx_vec_avx2.c      |  11 +--
 drivers/net/ice/ice_rxtx_vec_sse.c       |  49 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c  |  72 +++++++--------
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c   |  56 ++++++------
 11 files changed, 312 insertions(+), 227 deletions(-)
  

Comments

Morten Brørup Aug. 27, 2020, 12:38 p.m. UTC | #1
> From: Jeff Guo [mailto:jia.guo@intel.com]
> Sent: Thursday, August 27, 2020 12:10 PM
> 
> The limitation of burst size in vector rx was removed, since it should
> retrieve as much received packets as possible. And also the scattered
> receive path should use a wrapper function to achieve the goal of
> burst maximizing.
> 
> This patch set aims to maximize vector rx burst for for
> ixgbe/i40e/ice/iavf/fm10k PMDs.
> 
> v2->v1:
> 1:add fm10k driver case
> 2:refine some doc
> 

I now noticed that the vector functions also does:
nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP);

I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33 (not 32) would only return 32 packets, even if more packets are available. (I assume that RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the requested packets using the non-vector function in order to support any nb_pkts value.

That is, unless the rte_eth_rx_burst() API is extended with requirements to nb_pkts, as discussed in the other thread:
http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3-MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266


Med venlig hilsen / kind regards
- Morten Brørup
  
Wang, Haiyue Aug. 28, 2020, 2:06 a.m. UTC | #2
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Morten Brørup
> Sent: Thursday, August 27, 2020 20:38
> To: Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>; Xing, Beilei
> <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>; barbette@kth.se
> Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs
> 
> > From: Jeff Guo [mailto:jia.guo@intel.com]
> > Sent: Thursday, August 27, 2020 12:10 PM
> >
> > The limitation of burst size in vector rx was removed, since it should
> > retrieve as much received packets as possible. And also the scattered
> > receive path should use a wrapper function to achieve the goal of
> > burst maximizing.
> >
> > This patch set aims to maximize vector rx burst for for
> > ixgbe/i40e/ice/iavf/fm10k PMDs.
> >
> > v2->v1:
> > 1:add fm10k driver case
> > 2:refine some doc
> >
> 
> I now noticed that the vector functions also does:
> nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP);
> 
> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33
> (not 32) would only return 32 packets, even if more packets are available. (I assume that
> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the
> requested packets using the non-vector function in order to support any nb_pkts value.
> 

This is vector instruction handling design requirement, like for avx2: #define ICE_DESCS_PER_LOOP_AVX 8,
if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-)

_ice_recv_raw_pkts_vec_avx2:
for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX)

Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the application
burst size it wants.

> That is, unless the rte_eth_rx_burst() API is extended with requirements to nb_pkts, as discussed in
> the other thread:
> http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3-
> MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266
> 
> 
> Med venlig hilsen / kind regards
> - Morten Brørup
> 
>
  
Guo, Jia Aug. 28, 2020, 6:39 a.m. UTC | #3
On 8/28/2020 10:06 AM, Wang, Haiyue wrote:
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Morten Brørup
>> Sent: Thursday, August 27, 2020 20:38
>> To: Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>; Xing, Beilei
>> <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
>> Jingjing <jingjing.wu@intel.com>
>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>;
>> Yigit, Ferruh <ferruh.yigit@intel.com>; barbette@kth.se
>> Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs
>>
>>> From: Jeff Guo [mailto:jia.guo@intel.com]
>>> Sent: Thursday, August 27, 2020 12:10 PM
>>>
>>> The limitation of burst size in vector rx was removed, since it should
>>> retrieve as much received packets as possible. And also the scattered
>>> receive path should use a wrapper function to achieve the goal of
>>> burst maximizing.
>>>
>>> This patch set aims to maximize vector rx burst for for
>>> ixgbe/i40e/ice/iavf/fm10k PMDs.
>>>
>>> v2->v1:
>>> 1:add fm10k driver case
>>> 2:refine some doc
>>>
>> I now noticed that the vector functions also does:
>> nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP);
>>
>> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33
>> (not 32) would only return 32 packets, even if more packets are available. (I assume that
>> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the
>> requested packets using the non-vector function in order to support any nb_pkts value.
>>
> This is vector instruction handling design requirement, like for avx2: #define ICE_DESCS_PER_LOOP_AVX 8,
> if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-)
>
> _ice_recv_raw_pkts_vec_avx2:
> for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX)
>
> Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the application
> burst size it wants.
>
>> That is, unless the rte_eth_rx_burst() API is extended with requirements to nb_pkts, as discussed in
>> the other thread:
>> http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3-
>> MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266


Agree with above haiyue said, and go through the discuss on the thread, 
i think vector path was born definitely for the spirit of dpdk, each 
driver could keep the performance base on

the instinct requirement and define what is the specific "max", the path 
option could give to app, it could static choose one when set up queue 
or dynamic but not the driver scope,

document could be add to AVI if need for benefit user.


>>
>> Med venlig hilsen / kind regards
>> - Morten Brørup
>>
>>
  
Morten Brørup Aug. 28, 2020, 11:45 a.m. UTC | #4
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Friday, August 28, 2020 8:40 AM
> 
> 
> On 8/28/2020 10:06 AM, Wang, Haiyue wrote:
> >> -----Original Message-----
> >> From: dev <dev-bounces@dpdk.org> On Behalf Of Morten Brørup
> >> Sent: Thursday, August 27, 2020 20:38
> >> To: Guo, Jia <jia.guo@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; Xing, Beilei
> >> <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang, Qi
> Z <qi.z.zhang@intel.com>; Wu,
> >> Jingjing <jingjing.wu@intel.com>
> >> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org;
> Zhang, Helin <helin.zhang@intel.com>;
> >> Yigit, Ferruh <ferruh.yigit@intel.com>; barbette@kth.se
> >> Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for
> PMDs
> >>
> >>> From: Jeff Guo [mailto:jia.guo@intel.com]
> >>> Sent: Thursday, August 27, 2020 12:10 PM
> >>>
> >>> The limitation of burst size in vector rx was removed, since it
> should
> >>> retrieve as much received packets as possible. And also the
> scattered
> >>> receive path should use a wrapper function to achieve the goal of
> >>> burst maximizing.
> >>>
> >>> This patch set aims to maximize vector rx burst for for
> >>> ixgbe/i40e/ice/iavf/fm10k PMDs.
> >>>
> >>> v2->v1:
> >>> 1:add fm10k driver case
> >>> 2:refine some doc
> >>>
> >> I now noticed that the vector functions also does:
> >> nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP);
> >>
> >> I am not sure about this, but if I read it correctly, calling
> rte_eth_rx_burst() with nb_pkts = 33
> >> (not 32) would only return 32 packets, even if more packets are
> available. (I assume that
> >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need
> to read the remaining of the
> >> requested packets using the non-vector function in order to support
> any nb_pkts value.
> >>
> > This is vector instruction handling design requirement, like for
> avx2: #define ICE_DESCS_PER_LOOP_AVX 8,
> > if deep into the real loop handling, you will get the point why doing
> RTE_ALIGN_FLOOR. ;-)
> >
> > _ice_recv_raw_pkts_vec_avx2:
> > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX,
> rxdp += ICE_DESCS_PER_LOOP_AVX)
> >
> > Maybe it is worth to tell PMD to stop using vector by calling
> rte_eth_rx_queue_setup with the application
> > burst size it wants.
> >
> >> That is, unless the rte_eth_rx_burst() API is extended with
> requirements to nb_pkts, as discussed in
> >> the other thread:
> >> http://inbox.dpdk.org/dev/20200827114117.GD569@bricha3-
> >> MOBL.ger.corp.intel.com/T/#mc8051e9022d6aeb20c51c5a226b2274d3d6d4266
> 
> 
> Agree with above haiyue said, and go through the discuss on the thread,
> i think vector path was born definitely for the spirit of dpdk, each
> driver could keep the performance base on
> 
> the instinct requirement and define what is the specific "max", the
> path
> option could give to app, it could static choose one when set up queue
> or dynamic but not the driver scope,
> 
> document could be add to AVI if need for benefit user.
> 

Based on the discussion in the other thread, I think a minimum requirement to nb_pkts will be accepted.

On that note, for the series:

Acked-by: Morten Brørup <mb@smartsharesystems.com>
  
Stephen Hemminger Aug. 28, 2020, 8:30 p.m. UTC | #5
On Fri, 28 Aug 2020 14:39:33 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> >> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts = 33
> >> (not 32) would only return 32 packets, even if more packets are available. (I assume that
> >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the
> >> requested packets using the non-vector function in order to support any nb_pkts value.
> >>  
> > This is vector instruction handling design requirement, like for avx2: #define ICE_DESCS_PER_LOOP_AVX 8,
> > if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-)
> >
> > _ice_recv_raw_pkts_vec_avx2:
> > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX)
> >
> > Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the application
> > burst size it wants.

There is no need for yet another nerd knob in DPDK.

The driver must accept any burst size > 0.
If it wants to optimize for certain multiples, then it can do so in its burst handler.
Like any optimized checksum calculator does.

Let the CPU branch predictor do its job and find the best path through the
driver code.

Something like:

uint16_t
my_driver_recv_burst((void *prxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
{
	if (nb_pkts == 32)
		return my_driver_recv_burst32(prxq, rx_pkts, nb_pkts);
	...
	else
		return my_driver_recv_burstN(prxq, rx_pkts, nb_pkts);
	...

}

You could repeatedly call the burst32 if passed large nb_pkts.
  
Wang, Haiyue Aug. 31, 2020, 2:27 p.m. UTC | #6
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, August 29, 2020 04:31
> To: Guo, Jia <jia.guo@intel.com>
> Cc: Wang, Haiyue <haiyue.wang@intel.com>; Morten Brørup <mb@smartsharesystems.com>; Yang, Qiming
> <qiming.yang@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Zhao1, Wei <wei.zhao1@intel.com>; Zhang,
> Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; barbette@kth.se
> Subject: Re: [dpdk-dev] [PATCH v2 0/5] maximize vector rx burst for PMDs
> 
> On Fri, 28 Aug 2020 14:39:33 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > >> I am not sure about this, but if I read it correctly, calling rte_eth_rx_burst() with nb_pkts =
> 33
> > >> (not 32) would only return 32 packets, even if more packets are available. (I assume that
> > >> RTE_I40E_DESCS_PER_LOOP is 32.) In this case, I guess that you need to read the remaining of the
> > >> requested packets using the non-vector function in order to support any nb_pkts value.
> > >>
> > > This is vector instruction handling design requirement, like for avx2: #define
> ICE_DESCS_PER_LOOP_AVX 8,
> > > if deep into the real loop handling, you will get the point why doing RTE_ALIGN_FLOOR. ;-)
> > >
> > > _ice_recv_raw_pkts_vec_avx2:
> > > for (i = 0, received = 0; i < nb_pkts; i += ICE_DESCS_PER_LOOP_AVX, rxdp += ICE_DESCS_PER_LOOP_AVX)
> > >
> > > Maybe it is worth to tell PMD to stop using vector by calling rte_eth_rx_queue_setup with the
> application
> > > burst size it wants.
> 
> There is no need for yet another nerd knob in DPDK.
> 
> The driver must accept any burst size > 0.
> If it wants to optimize for certain multiples, then it can do so in its burst handler.
> Like any optimized checksum calculator does.

I think if people care about performance, then PMDs have done every possible methods like
the burst size, then no need to invent their burst handers. ;-)

At least, burst size is an optimize hit for performance.