ethdev: rte_eth_rx_burst() nb_pkts requirements
diff mbox series

Message ID 20200914110511.95609-1-mb@smartsharesystems.com
State New
Delegated to: Ferruh Yigit
Headers show
Series
  • ethdev: rte_eth_rx_burst() nb_pkts requirements
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/checkpatch warning coding style issues

Commit Message

Morten Brørup Sept. 14, 2020, 11:05 a.m. UTC
Updated description of rte_eth_rx_burst() to reflect what drivers,
when using vector instructions, expect from nb_pkts.

Also discussed on the mailing list here:
http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smartserver.smartshare.dk/

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/librte_ethdev/rte_ethdev.h | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Bruce Richardson Sept. 14, 2020, 11:26 a.m. UTC | #1
On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> Updated description of rte_eth_rx_burst() to reflect what drivers,
> when using vector instructions, expect from nb_pkts.
> 
> Also discussed on the mailing list here:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smartserver.smartshare.dk/
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/librte_ethdev/rte_ethdev.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 70295d7ab..41f8ba4ef 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -4469,6 +4469,10 @@ int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
>   * burst-oriented optimizations in both synchronous and asynchronous
>   * packet processing environments with no overhead in both cases.
>   *
> + * @note
> + *   Some drivers using vector instructions require that *nb_pkts* is
> + *   divisible by 4 or 8, depending on the driver implementation.
> + *

Not technically true, in that the drivers will round the value down to the
nearest multiple of 4 or 8. So how about rewording as:

"Some drivers using vector instructions may round the *nb_pkts* driver to
a multiple of 4 or 8 depending upon the driver implementation."

>   * The rte_eth_rx_burst() function does not provide any error
>   * notification to avoid the corresponding overhead. As a hint, the
>   * upper-level application might check the status of the device link once
> @@ -4485,6 +4489,7 @@ int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
>   *   must be large enough to store *nb_pkts* pointers in it.
>   * @param nb_pkts
>   *   The maximum number of packets to retrieve.
> + *   The value must be divisible by 8 in order to work with any driver.

Similarly here, I think it's better to state that it should be at least 8,
and any values not divisible by 8 may be rounded down.

>   * @return
>   *   The number of packets actually retrieved, which is the number
>   *   of pointers to *rte_mbuf* structures effectively supplied to the
> -- 
> 2.17.1
>
Morten Brørup Sept. 14, 2020, 12:42 p.m. UTC | #2
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Monday, September 14, 2020 1:27 PM
> 
> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> > Updated description of rte_eth_rx_burst() to reflect what drivers,
> > when using vector instructions, expect from nb_pkts.
> >
> > Also discussed on the mailing list here:
> >
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> erver.smartshare.dk/
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> b/lib/librte_ethdev/rte_ethdev.h
> > index 70295d7ab..41f8ba4ef 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -4469,6 +4469,10 @@ int
> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> >   * burst-oriented optimizations in both synchronous and asynchronous
> >   * packet processing environments with no overhead in both cases.
> >   *
> > + * @note
> > + *   Some drivers using vector instructions require that *nb_pkts*
> is
> > + *   divisible by 4 or 8, depending on the driver implementation.
> > + *
> 
> Not technically true, in that the drivers will round the value down to
> the
> nearest multiple of 4 or 8. So how about rewording as:
> 
> "Some drivers using vector instructions may round the *nb_pkts* driver
> to
> a multiple of 4 or 8 depending upon the driver implementation."
> 

You are correct about the driver behavior.

However, if you pass nb_pkts=9, the driver will return 8 packets,
and thus it does not conform to the API behavior of returning nb_pkts
if they are there.

This is why the description in this patch differs from the description we reached in the RFC discussion.

> >   * The rte_eth_rx_burst() function does not provide any error
> >   * notification to avoid the corresponding overhead. As a hint, the
> >   * upper-level application might check the status of the device link
> once
> > @@ -4485,6 +4489,7 @@ int rte_eth_dev_hairpin_capability_get(uint16_t
> port_id,
> >   *   must be large enough to store *nb_pkts* pointers in it.
> >   * @param nb_pkts
> >   *   The maximum number of packets to retrieve.
> > + *   The value must be divisible by 8 in order to work with any
> driver.
> 
> Similarly here, I think it's better to state that it should be at least
> 8,
> and any values not divisible by 8 may be rounded down.
> 
> >   * @return
> >   *   The number of packets actually retrieved, which is the number
> >   *   of pointers to *rte_mbuf* structures effectively supplied to
> the
> > --
> > 2.17.1
> >
Ferruh Yigit Oct. 13, 2020, 10:55 a.m. UTC | #3
On 9/14/2020 1:42 PM, Morten Brørup wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
>> Sent: Monday, September 14, 2020 1:27 PM
>>
>> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
>>> Updated description of rte_eth_rx_burst() to reflect what drivers,
>>> when using vector instructions, expect from nb_pkts.
>>>
>>> Also discussed on the mailing list here:
>>>
>> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
>> erver.smartshare.dk/
>>>
>>> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>>> ---
>>>   lib/librte_ethdev/rte_ethdev.h | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h
>> b/lib/librte_ethdev/rte_ethdev.h
>>> index 70295d7ab..41f8ba4ef 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -4469,6 +4469,10 @@ int
>> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
>>>    * burst-oriented optimizations in both synchronous and asynchronous
>>>    * packet processing environments with no overhead in both cases.
>>>    *
>>> + * @note
>>> + *   Some drivers using vector instructions require that *nb_pkts*
>> is
>>> + *   divisible by 4 or 8, depending on the driver implementation.
>>> + *
>>
>> Not technically true, in that the drivers will round the value down to
>> the
>> nearest multiple of 4 or 8. So how about rewording as:
>>
>> "Some drivers using vector instructions may round the *nb_pkts* driver
>> to
>> a multiple of 4 or 8 depending upon the driver implementation."
>>
> 
> You are correct about the driver behavior.
> 
> However, if you pass nb_pkts=9, the driver will return 8 packets,
> and thus it does not conform to the API behavior of returning nb_pkts
> if they are there.
> 
> This is why the description in this patch differs from the description we reached in the RFC discussion.
> 

Hi Morten, Bruce,

+1 to document the this behavior.

But in the patch the wording is more strict:
"... require that *nb_pkts* is divisible by 4 or 8 ..."
"... The value must be divisible by 8 in order to work with any driver."

I am not sure the requirement is that strict. Application still provide any 
value for 'nb_pkts', so the value doesn't "have to" be divisible 8/4.

But for vector PMD case it will return number of packets round down to 8/4. 
Perhaps can add for vector PMD it must be at least 4/8?

Bruce's explanation sound more accurate to me, what do you think?


>>>    * The rte_eth_rx_burst() function does not provide any error
>>>    * notification to avoid the corresponding overhead. As a hint, the
>>>    * upper-level application might check the status of the device link
>> once
>>> @@ -4485,6 +4489,7 @@ int rte_eth_dev_hairpin_capability_get(uint16_t
>> port_id,
>>>    *   must be large enough to store *nb_pkts* pointers in it.
>>>    * @param nb_pkts
>>>    *   The maximum number of packets to retrieve.
>>> + *   The value must be divisible by 8 in order to work with any
>> driver.
>>
>> Similarly here, I think it's better to state that it should be at least
>> 8,
>> and any values not divisible by 8 may be rounded down.
>>
>>>    * @return
>>>    *   The number of packets actually retrieved, which is the number
>>>    *   of pointers to *rte_mbuf* structures effectively supplied to
>> the
>>> --
>>> 2.17.1
>>>
>
Morten Brørup Oct. 14, 2020, 8:26 a.m. UTC | #4
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Tuesday, October 13, 2020 12:56 PM
> 
> On 9/14/2020 1:42 PM, Morten Brørup wrote:
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> >> Sent: Monday, September 14, 2020 1:27 PM
> >>
> >> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> >>> Updated description of rte_eth_rx_burst() to reflect what drivers,
> >>> when using vector instructions, expect from nb_pkts.
> >>>
> >>> Also discussed on the mailing list here:
> >>>
> >> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> >> erver.smartshare.dk/
> >>>
> >>> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> >>> ---
> >>>   lib/librte_ethdev/rte_ethdev.h | 5 +++++
> >>>   1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.h
> >> b/lib/librte_ethdev/rte_ethdev.h
> >>> index 70295d7ab..41f8ba4ef 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.h
> >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> >>> @@ -4469,6 +4469,10 @@ int
> >> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> >>>    * burst-oriented optimizations in both synchronous and asynchronous
> >>>    * packet processing environments with no overhead in both cases.
> >>>    *
> >>> + * @note
> >>> + *   Some drivers using vector instructions require that *nb_pkts*
> >> is
> >>> + *   divisible by 4 or 8, depending on the driver implementation.
> >>> + *
> >>
> >> Not technically true, in that the drivers will round the value down to
> >> the
> >> nearest multiple of 4 or 8. So how about rewording as:
> >>
> >> "Some drivers using vector instructions may round the *nb_pkts* driver
> >> to
> >> a multiple of 4 or 8 depending upon the driver implementation."
> >>
> >
> > You are correct about the driver behavior.
> >
> > However, if you pass nb_pkts=9, the driver will return 8 packets,
> > and thus it does not conform to the API behavior of returning nb_pkts
> > if they are there.
> >
> > This is why the description in this patch differs from the description we
> reached in the RFC discussion.
> >
> 
> Hi Morten, Bruce,
> 
> +1 to document the this behavior.
> 
> But in the patch the wording is more strict:
> "... require that *nb_pkts* is divisible by 4 or 8 ..."
> "... The value must be divisible by 8 in order to work with any driver."
> 
> I am not sure the requirement is that strict. Application still provide any
> value for 'nb_pkts', so the value doesn't "have to" be divisible 8/4.
> 
> But for vector PMD case it will return number of packets round down to 8/4.
> Perhaps can add for vector PMD it must be at least 4/8?
> 
> Bruce's explanation sound more accurate to me, what do you think?
> 

I aim to keep the explanation in the documentation relatively simple. Keep the parameter description short, and add the details about vector driver behavior as a note to the function.

The reason for all this is the existing documentation describing how to use the rte_eth_rx_burst() function at high level:

The rte_eth_rx_burst() function returns the number of packets actually retrieved [...]. A return value equal to nb_pkts indicates [...] that other received packets remain in the input queue. Applications implementing a "retrieve as much received packets as possible" policy can check this specific case and keep invoking the rte_eth_rx_burst() function until a value less than nb_pkts is returned.

As an alternative to my proposed solution, we could add that vector drivers round down to 4 or 8, and the application's comparison of the nb_pkts and return value must consider this. But that would make the above description strangely complex, rather than just requiring that nb_pkts for vector drivers must be divisible by 4 or 8.

And as a minor detail, keeping my proposed restriction would also eliminate the vector drivers' need to round down.

I don't see a need to be able to call rte_eth_rx_burst() with a value not divisible by 4 or 8 for a vector driver, so my proposed restriction is a tradeoff favoring simplicity over unnecessary flexibility.

> 
> >>>    * The rte_eth_rx_burst() function does not provide any error
> >>>    * notification to avoid the corresponding overhead. As a hint, the
> >>>    * upper-level application might check the status of the device link
> >> once
> >>> @@ -4485,6 +4489,7 @@ int rte_eth_dev_hairpin_capability_get(uint16_t
> >> port_id,
> >>>    *   must be large enough to store *nb_pkts* pointers in it.
> >>>    * @param nb_pkts
> >>>    *   The maximum number of packets to retrieve.
> >>> + *   The value must be divisible by 8 in order to work with any
> >> driver.
> >>
> >> Similarly here, I think it's better to state that it should be at least
> >> 8,
> >> and any values not divisible by 8 may be rounded down.
> >>
> >>>    * @return
> >>>    *   The number of packets actually retrieved, which is the number
> >>>    *   of pointers to *rte_mbuf* structures effectively supplied to
> >> the
> >>> --
> >>> 2.17.1
> >>>
> >
>
Thomas Monjalon Oct. 14, 2020, 8:53 a.m. UTC | #5
14/10/2020 10:26, Morten Brørup:
> From: Ferruh Yigit
> > On 9/14/2020 1:42 PM, Morten Brørup wrote:
> > > From: Bruce Richardson
> > >> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> > >>> Updated description of rte_eth_rx_burst() to reflect what drivers,
> > >>> when using vector instructions, expect from nb_pkts.
> > >>>
> > >>> Also discussed on the mailing list here:
> > >>>
> > >> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> > >> erver.smartshare.dk/
> > >>>
> > >>> --- a/lib/librte_ethdev/rte_ethdev.h
> > >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> > >>> @@ -4469,6 +4469,10 @@ int
> > >> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > >>>    * burst-oriented optimizations in both synchronous and asynchronous
> > >>>    * packet processing environments with no overhead in both cases.
> > >>>    *
> > >>> + * @note
> > >>> + *   Some drivers using vector instructions require that *nb_pkts*
> > >> is
> > >>> + *   divisible by 4 or 8, depending on the driver implementation.
> > >>> + *
> > >>
> > >> Not technically true, in that the drivers will round the value down to
> > >> the
> > >> nearest multiple of 4 or 8. So how about rewording as:
> > >>
> > >> "Some drivers using vector instructions may round the *nb_pkts* driver
> > >> to
> > >> a multiple of 4 or 8 depending upon the driver implementation."
> > >>
> > >
> > > You are correct about the driver behavior.
> > >
> > > However, if you pass nb_pkts=9, the driver will return 8 packets,
> > > and thus it does not conform to the API behavior of returning nb_pkts
> > > if they are there.
> > >
> > > This is why the description in this patch differs from the description we
> > reached in the RFC discussion.
> > >
> > 
> > Hi Morten, Bruce,
> > 
> > +1 to document the this behavior.
> > 
> > But in the patch the wording is more strict:
> > "... require that *nb_pkts* is divisible by 4 or 8 ..."
> > "... The value must be divisible by 8 in order to work with any driver."
> > 
> > I am not sure the requirement is that strict. Application still provide any
> > value for 'nb_pkts', so the value doesn't "have to" be divisible 8/4.
> > 
> > But for vector PMD case it will return number of packets round down to 8/4.
> > Perhaps can add for vector PMD it must be at least 4/8?
> > 
> > Bruce's explanation sound more accurate to me, what do you think?
> > 
> 
> I aim to keep the explanation in the documentation relatively simple. Keep the parameter description short, and add the details about vector driver behavior as a note to the function.
> 
> The reason for all this is the existing documentation describing how to use the rte_eth_rx_burst() function at high level:
> 
> The rte_eth_rx_burst() function returns the number of packets actually retrieved [...]. A return value equal to nb_pkts indicates [...] that other received packets remain in the input queue. Applications implementing a "retrieve as much received packets as possible" policy can check this specific case and keep invoking the rte_eth_rx_burst() function until a value less than nb_pkts is returned.
> 
> As an alternative to my proposed solution, we could add that vector drivers round down to 4 or 8, and the application's comparison of the nb_pkts and return value must consider this. But that would make the above description strangely complex, rather than just requiring that nb_pkts for vector drivers must be divisible by 4 or 8.
> 
> And as a minor detail, keeping my proposed restriction would also eliminate the vector drivers' need to round down.
> 
> I don't see a need to be able to call rte_eth_rx_burst() with a value not divisible by 4 or 8 for a vector driver, so my proposed restriction is a tradeoff favoring simplicity over unnecessary flexibility.

It makes sense to me.
Bruce Richardson Oct. 14, 2020, 9:29 a.m. UTC | #6
On Wed, Oct 14, 2020 at 10:53:24AM +0200, Thomas Monjalon wrote:
> 14/10/2020 10:26, Morten Brørup:
> > From: Ferruh Yigit
> > > On 9/14/2020 1:42 PM, Morten Brørup wrote:
> > > > From: Bruce Richardson
> > > >> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> > > >>> Updated description of rte_eth_rx_burst() to reflect what drivers,
> > > >>> when using vector instructions, expect from nb_pkts.
> > > >>>
> > > >>> Also discussed on the mailing list here:
> > > >>>
> > > >> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> > > >> erver.smartshare.dk/
> > > >>>
> > > >>> --- a/lib/librte_ethdev/rte_ethdev.h
> > > >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> > > >>> @@ -4469,6 +4469,10 @@ int
> > > >> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > > >>>    * burst-oriented optimizations in both synchronous and asynchronous
> > > >>>    * packet processing environments with no overhead in both cases.
> > > >>>    *
> > > >>> + * @note
> > > >>> + *   Some drivers using vector instructions require that *nb_pkts*
> > > >> is
> > > >>> + *   divisible by 4 or 8, depending on the driver implementation.
> > > >>> + *
> > > >>
> > > >> Not technically true, in that the drivers will round the value down to
> > > >> the
> > > >> nearest multiple of 4 or 8. So how about rewording as:
> > > >>
> > > >> "Some drivers using vector instructions may round the *nb_pkts* driver
> > > >> to
> > > >> a multiple of 4 or 8 depending upon the driver implementation."
> > > >>
> > > >
> > > > You are correct about the driver behavior.
> > > >
> > > > However, if you pass nb_pkts=9, the driver will return 8 packets,
> > > > and thus it does not conform to the API behavior of returning nb_pkts
> > > > if they are there.
> > > >
> > > > This is why the description in this patch differs from the description we
> > > reached in the RFC discussion.
> > > >
> > > 
> > > Hi Morten, Bruce,
> > > 
> > > +1 to document the this behavior.
> > > 
> > > But in the patch the wording is more strict:
> > > "... require that *nb_pkts* is divisible by 4 or 8 ..."
> > > "... The value must be divisible by 8 in order to work with any driver."
> > > 
> > > I am not sure the requirement is that strict. Application still provide any
> > > value for 'nb_pkts', so the value doesn't "have to" be divisible 8/4.
> > > 
> > > But for vector PMD case it will return number of packets round down to 8/4.
> > > Perhaps can add for vector PMD it must be at least 4/8?
> > > 
> > > Bruce's explanation sound more accurate to me, what do you think?
> > > 
> > 
> > I aim to keep the explanation in the documentation relatively simple. Keep the parameter description short, and add the details about vector driver behavior as a note to the function.
> > 
> > The reason for all this is the existing documentation describing how to use the rte_eth_rx_burst() function at high level:
> > 
> > The rte_eth_rx_burst() function returns the number of packets actually retrieved [...]. A return value equal to nb_pkts indicates [...] that other received packets remain in the input queue. Applications implementing a "retrieve as much received packets as possible" policy can check this specific case and keep invoking the rte_eth_rx_burst() function until a value less than nb_pkts is returned.
> > 
> > As an alternative to my proposed solution, we could add that vector drivers round down to 4 or 8, and the application's comparison of the nb_pkts and return value must consider this. But that would make the above description strangely complex, rather than just requiring that nb_pkts for vector drivers must be divisible by 4 or 8.
> > 
> > And as a minor detail, keeping my proposed restriction would also eliminate the vector drivers' need to round down.
> > 
> > I don't see a need to be able to call rte_eth_rx_burst() with a value not divisible by 4 or 8 for a vector driver, so my proposed restriction is a tradeoff favoring simplicity over unnecessary flexibility.
> 
> It makes sense to me.
>

That sounds reasonable for what we have now. We just need to standardize on
either 4 or 8 as the required factor of the input size. I would suggest
having it as 4, and look to put in fallback paths for the few drivers which
don't support less than 8. I think that 8 is too large a min burst size to
support, for any apps that want small bursts for lower latency.

/Bruce
Morten Brørup Oct. 14, 2020, 11:14 a.m. UTC | #7
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, October 14, 2020 11:30 AM
> 
> On Wed, Oct 14, 2020 at 10:53:24AM +0200, Thomas Monjalon wrote:
> > 14/10/2020 10:26, Morten Brørup:
> > > From: Ferruh Yigit
> > > > On 9/14/2020 1:42 PM, Morten Brørup wrote:
> > > > > From: Bruce Richardson
> > > > >> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> > > > >>> Updated description of rte_eth_rx_burst() to reflect what
> drivers,
> > > > >>> when using vector instructions, expect from nb_pkts.
> > > > >>>
> > > > >>> Also discussed on the mailing list here:
> > > > >>>
> > > > >>
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> > > > >> erver.smartshare.dk/
> > > > >>>
> > > > >>> --- a/lib/librte_ethdev/rte_ethdev.h
> > > > >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > >>> @@ -4469,6 +4469,10 @@ int
> > > > >> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > > > >>>    * burst-oriented optimizations in both synchronous and
> asynchronous
> > > > >>>    * packet processing environments with no overhead in both
> cases.
> > > > >>>    *
> > > > >>> + * @note
> > > > >>> + *   Some drivers using vector instructions require that
> *nb_pkts*
> > > > >> is
> > > > >>> + *   divisible by 4 or 8, depending on the driver
> implementation.
> > > > >>> + *
> > > > >>
> > > > >> Not technically true, in that the drivers will round the value
> down to
> > > > >> the
> > > > >> nearest multiple of 4 or 8. So how about rewording as:
> > > > >>
> > > > >> "Some drivers using vector instructions may round the *nb_pkts*
> driver
> > > > >> to
> > > > >> a multiple of 4 or 8 depending upon the driver implementation."
> > > > >>
> > > > >
> > > > > You are correct about the driver behavior.
> > > > >
> > > > > However, if you pass nb_pkts=9, the driver will return 8 packets,
> > > > > and thus it does not conform to the API behavior of returning
> nb_pkts
> > > > > if they are there.
> > > > >
> > > > > This is why the description in this patch differs from the
> description we
> > > > reached in the RFC discussion.
> > > > >
> > > >
> > > > Hi Morten, Bruce,
> > > >
> > > > +1 to document the this behavior.
> > > >
> > > > But in the patch the wording is more strict:
> > > > "... require that *nb_pkts* is divisible by 4 or 8 ..."
> > > > "... The value must be divisible by 8 in order to work with any
> driver."
> > > >
> > > > I am not sure the requirement is that strict. Application still
> provide any
> > > > value for 'nb_pkts', so the value doesn't "have to" be divisible 8/4.
> > > >
> > > > But for vector PMD case it will return number of packets round down
> to 8/4.
> > > > Perhaps can add for vector PMD it must be at least 4/8?
> > > >
> > > > Bruce's explanation sound more accurate to me, what do you think?
> > > >
> > >
> > > I aim to keep the explanation in the documentation relatively simple.
> Keep the parameter description short, and add the details about vector
> driver behavior as a note to the function.
> > >
> > > The reason for all this is the existing documentation describing how to
> use the rte_eth_rx_burst() function at high level:
> > >
> > > The rte_eth_rx_burst() function returns the number of packets actually
> retrieved [...]. A return value equal to nb_pkts indicates [...] that other
> received packets remain in the input queue. Applications implementing a
> "retrieve as much received packets as possible" policy can check this
> specific case and keep invoking the rte_eth_rx_burst() function until a
> value less than nb_pkts is returned.
> > >
> > > As an alternative to my proposed solution, we could add that vector
> drivers round down to 4 or 8, and the application's comparison of the
> nb_pkts and return value must consider this. But that would make the above
> description strangely complex, rather than just requiring that nb_pkts for
> vector drivers must be divisible by 4 or 8.
> > >
> > > And as a minor detail, keeping my proposed restriction would also
> eliminate the vector drivers' need to round down.
> > >
> > > I don't see a need to be able to call rte_eth_rx_burst() with a value
> not divisible by 4 or 8 for a vector driver, so my proposed restriction is
> a tradeoff favoring simplicity over unnecessary flexibility.
> >
> > It makes sense to me.
> >
> 
> That sounds reasonable for what we have now. We just need to standardize on
> either 4 or 8 as the required factor of the input size. I would suggest
> having it as 4, and look to put in fallback paths for the few drivers which
> don't support less than 8. I think that 8 is too large a min burst size to
> support, for any apps that want small bursts for lower latency.
> 

8 works with all drivers today, so I prefer 8. Also, it seems more future proof for even higher bandwidths. If you need low latency, don't use vector drivers. Also, DPDK generally seems to prefer throughput over latency, so 8 seems like the better choice to me.

Bruce, if you have a very strong preference for standardizing on 4 and you can convince the affected driver developers to modify their drivers to support 4, we can go ahead with that.

> /Bruce
Bruce Richardson Oct. 14, 2020, 11:38 a.m. UTC | #8
On Wed, Oct 14, 2020 at 01:14:13PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Wednesday, October 14, 2020 11:30 AM
> > 
> > On Wed, Oct 14, 2020 at 10:53:24AM +0200, Thomas Monjalon wrote:
> > > 14/10/2020 10:26, Morten Brørup:
> > > > From: Ferruh Yigit
> > > > > On 9/14/2020 1:42 PM, Morten Brørup wrote:
> > > > > > From: Bruce Richardson
> > > > > >> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> > > > > >>> Updated description of rte_eth_rx_burst() to reflect what
> > drivers,
> > > > > >>> when using vector instructions, expect from nb_pkts.
> > > > > >>>
> > > > > >>> Also discussed on the mailing list here:
> > > > > >>>
> > > > > >>
> > http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> > > > > >> erver.smartshare.dk/
> > > > > >>>
> > > > > >>> --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > >>> @@ -4469,6 +4469,10 @@ int
> > > > > >> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > > > > >>>    * burst-oriented optimizations in both synchronous and
> > asynchronous
> > > > > >>>    * packet processing environments with no overhead in both
> > cases.
> > > > > >>>    *
> > > > > >>> + * @note
> > > > > >>> + *   Some drivers using vector instructions require that
> > *nb_pkts*
> > > > > >> is
> > > > > >>> + *   divisible by 4 or 8, depending on the driver
> > implementation.
> > > > > >>> + *
> > > > > >>
> > > > > >> Not technically true, in that the drivers will round the value
> > down to
> > > > > >> the
> > > > > >> nearest multiple of 4 or 8. So how about rewording as:
> > > > > >>
> > > > > >> "Some drivers using vector instructions may round the *nb_pkts*
> > driver
> > > > > >> to
> > > > > >> a multiple of 4 or 8 depending upon the driver implementation."
> > > > > >>
> > > > > >
> > > > > > You are correct about the driver behavior.
> > > > > >
> > > > > > However, if you pass nb_pkts=9, the driver will return 8 packets,
> > > > > > and thus it does not conform to the API behavior of returning
> > nb_pkts
> > > > > > if they are there.
> > > > > >
> > > > > > This is why the description in this patch differs from the
> > description we
> > > > > reached in the RFC discussion.
> > > > > >
> > > > >
> > > > > Hi Morten, Bruce,
> > > > >
> > > > > +1 to document the this behavior.
> > > > >
> > > > > But in the patch the wording is more strict:
> > > > > "... require that *nb_pkts* is divisible by 4 or 8 ..."
> > > > > "... The value must be divisible by 8 in order to work with any
> > driver."
> > > > >
> > > > > I am not sure the requirement is that strict. Application still
> > provide any
> > > > > value for 'nb_pkts', so the value doesn't "have to" be divisible 8/4.
> > > > >
> > > > > But for vector PMD case it will return number of packets round down
> > to 8/4.
> > > > > Perhaps can add for vector PMD it must be at least 4/8?
> > > > >
> > > > > Bruce's explanation sound more accurate to me, what do you think?
> > > > >
> > > >
> > > > I aim to keep the explanation in the documentation relatively simple.
> > Keep the parameter description short, and add the details about vector
> > driver behavior as a note to the function.
> > > >
> > > > The reason for all this is the existing documentation describing how to
> > use the rte_eth_rx_burst() function at high level:
> > > >
> > > > The rte_eth_rx_burst() function returns the number of packets actually
> > retrieved [...]. A return value equal to nb_pkts indicates [...] that other
> > received packets remain in the input queue. Applications implementing a
> > "retrieve as much received packets as possible" policy can check this
> > specific case and keep invoking the rte_eth_rx_burst() function until a
> > value less than nb_pkts is returned.
> > > >
> > > > As an alternative to my proposed solution, we could add that vector
> > drivers round down to 4 or 8, and the application's comparison of the
> > nb_pkts and return value must consider this. But that would make the above
> > description strangely complex, rather than just requiring that nb_pkts for
> > vector drivers must be divisible by 4 or 8.
> > > >
> > > > And as a minor detail, keeping my proposed restriction would also
> > eliminate the vector drivers' need to round down.
> > > >
> > > > I don't see a need to be able to call rte_eth_rx_burst() with a value
> > not divisible by 4 or 8 for a vector driver, so my proposed restriction is
> > a tradeoff favoring simplicity over unnecessary flexibility.
> > >
> > > It makes sense to me.
> > >
> > 
> > That sounds reasonable for what we have now. We just need to standardize on
> > either 4 or 8 as the required factor of the input size. I would suggest
> > having it as 4, and look to put in fallback paths for the few drivers which
> > don't support less than 8. I think that 8 is too large a min burst size to
> > support, for any apps that want small bursts for lower latency.
> > 
> 
> 8 works with all drivers today, so I prefer 8. Also, it seems more future proof for even higher bandwidths. If you need low latency, don't use vector drivers. Also, DPDK generally seems to prefer throughput over latency, so 8 seems like the better choice to me.
> 
> Bruce, if you have a very strong preference for standardizing on 4 and you can convince the affected driver developers to modify their drivers to support 4, we can go ahead with that.
> 
Let's go with 8 for now, and we can reduce in future if we can.
Morten Brørup Oct. 15, 2020, 7:36 a.m. UTC | #9
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, October 14, 2020 1:39 PM
> 
> On Wed, Oct 14, 2020 at 01:14:13PM +0200, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Wednesday, October 14, 2020 11:30 AM
> > >
> > > On Wed, Oct 14, 2020 at 10:53:24AM +0200, Thomas Monjalon wrote:
> > > > 14/10/2020 10:26, Morten Brørup:
> > > > > From: Ferruh Yigit
> > > > > > On 9/14/2020 1:42 PM, Morten Brørup wrote:
> > > > > > > From: Bruce Richardson
> > > > > > >> On Mon, Sep 14, 2020 at 01:05:11PM +0200, Morten Brørup wrote:
> > > > > > >>> Updated description of rte_eth_rx_burst() to reflect what
> > > drivers,
> > > > > > >>> when using vector instructions, expect from nb_pkts.
> > > > > > >>>
> > > > > > >>> Also discussed on the mailing list here:
> > > > > > >>>
> > > > > > >>
> > > http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35C61257@smarts
> > > > > > >> erver.smartshare.dk/
> > > > > > >>>
> > > > > > >>> --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > > >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > > >>> @@ -4469,6 +4469,10 @@ int
> > > > > > >> rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > > > > > >>>    * burst-oriented optimizations in both synchronous and
> > > asynchronous
> > > > > > >>>    * packet processing environments with no overhead in both
> > > cases.
> > > > > > >>>    *
> > > > > > >>> + * @note
> > > > > > >>> + *   Some drivers using vector instructions require that
> > > *nb_pkts*
> > > > > > >> is
> > > > > > >>> + *   divisible by 4 or 8, depending on the driver
> > > implementation.
> > > > > > >>> + *
> > > > > > >>
> > > > > > >> Not technically true, in that the drivers will round the value
> > > down to
> > > > > > >> the
> > > > > > >> nearest multiple of 4 or 8. So how about rewording as:
> > > > > > >>
> > > > > > >> "Some drivers using vector instructions may round the
> *nb_pkts*
> > > driver
> > > > > > >> to
> > > > > > >> a multiple of 4 or 8 depending upon the driver
> implementation."
> > > > > > >>
> > > > > > >
> > > > > > > You are correct about the driver behavior.
> > > > > > >
> > > > > > > However, if you pass nb_pkts=9, the driver will return 8
> packets,
> > > > > > > and thus it does not conform to the API behavior of returning
> > > nb_pkts
> > > > > > > if they are there.
> > > > > > >
> > > > > > > This is why the description in this patch differs from the
> > > description we
> > > > > > reached in the RFC discussion.
> > > > > > >
> > > > > >
> > > > > > Hi Morten, Bruce,
> > > > > >
> > > > > > +1 to document the this behavior.
> > > > > >
> > > > > > But in the patch the wording is more strict:
> > > > > > "... require that *nb_pkts* is divisible by 4 or 8 ..."
> > > > > > "... The value must be divisible by 8 in order to work with any
> > > driver."
> > > > > >
> > > > > > I am not sure the requirement is that strict. Application still
> > > provide any
> > > > > > value for 'nb_pkts', so the value doesn't "have to" be divisible
> 8/4.
> > > > > >
> > > > > > But for vector PMD case it will return number of packets round
> down
> > > to 8/4.
> > > > > > Perhaps can add for vector PMD it must be at least 4/8?
> > > > > >
> > > > > > Bruce's explanation sound more accurate to me, what do you think?
> > > > > >
> > > > >
> > > > > I aim to keep the explanation in the documentation relatively
> simple.
> > > Keep the parameter description short, and add the details about vector
> > > driver behavior as a note to the function.
> > > > >
> > > > > The reason for all this is the existing documentation describing
> how to
> > > use the rte_eth_rx_burst() function at high level:
> > > > >
> > > > > The rte_eth_rx_burst() function returns the number of packets
> actually
> > > retrieved [...]. A return value equal to nb_pkts indicates [...] that
> other
> > > received packets remain in the input queue. Applications implementing a
> > > "retrieve as much received packets as possible" policy can check this
> > > specific case and keep invoking the rte_eth_rx_burst() function until a
> > > value less than nb_pkts is returned.
> > > > >
> > > > > As an alternative to my proposed solution, we could add that vector
> > > drivers round down to 4 or 8, and the application's comparison of the
> > > nb_pkts and return value must consider this. But that would make the
> above
> > > description strangely complex, rather than just requiring that nb_pkts
> for
> > > vector drivers must be divisible by 4 or 8.
> > > > >
> > > > > And as a minor detail, keeping my proposed restriction would also
> > > eliminate the vector drivers' need to round down.
> > > > >
> > > > > I don't see a need to be able to call rte_eth_rx_burst() with a
> value
> > > not divisible by 4 or 8 for a vector driver, so my proposed restriction
> is
> > > a tradeoff favoring simplicity over unnecessary flexibility.
> > > >
> > > > It makes sense to me.
> > > >
> > >
> > > That sounds reasonable for what we have now. We just need to
> standardize on
> > > either 4 or 8 as the required factor of the input size. I would suggest
> > > having it as 4, and look to put in fallback paths for the few drivers
> which
> > > don't support less than 8. I think that 8 is too large a min burst size
> to
> > > support, for any apps that want small bursts for lower latency.
> > >
> >
> > 8 works with all drivers today, so I prefer 8. Also, it seems more future
> proof for even higher bandwidths. If you need low latency, don't use vector
> drivers. Also, DPDK generally seems to prefer throughput over latency, so 8
> seems like the better choice to me.
> >
> > Bruce, if you have a very strong preference for standardizing on 4 and
> you can convince the affected driver developers to modify their drivers to
> support 4, we can go ahead with that.
> >
> Let's go with 8 for now, and we can reduce in future if we can.

Great. If y'all agree now, can I get some Acks, then?

-Morten

Patch
diff mbox series

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 70295d7ab..41f8ba4ef 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -4469,6 +4469,10 @@  int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
  * burst-oriented optimizations in both synchronous and asynchronous
  * packet processing environments with no overhead in both cases.
  *
+ * @note
+ *   Some drivers using vector instructions require that *nb_pkts* is
+ *   divisible by 4 or 8, depending on the driver implementation.
+ *
  * The rte_eth_rx_burst() function does not provide any error
  * notification to avoid the corresponding overhead. As a hint, the
  * upper-level application might check the status of the device link once
@@ -4485,6 +4489,7 @@  int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
  *   must be large enough to store *nb_pkts* pointers in it.
  * @param nb_pkts
  *   The maximum number of packets to retrieve.
+ *   The value must be divisible by 8 in order to work with any driver.
  * @return
  *   The number of packets actually retrieved, which is the number
  *   of pointers to *rte_mbuf* structures effectively supplied to the