[v3,08/33] net/ena/hal: exponential backoff exp limit

Message ID 20240306122445.4350-9-shaibran@amazon.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series net/ena: v2.9.0 driver release |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Brandes, Shai March 6, 2024, 12:24 p.m. UTC
  From: Shai Brandes <shaibran@amazon.com>

limits the exponent in the exponential backoff
mechanism in order to avoid the value overflowing.

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>
---
 drivers/net/ena/hal/ena_com.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit March 8, 2024, 5:24 p.m. UTC | #1
On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
> 
> limits the exponent in the exponential backoff
> mechanism in order to avoid the value overflowing.
> 

Is this a fix?

What was the impact of the overflowing if not limited? And is there a
significance of the value 16, can you please elaborate?


Also let me remind the patch subject format, (this may look
insignificant but helps to have more unified commit messages for
developers, and if not updated by author, maintainers update it and this
brings more overhead to maintainers):
"sub-module: verb object"

And we use verb 'fix' explicitly for all commits fixing something, and
that something can't be referenced as 'error', 'failure', 'issue',
'problem', etc... but it should be detailed.

Most of the times better to document NOT from driver internal
perspective, but impact of it, like "net/ena: set chain limit to 16" is
NOT a good one, it explains driver internal perspective (making all up)
but it can be something like:
"net/ena: support big packets by increasing link limit"

For this one, I am not sure impact of the change so hard for me to
propose an alternative, but just as example it can be something like:
"net/ena/base: avoid collision by limiting backoff delay"

> Signed-off-by: Shai Brandes <shaibran@amazon.com>
> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
> ---
>  drivers/net/ena/hal/ena_com.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
> index 6953a1fa33..31c37b0ab3 100644
> --- a/drivers/net/ena/hal/ena_com.c
> +++ b/drivers/net/ena/hal/ena_com.c
> @@ -34,6 +34,8 @@
>  
>  #define ENA_REGS_ADMIN_INTR_MASK 1
>  
> +#define ENA_MAX_BACKOFF_DELAY_EXP 16U
> +
>  #define ENA_MIN_ADMIN_POLL_US 100
>  
>  #define ENA_MAX_ADMIN_POLL_US 5000
> @@ -545,8 +547,9 @@ static int ena_com_comp_status_to_errno(struct ena_com_admin_queue *admin_queue,
>  
>  static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
>  {
> +	exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
>  	delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
> -	delay_us = ENA_MIN32(delay_us * (1U << exp), ENA_MAX_ADMIN_POLL_US);
> +	delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us * (1U << exp));
>  	ENA_USLEEP(delay_us);
>  }
>
  
Brandes, Shai March 10, 2024, 2:53 p.m. UTC | #2
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, March 8, 2024 7:24 PM
> To: Brandes, Shai <shaibran@amazon.com>
> Cc: dev@dpdk.org
> Subject: RE: [EXTERNAL] [PATCH v3 08/33] net/ena/hal: exponential backoff
> exp limit
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> > From: Shai Brandes <shaibran@amazon.com>
> >
> > limits the exponent in the exponential backoff mechanism in order to
> > avoid the value overflowing.
> >
> 
> Is this a fix?
> 
> What was the impact of the overflowing if not limited? And is there a
> significance of the value 16, can you please elaborate?
> 
[Brandes, Shai] I will restructure this patch, since this likely hides a fix in hal.
It is originated from the HAL release, from which I took the patches one by one, but the commit messages there tend to be (too) concise. 

> 
> Also let me remind the patch subject format, (this may look insignificant but
> helps to have more unified commit messages for developers, and if not
> updated by author, maintainers update it and this brings more overhead to
> maintainers):
> "sub-module: verb object"
> 
> And we use verb 'fix' explicitly for all commits fixing something, and that
> something can't be referenced as 'error', 'failure', 'issue', 'problem', etc... but
> it should be detailed.
> 
> Most of the times better to document NOT from driver internal perspective,
> but impact of it, like "net/ena: set chain limit to 16" is NOT a good one, it
> explains driver internal perspective (making all up) but it can be something
> like:
> "net/ena: support big packets by increasing link limit"
> 
> For this one, I am not sure impact of the change so hard for me to propose an
> alternative, but just as example it can be something like:
> "net/ena/base: avoid collision by limiting backoff delay"
> 
> > Signed-off-by: Shai Brandes <shaibran@amazon.com>
> > Reviewed-by: Amit Bernstein <amitbern@amazon.com>
> > ---
> >  drivers/net/ena/hal/ena_com.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ena/hal/ena_com.c
> > b/drivers/net/ena/hal/ena_com.c index 6953a1fa33..31c37b0ab3 100644
> > --- a/drivers/net/ena/hal/ena_com.c
> > +++ b/drivers/net/ena/hal/ena_com.c
> > @@ -34,6 +34,8 @@
> >
> >  #define ENA_REGS_ADMIN_INTR_MASK 1
> >
> > +#define ENA_MAX_BACKOFF_DELAY_EXP 16U
> > +
> >  #define ENA_MIN_ADMIN_POLL_US 100
> >
> >  #define ENA_MAX_ADMIN_POLL_US 5000
> > @@ -545,8 +547,9 @@ static int ena_com_comp_status_to_errno(struct
> > ena_com_admin_queue *admin_queue,
> >
> >  static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
> > {
> > +     exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
> >       delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
> > -     delay_us = ENA_MIN32(delay_us * (1U << exp),
> ENA_MAX_ADMIN_POLL_US);
> > +     delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us * (1U
> <<
> > + exp));
> >       ENA_USLEEP(delay_us);
> >  }
> >
  
Brandes, Shai March 12, 2024, 4:53 p.m. UTC | #3
> -----Original Message-----
> From: Brandes, Shai
> Sent: Sunday, March 10, 2024 4:54 PM
> To: 'Ferruh Yigit' <ferruh.yigit@amd.com>
> Cc: dev@dpdk.org
> Subject: RE: [EXTERNAL] [PATCH v3 08/33] net/ena/hal: exponential backoff
> exp limit
> 
> 
> 
> > -----Original Message-----
> > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > Sent: Friday, March 8, 2024 7:24 PM
> > To: Brandes, Shai <shaibran@amazon.com>
> > Cc: dev@dpdk.org
> > Subject: RE: [EXTERNAL] [PATCH v3 08/33] net/ena/hal: exponential
> > backoff exp limit
> >
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender
> > and know the content is safe.
> >
> >
> >
> > On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> > > From: Shai Brandes <shaibran@amazon.com>
> > >
> > > limits the exponent in the exponential backoff mechanism in order to
> > > avoid the value overflowing.
> > >
> >
> > Is this a fix?
[Brandes, Shai] No, this is originated from a backport from the Linux community to our HAL.
The backoff mechanism is used to delay device reset, command completion checks, etc.
The backoff eventually could cause the delay to become excessive (1<<32).
So, this patch cap the backoff value of the exponent used for this backoff at (1<<16).
In addition, for uniformity and readability purposes, the min/max parameter
in the calls of ENA_MIN32 and ENA_MAX32 macros was changed to be first.


> >
> > What was the impact of the overflowing if not limited? And is there a
> > significance of the value 16, can you please elaborate?
> >
> [Brandes, Shai] I will restructure this patch, since this likely hides a fix in hal.
> It is originated from the HAL release, from which I took the patches one by
> one, but the commit messages there tend to be (too) concise.
> 
> >
> > Also let me remind the patch subject format, (this may look
> > insignificant but helps to have more unified commit messages for
> > developers, and if not updated by author, maintainers update it and
> > this brings more overhead to
> > maintainers):
> > "sub-module: verb object"
> >
> > And we use verb 'fix' explicitly for all commits fixing something, and
> > that something can't be referenced as 'error', 'failure', 'issue',
> > 'problem', etc... but it should be detailed.
> >
> > Most of the times better to document NOT from driver internal
> > perspective, but impact of it, like "net/ena: set chain limit to 16"
> > is NOT a good one, it explains driver internal perspective (making all
> > up) but it can be something
> > like:
> > "net/ena: support big packets by increasing link limit"
> >
> > For this one, I am not sure impact of the change so hard for me to
> > propose an alternative, but just as example it can be something like:
> > "net/ena/base: avoid collision by limiting backoff delay"
> >
> > > Signed-off-by: Shai Brandes <shaibran@amazon.com>
> > > Reviewed-by: Amit Bernstein <amitbern@amazon.com>
> > > ---
> > >  drivers/net/ena/hal/ena_com.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ena/hal/ena_com.c
> > > b/drivers/net/ena/hal/ena_com.c index 6953a1fa33..31c37b0ab3 100644
> > > --- a/drivers/net/ena/hal/ena_com.c
> > > +++ b/drivers/net/ena/hal/ena_com.c
> > > @@ -34,6 +34,8 @@
> > >
> > >  #define ENA_REGS_ADMIN_INTR_MASK 1
> > >
> > > +#define ENA_MAX_BACKOFF_DELAY_EXP 16U
> > > +
> > >  #define ENA_MIN_ADMIN_POLL_US 100
> > >
> > >  #define ENA_MAX_ADMIN_POLL_US 5000
> > > @@ -545,8 +547,9 @@ static int ena_com_comp_status_to_errno(struct
> > > ena_com_admin_queue *admin_queue,
> > >
> > >  static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
> > > {
> > > +     exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
> > >       delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
> > > -     delay_us = ENA_MIN32(delay_us * (1U << exp),
> > ENA_MAX_ADMIN_POLL_US);
> > > +     delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us *
> (1U
> > <<
> > > + exp));
> > >       ENA_USLEEP(delay_us);
> > >  }
> > >
  
Ferruh Yigit March 13, 2024, 11:25 a.m. UTC | #4
On 3/12/2024 4:53 PM, Brandes, Shai wrote:
> 
> 
>> -----Original Message-----
>> From: Brandes, Shai
>> Sent: Sunday, March 10, 2024 4:54 PM
>> To: 'Ferruh Yigit' <ferruh.yigit@amd.com>
>> Cc: dev@dpdk.org
>> Subject: RE: [EXTERNAL] [PATCH v3 08/33] net/ena/hal: exponential backoff
>> exp limit
>>
>>
>>
>>> -----Original Message-----
>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>> Sent: Friday, March 8, 2024 7:24 PM
>>> To: Brandes, Shai <shaibran@amazon.com>
>>> Cc: dev@dpdk.org
>>> Subject: RE: [EXTERNAL] [PATCH v3 08/33] net/ena/hal: exponential
>>> backoff exp limit
>>>
>>> CAUTION: This email originated from outside of the organization. Do
>>> not click links or open attachments unless you can confirm the sender
>>> and know the content is safe.
>>>
>>>
>>>
>>> On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
>>>> From: Shai Brandes <shaibran@amazon.com>
>>>>
>>>> limits the exponent in the exponential backoff mechanism in order to
>>>> avoid the value overflowing.
>>>>
>>>
>>> Is this a fix?
> [Brandes, Shai] No, this is originated from a backport from the Linux community to our HAL.
> The backoff mechanism is used to delay device reset, command completion checks, etc.
> The backoff eventually could cause the delay to become excessive (1<<32).
> So, this patch cap the backoff value of the exponent used for this backoff at (1<<16).
> In addition, for uniformity and readability purposes, the min/max parameter
> in the calls of ENA_MIN32 and ENA_MAX32 macros was changed to be first.
> 

I think this fixes the device reset.
Without cap in the backoff value, delay can be too long (depending input
to the function) so that can't reset the device.

But anyway, thanks for the clarification.

> 
>>>
>>> What was the impact of the overflowing if not limited? And is there a
>>> significance of the value 16, can you please elaborate?
>>>
>> [Brandes, Shai] I will restructure this patch, since this likely hides a fix in hal.
>> It is originated from the HAL release, from which I took the patches one by
>> one, but the commit messages there tend to be (too) concise.
>>
>>>
>>> Also let me remind the patch subject format, (this may look
>>> insignificant but helps to have more unified commit messages for
>>> developers, and if not updated by author, maintainers update it and
>>> this brings more overhead to
>>> maintainers):
>>> "sub-module: verb object"
>>>
>>> And we use verb 'fix' explicitly for all commits fixing something, and
>>> that something can't be referenced as 'error', 'failure', 'issue',
>>> 'problem', etc... but it should be detailed.
>>>
>>> Most of the times better to document NOT from driver internal
>>> perspective, but impact of it, like "net/ena: set chain limit to 16"
>>> is NOT a good one, it explains driver internal perspective (making all
>>> up) but it can be something
>>> like:
>>> "net/ena: support big packets by increasing link limit"
>>>
>>> For this one, I am not sure impact of the change so hard for me to
>>> propose an alternative, but just as example it can be something like:
>>> "net/ena/base: avoid collision by limiting backoff delay"
>>>
>>>> Signed-off-by: Shai Brandes <shaibran@amazon.com>
>>>> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
>>>> ---
>>>>  drivers/net/ena/hal/ena_com.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ena/hal/ena_com.c
>>>> b/drivers/net/ena/hal/ena_com.c index 6953a1fa33..31c37b0ab3 100644
>>>> --- a/drivers/net/ena/hal/ena_com.c
>>>> +++ b/drivers/net/ena/hal/ena_com.c
>>>> @@ -34,6 +34,8 @@
>>>>
>>>>  #define ENA_REGS_ADMIN_INTR_MASK 1
>>>>
>>>> +#define ENA_MAX_BACKOFF_DELAY_EXP 16U
>>>> +
>>>>  #define ENA_MIN_ADMIN_POLL_US 100
>>>>
>>>>  #define ENA_MAX_ADMIN_POLL_US 5000
>>>> @@ -545,8 +547,9 @@ static int ena_com_comp_status_to_errno(struct
>>>> ena_com_admin_queue *admin_queue,
>>>>
>>>>  static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
>>>> {
>>>> +     exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
>>>>       delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
>>>> -     delay_us = ENA_MIN32(delay_us * (1U << exp),
>>> ENA_MAX_ADMIN_POLL_US);
>>>> +     delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us *
>> (1U
>>> <<
>>>> + exp));
>>>>       ENA_USLEEP(delay_us);
>>>>  }
>>>>
>
  

Patch

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 6953a1fa33..31c37b0ab3 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -34,6 +34,8 @@ 
 
 #define ENA_REGS_ADMIN_INTR_MASK 1
 
+#define ENA_MAX_BACKOFF_DELAY_EXP 16U
+
 #define ENA_MIN_ADMIN_POLL_US 100
 
 #define ENA_MAX_ADMIN_POLL_US 5000
@@ -545,8 +547,9 @@  static int ena_com_comp_status_to_errno(struct ena_com_admin_queue *admin_queue,
 
 static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
 {
+	exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
 	delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
-	delay_us = ENA_MIN32(delay_us * (1U << exp), ENA_MAX_ADMIN_POLL_US);
+	delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us * (1U << exp));
 	ENA_USLEEP(delay_us);
 }