net/ena: Fix admin cq polling for 32-bit apps

Message ID 20190529210139.26766-1-dharton@cisco.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series net/ena: Fix admin cq polling for 32-bit apps |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

David Harton May 29, 2019, 9:01 p.m. UTC
  Recent modifications to admin command queue polling logic
did not support 32-bit applications.  Updated the driver to
work for 32 or 64 bit applications as well as avoiding
roll-over possibility.

Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")

Signed-off-by: David Harton <dharton@cisco.com>
---
 drivers/net/ena/base/ena_com.c       | 10 +++++++---
 drivers/net/ena/base/ena_plat_dpdk.h |  6 +-----
 2 files changed, 8 insertions(+), 8 deletions(-)
  

Comments

Ferruh Yigit June 27, 2019, 3:37 p.m. UTC | #1
On 5/29/2019 10:01 PM, David Harton wrote:
> Recent modifications to admin command queue polling logic
> did not support 32-bit applications.  Updated the driver to
> work for 32 or 64 bit applications as well as avoiding
> roll-over possibility.
> 
> Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
> 
> Signed-off-by: David Harton <dharton@cisco.com>

Reminder of this patch, if there is not objection or review it will be merged.

Thanks,
ferruh
  
Michal Krawczyk June 28, 2019, 3:03 p.m. UTC | #2
Hi,

sorry for the late reply.

śr., 29 maj 2019 o 23:01 David Harton <dharton@cisco.com> napisał(a):
>
> Recent modifications to admin command queue polling logic
> did not support 32-bit applications.  Updated the driver to
> work for 32 or 64 bit applications as well as avoiding
> roll-over possibility.
>
> Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
>
> Signed-off-by: David Harton <dharton@cisco.com>
> ---
>  drivers/net/ena/base/ena_com.c       | 10 +++++++---
>  drivers/net/ena/base/ena_plat_dpdk.h |  6 +-----
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
> index b688067f7..b96adde3c 100644
> --- a/drivers/net/ena/base/ena_com.c
> +++ b/drivers/net/ena/base/ena_com.c
> @@ -547,10 +547,13 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
>                                                      struct ena_com_admin_queue *admin_queue)
>  {
>         unsigned long flags = 0;
> -       unsigned long timeout;
> +       u32 timeout_ms;
>         int ret;
>
> -       timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue->completion_timeout);
> +       /* Calculate ms granularity timeout from us completion_timeout
> +        * making sure we retry once if we have at least 1ms
> +        */
> +       timeout_ms = (admin_queue->completion_timeout / 1000) + (ENA_POLL_MS - 1);
>
>         while (1) {
>                  ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
> @@ -560,7 +563,7 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
>                  if (comp_ctx->status != ENA_CMD_SUBMITTED)
>                         break;
>
> -               if (ENA_TIME_EXPIRE(timeout)) {
> +               if (timeout_ms < ENA_POLL_MS) {
>                         ena_trc_err("Wait for completion (polling) timeout\n");
>                         /* ENA didn't have any completion */
>                         ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
> @@ -573,6 +576,7 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
>                 }
>
>                 ENA_MSLEEP(ENA_POLL_MS);
> +               timeout_ms -= ENA_POLL_MS;

This part can be problematic at the very overloaded systems - in that
case the ENA_MSLEEP can take a much longer than ENA_POLL_MS and in
this situation the time spent in this function can't be determined.
That's why we were checking time spent in sleep every ENA_TIME_EXPIRE
macro.
The issue can be observed especially in the kernel drivers, and
ena_com is common file for all ENA drivers.
  
David Harton June 28, 2019, 3:46 p.m. UTC | #3
> -----Original Message-----
> From: Michał Krawczyk <mk@semihalf.com>
> Sent: Friday, June 28, 2019 11:03 AM
> To: David Harton (dharton) <dharton@cisco.com>
> Cc: dev@dpdk.org; Marcin Wojtas <mw@semihalf.com>; Tzalik, Guy
> <gtzalik@amazon.com>; Schmeilin, Evgeny <evgenys@amazon.com>
> Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> 
> Hi,
> 
> sorry for the late reply.
> 
> śr., 29 maj 2019 o 23:01 David Harton <dharton@cisco.com> napisał(a):
> >
> > Recent modifications to admin command queue polling logic did not
> > support 32-bit applications.  Updated the driver to work for 32 or 64
> > bit applications as well as avoiding roll-over possibility.
> >
> > Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
> >
> > Signed-off-by: David Harton <dharton@cisco.com>
> > ---
> >  drivers/net/ena/base/ena_com.c       | 10 +++++++---
> >  drivers/net/ena/base/ena_plat_dpdk.h |  6 +-----
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ena/base/ena_com.c
> > b/drivers/net/ena/base/ena_com.c index b688067f7..b96adde3c 100644
> > --- a/drivers/net/ena/base/ena_com.c
> > +++ b/drivers/net/ena/base/ena_com.c
> > @@ -547,10 +547,13 @@ static int
> ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
> >                                                      struct
> > ena_com_admin_queue *admin_queue)  {
> >         unsigned long flags = 0;
> > -       unsigned long timeout;
> > +       u32 timeout_ms;
> >         int ret;
> >
> > -       timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue-
> >completion_timeout);
> > +       /* Calculate ms granularity timeout from us completion_timeout
> > +        * making sure we retry once if we have at least 1ms
> > +        */
> > +       timeout_ms = (admin_queue->completion_timeout / 1000) +
> > + (ENA_POLL_MS - 1);
> >
> >         while (1) {
> >                  ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); @@
> > -560,7 +563,7 @@ static int
> ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
> >                  if (comp_ctx->status != ENA_CMD_SUBMITTED)
> >                         break;
> >
> > -               if (ENA_TIME_EXPIRE(timeout)) {
> > +               if (timeout_ms < ENA_POLL_MS) {
> >                         ena_trc_err("Wait for completion (polling)
> timeout\n");
> >                         /* ENA didn't have any completion */
> >                         ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
> > @@ -573,6 +576,7 @@ static int
> ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
> >                 }
> >
> >                 ENA_MSLEEP(ENA_POLL_MS);
> > +               timeout_ms -= ENA_POLL_MS;
> 
> This part can be problematic at the very overloaded systems - in that case
> the ENA_MSLEEP can take a much longer than ENA_POLL_MS and in this
> situation the time spent in this function can't be determined.
> That's why we were checking time spent in sleep every ENA_TIME_EXPIRE
> macro.
> The issue can be observed especially in the kernel drivers, and ena_com is
> common file for all ENA drivers.

I don't understand the comment/concern.

The previous macros calculate the future cycle count based on a us timeout value (assuming 64 bit apps) and repeat the loop until the command is "submitted" or the current cycle count is greater than the calculated cycle count value sleeping ENA_POLL_MS between each iteration.

The new method accomplishes the same thing but instead of using a "cycle count" it uses the number of ms which the poll and sleep actions are based upon.

The differences with the new method are:
 - it uses less instructions
 - not susceptible to cycle count overrun (admittedly highyl unlikely)
 - (most importantly) works equally well for 32 or 64 bit apps 

Can you elaborate on your concern?

Thanks,
Dave
  
Michal Krawczyk July 1, 2019, 7:24 a.m. UTC | #4
+ folks responsible for ENA on other platforms as this code touches
every ENA target

pt., 28 cze 2019 o 17:46 David Harton (dharton) <dharton@cisco.com> napisał(a):
>
>
>
> > -----Original Message-----
> > From: Michał Krawczyk <mk@semihalf.com>
> > Sent: Friday, June 28, 2019 11:03 AM
> > To: David Harton (dharton) <dharton@cisco.com>
> > Cc: dev@dpdk.org; Marcin Wojtas <mw@semihalf.com>; Tzalik, Guy
> > <gtzalik@amazon.com>; Schmeilin, Evgeny <evgenys@amazon.com>
> > Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> >
> > Hi,
> >
> > sorry for the late reply.
> >
> > śr., 29 maj 2019 o 23:01 David Harton <dharton@cisco.com> napisał(a):
> > >
> > > Recent modifications to admin command queue polling logic did not
> > > support 32-bit applications.  Updated the driver to work for 32 or 64
> > > bit applications as well as avoiding roll-over possibility.
> > >
> > > Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
> > >
> > > Signed-off-by: David Harton <dharton@cisco.com>
> > > ---
> > >  drivers/net/ena/base/ena_com.c       | 10 +++++++---
> > >  drivers/net/ena/base/ena_plat_dpdk.h |  6 +-----
> > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/net/ena/base/ena_com.c
> > > b/drivers/net/ena/base/ena_com.c index b688067f7..b96adde3c 100644
> > > --- a/drivers/net/ena/base/ena_com.c
> > > +++ b/drivers/net/ena/base/ena_com.c
> > > @@ -547,10 +547,13 @@ static int
> > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
> > >                                                      struct
> > > ena_com_admin_queue *admin_queue)  {
> > >         unsigned long flags = 0;
> > > -       unsigned long timeout;
> > > +       u32 timeout_ms;
> > >         int ret;
> > >
> > > -       timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue-
> > >completion_timeout);
> > > +       /* Calculate ms granularity timeout from us completion_timeout
> > > +        * making sure we retry once if we have at least 1ms
> > > +        */
> > > +       timeout_ms = (admin_queue->completion_timeout / 1000) +
> > > + (ENA_POLL_MS - 1);
> > >
> > >         while (1) {
> > >                  ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); @@
> > > -560,7 +563,7 @@ static int
> > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
> > >                  if (comp_ctx->status != ENA_CMD_SUBMITTED)
> > >                         break;
> > >
> > > -               if (ENA_TIME_EXPIRE(timeout)) {
> > > +               if (timeout_ms < ENA_POLL_MS) {
> > >                         ena_trc_err("Wait for completion (polling)
> > timeout\n");
> > >                         /* ENA didn't have any completion */
> > >                         ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
> > > @@ -573,6 +576,7 @@ static int
> > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
> > >                 }
> > >
> > >                 ENA_MSLEEP(ENA_POLL_MS);
> > > +               timeout_ms -= ENA_POLL_MS;
> >
> > This part can be problematic at the very overloaded systems - in that case
> > the ENA_MSLEEP can take a much longer than ENA_POLL_MS and in this
> > situation the time spent in this function can't be determined.
> > That's why we were checking time spent in sleep every ENA_TIME_EXPIRE
> > macro.
> > The issue can be observed especially in the kernel drivers, and ena_com is
> > common file for all ENA drivers.
>
> I don't understand the comment/concern.
>
> The previous macros calculate the future cycle count based on a us timeout value (assuming 64 bit apps) and repeat the loop until the command is "submitted" or the current cycle count is greater than the calculated cycle count value sleeping ENA_POLL_MS between each iteration.
>
>
> The new method accomplishes the same thing but instead of using a "cycle count" it uses the number of ms which the poll and sleep actions are based upon.
>
> The differences with the new method are:
>  - it uses less instructions
>  - not susceptible to cycle count overrun (admittedly highyl unlikely)
>  - (most importantly) works equally well for 32 or 64 bit apps
>
> Can you elaborate on your concern?

The problem with this solution is that you are assuming that
ENA_MSLEEP will always sleep for ENA_POLL_MS which is not true. It can
sleep much more in busy systems.
The behavior of this function before your changes is minimizing that
time by getting current cycles in the ENA_TIME_EXPIRE. In the above
solution, we can not determine how much time we've sleepped. It could
be ENA_POLL_MS or even 10 second.

Thanks,
Michal

> Thanks,
> Dave
>
  
David Harton July 1, 2019, noon UTC | #5
> -----Original Message-----
> From: Michał Krawczyk <mk@semihalf.com>
> Sent: Monday, July 01, 2019 3:24 AM
> To: David Harton (dharton) <dharton@cisco.com>
> Cc: dev@dpdk.org; Marcin Wojtas <mw@semihalf.com>; Tzalik, Guy
> <gtzalik@amazon.com>; Schmeilin, Evgeny <evgenys@amazon.com>; Belgazal,
> Netanel <netanel@amazon.com>; Kiyanovski, Arthur <akiyano@amazon.com>;
> Chauskin, Igor <igorch@amazon.com>; Matushevsky, Alexander
> <matua@amazon.com>; sameehj@amazon.com
> Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> 
> + folks responsible for ENA on other platforms as this code touches
> every ENA target
> 
> pt., 28 cze 2019 o 17:46 David Harton (dharton) <dharton@cisco.com>
> napisał(a):
> >
> >
> >
> > > -----Original Message-----
> > > From: Michał Krawczyk <mk@semihalf.com>
> > > Sent: Friday, June 28, 2019 11:03 AM
> > > To: David Harton (dharton) <dharton@cisco.com>
> > > Cc: dev@dpdk.org; Marcin Wojtas <mw@semihalf.com>; Tzalik, Guy
> > > <gtzalik@amazon.com>; Schmeilin, Evgeny <evgenys@amazon.com>
> > > Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> > >
> > > Hi,
> > >
> > > sorry for the late reply.
> > >
> > > śr., 29 maj 2019 o 23:01 David Harton <dharton@cisco.com> napisał(a):
> > > >
> > > > Recent modifications to admin command queue polling logic did not
> > > > support 32-bit applications.  Updated the driver to work for 32 or
> > > > 64 bit applications as well as avoiding roll-over possibility.
> > > >
> > > > Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
> > > >
> > > > Signed-off-by: David Harton <dharton@cisco.com>
> > > > ---
> > > >  drivers/net/ena/base/ena_com.c       | 10 +++++++---
> > > >  drivers/net/ena/base/ena_plat_dpdk.h |  6 +-----
> > > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ena/base/ena_com.c
> > > > b/drivers/net/ena/base/ena_com.c index b688067f7..b96adde3c 100644
> > > > --- a/drivers/net/ena/base/ena_com.c
> > > > +++ b/drivers/net/ena/base/ena_com.c
> > > > @@ -547,10 +547,13 @@ static int
> > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > *comp_c
> > > >                                                      struct
> > > > ena_com_admin_queue *admin_queue)  {
> > > >         unsigned long flags = 0;
> > > > -       unsigned long timeout;
> > > > +       u32 timeout_ms;
> > > >         int ret;
> > > >
> > > > -       timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue-
> > > >completion_timeout);
> > > > +       /* Calculate ms granularity timeout from us
> completion_timeout
> > > > +        * making sure we retry once if we have at least 1ms
> > > > +        */
> > > > +       timeout_ms = (admin_queue->completion_timeout / 1000) +
> > > > + (ENA_POLL_MS - 1);
> > > >
> > > >         while (1) {
> > > >                  ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); @@
> > > > -560,7 +563,7 @@ static int
> > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > *comp_c
> > > >                  if (comp_ctx->status != ENA_CMD_SUBMITTED)
> > > >                         break;
> > > >
> > > > -               if (ENA_TIME_EXPIRE(timeout)) {
> > > > +               if (timeout_ms < ENA_POLL_MS) {
> > > >                         ena_trc_err("Wait for completion (polling)
> > > timeout\n");
> > > >                         /* ENA didn't have any completion */
> > > >                         ENA_SPINLOCK_LOCK(admin_queue->q_lock,
> > > > flags); @@ -573,6 +576,7 @@ static int
> > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > *comp_c
> > > >                 }
> > > >
> > > >                 ENA_MSLEEP(ENA_POLL_MS);
> > > > +               timeout_ms -= ENA_POLL_MS;
> > >
> > > This part can be problematic at the very overloaded systems - in
> > > that case the ENA_MSLEEP can take a much longer than ENA_POLL_MS and
> > > in this situation the time spent in this function can't be determined.
> > > That's why we were checking time spent in sleep every
> > > ENA_TIME_EXPIRE macro.
> > > The issue can be observed especially in the kernel drivers, and
> > > ena_com is common file for all ENA drivers.
> >
> > I don't understand the comment/concern.
> >
> > The previous macros calculate the future cycle count based on a us
> timeout value (assuming 64 bit apps) and repeat the loop until the command
> is "submitted" or the current cycle count is greater than the calculated
> cycle count value sleeping ENA_POLL_MS between each iteration.
> >
> >
> > The new method accomplishes the same thing but instead of using a "cycle
> count" it uses the number of ms which the poll and sleep actions are based
> upon.
> >
> > The differences with the new method are:
> >  - it uses less instructions
> >  - not susceptible to cycle count overrun (admittedly highyl unlikely)
> >  - (most importantly) works equally well for 32 or 64 bit apps
> >
> > Can you elaborate on your concern?
> 
> The problem with this solution is that you are assuming that ENA_MSLEEP
> will always sleep for ENA_POLL_MS which is not true. It can sleep much
> more in busy systems.
> The behavior of this function before your changes is minimizing that time
> by getting current cycles in the ENA_TIME_EXPIRE. In the above solution,
> we can not determine how much time we've sleepped. It could be ENA_POLL_MS
> or even 10 second.

Thanks, I understand your concern now.  It's true that what I added is much more coarse than what was there because any call to rte_delay_ms() is only guaranteed to wait at lease those ms and can wait longer.
 
Kinda scary to think though I ask to wait say 10ms and the context be switched out for 10s.  Not really a high-performing or low latency system in that case and if so I'm not sure I see the harm waiting for the coarse amount of time either.

However, if new approach is truly not desired then the original approach can preserved if it uses uint64_t to track clock cycles instead of the architecture dependent type currently used.

Regards,
Dave

> 
> Thanks,
> Michal
> 
> > Thanks,
> > Dave
> >
  
Michal Krawczyk July 1, 2019, 12:52 p.m. UTC | #6
pon., 1 lip 2019 o 14:01 David Harton (dharton) <dharton@cisco.com> napisał(a):
>
>
>
> > -----Original Message-----
> > From: Michał Krawczyk <mk@semihalf.com>
> > Sent: Monday, July 01, 2019 3:24 AM
> > To: David Harton (dharton) <dharton@cisco.com>
> > Cc: dev@dpdk.org; Marcin Wojtas <mw@semihalf.com>; Tzalik, Guy
> > <gtzalik@amazon.com>; Schmeilin, Evgeny <evgenys@amazon.com>; Belgazal,
> > Netanel <netanel@amazon.com>; Kiyanovski, Arthur <akiyano@amazon.com>;
> > Chauskin, Igor <igorch@amazon.com>; Matushevsky, Alexander
> > <matua@amazon.com>; sameehj@amazon.com
> > Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> >
> > + folks responsible for ENA on other platforms as this code touches
> > every ENA target
> >
> > pt., 28 cze 2019 o 17:46 David Harton (dharton) <dharton@cisco.com>
> > napisał(a):
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Michał Krawczyk <mk@semihalf.com>
> > > > Sent: Friday, June 28, 2019 11:03 AM
> > > > To: David Harton (dharton) <dharton@cisco.com>
> > > > Cc: dev@dpdk.org; Marcin Wojtas <mw@semihalf.com>; Tzalik, Guy
> > > > <gtzalik@amazon.com>; Schmeilin, Evgeny <evgenys@amazon.com>
> > > > Subject: Re: [PATCH] net/ena: Fix admin cq polling for 32-bit apps
> > > >
> > > > Hi,
> > > >
> > > > sorry for the late reply.
> > > >
> > > > śr., 29 maj 2019 o 23:01 David Harton <dharton@cisco.com> napisał(a):
> > > > >
> > > > > Recent modifications to admin command queue polling logic did not
> > > > > support 32-bit applications.  Updated the driver to work for 32 or
> > > > > 64 bit applications as well as avoiding roll-over possibility.
> > > > >
> > > > > Fixes: 3adcba9a89 ("net/ena: update HAL to the newer version")
> > > > >
> > > > > Signed-off-by: David Harton <dharton@cisco.com>
> > > > > ---
> > > > >  drivers/net/ena/base/ena_com.c       | 10 +++++++---
> > > > >  drivers/net/ena/base/ena_plat_dpdk.h |  6 +-----
> > > > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ena/base/ena_com.c
> > > > > b/drivers/net/ena/base/ena_com.c index b688067f7..b96adde3c 100644
> > > > > --- a/drivers/net/ena/base/ena_com.c
> > > > > +++ b/drivers/net/ena/base/ena_com.c
> > > > > @@ -547,10 +547,13 @@ static int
> > > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > > *comp_c
> > > > >                                                      struct
> > > > > ena_com_admin_queue *admin_queue)  {
> > > > >         unsigned long flags = 0;
> > > > > -       unsigned long timeout;
> > > > > +       u32 timeout_ms;
> > > > >         int ret;
> > > > >
> > > > > -       timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue-
> > > > >completion_timeout);
> > > > > +       /* Calculate ms granularity timeout from us
> > completion_timeout
> > > > > +        * making sure we retry once if we have at least 1ms
> > > > > +        */
> > > > > +       timeout_ms = (admin_queue->completion_timeout / 1000) +
> > > > > + (ENA_POLL_MS - 1);
> > > > >
> > > > >         while (1) {
> > > > >                  ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); @@
> > > > > -560,7 +563,7 @@ static int
> > > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > > *comp_c
> > > > >                  if (comp_ctx->status != ENA_CMD_SUBMITTED)
> > > > >                         break;
> > > > >
> > > > > -               if (ENA_TIME_EXPIRE(timeout)) {
> > > > > +               if (timeout_ms < ENA_POLL_MS) {
> > > > >                         ena_trc_err("Wait for completion (polling)
> > > > timeout\n");
> > > > >                         /* ENA didn't have any completion */
> > > > >                         ENA_SPINLOCK_LOCK(admin_queue->q_lock,
> > > > > flags); @@ -573,6 +576,7 @@ static int
> > > > ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx
> > > > *comp_c
> > > > >                 }
> > > > >
> > > > >                 ENA_MSLEEP(ENA_POLL_MS);
> > > > > +               timeout_ms -= ENA_POLL_MS;
> > > >
> > > > This part can be problematic at the very overloaded systems - in
> > > > that case the ENA_MSLEEP can take a much longer than ENA_POLL_MS and
> > > > in this situation the time spent in this function can't be determined.
> > > > That's why we were checking time spent in sleep every
> > > > ENA_TIME_EXPIRE macro.
> > > > The issue can be observed especially in the kernel drivers, and
> > > > ena_com is common file for all ENA drivers.
> > >
> > > I don't understand the comment/concern.
> > >
> > > The previous macros calculate the future cycle count based on a us
> > timeout value (assuming 64 bit apps) and repeat the loop until the command
> > is "submitted" or the current cycle count is greater than the calculated
> > cycle count value sleeping ENA_POLL_MS between each iteration.
> > >
> > >
> > > The new method accomplishes the same thing but instead of using a "cycle
> > count" it uses the number of ms which the poll and sleep actions are based
> > upon.
> > >
> > > The differences with the new method are:
> > >  - it uses less instructions
> > >  - not susceptible to cycle count overrun (admittedly highyl unlikely)
> > >  - (most importantly) works equally well for 32 or 64 bit apps
> > >
> > > Can you elaborate on your concern?
> >
> > The problem with this solution is that you are assuming that ENA_MSLEEP
> > will always sleep for ENA_POLL_MS which is not true. It can sleep much
> > more in busy systems.
> > The behavior of this function before your changes is minimizing that time
> > by getting current cycles in the ENA_TIME_EXPIRE. In the above solution,
> > we can not determine how much time we've sleepped. It could be ENA_POLL_MS
> > or even 10 second.
>
> Thanks, I understand your concern now.  It's true that what I added is much more coarse than what was there because any call to rte_delay_ms() is only guaranteed to wait at lease those ms and can wait longer.
>
> Kinda scary to think though I ask to wait say 10ms and the context be switched out for 10s.  Not really a high-performing or low latency system in that case and if so I'm not sure I see the harm waiting for the coarse amount of time either.
>
> However, if new approach is truly not desired then the original approach can preserved if it uses uint64_t to track clock cycles instead of the architecture dependent type currently used.

Using uint64_t instead of unsigned long is fine for me. That shouldn't
affect other platforms as much.

But let's hear from others if they are fine with that kind of change.

Thanks,
Michal

>
> Regards,
> Dave
>
> >
> > Thanks,
> > Michal
> >
> > > Thanks,
> > > Dave
> > >
  

Patch

diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
index b688067f7..b96adde3c 100644
--- a/drivers/net/ena/base/ena_com.c
+++ b/drivers/net/ena/base/ena_com.c
@@ -547,10 +547,13 @@  static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
 						     struct ena_com_admin_queue *admin_queue)
 {
 	unsigned long flags = 0;
-	unsigned long timeout;
+	u32 timeout_ms;
 	int ret;
 
-	timeout = ENA_GET_SYSTEM_TIMEOUT(admin_queue->completion_timeout);
+	/* Calculate ms granularity timeout from us completion_timeout
+	 * making sure we retry once if we have at least 1ms
+	 */
+	timeout_ms = (admin_queue->completion_timeout / 1000) + (ENA_POLL_MS - 1);
 
 	while (1) {
                 ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
@@ -560,7 +563,7 @@  static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
                 if (comp_ctx->status != ENA_CMD_SUBMITTED)
 			break;
 
-		if (ENA_TIME_EXPIRE(timeout)) {
+		if (timeout_ms < ENA_POLL_MS) {
 			ena_trc_err("Wait for completion (polling) timeout\n");
 			/* ENA didn't have any completion */
 			ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
@@ -573,6 +576,7 @@  static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
 		}
 
 		ENA_MSLEEP(ENA_POLL_MS);
+		timeout_ms -= ENA_POLL_MS;
 	}
 
 	if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) {
diff --git a/drivers/net/ena/base/ena_plat_dpdk.h b/drivers/net/ena/base/ena_plat_dpdk.h
index 902d91efb..0cb837e4f 100644
--- a/drivers/net/ena/base/ena_plat_dpdk.h
+++ b/drivers/net/ena/base/ena_plat_dpdk.h
@@ -178,7 +178,7 @@  do {                                                                   \
 	do {								\
 		struct timespec wait;					\
 		struct timeval now;					\
-		unsigned long timeout_us;				\
+		uint64_t timeout_us;					\
 		gettimeofday(&now, NULL);				\
 		wait.tv_sec = now.tv_sec + timeout / 1000000UL;		\
 		timeout_us = timeout % 1000000UL;			\
@@ -196,10 +196,6 @@  do {                                                                   \
 #define ena_wait_event_t ena_wait_queue_t
 #define ENA_MIGHT_SLEEP()
 
-#define ENA_TIME_EXPIRE(timeout)  (timeout < rte_get_timer_cycles())
-#define ENA_GET_SYSTEM_TIMEOUT(timeout_us)                             \
-       (timeout_us * rte_get_timer_hz() / 1000000 + rte_get_timer_cycles())
-
 /*
  * Each rte_memzone should have unique name.
  * To satisfy it, count number of allocations and add it to name.