mbox series

[v10,0/9] Add PMD power mgmt

Message ID 1603810749-22285-1-git-send-email-liang.j.ma@intel.com (mailing list archive)
Headers
Series Add PMD power mgmt |

Message

Liang, Ma Oct. 27, 2020, 2:59 p.m. UTC
  This patchset proposes a simple API for Ethernet drivers
to cause the CPU to enter a power-optimized state while
waiting for packets to arrive, along with a set of
generic intrinsics that facilitate that. This is achieved
through cooperation with the NIC driver that will allow
us to know address of wake up event, and wait for writes
on it.

On IA, this is achieved through using UMONITOR/UMWAIT
instructions. They are used in their raw opcode form
because there is no widespread compiler support for
them yet. Still, the API is made generic enough to
hopefully support other architectures, if they happen
to implement similar instructions.

To achieve power savings, there is a very simple mechanism
used: we're counting empty polls, and if a certain threshold
is reached, we get the address of next RX ring descriptor
from the NIC driver, arm the monitoring hardware, and
enter a power-optimized state. We will then wake up when
either a timeout happens, or a write happens (or generally
whenever CPU feels like waking up - this is platform-
specific), and proceed as normal. The empty poll counter is
reset whenever we actually get packets, so we only go to
sleep when we know nothing is going on. The mechanism is
generic which can be used for any write back descriptor.

Why are we putting it into ethdev as opposed to leaving
this up to the application? Our customers specifically
requested a way to do it wit minimal changes to the
application code. The current approach allows to just
flip a switch and automatically have power savings.

- Only 1:1 core to queue mapping is supported,
  meaning that each lcore must at most handle RX on a
  single queue
- Support 3 type policies. UMWAIT/PAUSE/Frequency_Scale
- Power management is enabled per-queue
- The API doesn't extend to other device types

Liang Ma (9):
  eal: add new x86 cpuid support for WAITPKG
  eal: add power management intrinsics
  eal: add intrinsics support check infrastructure
  ethdev: add simple power management API
  power: add PMD power management API and callback
  net/ixgbe: implement power management API
  net/i40e: implement power management API
  net/ice: implement power management API
  examples/l3fwd-power: enable PMD power mgmt

 doc/guides/prog_guide/power_man.rst           |  48 +++
 doc/guides/rel_notes/release_20_11.rst        |  15 +
 .../sample_app_ug/l3_forward_power_man.rst    |  13 +
 drivers/net/i40e/i40e_ethdev.c                |   1 +
 drivers/net/i40e/i40e_rxtx.c                  |  26 ++
 drivers/net/i40e/i40e_rxtx.h                  |   2 +
 drivers/net/ice/ice_ethdev.c                  |   1 +
 drivers/net/ice/ice_rxtx.c                    |  26 ++
 drivers/net/ice/ice_rxtx.h                    |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c              |   1 +
 drivers/net/ixgbe/ixgbe_rxtx.c                |  25 ++
 drivers/net/ixgbe/ixgbe_rxtx.h                |   2 +
 examples/l3fwd-power/main.c                   |  46 ++-
 lib/librte_eal/arm/include/meson.build        |   1 +
 .../arm/include/rte_power_intrinsics.h        |  60 ++++
 lib/librte_eal/arm/rte_cpuflags.c             |   6 +
 lib/librte_eal/include/generic/rte_cpuflags.h |  26 ++
 .../include/generic/rte_power_intrinsics.h    | 123 +++++++
 lib/librte_eal/include/meson.build            |   1 +
 lib/librte_eal/ppc/include/meson.build        |   1 +
 .../ppc/include/rte_power_intrinsics.h        |  60 ++++
 lib/librte_eal/ppc/rte_cpuflags.c             |   7 +
 lib/librte_eal/version.map                    |   1 +
 lib/librte_eal/x86/include/meson.build        |   1 +
 lib/librte_eal/x86/include/rte_cpuflags.h     |   1 +
 .../x86/include/rte_power_intrinsics.h        | 135 ++++++++
 lib/librte_eal/x86/rte_cpuflags.c             |  14 +
 lib/librte_ethdev/rte_ethdev.c                |  23 ++
 lib/librte_ethdev/rte_ethdev.h                |  33 ++
 lib/librte_ethdev/rte_ethdev_driver.h         |  28 ++
 lib/librte_ethdev/version.map                 |   1 +
 lib/librte_power/meson.build                  |   5 +-
 lib/librte_power/rte_power_pmd_mgmt.c         | 320 ++++++++++++++++++
 lib/librte_power/rte_power_pmd_mgmt.h         |  92 +++++
 lib/librte_power/version.map                  |   4 +
 35 files changed, 1148 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_eal/arm/include/rte_power_intrinsics.h
 create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h
 create mode 100644 lib/librte_eal/ppc/include/rte_power_intrinsics.h
 create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h
 create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
 create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
  

Comments

Thomas Monjalon Oct. 27, 2020, 4:02 p.m. UTC | #1
Please be more patient.
I asked some questions on v9 (ethdev API is not generic enough),
and you send a v10 in the same minute you reply,
without making sure I agree with your answers.
  
Ajit Khaparde Oct. 27, 2020, 8:53 p.m. UTC | #2
On Tue, Oct 27, 2020 at 7:59 AM Liang Ma <liang.j.ma@intel.com> wrote:
>
> This patchset proposes a simple API for Ethernet drivers
> to cause the CPU to enter a power-optimized state while
> waiting for packets to arrive, along with a set of
> generic intrinsics that facilitate that. This is achieved
> through cooperation with the NIC driver that will allow
> us to know address of wake up event, and wait for writes
> on it.
Is the wake event the same as ring status or interrupt status register?

So in a way the PMD is passing the address of the next ring descriptor?
So that instead of the PMD polling it, the application peeks at it and
when ready asks the PMD to actually process the packet(s)?

>
> On IA, this is achieved through using UMONITOR/UMWAIT
> instructions. They are used in their raw opcode form
> because there is no widespread compiler support for
> them yet. Still, the API is made generic enough to
> hopefully support other architectures, if they happen
> to implement similar instructions.
>
> To achieve power savings, there is a very simple mechanism
> used: we're counting empty polls, and if a certain threshold
> is reached, we get the address of next RX ring descriptor
> from the NIC driver, arm the monitoring hardware, and
> enter a power-optimized state. We will then wake up when
> either a timeout happens, or a write happens (or generally
> whenever CPU feels like waking up - this is platform-
> specific), and proceed as normal. The empty poll counter is
> reset whenever we actually get packets, so we only go to
> sleep when we know nothing is going on. The mechanism is
> generic which can be used for any write back descriptor.
>
> Why are we putting it into ethdev as opposed to leaving
> this up to the application? Our customers specifically
> requested a way to do it wit minimal changes to the
> application code. The current approach allows to just
> flip a switch and automatically have power savings.
The application still has to know address of wake up event. Right?
And then it will need the logic to count empty polls and the threshold?
This will be done by application or something else?

>
> - Only 1:1 core to queue mapping is supported,
>   meaning that each lcore must at most handle RX on a
>   single queue
> - Support 3 type policies. UMWAIT/PAUSE/Frequency_Scale
> - Power management is enabled per-queue
> - The API doesn't extend to other device types
>
> Liang Ma (9):
>   eal: add new x86 cpuid support for WAITPKG
>   eal: add power management intrinsics
>   eal: add intrinsics support check infrastructure
>   ethdev: add simple power management API
>   power: add PMD power management API and callback
>   net/ixgbe: implement power management API
>   net/i40e: implement power management API
>   net/ice: implement power management API
>   examples/l3fwd-power: enable PMD power mgmt
>
>  doc/guides/prog_guide/power_man.rst           |  48 +++
>  doc/guides/rel_notes/release_20_11.rst        |  15 +
>  .../sample_app_ug/l3_forward_power_man.rst    |  13 +
>  drivers/net/i40e/i40e_ethdev.c                |   1 +
>  drivers/net/i40e/i40e_rxtx.c                  |  26 ++
>  drivers/net/i40e/i40e_rxtx.h                  |   2 +
>  drivers/net/ice/ice_ethdev.c                  |   1 +
>  drivers/net/ice/ice_rxtx.c                    |  26 ++
>  drivers/net/ice/ice_rxtx.h                    |   2 +
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   1 +
>  drivers/net/ixgbe/ixgbe_rxtx.c                |  25 ++
>  drivers/net/ixgbe/ixgbe_rxtx.h                |   2 +
>  examples/l3fwd-power/main.c                   |  46 ++-
>  lib/librte_eal/arm/include/meson.build        |   1 +
>  .../arm/include/rte_power_intrinsics.h        |  60 ++++
>  lib/librte_eal/arm/rte_cpuflags.c             |   6 +
>  lib/librte_eal/include/generic/rte_cpuflags.h |  26 ++
>  .../include/generic/rte_power_intrinsics.h    | 123 +++++++
>  lib/librte_eal/include/meson.build            |   1 +
>  lib/librte_eal/ppc/include/meson.build        |   1 +
>  .../ppc/include/rte_power_intrinsics.h        |  60 ++++
>  lib/librte_eal/ppc/rte_cpuflags.c             |   7 +
>  lib/librte_eal/version.map                    |   1 +
>  lib/librte_eal/x86/include/meson.build        |   1 +
>  lib/librte_eal/x86/include/rte_cpuflags.h     |   1 +
>  .../x86/include/rte_power_intrinsics.h        | 135 ++++++++
>  lib/librte_eal/x86/rte_cpuflags.c             |  14 +
>  lib/librte_ethdev/rte_ethdev.c                |  23 ++
>  lib/librte_ethdev/rte_ethdev.h                |  33 ++
>  lib/librte_ethdev/rte_ethdev_driver.h         |  28 ++
>  lib/librte_ethdev/version.map                 |   1 +
>  lib/librte_power/meson.build                  |   5 +-
>  lib/librte_power/rte_power_pmd_mgmt.c         | 320 ++++++++++++++++++
>  lib/librte_power/rte_power_pmd_mgmt.h         |  92 +++++
>  lib/librte_power/version.map                  |   4 +
>  35 files changed, 1148 insertions(+), 3 deletions(-)
>  create mode 100644 lib/librte_eal/arm/include/rte_power_intrinsics.h
>  create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h
>  create mode 100644 lib/librte_eal/ppc/include/rte_power_intrinsics.h
>  create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h
>  create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
>  create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
>
> --
> 2.17.1
>
  
Liang, Ma Oct. 28, 2020, 12:13 p.m. UTC | #3
On 27 Oct 13:53, Ajit Khaparde wrote:
> On Tue, Oct 27, 2020 at 7:59 AM Liang Ma <liang.j.ma@intel.com> wrote:
> >
> > This patchset proposes a simple API for Ethernet drivers
> > to cause the CPU to enter a power-optimized state while
> > waiting for packets to arrive, along with a set of
> > generic intrinsics that facilitate that. This is achieved
> > through cooperation with the NIC driver that will allow
> > us to know address of wake up event, and wait for writes
> > on it.
> Is the wake event the same as ring status or interrupt status register?
  the wake event is any change happen to the monitoring address range. 
  can be a ring descriptor(e.g. Done Bits) but not limiti to that. 
> 
> So in a way the PMD is passing the address of the next ring descriptor?
> So that instead of the PMD polling it, the application peeks at it and
> when ready asks the PMD to actually process the packet(s)?
  NO, after PMD pssing the address, processor will monitoring and wait on
  the address range(aka, move to a sub-state). any change to the address
  range caused by other source(another processor, NIC etc) will wake up the
  processor then start polling again. 
> 
> >
> > On IA, this is achieved through using UMONITOR/UMWAIT
> > instructions. They are used in their raw opcode form
> > because there is no widespread compiler support for
> > them yet. Still, the API is made generic enough to
> > hopefully support other architectures, if they happen
> > to implement similar instructions.
> >
> > To achieve power savings, there is a very simple mechanism
> > used: we're counting empty polls, and if a certain threshold
> > is reached, we get the address of next RX ring descriptor
> > from the NIC driver, arm the monitoring hardware, and
> > enter a power-optimized state. We will then wake up when
> > either a timeout happens, or a write happens (or generally
> > whenever CPU feels like waking up - this is platform-
> > specific), and proceed as normal. The empty poll counter is
> > reset whenever we actually get packets, so we only go to
> > sleep when we know nothing is going on. The mechanism is
> > generic which can be used for any write back descriptor.
> >
> > Why are we putting it into ethdev as opposed to leaving
> > this up to the application? Our customers specifically
> > requested a way to do it wit minimal changes to the
> > application code. The current approach allows to just
> > flip a switch and automatically have power savings.
> The application still has to know address of wake up event. Right?
> And then it will need the logic to count empty polls and the threshold?
> This will be done by application or something else?
  the empty poll counting is done by the power library callback function
  which is included in the patch 5. APP has no need change any thing beside
  the initilization code(call the enable/disable API). please Ref the patch 9
  which demostrate how to use it with l3fwd-power.
> 
> >
> > - Only 1:1 core to queue mapping is supported,
> >   meaning that each lcore must at most handle RX on a
> >   single queue
> > - Support 3 type policies. UMWAIT/PAUSE/Frequency_Scale
> > - Power management is enabled per-queue
> > - The API doesn't extend to other device types
> >
> > Liang Ma (9):
> >   eal: add new x86 cpuid support for WAITPKG
> >   eal: add power management intrinsics
> >   eal: add intrinsics support check infrastructure
> >   ethdev: add simple power management API
> >   power: add PMD power management API and callback
> >   net/ixgbe: implement power management API
> >   net/i40e: implement power management API
> >   net/ice: implement power management API
> >   examples/l3fwd-power: enable PMD power mgmt
> >
> >  doc/guides/prog_guide/power_man.rst           |  48 +++
> >  doc/guides/rel_notes/release_20_11.rst        |  15 +
> >  .../sample_app_ug/l3_forward_power_man.rst    |  13 +
> >  drivers/net/i40e/i40e_ethdev.c                |   1 +
> >  drivers/net/i40e/i40e_rxtx.c                  |  26 ++
> >  drivers/net/i40e/i40e_rxtx.h                  |   2 +
> >  drivers/net/ice/ice_ethdev.c                  |   1 +
> >  drivers/net/ice/ice_rxtx.c                    |  26 ++
> >  drivers/net/ice/ice_rxtx.h                    |   2 +
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   1 +
> >  drivers/net/ixgbe/ixgbe_rxtx.c                |  25 ++
> >  drivers/net/ixgbe/ixgbe_rxtx.h                |   2 +
> >  examples/l3fwd-power/main.c                   |  46 ++-
> >  lib/librte_eal/arm/include/meson.build        |   1 +
> >  .../arm/include/rte_power_intrinsics.h        |  60 ++++
> >  lib/librte_eal/arm/rte_cpuflags.c             |   6 +
> >  lib/librte_eal/include/generic/rte_cpuflags.h |  26 ++
> >  .../include/generic/rte_power_intrinsics.h    | 123 +++++++
> >  lib/librte_eal/include/meson.build            |   1 +
> >  lib/librte_eal/ppc/include/meson.build        |   1 +
> >  .../ppc/include/rte_power_intrinsics.h        |  60 ++++
> >  lib/librte_eal/ppc/rte_cpuflags.c             |   7 +
> >  lib/librte_eal/version.map                    |   1 +
> >  lib/librte_eal/x86/include/meson.build        |   1 +
> >  lib/librte_eal/x86/include/rte_cpuflags.h     |   1 +
> >  .../x86/include/rte_power_intrinsics.h        | 135 ++++++++
> >  lib/librte_eal/x86/rte_cpuflags.c             |  14 +
> >  lib/librte_ethdev/rte_ethdev.c                |  23 ++
> >  lib/librte_ethdev/rte_ethdev.h                |  33 ++
> >  lib/librte_ethdev/rte_ethdev_driver.h         |  28 ++
> >  lib/librte_ethdev/version.map                 |   1 +
> >  lib/librte_power/meson.build                  |   5 +-
> >  lib/librte_power/rte_power_pmd_mgmt.c         | 320 ++++++++++++++++++
> >  lib/librte_power/rte_power_pmd_mgmt.h         |  92 +++++
> >  lib/librte_power/version.map                  |   4 +
> >  35 files changed, 1148 insertions(+), 3 deletions(-)
> >  create mode 100644 lib/librte_eal/arm/include/rte_power_intrinsics.h
> >  create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h
> >  create mode 100644 lib/librte_eal/ppc/include/rte_power_intrinsics.h
> >  create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h
> >  create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
> >  create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
> >
> > --
> > 2.17.1
> >
  
Liang, Ma Oct. 28, 2020, 1:35 p.m. UTC | #4
Hi Thomas,
   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible. This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?

Regards
Liang

On 27 Oct 17:02, Thomas Monjalon wrote:
> Please be more patient.
> I asked some questions on v9 (ethdev API is not generic enough),
> and you send a v10 in the same minute you reply,
> without making sure I agree with your answers.
> 
> 
>
  
Jerin Jacob Oct. 28, 2020, 1:49 p.m. UTC | #5
On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
>
> Hi Thomas,
>   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.

I think, From the architecture point of view, the specific
functionally of UMONITOR may not be abstracted.
But from the ethdev callback point of view, Can it be abstracted in
such a way that packet notification available through
checking interrupt status register or ring descriptor location, etc by
the driver. Use that callback as a notification mechanism rather
than defining a memory-based scheme that UMONITOR expects? or similar
thoughts on abstraction.


> This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
>
> Regards
> Liang
>
> On 27 Oct 17:02, Thomas Monjalon wrote:
> > Please be more patient.
> > I asked some questions on v9 (ethdev API is not generic enough),
> > and you send a v10 in the same minute you reply,
> > without making sure I agree with your answers.
> >
> >
> >
  
Thomas Monjalon Oct. 28, 2020, 2:21 p.m. UTC | #6
28/10/2020 14:49, Jerin Jacob:
> On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> >
> > Hi Thomas,
> >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> 
> I think, From the architecture point of view, the specific
> functionally of UMONITOR may not be abstracted.
> But from the ethdev callback point of view, Can it be abstracted in
> such a way that packet notification available through
> checking interrupt status register or ring descriptor location, etc by
> the driver. Use that callback as a notification mechanism rather
> than defining a memory-based scheme that UMONITOR expects? or similar
> thoughts on abstraction.

I agree with Jerin.
The ethdev API is the blocking problem.
First problem: it is not well explained in doxygen.
Second problem: it is probably not generic enough (if we understand it well)

> > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?

Being experimental is not an excuse to throw something
which is not satisfying.
  
Ananyev, Konstantin Oct. 28, 2020, 2:57 p.m. UTC | #7
> 28/10/2020 14:49, Jerin Jacob:
> > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > >
> > > Hi Thomas,
> > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> >
> > I think, From the architecture point of view, the specific
> > functionally of UMONITOR may not be abstracted.
> > But from the ethdev callback point of view, Can it be abstracted in
> > such a way that packet notification available through
> > checking interrupt status register or ring descriptor location, etc by
> > the driver. Use that callback as a notification mechanism rather
> > than defining a memory-based scheme that UMONITOR expects? or similar
> > thoughts on abstraction.

I think there is probably some sort of misunderstanding.
This API is not about providing acync notification when next packet arrives.
This is about to putting core to sleep till some event (or timeout) happens.
From my perspective the closest analogy: cond_timedwait().
So we need PMD to tell us what will be the address of the condition variable
we should sleep on.  

> I agree with Jerin.
> The ethdev API is the blocking problem.
> First problem: it is not well explained in doxygen.
> Second problem: it is probably not generic enough (if we understand it well)

It is an address to sleep(/wakeup) on, plus expected value.
Honestly, I can't think-up of anything even more generic then that. 
If you guys have something particular in mind - please share.

> 
> > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> 
> Being experimental is not an excuse to throw something
> which is not satisfying.
> 
>
  
Jerin Jacob Oct. 28, 2020, 3:14 p.m. UTC | #8
On Wed, Oct 28, 2020 at 8:27 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
>
>
> > 28/10/2020 14:49, Jerin Jacob:
> > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > >
> > > > Hi Thomas,
> > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > >
> > > I think, From the architecture point of view, the specific
> > > functionally of UMONITOR may not be abstracted.
> > > But from the ethdev callback point of view, Can it be abstracted in
> > > such a way that packet notification available through
> > > checking interrupt status register or ring descriptor location, etc by
> > > the driver. Use that callback as a notification mechanism rather
> > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > thoughts on abstraction.
>
> I think there is probably some sort of misunderstanding.
> This API is not about providing acync notification when next packet arrives.
> This is about to putting core to sleep till some event (or timeout) happens.
> From my perspective the closest analogy: cond_timedwait().
> So we need PMD to tell us what will be the address of the condition variable
> we should sleep on.
>
> > I agree with Jerin.
> > The ethdev API is the blocking problem.
> > First problem: it is not well explained in doxygen.
> > Second problem: it is probably not generic enough (if we understand it well)
>
> It is an address to sleep(/wakeup) on, plus expected value.
> Honestly, I can't think-up of anything even more generic then that.
> If you guys have something particular in mind - please share.

Current PMD callback:
typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
**tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
*data_sz);

Can we make it as
typedef void (*core_sleep_t)(void *rxq)

if we do such abstraction and "move the polling on memory by HW/CPU"
to the driver using a helper function then
I can think of abstracting in some way in all PMDs.

Note: core_sleep_t can take some more arguments such as enumerated
policy if something more needs to be pushed to the driver.

Thoughts?

>
> >
> > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> >
> > Being experimental is not an excuse to throw something
> > which is not satisfying.
> >
> >
>
  
Liang, Ma Oct. 28, 2020, 3:30 p.m. UTC | #9
On 28 Oct 20:44, Jerin Jacob wrote:
> On Wed, Oct 28, 2020 at 8:27 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> >
> >
> >
> > > 28/10/2020 14:49, Jerin Jacob:
> > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > >
> > > > > Hi Thomas,
> > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > >
> > > > I think, From the architecture point of view, the specific
> > > > functionally of UMONITOR may not be abstracted.
> > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > such a way that packet notification available through
> > > > checking interrupt status register or ring descriptor location, etc by
> > > > the driver. Use that callback as a notification mechanism rather
> > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > thoughts on abstraction.
> >
> > I think there is probably some sort of misunderstanding.
> > This API is not about providing acync notification when next packet arrives.
> > This is about to putting core to sleep till some event (or timeout) happens.
> > From my perspective the closest analogy: cond_timedwait().
> > So we need PMD to tell us what will be the address of the condition variable
> > we should sleep on.
> >
> > > I agree with Jerin.
> > > The ethdev API is the blocking problem.
> > > First problem: it is not well explained in doxygen.
> > > Second problem: it is probably not generic enough (if we understand it well)
> >
> > It is an address to sleep(/wakeup) on, plus expected value.
> > Honestly, I can't think-up of anything even more generic then that.
> > If you guys have something particular in mind - please share.
> 
> Current PMD callback:
> typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> *data_sz);
> 
> Can we make it as
> typedef void (*core_sleep_t)(void *rxq)
How about void (*eth_core_sleep_helper_t)(void *rxq, enum scheme, void *paramter)
by this way, PMD can cast the parameter accorind to the scheme. 
e.g.  if scheme  MEM_MONITOR then cast to umwait way. 
however, this will introduce another problem.
we need add PMD query callback to figure out if PMD support this scheme.
> 
> if we do such abstraction and "move the polling on memory by HW/CPU"
> to the driver using a helper function then
> I can think of abstracting in some way in all PMDs.
> 
> Note: core_sleep_t can take some more arguments such as enumerated
> policy if something more needs to be pushed to the driver.
> 
> Thoughts?
> 
> >
> > >
> > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > >
> > > Being experimental is not an excuse to throw something
> > > which is not satisfying.
> > >
> > >
> >
  
Ananyev, Konstantin Oct. 28, 2020, 3:33 p.m. UTC | #10
> > > 28/10/2020 14:49, Jerin Jacob:
> > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > >
> > > > > Hi Thomas,
> > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > >
> > > > I think, From the architecture point of view, the specific
> > > > functionally of UMONITOR may not be abstracted.
> > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > such a way that packet notification available through
> > > > checking interrupt status register or ring descriptor location, etc by
> > > > the driver. Use that callback as a notification mechanism rather
> > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > thoughts on abstraction.
> >
> > I think there is probably some sort of misunderstanding.
> > This API is not about providing acync notification when next packet arrives.
> > This is about to putting core to sleep till some event (or timeout) happens.
> > From my perspective the closest analogy: cond_timedwait().
> > So we need PMD to tell us what will be the address of the condition variable
> > we should sleep on.
> >
> > > I agree with Jerin.
> > > The ethdev API is the blocking problem.
> > > First problem: it is not well explained in doxygen.
> > > Second problem: it is probably not generic enough (if we understand it well)
> >
> > It is an address to sleep(/wakeup) on, plus expected value.
> > Honestly, I can't think-up of anything even more generic then that.
> > If you guys have something particular in mind - please share.
> 
> Current PMD callback:
> typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> *data_sz);
> 
> Can we make it as
> typedef void (*core_sleep_t)(void *rxq)
> 
> if we do such abstraction and "move the polling on memory by HW/CPU"
> to the driver using a helper function then
> I can think of abstracting in some way in all PMDs.

Ok I see, thanks for explanation.
From my perspective main disadvantage of such approach -
it can't be extended easily.
If/when will have an ability for core to sleep/wake-up on multiple events
(multiple addresses) will have to either rework that API again. 

> 
> Note: core_sleep_t can take some more arguments such as enumerated
> policy if something more needs to be pushed to the driver.
> 
> Thoughts?
> 
> >
> > >
> > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > >
> > > Being experimental is not an excuse to throw something
> > > which is not satisfying.
> > >
> > >
> >
  
Jerin Jacob Oct. 28, 2020, 3:36 p.m. UTC | #11
On Wed, Oct 28, 2020 at 9:00 PM Liang, Ma <liang.j.ma@intel.com> wrote:
>
> On 28 Oct 20:44, Jerin Jacob wrote:
> > On Wed, Oct 28, 2020 at 8:27 PM Ananyev, Konstantin
> > <konstantin.ananyev@intel.com> wrote:
> > >
> > >
> > >
> > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > >
> > > > > > Hi Thomas,
> > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > >
> > > > > I think, From the architecture point of view, the specific
> > > > > functionally of UMONITOR may not be abstracted.
> > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > such a way that packet notification available through
> > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > the driver. Use that callback as a notification mechanism rather
> > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > thoughts on abstraction.
> > >
> > > I think there is probably some sort of misunderstanding.
> > > This API is not about providing acync notification when next packet arrives.
> > > This is about to putting core to sleep till some event (or timeout) happens.
> > > From my perspective the closest analogy: cond_timedwait().
> > > So we need PMD to tell us what will be the address of the condition variable
> > > we should sleep on.
> > >
> > > > I agree with Jerin.
> > > > The ethdev API is the blocking problem.
> > > > First problem: it is not well explained in doxygen.
> > > > Second problem: it is probably not generic enough (if we understand it well)
> > >
> > > It is an address to sleep(/wakeup) on, plus expected value.
> > > Honestly, I can't think-up of anything even more generic then that.
> > > If you guys have something particular in mind - please share.
> >
> > Current PMD callback:
> > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > *data_sz);
> >
> > Can we make it as
> > typedef void (*core_sleep_t)(void *rxq)
> How about void (*eth_core_sleep_helper_t)(void *rxq, enum scheme, void *paramter)
> by this way, PMD can cast the parameter accorind to the scheme.
> e.g.  if scheme  MEM_MONITOR then cast to umwait way.
> however, this will introduce another problem.
> we need add PMD query callback to figure out if PMD support this scheme.

I thought scheme/policy something that "application cares" like below
not arch specifics
1) wakeup up on first packet,
2) wake me up on first packet or timeout 100 us etc

Yes. We can have query on type of the policies supported.


> >
> > if we do such abstraction and "move the polling on memory by HW/CPU"
> > to the driver using a helper function then
> > I can think of abstracting in some way in all PMDs.
> >
> > Note: core_sleep_t can take some more arguments such as enumerated
> > policy if something more needs to be pushed to the driver.
> >
> > Thoughts?
> >
> > >
> > > >
> > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > >
> > > > Being experimental is not an excuse to throw something
> > > > which is not satisfying.
> > > >
> > > >
> > >
  
Jerin Jacob Oct. 28, 2020, 3:39 p.m. UTC | #12
On Wed, Oct 28, 2020 at 9:04 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
>
>
> > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > >
> > > > > > Hi Thomas,
> > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > >
> > > > > I think, From the architecture point of view, the specific
> > > > > functionally of UMONITOR may not be abstracted.
> > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > such a way that packet notification available through
> > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > the driver. Use that callback as a notification mechanism rather
> > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > thoughts on abstraction.
> > >
> > > I think there is probably some sort of misunderstanding.
> > > This API is not about providing acync notification when next packet arrives.
> > > This is about to putting core to sleep till some event (or timeout) happens.
> > > From my perspective the closest analogy: cond_timedwait().
> > > So we need PMD to tell us what will be the address of the condition variable
> > > we should sleep on.
> > >
> > > > I agree with Jerin.
> > > > The ethdev API is the blocking problem.
> > > > First problem: it is not well explained in doxygen.
> > > > Second problem: it is probably not generic enough (if we understand it well)
> > >
> > > It is an address to sleep(/wakeup) on, plus expected value.
> > > Honestly, I can't think-up of anything even more generic then that.
> > > If you guys have something particular in mind - please share.
> >
> > Current PMD callback:
> > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > *data_sz);
> >
> > Can we make it as
> > typedef void (*core_sleep_t)(void *rxq)
> >
> > if we do such abstraction and "move the polling on memory by HW/CPU"
> > to the driver using a helper function then
> > I can think of abstracting in some way in all PMDs.
>
> Ok I see, thanks for explanation.
> From my perspective main disadvantage of such approach -
> it can't be extended easily.
> If/when will have an ability for core to sleep/wake-up on multiple events
> (multiple addresses) will have to either rework that API again.

I think, we can enumerate the policies and pass the associated
structures as input to the driver.


>
> >
> > Note: core_sleep_t can take some more arguments such as enumerated
> > policy if something more needs to be pushed to the driver.
> >
> > Thoughts?
> >
> > >
> > > >
> > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > >
> > > > Being experimental is not an excuse to throw something
> > > > which is not satisfying.
> > > >
> > > >
> > >
  
Liang, Ma Oct. 28, 2020, 3:44 p.m. UTC | #13
On 28 Oct 21:06, Jerin Jacob wrote:
> On Wed, Oct 28, 2020 at 9:00 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> >
> > On 28 Oct 20:44, Jerin Jacob wrote:
> > > On Wed, Oct 28, 2020 at 8:27 PM Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > >
> > > > > > > Hi Thomas,
> > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > >
> > > > > > I think, From the architecture point of view, the specific
> > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > such a way that packet notification available through
> > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > thoughts on abstraction.
> > > >
> > > > I think there is probably some sort of misunderstanding.
> > > > This API is not about providing acync notification when next packet arrives.
> > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > From my perspective the closest analogy: cond_timedwait().
> > > > So we need PMD to tell us what will be the address of the condition variable
> > > > we should sleep on.
> > > >
> > > > > I agree with Jerin.
> > > > > The ethdev API is the blocking problem.
> > > > > First problem: it is not well explained in doxygen.
> > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > >
> > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > Honestly, I can't think-up of anything even more generic then that.
> > > > If you guys have something particular in mind - please share.
> > >
> > > Current PMD callback:
> > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > *data_sz);
> > >
> > > Can we make it as
> > > typedef void (*core_sleep_t)(void *rxq)
> > How about void (*eth_core_sleep_helper_t)(void *rxq, enum scheme, void *paramter)
> > by this way, PMD can cast the parameter accorind to the scheme.
> > e.g.  if scheme  MEM_MONITOR then cast to umwait way.
> > however, this will introduce another problem.
> > we need add PMD query callback to figure out if PMD support this scheme.
> 
> I thought scheme/policy something that "application cares" like below
> not arch specifics
> 1) wakeup up on first packet,
> 2) wake me up on first packet or timeout 100 us etc
I need clarify about current design a bit. the purposed API just get target address.
the API itself(include the PMD callback) will not demand the processor to idle/sleep.
we use the post rx callback to do that. for ethdev layer, this API only is used  to
fetch target address from PMD, which make minmal impact to existing code.

> Yes. We can have query on type of the policies supported.
> 
> 
> > >
> > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > to the driver using a helper function then
> > > I can think of abstracting in some way in all PMDs.
> > >
> > > Note: core_sleep_t can take some more arguments such as enumerated
> > > policy if something more needs to be pushed to the driver.
> > >
> > > Thoughts?
> > >
> > > >
> > > > >
> > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > >
> > > > > Being experimental is not an excuse to throw something
> > > > > which is not satisfying.
> > > > >
> > > > >
> > > >
  
Ananyev, Konstantin Oct. 28, 2020, 3:49 p.m. UTC | #14
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, October 28, 2020 3:40 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Ma, Liang J <liang.j.ma@intel.com>; dpdk-dev <dev@dpdk.org>; Ruifeng Wang (Arm
> Technology China) <ruifeng.wang@arm.com>; Wang, Haiyue <haiyue.wang@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>; Neil Horman <nhorman@tuxdriver.com>; McDaniel, Timothy
> <timothy.mcdaniel@intel.com>; Eads, Gage <gage.eads@intel.com>; Marcin Wojtas <mw@semihalf.com>; Guy Tzalik
> <gtzalik@amazon.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Harman Kalra <hkalra@marvell.com>; John Daley
> <johndale@cisco.com>; Wei Hu (Xavier <xavier.huwei@huawei.com>; Ziyang Xuan <xuanziyang2@huawei.com>; matan@nvidia.com; Yong
> Wang <yongwang@vmware.com>; david.marchand@redhat.com
> Subject: Re: [PATCH v10 0/9] Add PMD power mgmt
> 
> On Wed, Oct 28, 2020 at 9:04 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> >
> >
> >
> > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > >
> > > > > > > Hi Thomas,
> > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > >
> > > > > > I think, From the architecture point of view, the specific
> > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > such a way that packet notification available through
> > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > thoughts on abstraction.
> > > >
> > > > I think there is probably some sort of misunderstanding.
> > > > This API is not about providing acync notification when next packet arrives.
> > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > From my perspective the closest analogy: cond_timedwait().
> > > > So we need PMD to tell us what will be the address of the condition variable
> > > > we should sleep on.
> > > >
> > > > > I agree with Jerin.
> > > > > The ethdev API is the blocking problem.
> > > > > First problem: it is not well explained in doxygen.
> > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > >
> > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > Honestly, I can't think-up of anything even more generic then that.
> > > > If you guys have something particular in mind - please share.
> > >
> > > Current PMD callback:
> > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > *data_sz);
> > >
> > > Can we make it as
> > > typedef void (*core_sleep_t)(void *rxq)
> > >
> > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > to the driver using a helper function then
> > > I can think of abstracting in some way in all PMDs.
> >
> > Ok I see, thanks for explanation.
> > From my perspective main disadvantage of such approach -
> > it can't be extended easily.
> > If/when will have an ability for core to sleep/wake-up on multiple events
> > (multiple addresses) will have to either rework that API again.
> 
> I think, we can enumerate the policies and pass the associated
> structures as input to the driver.

What I am trying to say: with that API we will not be able to wait
for events from multiple devices (HW queues).
I.E. something like that:

get_wake_addr(port=X, ..., &addr[0], ...);
get_wake_addr(port=Y,..., &addr[1],...);
wait_on_multi(addr, 2); 

wouldn't be possible.

> 
> 
> >
> > >
> > > Note: core_sleep_t can take some more arguments such as enumerated
> > > policy if something more needs to be pushed to the driver.
> > >
> > > Thoughts?
> > >
> > > >
> > > > >
> > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > >
> > > > > Being experimental is not an excuse to throw something
> > > > > which is not satisfying.
> > > > >
> > > > >
> > > >
  
Jerin Jacob Oct. 28, 2020, 3:57 p.m. UTC | #15
On Wed, Oct 28, 2020 at 9:19 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, October 28, 2020 3:40 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Thomas Monjalon <thomas@monjalon.net>; Ma, Liang J <liang.j.ma@intel.com>; dpdk-dev <dev@dpdk.org>; Ruifeng Wang (Arm
> > Technology China) <ruifeng.wang@arm.com>; Wang, Haiyue <haiyue.wang@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>; Neil Horman <nhorman@tuxdriver.com>; McDaniel, Timothy
> > <timothy.mcdaniel@intel.com>; Eads, Gage <gage.eads@intel.com>; Marcin Wojtas <mw@semihalf.com>; Guy Tzalik
> > <gtzalik@amazon.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Harman Kalra <hkalra@marvell.com>; John Daley
> > <johndale@cisco.com>; Wei Hu (Xavier <xavier.huwei@huawei.com>; Ziyang Xuan <xuanziyang2@huawei.com>; matan@nvidia.com; Yong
> > Wang <yongwang@vmware.com>; david.marchand@redhat.com
> > Subject: Re: [PATCH v10 0/9] Add PMD power mgmt
> >
> > On Wed, Oct 28, 2020 at 9:04 PM Ananyev, Konstantin
> > <konstantin.ananyev@intel.com> wrote:
> > >
> > >
> > >
> > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > >
> > > > > > > > Hi Thomas,
> > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > >
> > > > > > > I think, From the architecture point of view, the specific
> > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > such a way that packet notification available through
> > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > thoughts on abstraction.
> > > > >
> > > > > I think there is probably some sort of misunderstanding.
> > > > > This API is not about providing acync notification when next packet arrives.
> > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > we should sleep on.
> > > > >
> > > > > > I agree with Jerin.
> > > > > > The ethdev API is the blocking problem.
> > > > > > First problem: it is not well explained in doxygen.
> > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > >
> > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > If you guys have something particular in mind - please share.
> > > >
> > > > Current PMD callback:
> > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > *data_sz);
> > > >
> > > > Can we make it as
> > > > typedef void (*core_sleep_t)(void *rxq)
> > > >
> > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > to the driver using a helper function then
> > > > I can think of abstracting in some way in all PMDs.
> > >
> > > Ok I see, thanks for explanation.
> > > From my perspective main disadvantage of such approach -
> > > it can't be extended easily.
> > > If/when will have an ability for core to sleep/wake-up on multiple events
> > > (multiple addresses) will have to either rework that API again.
> >
> > I think, we can enumerate the policies and pass the associated
> > structures as input to the driver.
>
> What I am trying to say: with that API we will not be able to wait
> for events from multiple devices (HW queues).
> I.E. something like that:
>
> get_wake_addr(port=X, ..., &addr[0], ...);
> get_wake_addr(port=Y,..., &addr[1],...);
> wait_on_multi(addr, 2);
>
> wouldn't be possible.

I see. But the current implementation dictates the only queue bound to
a core. Right?


>
> >
> >
> > >
> > > >
> > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > policy if something more needs to be pushed to the driver.
> > > >
> > > > Thoughts?
> > > >
> > > > >
> > > > > >
> > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > >
> > > > > > Being experimental is not an excuse to throw something
> > > > > > which is not satisfying.
> > > > > >
> > > > > >
> > > > >
  
Jerin Jacob Oct. 28, 2020, 4:01 p.m. UTC | #16
On Wed, Oct 28, 2020 at 9:14 PM Liang, Ma <liang.j.ma@intel.com> wrote:
>
> On 28 Oct 21:06, Jerin Jacob wrote:
> > On Wed, Oct 28, 2020 at 9:00 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > >
> > > On 28 Oct 20:44, Jerin Jacob wrote:
> > > > On Wed, Oct 28, 2020 at 8:27 PM Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > >
> > > > > > > > Hi Thomas,
> > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > >
> > > > > > > I think, From the architecture point of view, the specific
> > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > such a way that packet notification available through
> > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > thoughts on abstraction.
> > > > >
> > > > > I think there is probably some sort of misunderstanding.
> > > > > This API is not about providing acync notification when next packet arrives.
> > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > we should sleep on.
> > > > >
> > > > > > I agree with Jerin.
> > > > > > The ethdev API is the blocking problem.
> > > > > > First problem: it is not well explained in doxygen.
> > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > >
> > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > If you guys have something particular in mind - please share.
> > > >
> > > > Current PMD callback:
> > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > *data_sz);
> > > >
> > > > Can we make it as
> > > > typedef void (*core_sleep_t)(void *rxq)
> > > How about void (*eth_core_sleep_helper_t)(void *rxq, enum scheme, void *paramter)
> > > by this way, PMD can cast the parameter accorind to the scheme.
> > > e.g.  if scheme  MEM_MONITOR then cast to umwait way.
> > > however, this will introduce another problem.
> > > we need add PMD query callback to figure out if PMD support this scheme.
> >
> > I thought scheme/policy something that "application cares" like below
> > not arch specifics
> > 1) wakeup up on first packet,
> > 2) wake me up on first packet or timeout 100 us etc
> I need clarify about current design a bit. the purposed API just get target address.
> the API itself(include the PMD callback) will not demand the processor to idle/sleep.
> we use the post rx callback to do that. for ethdev layer, this API only is used  to
> fetch target address from PMD, which make minmal impact to existing code.

I understand that. But if we move that logic to common code in ethdev
as a set of internal
PMD helper functions(Helper functions for class of devices and/or
arch) then it should be
possible. Right?

I do understand that, It will call for some rework in the code. I will
leave up to ethdev maintainers
on specifics.


>
> > Yes. We can have query on type of the policies supported.
> >
> >
> > > >
> > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > to the driver using a helper function then
> > > > I can think of abstracting in some way in all PMDs.
> > > >
> > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > policy if something more needs to be pushed to the driver.
> > > >
> > > > Thoughts?
> > > >
> > > > >
> > > > > >
> > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > >
> > > > > > Being experimental is not an excuse to throw something
> > > > > > which is not satisfying.
> > > > > >
> > > > > >
> > > > >
  
Liang, Ma Oct. 28, 2020, 4:21 p.m. UTC | #17
On 28 Oct 21:31, Jerin Jacob wrote:
> On Wed, Oct 28, 2020 at 9:14 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> >
> > On 28 Oct 21:06, Jerin Jacob wrote:
> > > On Wed, Oct 28, 2020 at 9:00 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > >
> > > > On 28 Oct 20:44, Jerin Jacob wrote:
> > > > > On Wed, Oct 28, 2020 at 8:27 PM Ananyev, Konstantin
> > > > > <konstantin.ananyev@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Thomas,
> > > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > > >
> > > > > > > > I think, From the architecture point of view, the specific
> > > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > > such a way that packet notification available through
> > > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > > thoughts on abstraction.
> > > > > >
> > > > > > I think there is probably some sort of misunderstanding.
> > > > > > This API is not about providing acync notification when next packet arrives.
> > > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > > we should sleep on.
> > > > > >
> > > > > > > I agree with Jerin.
> > > > > > > The ethdev API is the blocking problem.
> > > > > > > First problem: it is not well explained in doxygen.
> > > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > > >
> > > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > > If you guys have something particular in mind - please share.
> > > > >
> > > > > Current PMD callback:
> > > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > > *data_sz);
> > > > >
> > > > > Can we make it as
> > > > > typedef void (*core_sleep_t)(void *rxq)
> > > > How about void (*eth_core_sleep_helper_t)(void *rxq, enum scheme, void *paramter)
> > > > by this way, PMD can cast the parameter accorind to the scheme.
> > > > e.g.  if scheme  MEM_MONITOR then cast to umwait way.
> > > > however, this will introduce another problem.
> > > > we need add PMD query callback to figure out if PMD support this scheme.
> > >
> > > I thought scheme/policy something that "application cares" like below
> > > not arch specifics
> > > 1) wakeup up on first packet,
> > > 2) wake me up on first packet or timeout 100 us etc
> > I need clarify about current design a bit. the purposed API just get target address.
> > the API itself(include the PMD callback) will not demand the processor to idle/sleep.
> > we use the post rx callback to do that. for ethdev layer, this API only is used  to
> > fetch target address from PMD, which make minmal impact to existing code.
> 
> I understand that. But if we move that logic to common code in ethdev
> as a set of internal
> PMD helper functions(Helper functions for class of devices and/or
> arch) then it should be
> possible. Right?
Although that's possible, but power manangment logic will be duplicated in all related PMD.
Advantage of current design is decouple power mgmt logic with PMD, just need minmal help
from PMD. there are 3 scheme already which are implemented as post rx callback. 
Put all power mgmt logic into PMD is way too heavy. Also current design is very easy to extend.
Any new scheme has no impact to ethdev layer unless it need some kind of help from PMD.
we have another 2 scheme, PAUSE and Freq Scale dont not need any PMD help.
> I do understand that, It will call for some rework in the code. I will
> leave up to ethdev maintainers
> on specifics.
> 
> 
> >
> > > Yes. We can have query on type of the policies supported.
> > >
> > >
> > > > >
> > > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > > to the driver using a helper function then
> > > > > I can think of abstracting in some way in all PMDs.
> > > > >
> > > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > > policy if something more needs to be pushed to the driver.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > > >
> > > > > > > Being experimental is not an excuse to throw something
> > > > > > > which is not satisfying.
> > > > > > >
> > > > > > >
> > > > > >
  
Ananyev, Konstantin Oct. 28, 2020, 4:38 p.m. UTC | #18
> 
> On Wed, Oct 28, 2020 at 9:19 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, October 28, 2020 3:40 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: Thomas Monjalon <thomas@monjalon.net>; Ma, Liang J <liang.j.ma@intel.com>; dpdk-dev <dev@dpdk.org>; Ruifeng Wang (Arm
> > > Technology China) <ruifeng.wang@arm.com>; Wang, Haiyue <haiyue.wang@intel.com>; Richardson, Bruce
> > > <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>; Neil Horman <nhorman@tuxdriver.com>; McDaniel, Timothy
> > > <timothy.mcdaniel@intel.com>; Eads, Gage <gage.eads@intel.com>; Marcin Wojtas <mw@semihalf.com>; Guy Tzalik
> > > <gtzalik@amazon.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Harman Kalra <hkalra@marvell.com>; John Daley
> > > <johndale@cisco.com>; Wei Hu (Xavier <xavier.huwei@huawei.com>; Ziyang Xuan <xuanziyang2@huawei.com>; matan@nvidia.com;
> Yong
> > > Wang <yongwang@vmware.com>; david.marchand@redhat.com
> > > Subject: Re: [PATCH v10 0/9] Add PMD power mgmt
> > >
> > > On Wed, Oct 28, 2020 at 9:04 PM Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Thomas,
> > > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From
> the
> > > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > > >
> > > > > > > > I think, From the architecture point of view, the specific
> > > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > > such a way that packet notification available through
> > > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > > thoughts on abstraction.
> > > > > >
> > > > > > I think there is probably some sort of misunderstanding.
> > > > > > This API is not about providing acync notification when next packet arrives.
> > > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > > we should sleep on.
> > > > > >
> > > > > > > I agree with Jerin.
> > > > > > > The ethdev API is the blocking problem.
> > > > > > > First problem: it is not well explained in doxygen.
> > > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > > >
> > > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > > If you guys have something particular in mind - please share.
> > > > >
> > > > > Current PMD callback:
> > > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > > *data_sz);
> > > > >
> > > > > Can we make it as
> > > > > typedef void (*core_sleep_t)(void *rxq)
> > > > >
> > > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > > to the driver using a helper function then
> > > > > I can think of abstracting in some way in all PMDs.
> > > >
> > > > Ok I see, thanks for explanation.
> > > > From my perspective main disadvantage of such approach -
> > > > it can't be extended easily.
> > > > If/when will have an ability for core to sleep/wake-up on multiple events
> > > > (multiple addresses) will have to either rework that API again.
> > >
> > > I think, we can enumerate the policies and pass the associated
> > > structures as input to the driver.
> >
> > What I am trying to say: with that API we will not be able to wait
> > for events from multiple devices (HW queues).
> > I.E. something like that:
> >
> > get_wake_addr(port=X, ..., &addr[0], ...);
> > get_wake_addr(port=Y,..., &addr[1],...);
> > wait_on_multi(addr, 2);
> >
> > wouldn't be possible.
> 
> I see. But the current implementation dictates the only queue bound to
> a core. Right?

Yes, current implementation of rte_power_monitor() supports only single address.
Though proposed API for both ethdev (get_wake_addr) and
power(rte_power_pmd_mgmt_queue_enable) don't dictate
one to one mapping as the only possible usage model.
 
> 
> 
> >
> > >
> > >
> > > >
> > > > >
> > > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > > policy if something more needs to be pushed to the driver.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > > >
> > > > > > > Being experimental is not an excuse to throw something
> > > > > > > which is not satisfying.
> > > > > > >
> > > > > > >
> > > > > >
  
Liang, Ma Oct. 28, 2020, 4:47 p.m. UTC | #19
On 28 Oct 21:27, Jerin Jacob wrote:
> On Wed, Oct 28, 2020 at 9:19 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> > > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Thomas,
> > > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > > >
> > > > > > > > I think, From the architecture point of view, the specific
> > > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > > such a way that packet notification available through
> > > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > > thoughts on abstraction.
> > > > > >
> > > > > > I think there is probably some sort of misunderstanding.
> > > > > > This API is not about providing acync notification when next packet arrives.
> > > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > > we should sleep on.
> > > > > >
> > > > > > > I agree with Jerin.
> > > > > > > The ethdev API is the blocking problem.
> > > > > > > First problem: it is not well explained in doxygen.
> > > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > > >
> > > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > > If you guys have something particular in mind - please share.
> > > > >
> > > > > Current PMD callback:
> > > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > > *data_sz);
> > > > >
> > > > > Can we make it as
> > > > > typedef void (*core_sleep_t)(void *rxq)
> > > > >
> > > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > > to the driver using a helper function then
> > > > > I can think of abstracting in some way in all PMDs.
> > > >
> > > > Ok I see, thanks for explanation.
> > > > From my perspective main disadvantage of such approach -
> > > > it can't be extended easily.
> > > > If/when will have an ability for core to sleep/wake-up on multiple events
> > > > (multiple addresses) will have to either rework that API again.
> > >
> > > I think, we can enumerate the policies and pass the associated
> > > structures as input to the driver.
> >
> > What I am trying to say: with that API we will not be able to wait
> > for events from multiple devices (HW queues).
> > I.E. something like that:
> >
> > get_wake_addr(port=X, ..., &addr[0], ...);
> > get_wake_addr(port=Y,..., &addr[1],...);
> > wait_on_multi(addr, 2);
> >
> > wouldn't be possible.
> 
> I see. But the current implementation dictates the only queue bound to
> a core. Right?
Current implementation only support 1:1 queue/core mapping is because of 
the limitation of umwait/umonitor which can not work with multiple address
range. However, for other scheme like PASUE/Freq Scale have no such limitation. 
The proposed API itself doesn't limit the 1:1 queue/core mapping. 
> 
> 
> >
> > >
> > >
> > > >
> > > > >
> > > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > > policy if something more needs to be pushed to the driver.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > > >
> > > > > > > Being experimental is not an excuse to throw something
> > > > > > > which is not satisfying.
> > > > > > >
> > > > > > >
> > > > > >
  
Timothy McDaniel Oct. 28, 2020, 4:54 p.m. UTC | #20
> -----Original Message-----
> From: Ma, Liang J <liang.j.ma@intel.com>
> Sent: Wednesday, October 28, 2020 11:48 AM
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>; dpdk-dev <dev@dpdk.org>; Ruifeng Wang (Arm
> Technology China) <ruifeng.wang@arm.com>; Wang, Haiyue
> <haiyue.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Hunt, David <david.hunt@intel.com>; Neil Horman <nhorman@tuxdriver.com>;
> McDaniel, Timothy <timothy.mcdaniel@intel.com>; Eads, Gage
> <gage.eads@intel.com>; Marcin Wojtas <mw@semihalf.com>; Guy Tzalik
> <gtzalik@amazon.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>;
> Harman Kalra <hkalra@marvell.com>; John Daley <johndale@cisco.com>; Wei
> Hu (Xavier <xavier.huwei@huawei.com>; Ziyang Xuan
> <xuanziyang2@huawei.com>; matan@nvidia.com; Yong Wang
> <yongwang@vmware.com>; david.marchand@redhat.com
> Subject: Re: [PATCH v10 0/9] Add PMD power mgmt
> 
> On 28 Oct 21:27, Jerin Jacob wrote:
> > On Wed, Oct 28, 2020 at 9:19 PM Ananyev, Konstantin
> > <konstantin.ananyev@intel.com> wrote:
> > > > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma
> <liang.j.ma@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Thomas,
> > > > > > > > > >   I think I addressed all of the questions in relation to V9. I don't
> think I can solve the issue of a generic API on my own. From the
> > > > > > > > Community Call last week Jerin also said that a generic was
> investigated but that a single solution wasn't feasible.
> > > > > > > > >
> > > > > > > > > I think, From the architecture point of view, the specific
> > > > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > > > such a way that packet notification available through
> > > > > > > > > checking interrupt status register or ring descriptor location, etc
> by
> > > > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > > > than defining a memory-based scheme that UMONITOR expects?
> or similar
> > > > > > > > > thoughts on abstraction.
> > > > > > >
> > > > > > > I think there is probably some sort of misunderstanding.
> > > > > > > This API is not about providing acync notification when next packet
> arrives.
> > > > > > > This is about to putting core to sleep till some event (or timeout)
> happens.
> > > > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > > > So we need PMD to tell us what will be the address of the condition
> variable
> > > > > > > we should sleep on.
> > > > > > >
> > > > > > > > I agree with Jerin.
> > > > > > > > The ethdev API is the blocking problem.
> > > > > > > > First problem: it is not well explained in doxygen.
> > > > > > > > Second problem: it is probably not generic enough (if we understand
> it well)
> > > > > > >
> > > > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > > > If you guys have something particular in mind - please share.
> > > > > >
> > > > > > Current PMD callback:
> > > > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > > > *data_sz);
> > > > > >
> > > > > > Can we make it as
> > > > > > typedef void (*core_sleep_t)(void *rxq)
> > > > > >
> > > > > > if we do such abstraction and "move the polling on memory by
> HW/CPU"
> > > > > > to the driver using a helper function then
> > > > > > I can think of abstracting in some way in all PMDs.
> > > > >
> > > > > Ok I see, thanks for explanation.
> > > > > From my perspective main disadvantage of such approach -
> > > > > it can't be extended easily.
> > > > > If/when will have an ability for core to sleep/wake-up on multiple events
> > > > > (multiple addresses) will have to either rework that API again.
> > > >
> > > > I think, we can enumerate the policies and pass the associated
> > > > structures as input to the driver.
> > >
> > > What I am trying to say: with that API we will not be able to wait
> > > for events from multiple devices (HW queues).
> > > I.E. something like that:
> > >
> > > get_wake_addr(port=X, ..., &addr[0], ...);
> > > get_wake_addr(port=Y,..., &addr[1],...);
> > > wait_on_multi(addr, 2);
> > >
> > > wouldn't be possible.
> >
> > I see. But the current implementation dictates the only queue bound to
> > a core. Right?
> Current implementation only support 1:1 queue/core mapping is because of
> the limitation of umwait/umonitor which can not work with multiple address
> range. However, for other scheme like PASUE/Freq Scale have no such
> limitation.
> The proposed API itself doesn't limit the 1:1 queue/core mapping.
> >
> >
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Note: core_sleep_t can take some more arguments such as
> enumerated
> > > > > > policy if something more needs to be pushed to the driver.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > > This API is experimental and other vendor support can be added
> as needed. If there are any other open issue let me know?
> > > > > > > >
> > > > > > > > Being experimental is not an excuse to throw something
> > > > > > > > which is not satisfying.
> > > > > > > >
> > > > > > > >
> > > > > > >

It would be nice if the low level definition of the UMWAIT and UMONOTOR instructions were split out
into their own inline function or macro so that any PMD could use the intrinsic without being tied to ethdev or
any of the other logic associated with this patch set.  This would be similar to rte_wmb, and so on
  
Ajit Khaparde Oct. 28, 2020, 5:02 p.m. UTC | #21
On Wed, Oct 28, 2020 at 9:47 AM Liang, Ma <liang.j.ma@intel.com> wrote:
>
> On 28 Oct 21:27, Jerin Jacob wrote:
> > On Wed, Oct 28, 2020 at 9:19 PM Ananyev, Konstantin
> > <konstantin.ananyev@intel.com> wrote:
> > > > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Thomas,
> > > > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own. From the
> > > > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > > > >
> > > > > > > > > I think, From the architecture point of view, the specific
> > > > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > > > such a way that packet notification available through
> > > > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > > > thoughts on abstraction.
> > > > > > >
> > > > > > > I think there is probably some sort of misunderstanding.
> > > > > > > This API is not about providing acync notification when next packet arrives.
> > > > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > > > we should sleep on.
> > > > > > >
> > > > > > > > I agree with Jerin.
> > > > > > > > The ethdev API is the blocking problem.
> > > > > > > > First problem: it is not well explained in doxygen.
> > > > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > > > >
> > > > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > > > If you guys have something particular in mind - please share.
> > > > > >
> > > > > > Current PMD callback:
> > > > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > > > *data_sz);
> > > > > >
> > > > > > Can we make it as
> > > > > > typedef void (*core_sleep_t)(void *rxq)
> > > > > >
> > > > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > > > to the driver using a helper function then
> > > > > > I can think of abstracting in some way in all PMDs.
> > > > >
> > > > > Ok I see, thanks for explanation.
> > > > > From my perspective main disadvantage of such approach -
> > > > > it can't be extended easily.
> > > > > If/when will have an ability for core to sleep/wake-up on multiple events
> > > > > (multiple addresses) will have to either rework that API again.
> > > >
> > > > I think, we can enumerate the policies and pass the associated
> > > > structures as input to the driver.
> > >
> > > What I am trying to say: with that API we will not be able to wait
> > > for events from multiple devices (HW queues).
> > > I.E. something like that:
> > >
> > > get_wake_addr(port=X, ..., &addr[0], ...);
> > > get_wake_addr(port=Y,..., &addr[1],...);
> > > wait_on_multi(addr, 2);
> > >
> > > wouldn't be possible.
> >
> > I see. But the current implementation dictates the only queue bound to
> > a core. Right?
> Current implementation only support 1:1 queue/core mapping is because of
> the limitation of umwait/umonitor which can not work with multiple address
> range. However, for other scheme like PASUE/Freq Scale have no such limitation.
> The proposed API itself doesn't limit the 1:1 queue/core mapping.

The PMD would not know if it is 1:1 queue/core or any other shared scheme.
So the intelligence and decision making is best left to the application.
I think PMD and the underlying hardware does not need to know what kind of
power management scheme is implemented.
IMHO the original API which provides the address, value and mask should suffice.
Any other callback or handshake between PMD and application may be an overkill.

> >
> >
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > > > policy if something more needs to be pushed to the driver.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > > > >
> > > > > > > > Being experimental is not an excuse to throw something
> > > > > > > > which is not satisfying.
> > > > > > > >
> > > > > > > >
> > > > > > >
  
Ananyev, Konstantin Oct. 28, 2020, 6:10 p.m. UTC | #22
> -----Original Message-----
> From: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Sent: Wednesday, October 28, 2020 5:02 PM
> To: Ma, Liang J <liang.j.ma@intel.com>
> Cc: Jerin Jacob <jerinjacobk@gmail.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>; dpdk-dev <dev@dpdk.org>; Ruifeng Wang (Arm Technology China) <ruifeng.wang@arm.com>; Wang, Haiyue
> <haiyue.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>; Neil Horman
> <nhorman@tuxdriver.com>; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Eads, Gage <gage.eads@intel.com>; Marcin Wojtas
> <mw@semihalf.com>; Guy Tzalik <gtzalik@amazon.com>; Harman Kalra <hkalra@marvell.com>; John Daley <johndale@cisco.com>; Wei
> Hu (Xavier <xavier.huwei@huawei.com>; Ziyang Xuan <xuanziyang2@huawei.com>; matan@nvidia.com; Yong Wang
> <yongwang@vmware.com>; david.marchand@redhat.com
> Subject: Re: [PATCH v10 0/9] Add PMD power mgmt
> 
> On Wed, Oct 28, 2020 at 9:47 AM Liang, Ma <liang.j.ma@intel.com> wrote:
> >
> > On 28 Oct 21:27, Jerin Jacob wrote:
> > > On Wed, Oct 28, 2020 at 9:19 PM Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com> wrote:
> > > > > > > > > 28/10/2020 14:49, Jerin Jacob:
> > > > > > > > > > On Wed, Oct 28, 2020 at 7:05 PM Liang, Ma <liang.j.ma@intel.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Thomas,
> > > > > > > > > > >   I think I addressed all of the questions in relation to V9. I don't think I can solve the issue of a generic API on my own.
> From the
> > > > > > > > > Community Call last week Jerin also said that a generic was investigated but that a single solution wasn't feasible.
> > > > > > > > > >
> > > > > > > > > > I think, From the architecture point of view, the specific
> > > > > > > > > > functionally of UMONITOR may not be abstracted.
> > > > > > > > > > But from the ethdev callback point of view, Can it be abstracted in
> > > > > > > > > > such a way that packet notification available through
> > > > > > > > > > checking interrupt status register or ring descriptor location, etc by
> > > > > > > > > > the driver. Use that callback as a notification mechanism rather
> > > > > > > > > > than defining a memory-based scheme that UMONITOR expects? or similar
> > > > > > > > > > thoughts on abstraction.
> > > > > > > >
> > > > > > > > I think there is probably some sort of misunderstanding.
> > > > > > > > This API is not about providing acync notification when next packet arrives.
> > > > > > > > This is about to putting core to sleep till some event (or timeout) happens.
> > > > > > > > From my perspective the closest analogy: cond_timedwait().
> > > > > > > > So we need PMD to tell us what will be the address of the condition variable
> > > > > > > > we should sleep on.
> > > > > > > >
> > > > > > > > > I agree with Jerin.
> > > > > > > > > The ethdev API is the blocking problem.
> > > > > > > > > First problem: it is not well explained in doxygen.
> > > > > > > > > Second problem: it is probably not generic enough (if we understand it well)
> > > > > > > >
> > > > > > > > It is an address to sleep(/wakeup) on, plus expected value.
> > > > > > > > Honestly, I can't think-up of anything even more generic then that.
> > > > > > > > If you guys have something particular in mind - please share.
> > > > > > >
> > > > > > > Current PMD callback:
> > > > > > > typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void
> > > > > > > **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t
> > > > > > > *data_sz);
> > > > > > >
> > > > > > > Can we make it as
> > > > > > > typedef void (*core_sleep_t)(void *rxq)
> > > > > > >
> > > > > > > if we do such abstraction and "move the polling on memory by HW/CPU"
> > > > > > > to the driver using a helper function then
> > > > > > > I can think of abstracting in some way in all PMDs.
> > > > > >
> > > > > > Ok I see, thanks for explanation.
> > > > > > From my perspective main disadvantage of such approach -
> > > > > > it can't be extended easily.
> > > > > > If/when will have an ability for core to sleep/wake-up on multiple events
> > > > > > (multiple addresses) will have to either rework that API again.
> > > > >
> > > > > I think, we can enumerate the policies and pass the associated
> > > > > structures as input to the driver.
> > > >
> > > > What I am trying to say: with that API we will not be able to wait
> > > > for events from multiple devices (HW queues).
> > > > I.E. something like that:
> > > >
> > > > get_wake_addr(port=X, ..., &addr[0], ...);
> > > > get_wake_addr(port=Y,..., &addr[1],...);
> > > > wait_on_multi(addr, 2);
> > > >
> > > > wouldn't be possible.
> > >
> > > I see. But the current implementation dictates the only queue bound to
> > > a core. Right?
> > Current implementation only support 1:1 queue/core mapping is because of
> > the limitation of umwait/umonitor which can not work with multiple address
> > range. However, for other scheme like PASUE/Freq Scale have no such limitation.
> > The proposed API itself doesn't limit the 1:1 queue/core mapping.
> 
> The PMD would not know if it is 1:1 queue/core or any other shared scheme.
> So the intelligence and decision making is best left to the application.
> I think PMD and the underlying hardware does not need to know what kind of
> power management scheme is implemented.

Yep, good point. 100% agree.

> IMHO the original API which provides the address, value and mask should suffice.
> Any other callback or handshake between PMD and application may be an overkill.
> 
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Note: core_sleep_t can take some more arguments such as enumerated
> > > > > > > policy if something more needs to be pushed to the driver.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > > This API is experimental and other vendor support can be added as needed. If there are any other open issue let me know?
> > > > > > > > >
> > > > > > > > > Being experimental is not an excuse to throw something
> > > > > > > > > which is not satisfying.
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
  
Liang, Ma Oct. 29, 2020, 9:19 a.m. UTC | #23
On 28 Oct 09:54, McDaniel, Timothy wrote:
<snip>
> > > > > > > >
> > > > > > > >
> 
> It would be nice if the low level definition of the UMWAIT and UMONOTOR instructions were split out
> into their own inline function or macro so that any PMD could use the intrinsic without being tied to ethdev or
> any of the other logic associated with this patch set.  This would be similar to rte_wmb, and so on
Current patches split the intrinsics out. and already part of eal lib. any other PMD can re-use it easily.
> 
>
  
Thomas Monjalon Oct. 29, 2020, 5:42 p.m. UTC | #24
27/10/2020 15:59, Liang Ma:
> Liang Ma (9):
>   eal: add new x86 cpuid support for WAITPKG
>   eal: add power management intrinsics
>   eal: add intrinsics support check infrastructure

EAL patches applied, thanks.

>   ethdev: add simple power management API

Waiting for doxygen reword which may help to get acks for ethdev API.

>   power: add PMD power management API and callback
>   net/ixgbe: implement power management API
>   net/i40e: implement power management API
>   net/ice: implement power management API
>   examples/l3fwd-power: enable PMD power mgmt
  
Liang, Ma Oct. 30, 2020, 9:36 a.m. UTC | #25
On 29 Oct 18:42, Thomas Monjalon wrote:
> 27/10/2020 15:59, Liang Ma:
> > Liang Ma (9):
> >   eal: add new x86 cpuid support for WAITPKG
> >   eal: add power management intrinsics
> >   eal: add intrinsics support check infrastructure
> 
> EAL patches applied, thanks.
thanks!
> 
> >   ethdev: add simple power management API
> 
> Waiting for doxygen reword which may help to get acks for ethdev API.
we are working on this. I will send out v11 today.
> 
> >   power: add PMD power management API and callback
> >   net/ixgbe: implement power management API
> >   net/i40e: implement power management API
> >   net/ice: implement power management API
> >   examples/l3fwd-power: enable PMD power mgmt
> 
> 
>
  
Thomas Monjalon Oct. 30, 2020, 9:58 a.m. UTC | #26
30/10/2020 10:36, Liang, Ma:
> On 29 Oct 18:42, Thomas Monjalon wrote:
> > 27/10/2020 15:59, Liang Ma:
> > > Liang Ma (9):
> > >   eal: add new x86 cpuid support for WAITPKG
> > >   eal: add power management intrinsics
> > >   eal: add intrinsics support check infrastructure
> > 
> > EAL patches applied, thanks.
> thanks!
> > 
> > >   ethdev: add simple power management API
> > 
> > Waiting for doxygen reword which may help to get acks for ethdev API.
> we are working on this. I will send out v11 today.

David sent some interesting questions about the EAL API.
Maybe some explanations are missing, and we could improve the API.
Please reply, I would like to solve the EAL questions quickly.