[RFC,1/5] eventdev: add power monitoring API on event port

Message ID 20230419095427.563185-1-sivaprasad.tummala@amd.com (mailing list archive)
State Changes Requested, archived
Delegated to: Jerin Jacob
Headers
Series [RFC,1/5] eventdev: add power monitoring API on event port |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Sivaprasad Tummala April 19, 2023, 9:54 a.m. UTC
  A new API to allow power monitoring condition on event port to
optimize power when no events are arriving on an event port for
the worker core to process in an eventdev based pipelined application.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/eventdev/eventdev_pmd.h | 23 +++++++++++++++++++++++
 lib/eventdev/rte_eventdev.c | 24 ++++++++++++++++++++++++
 lib/eventdev/rte_eventdev.h | 25 +++++++++++++++++++++++++
 3 files changed, 72 insertions(+)
  

Comments

Jerin Jacob April 19, 2023, 10:15 a.m. UTC | #1
On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
<sivaprasad.tummala@amd.com> wrote:
>
> A new API to allow power monitoring condition on event port to
> optimize power when no events are arriving on an event port for
> the worker core to process in an eventdev based pipelined application.
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> + *
> + * @param dev_id
> + *   Eventdev id
> + * @param port_id
> + *   Eventdev port id
> + * @param pmc
> + *   The pointer to power-optimized monitoring condition structure.
> + *
> + * @return
> + *   - 0: Success.
> + *   -ENOTSUP: Operation not supported.
> + *   -EINVAL: Invalid parameters.
> + *   -ENODEV: Invalid device ID.
> + */
> +__rte_experimental
> +int
> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> +               struct rte_power_monitor_cond *pmc);

+ eventdev driver maintainers

I think, we don't need to expose this application due to applications
1)To make applications to be transparent whether power saving is enabled or not?
2)Some HW and Arch already supports power managent in driver and in HW
(Not using  CPU architecture directly)

If so, that will be translated to following,
a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
port_id, bool ena) for controlling power saving in slowpath.
b) Create reusable PMD private function based on the CPU architecture
power saving primitive to cover the PMD don't have native power saving
support.
c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).




> +
>  /**
>   * Start an event device.
>   *
> --
> 2.34.1
>
  
Ferruh Yigit April 24, 2023, 4:06 p.m. UTC | #2
On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> <sivaprasad.tummala@amd.com> wrote:
>>
>> A new API to allow power monitoring condition on event port to
>> optimize power when no events are arriving on an event port for
>> the worker core to process in an eventdev based pipelined application.
>>
>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>> + *
>> + * @param dev_id
>> + *   Eventdev id
>> + * @param port_id
>> + *   Eventdev port id
>> + * @param pmc
>> + *   The pointer to power-optimized monitoring condition structure.
>> + *
>> + * @return
>> + *   - 0: Success.
>> + *   -ENOTSUP: Operation not supported.
>> + *   -EINVAL: Invalid parameters.
>> + *   -ENODEV: Invalid device ID.
>> + */
>> +__rte_experimental
>> +int
>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>> +               struct rte_power_monitor_cond *pmc);
> 
> + eventdev driver maintainers
> 
> I think, we don't need to expose this application due to applications
> 1)To make applications to be transparent whether power saving is enabled or not?
> 2)Some HW and Arch already supports power managent in driver and in HW
> (Not using  CPU architecture directly)
> 
> If so, that will be translated to following,
> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> port_id, bool ena) for controlling power saving in slowpath.
> b) Create reusable PMD private function based on the CPU architecture
> power saving primitive to cover the PMD don't have native power saving
> support.
> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> 
> 

Hi Jerin,

ethdev approach seems applied here.

In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
'rte_eth_get_monitor_addr()'.

Although 'rte_eth_get_monitor_addr()' is public API, it is currently
only called from Rx/Tx callback functions implemented in the power library.
But I assume intention to make it public is to enable users to implement
their own callback functions that has custom algorithm for the power
management.

And probably same is true for the 'rte_event_port_get_monitor_addr()'.


Also instead of implementing power features for withing PMDs, isn't it
better to have a common eventdev layer for it?

For the PMDs benefit from HW event manager, just not implementing
.get_monitor_addr() dev_ops will make them free from power related APIs.
  
Jerin Jacob April 25, 2023, 4:09 a.m. UTC | #3
On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> > On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> > <sivaprasad.tummala@amd.com> wrote:
> >>
> >> A new API to allow power monitoring condition on event port to
> >> optimize power when no events are arriving on an event port for
> >> the worker core to process in an eventdev based pipelined application.
> >>
> >> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >> + *
> >> + * @param dev_id
> >> + *   Eventdev id
> >> + * @param port_id
> >> + *   Eventdev port id
> >> + * @param pmc
> >> + *   The pointer to power-optimized monitoring condition structure.
> >> + *
> >> + * @return
> >> + *   - 0: Success.
> >> + *   -ENOTSUP: Operation not supported.
> >> + *   -EINVAL: Invalid parameters.
> >> + *   -ENODEV: Invalid device ID.
> >> + */
> >> +__rte_experimental
> >> +int
> >> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> >> +               struct rte_power_monitor_cond *pmc);
> >
> > + eventdev driver maintainers
> >
> > I think, we don't need to expose this application due to applications
> > 1)To make applications to be transparent whether power saving is enabled or not?
> > 2)Some HW and Arch already supports power managent in driver and in HW
> > (Not using  CPU architecture directly)
> >
> > If so, that will be translated to following,
> > a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> > port_id, bool ena) for controlling power saving in slowpath.
> > b) Create reusable PMD private function based on the CPU architecture
> > power saving primitive to cover the PMD don't have native power saving
> > support.
> > c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> >
> >
>
> Hi Jerin,

Hi Ferruh,

>
> ethdev approach seems applied here.

Understands that. But none of the NIC HW supports power management at
HW level like eventdev, so that way
for what we are doing for ethdev is a correct abstraction for ethdev.

>
> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> 'rte_eth_get_monitor_addr()'.
>
> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> only called from Rx/Tx callback functions implemented in the power library.
> But I assume intention to make it public is to enable users to implement
> their own callback functions that has custom algorithm for the power
> management.

If there is a use case for customizing with own callback, we can provide that.
Provided NULL is valid with default algorithm.

>
> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>
>
> Also instead of implementing power features for withing PMDs, isn't it
> better to have a common eventdev layer for it?

We can have rte_evetdev_pmd_* APIs as non-public APIs.
My only objection is to NOT introduce _monitor_ APIs at eventdev level,
Instead, _monitor_ is one way to do it in SW, So we need higher level
of abstraction.

>
> For the PMDs benefit from HW event manager, just not implementing
> .get_monitor_addr() dev_ops will make them free from power related APIs.

But application fast path code gets diverged by exposing low level primitives.


>
>
>
>
  
Mattias Rönnblom April 25, 2023, 6:19 a.m. UTC | #4
On 2023-04-24 18:06, Ferruh Yigit wrote:
> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>> <sivaprasad.tummala@amd.com> wrote:
>>>
>>> A new API to allow power monitoring condition on event port to
>>> optimize power when no events are arriving on an event port for
>>> the worker core to process in an eventdev based pipelined application.
>>>
>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>> + *
>>> + * @param dev_id
>>> + *   Eventdev id
>>> + * @param port_id
>>> + *   Eventdev port id
>>> + * @param pmc
>>> + *   The pointer to power-optimized monitoring condition structure.
>>> + *
>>> + * @return
>>> + *   - 0: Success.
>>> + *   -ENOTSUP: Operation not supported.
>>> + *   -EINVAL: Invalid parameters.
>>> + *   -ENODEV: Invalid device ID.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>> +               struct rte_power_monitor_cond *pmc);
>>
>> + eventdev driver maintainers
>>
>> I think, we don't need to expose this application due to applications
>> 1)To make applications to be transparent whether power saving is enabled or not?
>> 2)Some HW and Arch already supports power managent in driver and in HW
>> (Not using  CPU architecture directly)
>>
>> If so, that will be translated to following,
>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>> port_id, bool ena) for controlling power saving in slowpath.
>> b) Create reusable PMD private function based on the CPU architecture
>> power saving primitive to cover the PMD don't have native power saving
>> support.
>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>
>>
> 
> Hi Jerin,
> 
> ethdev approach seems applied here.
> 
> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> 'rte_eth_get_monitor_addr()'.
> 
> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> only called from Rx/Tx callback functions implemented in the power library.
> But I assume intention to make it public is to enable users to implement
> their own callback functions that has custom algorithm for the power
> management.
> 
> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> 
> 
> Also instead of implementing power features for withing PMDs, isn't it
> better to have a common eventdev layer for it?
> 

To allow that question to be answered, I think you need to be more 
specific what are "power features".

 From what it seems to me, the get_monitor_addr() family of functions 
address the pretty narrow case of allowing umwait (or the non-x86 
equivalent) to be used to wait for new events. It leaves all the heavy 
lifting to the app, which needs to figure out how loaded each CPU core 
is, what backlog of work there is, how to shuffle work around to get the 
most out of the power, how to translate wall-clock latency requirements 
into the equation, what CPU (and/or accelerator/NIC-level) power 
features to employ (e.g., DVFS, sleep states, umwait), etc.

In the context of Eventdev, optimizing for power may include packing 
more flows into the same port, in low-load situations. Keeping a few 
cores relatively busy, and the rest in some deep sleep state may well be 
the best solution for certain (most?) systems. For such a feature to 
work, the event device must be in the loop, but the mechanics could (and 
should) be generic. Eventdev could also control DVFS.

A reasonably generic power management mechanism could go into Eventdev a 
combination of the event device drivers, and some generic functions). 
(Various policies would still need to come from the app.)

I think keeping this kind of functionality in Eventdev works well 
provided the only source of work is Eventdev events (i.e., most or all 
fast path lcores are "pure" event-based lcores). No non-eventdev timer 
wheels, no non-eventdev lookaside accelerator or I/O device access, no 
control plane rings to poll, etc.

If such a model is too limiting, another option is to put the central 
power management function in the service framework (with a lot of help 
from Eventdev, RTE timer, and other sources of work as well).

> For the PMDs benefit from HW event manager, just not implementing
> .get_monitor_addr() dev_ops will make them free from power related APIs.
> 
> 
>
  
Ferruh Yigit May 2, 2023, 10:43 a.m. UTC | #5
On 4/25/2023 7:19 AM, Mattias Rönnblom wrote:
> On 2023-04-24 18:06, Ferruh Yigit wrote:
>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>>> <sivaprasad.tummala@amd.com> wrote:
>>>>
>>>> A new API to allow power monitoring condition on event port to
>>>> optimize power when no events are arriving on an event port for
>>>> the worker core to process in an eventdev based pipelined application.
>>>>
>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>> + *
>>>> + * @param dev_id
>>>> + *   Eventdev id
>>>> + * @param port_id
>>>> + *   Eventdev port id
>>>> + * @param pmc
>>>> + *   The pointer to power-optimized monitoring condition structure.
>>>> + *
>>>> + * @return
>>>> + *   - 0: Success.
>>>> + *   -ENOTSUP: Operation not supported.
>>>> + *   -EINVAL: Invalid parameters.
>>>> + *   -ENODEV: Invalid device ID.
>>>> + */
>>>> +__rte_experimental
>>>> +int
>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>>> +               struct rte_power_monitor_cond *pmc);
>>>
>>> + eventdev driver maintainers
>>>
>>> I think, we don't need to expose this application due to applications
>>> 1)To make applications to be transparent whether power saving is enabled or not?
>>> 2)Some HW and Arch already supports power managent in driver and in HW
>>> (Not using  CPU architecture directly)
>>>
>>> If so, that will be translated to following,
>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>>> port_id, bool ena) for controlling power saving in slowpath.
>>> b) Create reusable PMD private function based on the CPU architecture
>>> power saving primitive to cover the PMD don't have native power saving
>>> support.
>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>>
>>>
>>
>> Hi Jerin,
>>
>> ethdev approach seems applied here.
>>
>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
>> 'rte_eth_get_monitor_addr()'.
>>
>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
>> only called from Rx/Tx callback functions implemented in the power library.
>> But I assume intention to make it public is to enable users to implement
>> their own callback functions that has custom algorithm for the power
>> management.
>>
>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>>
>>
>> Also instead of implementing power features for withing PMDs, isn't it
>> better to have a common eventdev layer for it?
>>
> 
> To allow that question to be answered, I think you need to be more 
> specific what are "power features".
> 
>  From what it seems to me, the get_monitor_addr() family of functions 
> address the pretty narrow case of allowing umwait (or the non-x86 
> equivalent) to be used to wait for new events. It leaves all the heavy 
> lifting to the app, which needs to figure out how loaded each CPU core 
> is, what backlog of work there is, how to shuffle work around to get the 
> most out of the power, how to translate wall-clock latency requirements 
> into the equation, what CPU (and/or accelerator/NIC-level) power 
> features to employ (e.g., DVFS, sleep states, umwait), etc.
> 
> In the context of Eventdev, optimizing for power may include packing 
> more flows into the same port, in low-load situations. Keeping a few 
> cores relatively busy, and the rest in some deep sleep state may well be 
> the best solution for certain (most?) systems. For such a feature to 
> work, the event device must be in the loop, but the mechanics could (and 
> should) be generic. Eventdev could also control DVFS.
> 
> A reasonably generic power management mechanism could go into Eventdev a 
> combination of the event device drivers, and some generic functions). 
> (Various policies would still need to come from the app.)
> 
> I think keeping this kind of functionality in Eventdev works well 
> provided the only source of work is Eventdev events (i.e., most or all 
> fast path lcores are "pure" event-based lcores). No non-eventdev timer 
> wheels, no non-eventdev lookaside accelerator or I/O device access, no 
> control plane rings to poll, etc.
> 
> If such a model is too limiting, another option is to put the central 
> power management function in the service framework (with a lot of help 
> from Eventdev, RTE timer, and other sources of work as well).
> 

Hi Mattias,

The current power management features referred in the scope of this
patch is around umwait use case as you mentioned.

It has default callbacks that application benefit with minimal
involvement from application, but if application wants more
sophisticated algorithm, needs to implement its own functions.

And I agree to have more comprehensive power management, it has benefit
but it has to start somewhere and we can grow it more by time. Also it
requires more support from community, not just from some vendors.

I think it is a good start to enable some HW features for power
management and make existing APIs more HW agnostic.

>> For the PMDs benefit from HW event manager, just not implementing
>> .get_monitor_addr() dev_ops will make them free from power related APIs.
>>
>>
>>
>
  
Ferruh Yigit May 2, 2023, 11:19 a.m. UTC | #6
On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>>> <sivaprasad.tummala@amd.com> wrote:
>>>>
>>>> A new API to allow power monitoring condition on event port to
>>>> optimize power when no events are arriving on an event port for
>>>> the worker core to process in an eventdev based pipelined application.
>>>>
>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>> + *
>>>> + * @param dev_id
>>>> + *   Eventdev id
>>>> + * @param port_id
>>>> + *   Eventdev port id
>>>> + * @param pmc
>>>> + *   The pointer to power-optimized monitoring condition structure.
>>>> + *
>>>> + * @return
>>>> + *   - 0: Success.
>>>> + *   -ENOTSUP: Operation not supported.
>>>> + *   -EINVAL: Invalid parameters.
>>>> + *   -ENODEV: Invalid device ID.
>>>> + */
>>>> +__rte_experimental
>>>> +int
>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>>> +               struct rte_power_monitor_cond *pmc);
>>>
>>> + eventdev driver maintainers
>>>
>>> I think, we don't need to expose this application due to applications
>>> 1)To make applications to be transparent whether power saving is enabled or not?
>>> 2)Some HW and Arch already supports power managent in driver and in HW
>>> (Not using  CPU architecture directly)
>>>
>>> If so, that will be translated to following,
>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>>> port_id, bool ena) for controlling power saving in slowpath.
>>> b) Create reusable PMD private function based on the CPU architecture
>>> power saving primitive to cover the PMD don't have native power saving
>>> support.
>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>>
>>>
>>
>> Hi Jerin,
> 
> Hi Ferruh,
> 
>>
>> ethdev approach seems applied here.
> 
> Understands that. But none of the NIC HW supports power management at
> HW level like eventdev, so that way
> for what we are doing for ethdev is a correct abstraction for ethdev.
> 

What I understand is there is HW based event manager and SW based ones,
SW based ones can benefit more from CPU power optimizations, for HW
event managers if there is not enough benefit they can just ignore the
feature.

>>
>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
>> 'rte_eth_get_monitor_addr()'.
>>
>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
>> only called from Rx/Tx callback functions implemented in the power library.
>> But I assume intention to make it public is to enable users to implement
>> their own callback functions that has custom algorithm for the power
>> management.
> 
> If there is a use case for customizing with own callback, we can provide that.
> Provided NULL is valid with default algorithm.
> 
>>
>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>>
>>
>> Also instead of implementing power features for withing PMDs, isn't it
>> better to have a common eventdev layer for it?
> 
> We can have rte_evetdev_pmd_* APIs as non-public APIs.
> My only objection is to NOT introduce _monitor_ APIs at eventdev level,
> Instead, _monitor_ is one way to do it in SW, So we need higher level
> of abstraction.
> 

I see, this seems a trade off between flexibility and usability. If
application has access to _monitor_ APIs, they can be more flexible to
implement their own logic.

Another option can be application provides the policy with an API and
monitor API used to realize the policy, but for this case it can be
challenge to find and implement correct policies.

>>
>> For the PMDs benefit from HW event manager, just not implementing
>> .get_monitor_addr() dev_ops will make them free from power related APIs.
> 
> But application fast path code gets diverged by exposing low level primitives.
> 

I am not clear with concern above, but for application that use default
callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
to enable this feature, if not called datapath is not impacted.
And if not dequeue callback added at all, custom or default, data path
is not impacted at all.
  
Jerin Jacob May 3, 2023, 7:58 a.m. UTC | #7
On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> > On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> >>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> >>> <sivaprasad.tummala@amd.com> wrote:
> >>>>
> >>>> A new API to allow power monitoring condition on event port to
> >>>> optimize power when no events are arriving on an event port for
> >>>> the worker core to process in an eventdev based pipelined application.
> >>>>
> >>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >>>> + *
> >>>> + * @param dev_id
> >>>> + *   Eventdev id
> >>>> + * @param port_id
> >>>> + *   Eventdev port id
> >>>> + * @param pmc
> >>>> + *   The pointer to power-optimized monitoring condition structure.
> >>>> + *
> >>>> + * @return
> >>>> + *   - 0: Success.
> >>>> + *   -ENOTSUP: Operation not supported.
> >>>> + *   -EINVAL: Invalid parameters.
> >>>> + *   -ENODEV: Invalid device ID.
> >>>> + */
> >>>> +__rte_experimental
> >>>> +int
> >>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> >>>> +               struct rte_power_monitor_cond *pmc);
> >>>
> >>> + eventdev driver maintainers
> >>>
> >>> I think, we don't need to expose this application due to applications
> >>> 1)To make applications to be transparent whether power saving is enabled or not?
> >>> 2)Some HW and Arch already supports power managent in driver and in HW
> >>> (Not using  CPU architecture directly)
> >>>
> >>> If so, that will be translated to following,
> >>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> >>> port_id, bool ena) for controlling power saving in slowpath.
> >>> b) Create reusable PMD private function based on the CPU architecture
> >>> power saving primitive to cover the PMD don't have native power saving
> >>> support.
> >>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> >>>
> >>>
> >>
> >> Hi Jerin,
> >
> > Hi Ferruh,
> >
> >>
> >> ethdev approach seems applied here.
> >
> > Understands that. But none of the NIC HW supports power management at
> > HW level like eventdev, so that way
> > for what we are doing for ethdev is a correct abstraction for ethdev.
> >
>
> What I understand is there is HW based event manager and SW based ones,
> SW based ones can benefit more from CPU power optimizations, for HW
> event managers if there is not enough benefit they can just ignore the
> feature.
>
> >>
> >> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> >> 'rte_eth_get_monitor_addr()'.
> >>
> >> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> >> only called from Rx/Tx callback functions implemented in the power library.
> >> But I assume intention to make it public is to enable users to implement
> >> their own callback functions that has custom algorithm for the power
> >> management.
> >
> > If there is a use case for customizing with own callback, we can provide that.
> > Provided NULL is valid with default algorithm.
> >
> >>
> >> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> >>
> >>
> >> Also instead of implementing power features for withing PMDs, isn't it
> >> better to have a common eventdev layer for it?
> >
> > We can have rte_evetdev_pmd_* APIs as non-public APIs.
> > My only objection is to NOT introduce _monitor_ APIs at eventdev level,
> > Instead, _monitor_ is one way to do it in SW, So we need higher level
> > of abstraction.
> >
>
> I see, this seems a trade off between flexibility and usability. If
> application has access to _monitor_ APIs, they can be more flexible to
> implement their own logic.

OK.

>
> Another option can be application provides the policy with an API and
> monitor API used to realize the policy, but for this case it can be
> challenge to find and implement correct policies.

OK. If we can enumerate the policies, then it will be ideal.
On plus side, there will not be any changes in needed in lib/power/


>
> >>
> >> For the PMDs benefit from HW event manager, just not implementing
> >> .get_monitor_addr() dev_ops will make them free from power related APIs.
> >
> > But application fast path code gets diverged by exposing low level primitives.
> >
>
> I am not clear with concern above, but for application that use default
> callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
> to enable this feature, if not called datapath is not impacted.
> And if not dequeue callback added at all, custom or default, data path
> is not impacted at all.

Concerns are around following code[1] when callback is not registered
for this use case.
In eventdev, we are using _one packet at a time_ for a lot of use case
with latency critical workload like L1 processing.
On such cases, the following code will add up.

[1]
  cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
    __ATOMIC_RELAXED);
  if (unlikely(cb != NULL))
       nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev, nb_rx, cb);

I see two options,
1) Enumerate the power policy and let driver implement through
non-public PMD helper functions
OR
2)Move the power management callback to driver via non-public PMD
helper functions to avoid cost of
PMDs where power managment done in HW and to remove above extra check
when NO callback is registered[1]

>
>
  
Ferruh Yigit May 3, 2023, 8:13 a.m. UTC | #8
On 5/3/2023 8:58 AM, Jerin Jacob wrote:
> On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
>>> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>>>
>>>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>>>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>>>>> <sivaprasad.tummala@amd.com> wrote:
>>>>>>
>>>>>> A new API to allow power monitoring condition on event port to
>>>>>> optimize power when no events are arriving on an event port for
>>>>>> the worker core to process in an eventdev based pipelined application.
>>>>>>
>>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>>>> + *
>>>>>> + * @param dev_id
>>>>>> + *   Eventdev id
>>>>>> + * @param port_id
>>>>>> + *   Eventdev port id
>>>>>> + * @param pmc
>>>>>> + *   The pointer to power-optimized monitoring condition structure.
>>>>>> + *
>>>>>> + * @return
>>>>>> + *   - 0: Success.
>>>>>> + *   -ENOTSUP: Operation not supported.
>>>>>> + *   -EINVAL: Invalid parameters.
>>>>>> + *   -ENODEV: Invalid device ID.
>>>>>> + */
>>>>>> +__rte_experimental
>>>>>> +int
>>>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>>>>> +               struct rte_power_monitor_cond *pmc);
>>>>>
>>>>> + eventdev driver maintainers
>>>>>
>>>>> I think, we don't need to expose this application due to applications
>>>>> 1)To make applications to be transparent whether power saving is enabled or not?
>>>>> 2)Some HW and Arch already supports power managent in driver and in HW
>>>>> (Not using  CPU architecture directly)
>>>>>
>>>>> If so, that will be translated to following,
>>>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>>>>> port_id, bool ena) for controlling power saving in slowpath.
>>>>> b) Create reusable PMD private function based on the CPU architecture
>>>>> power saving primitive to cover the PMD don't have native power saving
>>>>> support.
>>>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>>>>
>>>>>
>>>>
>>>> Hi Jerin,
>>>
>>> Hi Ferruh,
>>>
>>>>
>>>> ethdev approach seems applied here.
>>>
>>> Understands that. But none of the NIC HW supports power management at
>>> HW level like eventdev, so that way
>>> for what we are doing for ethdev is a correct abstraction for ethdev.
>>>
>>
>> What I understand is there is HW based event manager and SW based ones,
>> SW based ones can benefit more from CPU power optimizations, for HW
>> event managers if there is not enough benefit they can just ignore the
>> feature.
>>
>>>>
>>>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
>>>> 'rte_eth_get_monitor_addr()'.
>>>>
>>>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
>>>> only called from Rx/Tx callback functions implemented in the power library.
>>>> But I assume intention to make it public is to enable users to implement
>>>> their own callback functions that has custom algorithm for the power
>>>> management.
>>>
>>> If there is a use case for customizing with own callback, we can provide that.
>>> Provided NULL is valid with default algorithm.
>>>
>>>>
>>>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>>>>
>>>>
>>>> Also instead of implementing power features for withing PMDs, isn't it
>>>> better to have a common eventdev layer for it?
>>>
>>> We can have rte_evetdev_pmd_* APIs as non-public APIs.
>>> My only objection is to NOT introduce _monitor_ APIs at eventdev level,
>>> Instead, _monitor_ is one way to do it in SW, So we need higher level
>>> of abstraction.
>>>
>>
>> I see, this seems a trade off between flexibility and usability. If
>> application has access to _monitor_ APIs, they can be more flexible to
>> implement their own logic.
> 
> OK.
> 
>>
>> Another option can be application provides the policy with an API and
>> monitor API used to realize the policy, but for this case it can be
>> challenge to find and implement correct policies.
> 
> OK. If we can enumerate the policies, then it will be ideal.
> On plus side, there will not be any changes in needed in lib/power/
> 

If we are talking about a power framework that user defines policies, I
expect parsing/defining policies will be in the power library and will
require changes in the power library anyway.

But as mentioned above it is difficult to define a proper policy, this
is not really related to eventdev, more a power library issue. We can
continue to provide flexibility to user in eventdev and discuss the
policy if a wider forum.

> 
>>
>>>>
>>>> For the PMDs benefit from HW event manager, just not implementing
>>>> .get_monitor_addr() dev_ops will make them free from power related APIs.
>>>
>>> But application fast path code gets diverged by exposing low level primitives.
>>>
>>
>> I am not clear with concern above, but for application that use default
>> callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
>> to enable this feature, if not called datapath is not impacted.
>> And if not dequeue callback added at all, custom or default, data path
>> is not impacted at all.
> 
> Concerns are around following code[1] when callback is not registered
> for this use case.
> In eventdev, we are using _one packet at a time_ for a lot of use case
> with latency critical workload like L1 processing.
> On such cases, the following code will add up.
> 
> [1]
>   cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
>     __ATOMIC_RELAXED);
>   if (unlikely(cb != NULL))
>        nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev, nb_rx, cb);
> 
> I see two options,
> 1) Enumerate the power policy and let driver implement through
> non-public PMD helper functions
> OR
> 2)Move the power management callback to driver via non-public PMD
> helper functions to avoid cost of
> PMDs where power managment done in HW and to remove above extra check
> when NO callback is registered[1]
> 

Got it, yes there is an additional check with event callbacks, we can
add a compiler flag around it as done in ethdev to let it not compiled
when not needed, will it work?
  
Jerin Jacob May 3, 2023, 8:26 a.m. UTC | #9
On Wed, May 3, 2023 at 1:44 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 5/3/2023 8:58 AM, Jerin Jacob wrote:
> > On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> >>> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>>>
> >>>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> >>>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> >>>>> <sivaprasad.tummala@amd.com> wrote:
> >>>>>>
> >>>>>> A new API to allow power monitoring condition on event port to
> >>>>>> optimize power when no events are arriving on an event port for
> >>>>>> the worker core to process in an eventdev based pipelined application.
> >>>>>>
> >>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >>>>>> + *
> >>>>>> + * @param dev_id
> >>>>>> + *   Eventdev id
> >>>>>> + * @param port_id
> >>>>>> + *   Eventdev port id
> >>>>>> + * @param pmc
> >>>>>> + *   The pointer to power-optimized monitoring condition structure.
> >>>>>> + *
> >>>>>> + * @return
> >>>>>> + *   - 0: Success.
> >>>>>> + *   -ENOTSUP: Operation not supported.
> >>>>>> + *   -EINVAL: Invalid parameters.
> >>>>>> + *   -ENODEV: Invalid device ID.
> >>>>>> + */
> >>>>>> +__rte_experimental
> >>>>>> +int
> >>>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> >>>>>> +               struct rte_power_monitor_cond *pmc);
> >>>>>
> >>>>> + eventdev driver maintainers
> >>>>>
> >>>>> I think, we don't need to expose this application due to applications
> >>>>> 1)To make applications to be transparent whether power saving is enabled or not?
> >>>>> 2)Some HW and Arch already supports power managent in driver and in HW
> >>>>> (Not using  CPU architecture directly)
> >>>>>
> >>>>> If so, that will be translated to following,
> >>>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> >>>>> port_id, bool ena) for controlling power saving in slowpath.
> >>>>> b) Create reusable PMD private function based on the CPU architecture
> >>>>> power saving primitive to cover the PMD don't have native power saving
> >>>>> support.
> >>>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> >>>>>
> >>>>>
> >>>>
> >>>> Hi Jerin,
> >>>
> >>> Hi Ferruh,
> >>>
> >>>>
> >>>> ethdev approach seems applied here.
> >>>
> >>> Understands that. But none of the NIC HW supports power management at
> >>> HW level like eventdev, so that way
> >>> for what we are doing for ethdev is a correct abstraction for ethdev.
> >>>
> >>
> >> What I understand is there is HW based event manager and SW based ones,
> >> SW based ones can benefit more from CPU power optimizations, for HW
> >> event managers if there is not enough benefit they can just ignore the
> >> feature.
> >>
> >>>>
> >>>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> >>>> 'rte_eth_get_monitor_addr()'.
> >>>>
> >>>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> >>>> only called from Rx/Tx callback functions implemented in the power library.
> >>>> But I assume intention to make it public is to enable users to implement
> >>>> their own callback functions that has custom algorithm for the power
> >>>> management.
> >>>
> >>> If there is a use case for customizing with own callback, we can provide that.
> >>> Provided NULL is valid with default algorithm.
> >>>
> >>>>
> >>>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> >>>>
> >>>>
> >>>> Also instead of implementing power features for withing PMDs, isn't it
> >>>> better to have a common eventdev layer for it?
> >>>
> >>> We can have rte_evetdev_pmd_* APIs as non-public APIs.
> >>> My only objection is to NOT introduce _monitor_ APIs at eventdev level,
> >>> Instead, _monitor_ is one way to do it in SW, So we need higher level
> >>> of abstraction.
> >>>
> >>
> >> I see, this seems a trade off between flexibility and usability. If
> >> application has access to _monitor_ APIs, they can be more flexible to
> >> implement their own logic.
> >
> > OK.
> >
> >>
> >> Another option can be application provides the policy with an API and
> >> monitor API used to realize the policy, but for this case it can be
> >> challenge to find and implement correct policies.
> >
> > OK. If we can enumerate the policies, then it will be ideal.
> > On plus side, there will not be any changes in needed in lib/power/
> >
>
> If we are talking about a power framework that user defines policies, I
> expect parsing/defining policies will be in the power library and will
> require changes in the power library anyway.

OK

>
> But as mentioned above it is difficult to define a proper policy, this
> is not really related to eventdev, more a power library issue. We can
> continue to provide flexibility to user in eventdev and discuss the
> policy if a wider forum.

OK.

>
> >
> >>
> >>>>
> >>>> For the PMDs benefit from HW event manager, just not implementing
> >>>> .get_monitor_addr() dev_ops will make them free from power related APIs.
> >>>
> >>> But application fast path code gets diverged by exposing low level primitives.
> >>>
> >>
> >> I am not clear with concern above, but for application that use default
> >> callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
> >> to enable this feature, if not called datapath is not impacted.
> >> And if not dequeue callback added at all, custom or default, data path
> >> is not impacted at all.
> >
> > Concerns are around following code[1] when callback is not registered
> > for this use case.
> > In eventdev, we are using _one packet at a time_ for a lot of use case
> > with latency critical workload like L1 processing.
> > On such cases, the following code will add up.
> >
> > [1]
> >   cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
> >     __ATOMIC_RELAXED);
> >   if (unlikely(cb != NULL))
> >        nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev, nb_rx, cb);
> >
> > I see two options,
> > 1) Enumerate the power policy and let driver implement through
> > non-public PMD helper functions
> > OR
> > 2)Move the power management callback to driver via non-public PMD
> > helper functions to avoid cost of
> > PMDs where power managment done in HW and to remove above extra check
> > when NO callback is registered[1]
> >
>
> Got it, yes there is an additional check with event callbacks, we can
> add a compiler flag around it as done in ethdev to let it not compiled
> when not needed, will it work?

I would prefer to expose PMD helper function which can be called at
end of the driver dequeue function so that other PMD can reuse as
needed.
This is to avoid compiler flag, cache line occupancy changes in struct
rte_eventdev
struct rte_event_fp_ops in generic code also we may not need
full-fledged generic callbacks scheme for this.

>
  
Sivaprasad Tummala May 3, 2023, 3:11 p.m. UTC | #10
[AMD Official Use Only - General]

Hi Jerin, 

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 3, 2023 1:57 PM
> To: Yigit, Ferruh <Ferruh.Yigit@amd.com>
> Cc: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>;
> david.hunt@intel.com; jerinj@marvell.com; harry.van.haaren@intel.com;
> dev@dpdk.org; Pavan Nikhilesh <pbhagavatula@marvell.com>; McDaniel, Timothy
> <timothy.mcdaniel@intel.com>; Shijith Thotton <sthotton@marvell.com>;
> Hemant Agrawal <hemant.agrawal@nxp.com>; Sachin Saxena
> <sachin.saxena@oss.nxp.com>; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; Peter Mccarthy
> <peter.mccarthy@intel.com>; Liang Ma <liangma@liangbit.com>
> Subject: Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
> 
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
> 
> 
> On Wed, May 3, 2023 at 1:44 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> > On 5/3/2023 8:58 AM, Jerin Jacob wrote:
> > > On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> > >>
> > >> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> > >>> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com>
> wrote:
> > >>>>
> > >>>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> > >>>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> > >>>>> <sivaprasad.tummala@amd.com> wrote:
> > >>>>>>
> > >>>>>> A new API to allow power monitoring condition on event port to
> > >>>>>> optimize power when no events are arriving on an event port for
> > >>>>>> the worker core to process in an eventdev based pipelined application.
> > >>>>>>
> > >>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> > >>>>>> + *
> > >>>>>> + * @param dev_id
> > >>>>>> + *   Eventdev id
> > >>>>>> + * @param port_id
> > >>>>>> + *   Eventdev port id
> > >>>>>> + * @param pmc
> > >>>>>> + *   The pointer to power-optimized monitoring condition structure.
> > >>>>>> + *
> > >>>>>> + * @return
> > >>>>>> + *   - 0: Success.
> > >>>>>> + *   -ENOTSUP: Operation not supported.
> > >>>>>> + *   -EINVAL: Invalid parameters.
> > >>>>>> + *   -ENODEV: Invalid device ID.
> > >>>>>> + */
> > >>>>>> +__rte_experimental
> > >>>>>> +int
> > >>>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> > >>>>>> +               struct rte_power_monitor_cond *pmc);
> > >>>>>
> > >>>>> + eventdev driver maintainers
> > >>>>>
> > >>>>> I think, we don't need to expose this application due to
> > >>>>> applications 1)To make applications to be transparent whether power
> saving is enabled or not?
> > >>>>> 2)Some HW and Arch already supports power managent in driver and
> > >>>>> in HW (Not using  CPU architecture directly)
> > >>>>>
> > >>>>> If so, that will be translated to following,
> > >>>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id,
> > >>>>> uint8_t port_id, bool ena) for controlling power saving in slowpath.
> > >>>>> b) Create reusable PMD private function based on the CPU
> > >>>>> architecture power saving primitive to cover the PMD don't have
> > >>>>> native power saving support.
> > >>>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> Hi Jerin,
> > >>>
> > >>> Hi Ferruh,
> > >>>
> > >>>>
> > >>>> ethdev approach seems applied here.
> > >>>
> > >>> Understands that. But none of the NIC HW supports power management
> > >>> at HW level like eventdev, so that way for what we are doing for
> > >>> ethdev is a correct abstraction for ethdev.
> > >>>
> > >>
> > >> What I understand is there is HW based event manager and SW based
> > >> ones, SW based ones can benefit more from CPU power optimizations,
> > >> for HW event managers if there is not enough benefit they can just
> > >> ignore the feature.
> > >>
> > >>>>
> > >>>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> > >>>> 'rte_eth_get_monitor_addr()'.
> > >>>>
> > >>>> Although 'rte_eth_get_monitor_addr()' is public API, it is
> > >>>> currently only called from Rx/Tx callback functions implemented in the
> power library.
> > >>>> But I assume intention to make it public is to enable users to
> > >>>> implement their own callback functions that has custom algorithm
> > >>>> for the power management.
> > >>>
> > >>> If there is a use case for customizing with own callback, we can provide
> that.
> > >>> Provided NULL is valid with default algorithm.
> > >>>
> > >>>>
> > >>>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> > >>>>
> > >>>>
> > >>>> Also instead of implementing power features for withing PMDs,
> > >>>> isn't it better to have a common eventdev layer for it?
> > >>>
> > >>> We can have rte_evetdev_pmd_* APIs as non-public APIs.
> > >>> My only objection is to NOT introduce _monitor_ APIs at eventdev
> > >>> level, Instead, _monitor_ is one way to do it in SW, So we need
> > >>> higher level of abstraction.
> > >>>
> > >>
> > >> I see, this seems a trade off between flexibility and usability. If
> > >> application has access to _monitor_ APIs, they can be more flexible
> > >> to implement their own logic.
> > >
> > > OK.
> > >
> > >>
> > >> Another option can be application provides the policy with an API
> > >> and monitor API used to realize the policy, but for this case it
> > >> can be challenge to find and implement correct policies.
> > >
> > > OK. If we can enumerate the policies, then it will be ideal.
> > > On plus side, there will not be any changes in needed in lib/power/
> > >
> >
> > If we are talking about a power framework that user defines policies,
> > I expect parsing/defining policies will be in the power library and
> > will require changes in the power library anyway.
> 
> OK
> 
> >
> > But as mentioned above it is difficult to define a proper policy, this
> > is not really related to eventdev, more a power library issue. We can
> > continue to provide flexibility to user in eventdev and discuss the
> > policy if a wider forum.
> 
> OK.
> 
> >
> > >
> > >>
> > >>>>
> > >>>> For the PMDs benefit from HW event manager, just not implementing
> > >>>> .get_monitor_addr() dev_ops will make them free from power related
> APIs.
> > >>>
> > >>> But application fast path code gets diverged by exposing low level
> primitives.
> > >>>
> > >>
> > >> I am not clear with concern above, but for application that use
> > >> default callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs
> > >> to be called to enable this feature, if not called datapath is not impacted.
> > >> And if not dequeue callback added at all, custom or default, data
> > >> path is not impacted at all.
> > >
> > > Concerns are around following code[1] when callback is not
> > > registered for this use case.
> > > In eventdev, we are using _one packet at a time_ for a lot of use
> > > case with latency critical workload like L1 processing.
> > > On such cases, the following code will add up.
> > >
> > > [1]
> > >   cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
> > >     __ATOMIC_RELAXED);
> > >   if (unlikely(cb != NULL))
> > >        nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev,
> > > nb_rx, cb);
> > >
> > > I see two options,
> > > 1) Enumerate the power policy and let driver implement through
> > > non-public PMD helper functions OR 2)Move the power management
> > > callback to driver via non-public PMD helper functions to avoid cost
> > > of PMDs where power managment done in HW and to remove above extra
> > > check when NO callback is registered[1]
> > >
> >
> > Got it, yes there is an additional check with event callbacks, we can
> > add a compiler flag around it as done in ethdev to let it not compiled
> > when not needed, will it work?
> 
> I would prefer to expose PMD helper function which can be called at end of the
> driver dequeue function so that other PMD can reuse as needed.
> This is to avoid compiler flag, cache line occupancy changes in struct rte_eventdev
> struct rte_event_fp_ops in generic code also we may not need full-fledged generic
> callbacks scheme for this.
> 
> >
OK. Will fix this in the v1 patch.
  
Burakov, Anatoly May 17, 2023, 2:48 p.m. UTC | #11
On 4/19/2023 10:54 AM, Sivaprasad Tummala wrote:
> A new API to allow power monitoring condition on event port to
> optimize power when no events are arriving on an event port for
> the worker core to process in an eventdev based pipelined application.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---

General patchset comment: the implementation seems straightforward 
enough and closely follows ethdev, so I do not have any objections to it 
from rte_power point of view - it's nice to see that the infrastructure 
we have created came out useful outside of usecases we envisioned for it :)
  

Patch

diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index aebab26852..7b12f80f57 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -481,6 +481,26 @@  typedef int (*eventdev_port_unlink_t)(struct rte_eventdev *dev, void *port,
 typedef int (*eventdev_port_unlinks_in_progress_t)(struct rte_eventdev *dev,
 		void *port);
 
+/**
+ * @internal
+ * Get address of memory location whose contents will change whenever there is
+ * new data to be received on an Event port.
+ *
+ * @param port
+ *   Eventdev port pointer.
+ * @param pmc
+ *   The pointer to power-optimized monitoring condition structure.
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success
+ * @retval -EINVAL
+ *   Invalid parameters
+ */
+typedef int (*event_get_monitor_addr_t)(void *port,
+		struct rte_power_monitor_cond *pmc);
+
 /**
  * Converts nanoseconds to *timeout_ticks* value for rte_event_dequeue()
  *
@@ -1376,6 +1396,9 @@  struct eventdev_ops {
 	eventdev_dump_t dump;
 	/* Dump internal information */
 
+	/** Get power monitoring condition for event port */
+	event_get_monitor_addr_t get_monitor_addr;
+
 	eventdev_xstats_get_t xstats_get;
 	/**< Get extended device statistics. */
 	eventdev_xstats_get_names_t xstats_get_names;
diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index 6ab4524332..ff77194783 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -860,6 +860,30 @@  rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 	return 0;
 }
 
+int
+rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
+		struct rte_power_monitor_cond *pmc)
+{
+	struct rte_eventdev *dev;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_eventdevs[dev_id];
+	if (!is_valid_port(dev, port_id)) {
+		RTE_EDEV_LOG_ERR("Invalid port_id=%" PRIu8, port_id);
+		return -EINVAL;
+	}
+
+	if (pmc == NULL) {
+		RTE_EDEV_LOG_ERR("devid %u port %u power monitor condition is NULL\n",
+				dev_id, port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->get_monitor_addr == NULL)
+		return -ENOTSUP;
+	return (*dev->dev_ops->get_monitor_addr)(dev->data->ports[port_id], pmc);
+}
+
 int
 rte_event_queue_attr_get(uint8_t dev_id, uint8_t queue_id, uint32_t attr_id,
 			uint32_t *attr_value)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..841b1fb9b5 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -215,6 +215,7 @@  extern "C" {
 #include <rte_errno.h>
 #include <rte_mbuf_pool_ops.h>
 #include <rte_mempool.h>
+#include <rte_power_intrinsics.h>
 
 #include "rte_eventdev_trace_fp.h"
 
@@ -984,6 +985,30 @@  int
 rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 			uint32_t *attr_value);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve the monitor condition for a given event port.
+ *
+ * @param dev_id
+ *   Eventdev id
+ * @param port_id
+ *   Eventdev port id
+ * @param pmc
+ *   The pointer to power-optimized monitoring condition structure.
+ *
+ * @return
+ *   - 0: Success.
+ *   -ENOTSUP: Operation not supported.
+ *   -EINVAL: Invalid parameters.
+ *   -ENODEV: Invalid device ID.
+ */
+__rte_experimental
+int
+rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
+		struct rte_power_monitor_cond *pmc);
+
 /**
  * Start an event device.
  *