[v9,04/10] ethdev: add simple power management API

Message ID 1603494392-7181-5-git-send-email-liang.j.ma@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Add PMD power mgmt |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Liang, Ma Oct. 23, 2020, 11:06 p.m. UTC
  Add a simple API to allow getting address of next RX descriptor from the
PMD, as well as release notes information.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v8:
    - Rename version map file name.

    v7:
    - Fixed queue ID validation
    - Fixed documentation

    v6:
    - Rebase on top of latest main
    - Ensure the API checks queue ID (Konstantin)
    - Removed accidental inclusion of unrelated release notes
    v5:
    - Bring function format in line with other functions in the file
    - Ensure the API is supported by the driver before calling it (Konstantin)
---
 doc/guides/rel_notes/release_20_11.rst |  5 +++++
 lib/librte_ethdev/rte_ethdev.c         | 23 +++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h         | 28 ++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h  | 28 ++++++++++++++++++++++++++
 lib/librte_ethdev/version.map          |  1 +
 5 files changed, 85 insertions(+)
  

Comments

Thomas Monjalon Oct. 24, 2020, 8:39 p.m. UTC | #1
24/10/2020 01:06, Liang Ma:
> Add a simple API to allow getting address of next RX descriptor from the
> PMD, as well as release notes information.
> 
> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

You keep forgetting Cc ethdev maintainers
(it is automatic when using --cc-cmd devtools/get-maintainer.sh).
As a result we still don't have any feedback from Ferruh and Andrew.
And I believe such API requires having feedback from maintainers
of other NIC drivers. I tried to explain this concern already
in email and community meetings, but I see no progress.

Below are my comments.

> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -139,6 +139,11 @@ New Features
>    Hairpin Tx part flow rules can be inserted explicitly.
>    New API is added to get the hairpin peer ports list.
>  
> +* **ethdev: add 1 new EXPERIMENTAL API for PMD power management.**

The guidelines at the top of the file say using past tense.
No need to mention it is experimental as any new API.

> +
> +  * ``rte_eth_get_wake_addr()``
> +  * add new eth_dev_ops ``get_wake_addr``

No need to mention the dev_ops in the release notes.
Better to explain what it brings from an user perspective.

> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> +/**
> + * Retrieve the wake up address for the receive queue.

I guess how this function should be used,
but a bit more explanations would not hurt here.

> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The Rx queue on the Ethernet device for which information will be
> + *   retrieved.
> + * @param wake_addr
> + *   The pointer to the address which will be monitored.

This function does not make the address monitored, right?

> + * @param expected
> + *   The pointer to value to be expected when descriptor is set.

Not sure we should restrict it to a "descriptor".

Expecting a value or some bits looks too much restrictive.
I understand it probably fits well for Intel NICs,
but in the general case, we can imagine that any change
in a byte array could be a wake up signal.

> + * @param mask
> + *   The pointer to comparison bitmask for the expected value.
> + * @param data_sz
> + *   The pointer to data size for the expected value and comparison bitmask.

It is not clear that above 4 parameters are filled by the driver.

> + *
> + * @return
> + *   - 0: Success.
> + *   -ENOTSUP: Operation not supported.
> + *   -EINVAL: Invalid parameters.
> + *   -ENODEV: Invalid port ID.
> + */
> +__rte_experimental
> +int rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id,
> +		volatile void **wake_addr, uint64_t *expected, uint64_t *mask,
> +		uint8_t *data_sz);
[...]
> --- a/lib/librte_ethdev/version.map
> +++ b/lib/librte_ethdev/version.map
> @@ -244,6 +244,7 @@ EXPERIMENTAL {
>  	rte_flow_get_restore_info;
>  	rte_flow_tunnel_action_decap_release;
>  	rte_flow_tunnel_item_release;
> +	rte_eth_get_wake_addr;
>  };

Please sort in alphabetical order.
  
Liang, Ma Oct. 27, 2020, 11:15 a.m. UTC | #2
<snip>
thanks for your information. 
Sorry for that. 
All related maintainer(include other NIC PMD) will be Cced in v10.

> You keep forgetting Cc ethdev maintainers
> (it is automatic when using --cc-cmd devtools/get-maintainer.sh).
> As a result we still don't have any feedback from Ferruh and Andrew.
> And I believe such API requires having feedback from maintainers
> of other NIC drivers. I tried to explain this concern already
> in email and community meetings, but I see no progress.
> 
> Below are my comments.
> 
> > --- a/doc/guides/rel_notes/release_20_11.rst
> > +++ b/doc/guides/rel_notes/release_20_11.rst
> > @@ -139,6 +139,11 @@ New Features
> >    Hairpin Tx part flow rules can be inserted explicitly.
> >    New API is added to get the hairpin peer ports list.
> >  
> > +* **ethdev: add 1 new EXPERIMENTAL API for PMD power management.**
> 
> The guidelines at the top of the file say using past tense.
> No need to mention it is experimental as any new API.
agree
> 
> > +
> > +  * ``rte_eth_get_wake_addr()``
> > +  * add new eth_dev_ops ``get_wake_addr``
> 
> No need to mention the dev_ops in the release notes.
> Better to explain what it brings from an user perspective.
agree
> 
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > +/**
> > + * Retrieve the wake up address for the receive queue.
> 
> I guess how this function should be used,
> but a bit more explanations would not hurt here.
agree
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *   The Rx queue on the Ethernet device for which information will be
> > + *   retrieved.
> > + * @param wake_addr
> > + *   The pointer to the address which will be monitored.
> 
> This function does not make the address monitored, right?
This function only get the target wakeup address. that does not monitor this address.
> 
> > + * @param expected
> > + *   The pointer to value to be expected when descriptor is set.
> 
> Not sure we should restrict it to a "descriptor".
  actully that is not limited to a descriptor, any writeback content should work.
> 
> Expecting a value or some bits looks too much restrictive.
> I understand it probably fits well for Intel NICs,
> but in the general case, we can imagine that any change
> in a byte array could be a wake up signal.
this parameter doesn not limited user how to use it.
In fact, current design can support any bits change within 64 bits content.
> 
> > + * @param mask
> > + *   The pointer to comparison bitmask for the expected value.
> > + * @param data_sz
> > + *   The pointer to data size for the expected value and comparison bitmask.
> 
> It is not clear that above 4 parameters are filled by the driver.
> 
> > + *
> > + * @return
> > + *   - 0: Success.
> > + *   -ENOTSUP: Operation not supported.
> > + *   -EINVAL: Invalid parameters.
> > + *   -ENODEV: Invalid port ID.
> > + */
> > +__rte_experimental
> > +int rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id,
> > +		volatile void **wake_addr, uint64_t *expected, uint64_t *mask,
> > +		uint8_t *data_sz);
> [...]
> > --- a/lib/librte_ethdev/version.map
> > +++ b/lib/librte_ethdev/version.map
> > @@ -244,6 +244,7 @@ EXPERIMENTAL {
> >  	rte_flow_get_restore_info;
> >  	rte_flow_tunnel_action_decap_release;
> >  	rte_flow_tunnel_item_release;
> > +	rte_eth_get_wake_addr;
> >  };
> 
> Please sort in alphabetical order.
agree will do
> 
>
  
Thomas Monjalon Oct. 27, 2020, 3:52 p.m. UTC | #3
27/10/2020 12:15, Liang, Ma:
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > +/**
> > > + * Retrieve the wake up address for the receive queue.
> > 
> > I guess how this function should be used,
> > but a bit more explanations would not hurt here.
> agree
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the Ethernet device.
> > > + * @param queue_id
> > > + *   The Rx queue on the Ethernet device for which information will be
> > > + *   retrieved.
> > > + * @param wake_addr
> > > + *   The pointer to the address which will be monitored.
> > 
> > This function does not make the address monitored, right?
> This function only get the target wakeup address. that does not monitor this address.
> > 
> > > + * @param expected
> > > + *   The pointer to value to be expected when descriptor is set.
> > 
> > Not sure we should restrict it to a "descriptor".
>   actully that is not limited to a descriptor, any writeback content should work.
> > 
> > Expecting a value or some bits looks too much restrictive.
> > I understand it probably fits well for Intel NICs,
> > but in the general case, we can imagine that any change
> > in a byte array could be a wake up signal.
> 
> this parameter doesn not limited user how to use it.
> In fact, current design can support any bits change within 64 bits content.

How the driver can specify that any value change should be monitored?
I understand that it is only a value/mask pair,
it does not give room for "any value".
  
Ananyev, Konstantin Oct. 27, 2020, 5:43 p.m. UTC | #4
> 27/10/2020 12:15, Liang, Ma:
> > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > +/**
> > > > + * Retrieve the wake up address for the receive queue.
> > >
> > > I guess how this function should be used,
> > > but a bit more explanations would not hurt here.
> > agree
> > > > + *
> > > > + * @param port_id
> > > > + *   The port identifier of the Ethernet device.
> > > > + * @param queue_id
> > > > + *   The Rx queue on the Ethernet device for which information will be
> > > > + *   retrieved.
> > > > + * @param wake_addr
> > > > + *   The pointer to the address which will be monitored.
> > >
> > > This function does not make the address monitored, right?
> > This function only get the target wakeup address. that does not monitor this address.
> > >
> > > > + * @param expected
> > > > + *   The pointer to value to be expected when descriptor is set.
> > >
> > > Not sure we should restrict it to a "descriptor".
> >   actully that is not limited to a descriptor, any writeback content should work.
> > >
> > > Expecting a value or some bits looks too much restrictive.
> > > I understand it probably fits well for Intel NICs,
> > > but in the general case, we can imagine that any change
> > > in a byte array could be a wake up signal.
> >
> > this parameter doesn not limited user how to use it.
> > In fact, current design can support any bits change within 64 bits content.
> 
> How the driver can specify that any value change should be monitored?
> I understand that it is only a value/mask pair,
> it does not give room for "any value".

As I can read the code, value=0, mask=0 will provide you with 'any value'.
Though it would mean that rte_power_monitor() will *always* go into sleep,
so not sure what will be there any practical usage for such case. 
Konstantin
  
Thomas Monjalon Oct. 27, 2020, 6:30 p.m. UTC | #5
27/10/2020 18:43, Ananyev, Konstantin:
> > 27/10/2020 12:15, Liang, Ma:
> > > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > +/**
> > > > > + * Retrieve the wake up address for the receive queue.
> > > >
> > > > I guess how this function should be used,
> > > > but a bit more explanations would not hurt here.
> > > agree
> > > > > + *
> > > > > + * @param port_id
> > > > > + *   The port identifier of the Ethernet device.
> > > > > + * @param queue_id
> > > > > + *   The Rx queue on the Ethernet device for which information will be
> > > > > + *   retrieved.
> > > > > + * @param wake_addr
> > > > > + *   The pointer to the address which will be monitored.
> > > >
> > > > This function does not make the address monitored, right?
> > > This function only get the target wakeup address. that does not monitor this address.
> > > >
> > > > > + * @param expected
> > > > > + *   The pointer to value to be expected when descriptor is set.
> > > >
> > > > Not sure we should restrict it to a "descriptor".
> > >   actully that is not limited to a descriptor, any writeback content should work.
> > > >
> > > > Expecting a value or some bits looks too much restrictive.
> > > > I understand it probably fits well for Intel NICs,
> > > > but in the general case, we can imagine that any change
> > > > in a byte array could be a wake up signal.
> > >
> > > this parameter doesn not limited user how to use it.
> > > In fact, current design can support any bits change within 64 bits content.
> > 
> > How the driver can specify that any value change should be monitored?
> > I understand that it is only a value/mask pair,
> > it does not give room for "any value".
> 
> As I can read the code, value=0, mask=0 will provide you with 'any value'.
> Though it would mean that rte_power_monitor() will *always* go into sleep,
> so not sure what will be there any practical usage for such case. 

I think what is missing is to allow waking up when the value
of a byte array is changing, without specifiying any value.
  
Ananyev, Konstantin Oct. 27, 2020, 11:29 p.m. UTC | #6
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, October 27, 2020 6:31 PM
> To: Ma, Liang J <liang.j.ma@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; viktorin@rehivetech.com; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ruifeng.wang@arm.com; Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>;
> jerinjacobk@gmail.com; nhorman@tuxdriver.com; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Eads, Gage
> <gage.eads@intel.com>; drc@linux.vnet.ibm.com; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; jerinj@marvell.com; hemant.agrawal@nxp.com; viacheslavo@nvidia.com; matan@nvidia.com;
> ajit.khaparde@broadcom.com; rahul.lakkireddy@chelsio.com; johndale@cisco.com; xavier.huwei@huawei.com; shahafs@nvidia.com;
> sthemmin@microsoft.com; g.singh@nxp.com; rmody@marvell.com; maxime.coquelin@redhat.com; david.marchand@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v9 04/10] ethdev: add simple power management API
> 
> 27/10/2020 18:43, Ananyev, Konstantin:
> > > 27/10/2020 12:15, Liang, Ma:
> > > > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > > +/**
> > > > > > + * Retrieve the wake up address for the receive queue.
> > > > >
> > > > > I guess how this function should be used,
> > > > > but a bit more explanations would not hurt here.
> > > > agree
> > > > > > + *
> > > > > > + * @param port_id
> > > > > > + *   The port identifier of the Ethernet device.
> > > > > > + * @param queue_id
> > > > > > + *   The Rx queue on the Ethernet device for which information will be
> > > > > > + *   retrieved.
> > > > > > + * @param wake_addr
> > > > > > + *   The pointer to the address which will be monitored.
> > > > >
> > > > > This function does not make the address monitored, right?
> > > > This function only get the target wakeup address. that does not monitor this address.
> > > > >
> > > > > > + * @param expected
> > > > > > + *   The pointer to value to be expected when descriptor is set.
> > > > >
> > > > > Not sure we should restrict it to a "descriptor".
> > > >   actully that is not limited to a descriptor, any writeback content should work.
> > > > >
> > > > > Expecting a value or some bits looks too much restrictive.
> > > > > I understand it probably fits well for Intel NICs,
> > > > > but in the general case, we can imagine that any change
> > > > > in a byte array could be a wake up signal.
> > > >
> > > > this parameter doesn not limited user how to use it.
> > > > In fact, current design can support any bits change within 64 bits content.
> > >
> > > How the driver can specify that any value change should be monitored?
> > > I understand that it is only a value/mask pair,
> > > it does not give room for "any value".
> >
> > As I can read the code, value=0, mask=0 will provide you with 'any value'.
> > Though it would mean that rte_power_monitor() will *always* go into sleep,
> > so not sure what will be there any practical usage for such case.
> 
> I think what is missing is to allow waking up when the value
> of a byte array is changing, without specifiying any value.


I think it will always wakeup on any write to wait_addr.
What you control with value/mask pair - when we should go to sleep.
In other words:
ret = rte_eth_get_wake_addr(port, queue, &wait_addr, &value, &mask, ....);

mask==0: always go to sleep, wakeup at any store to wait_addr.
mask!=0: go to sleep only if (*wait_addr & mask) == value, wakeup at any store to wait_addr.   

Liang, Anatoly - feel free to correct me here, if I missed something.
  
Ajit Khaparde Oct. 28, 2020, 3:24 a.m. UTC | #7
On Tue, Oct 27, 2020 at 4:29 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, October 27, 2020 6:31 PM
> > To: Ma, Liang J <liang.j.ma@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; viktorin@rehivetech.com; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > ruifeng.wang@arm.com; Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> > Wang, Haiyue <haiyue.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>;
> > jerinjacobk@gmail.com; nhorman@tuxdriver.com; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Eads, Gage
> > <gage.eads@intel.com>; drc@linux.vnet.ibm.com; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Yigit, Ferruh
> > <ferruh.yigit@intel.com>; jerinj@marvell.com; hemant.agrawal@nxp.com; viacheslavo@nvidia.com; matan@nvidia.com;
> > ajit.khaparde@broadcom.com; rahul.lakkireddy@chelsio.com; johndale@cisco.com; xavier.huwei@huawei.com; shahafs@nvidia.com;
> > sthemmin@microsoft.com; g.singh@nxp.com; rmody@marvell.com; maxime.coquelin@redhat.com; david.marchand@redhat.com
> > Subject: Re: [dpdk-dev] [PATCH v9 04/10] ethdev: add simple power management API
> >
> > 27/10/2020 18:43, Ananyev, Konstantin:
> > > > 27/10/2020 12:15, Liang, Ma:
> > > > > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > > > +/**
> > > > > > > + * Retrieve the wake up address for the receive queue.
> > > > > >
> > > > > > I guess how this function should be used,
> > > > > > but a bit more explanations would not hurt here.
> > > > > agree
> > > > > > > + *
> > > > > > > + * @param port_id
> > > > > > > + *   The port identifier of the Ethernet device.
> > > > > > > + * @param queue_id
> > > > > > > + *   The Rx queue on the Ethernet device for which information will be
> > > > > > > + *   retrieved.
> > > > > > > + * @param wake_addr
> > > > > > > + *   The pointer to the address which will be monitored.
> > > > > >
> > > > > > This function does not make the address monitored, right?
> > > > > This function only get the target wakeup address. that does not monitor this address.
> > > > > >
> > > > > > > + * @param expected
> > > > > > > + *   The pointer to value to be expected when descriptor is set.
> > > > > >
> > > > > > Not sure we should restrict it to a "descriptor".
> > > > >   actully that is not limited to a descriptor, any writeback content should work.
> > > > > >
> > > > > > Expecting a value or some bits looks too much restrictive.
> > > > > > I understand it probably fits well for Intel NICs,
> > > > > > but in the general case, we can imagine that any change
> > > > > > in a byte array could be a wake up signal.
> > > > >
> > > > > this parameter doesn not limited user how to use it.
> > > > > In fact, current design can support any bits change within 64 bits content.
> > > >
> > > > How the driver can specify that any value change should be monitored?
> > > > I understand that it is only a value/mask pair,
> > > > it does not give room for "any value".
> > >
> > > As I can read the code, value=0, mask=0 will provide you with 'any value'.
> > > Though it would mean that rte_power_monitor() will *always* go into sleep,
> > > so not sure what will be there any practical usage for such case.
> >
> > I think what is missing is to allow waking up when the value
> > of a byte array is changing, without specifiying any value.
>
>
> I think it will always wakeup on any write to wait_addr.
> What you control with value/mask pair - when we should go to sleep.
> In other words:
> ret = rte_eth_get_wake_addr(port, queue, &wait_addr, &value, &mask, ....);
>
> mask==0: always go to sleep, wakeup at any store to wait_addr.
> mask!=0: go to sleep only if (*wait_addr & mask) == value, wakeup at any store to wait_addr.
I did not get this impression on reading it first time. Till you put
it this way.
The comment "if the masked value is already matching, abort" stumped me.

>
> Liang, Anatoly - feel free to correct me here, if I missed something.
>
  
Liang, Ma Oct. 28, 2020, 12:24 p.m. UTC | #8
On 27 Oct 16:29, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, October 27, 2020 6:31 PM
> > To: Ma, Liang J <liang.j.ma@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; viktorin@rehivetech.com; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > ruifeng.wang@arm.com; Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> > Wang, Haiyue <haiyue.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David <david.hunt@intel.com>;
> > jerinjacobk@gmail.com; nhorman@tuxdriver.com; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Eads, Gage
> > <gage.eads@intel.com>; drc@linux.vnet.ibm.com; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Yigit, Ferruh
> > <ferruh.yigit@intel.com>; jerinj@marvell.com; hemant.agrawal@nxp.com; viacheslavo@nvidia.com; matan@nvidia.com;
> > ajit.khaparde@broadcom.com; rahul.lakkireddy@chelsio.com; johndale@cisco.com; xavier.huwei@huawei.com; shahafs@nvidia.com;
> > sthemmin@microsoft.com; g.singh@nxp.com; rmody@marvell.com; maxime.coquelin@redhat.com; david.marchand@redhat.com
> > Subject: Re: [dpdk-dev] [PATCH v9 04/10] ethdev: add simple power management API
> >
> > 27/10/2020 18:43, Ananyev, Konstantin:
> > > > 27/10/2020 12:15, Liang, Ma:
> > > > > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > > > +/**
> > > > > > > + * Retrieve the wake up address for the receive queue.
> > > > > >
> > > > > > I guess how this function should be used,
> > > > > > but a bit more explanations would not hurt here.
> > > > > agree
> > > > > > > + *
> > > > > > > + * @param port_id
> > > > > > > + *   The port identifier of the Ethernet device.
> > > > > > > + * @param queue_id
> > > > > > > + *   The Rx queue on the Ethernet device for which information will be
> > > > > > > + *   retrieved.
> > > > > > > + * @param wake_addr
> > > > > > > + *   The pointer to the address which will be monitored.
> > > > > >
> > > > > > This function does not make the address monitored, right?
> > > > > This function only get the target wakeup address. that does not monitor this address.
> > > > > >
> > > > > > > + * @param expected
> > > > > > > + *   The pointer to value to be expected when descriptor is set.
> > > > > >
> > > > > > Not sure we should restrict it to a "descriptor".
> > > > >   actully that is not limited to a descriptor, any writeback content should work.
> > > > > >
> > > > > > Expecting a value or some bits looks too much restrictive.
> > > > > > I understand it probably fits well for Intel NICs,
> > > > > > but in the general case, we can imagine that any change
> > > > > > in a byte array could be a wake up signal.
> > > > >
> > > > > this parameter doesn not limited user how to use it.
> > > > > In fact, current design can support any bits change within 64 bits content.
> > > >
> > > > How the driver can specify that any value change should be monitored?
> > > > I understand that it is only a value/mask pair,
> > > > it does not give room for "any value".
> > >
> > > As I can read the code, value=0, mask=0 will provide you with 'any value'.
> > > Though it would mean that rte_power_monitor() will *always* go into sleep,
> > > so not sure what will be there any practical usage for such case.
> >
> > I think what is missing is to allow waking up when the value
> > of a byte array is changing, without specifiying any value.
> 
> 
> I think it will always wakeup on any write to wait_addr.
> What you control with value/mask pair - when we should go to sleep.
> In other words:
> ret = rte_eth_get_wake_addr(port, queue, &wait_addr, &value, &mask, ....);
> 
> mask==0: always go to sleep, wakeup at any store to wait_addr.
> mask!=0: go to sleep only if (*wait_addr & mask) == value, wakeup at any store to wait_addr.
> 
> Liang, Anatoly - feel free to correct me here, if I missed something.
  mask and expected value is used to prevent race condition with NIC. 
  there is a interval between code get target address and issue umwait action. 
  therefore, the mask is a selector of the bits, if the current value  & mask already
  == the expected value just before issue the umwait,  that mean that address is no 
  longer useful for umwait because of DMA has happend.
  then the code path is abort and back to normal logic.
  

Patch

diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index d8ac359e51..2827a000db 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -139,6 +139,11 @@  New Features
   Hairpin Tx part flow rules can be inserted explicitly.
   New API is added to get the hairpin peer ports list.
 
+* **ethdev: add 1 new EXPERIMENTAL API for PMD power management.**
+
+  * ``rte_eth_get_wake_addr()``
+  * add new eth_dev_ops ``get_wake_addr``
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index b12bb3854d..4f3115fe8e 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -5138,6 +5138,29 @@  rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id,
 		       dev->dev_ops->tx_burst_mode_get(dev, queue_id, mode));
 }
 
+int
+rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id,
+		volatile void **wake_addr, uint64_t *expected, uint64_t *mask,
+		uint8_t *data_sz)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_wake_addr, -ENOTSUP);
+
+	if (queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	return eth_err(port_id,
+		dev->dev_ops->get_wake_addr(dev->data->rx_queues[queue_id],
+			wake_addr, expected, mask, data_sz));
+}
+
 int
 rte_eth_dev_set_mc_addr_list(uint16_t port_id,
 			     struct rte_ether_addr *mc_addr_set,
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index e341a08817..11559e7bc8 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -4364,6 +4364,34 @@  __rte_experimental
 int rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_burst_mode *mode);
 
+/**
+ * Retrieve the wake up address for the receive queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The Rx queue on the Ethernet device for which information will be
+ *   retrieved.
+ * @param wake_addr
+ *   The pointer to the address which will be monitored.
+ * @param expected
+ *   The pointer to value to be expected when descriptor is set.
+ * @param mask
+ *   The pointer to comparison bitmask for the expected value.
+ * @param data_sz
+ *   The pointer to data size for the expected value and comparison bitmask.
+ *
+ * @return
+ *   - 0: Success.
+ *   -ENOTSUP: Operation not supported.
+ *   -EINVAL: Invalid parameters.
+ *   -ENODEV: Invalid port ID.
+ */
+__rte_experimental
+int rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id,
+		volatile void **wake_addr, uint64_t *expected, uint64_t *mask,
+		uint8_t *data_sz);
+
 /**
  * Retrieve device registers and register attributes (number of registers and
  * register size)
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index c63b9f7eb7..d7548dfe74 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -752,6 +752,32 @@  typedef int (*eth_hairpin_queue_peer_unbind_t)
 	(struct rte_eth_dev *dev, uint16_t cur_queue, uint32_t direction);
 /**< @internal Unbind peer queue from the current queue. */
 
+/**
+ * @internal
+ * Get address of memory location whose contents will change whenever there is
+ * new data to be received on an RX queue.
+ *
+ * @param rxq
+ *   Ethdev queue pointer.
+ * @param tail_desc_addr
+ *   The pointer point to where the address will be stored.
+ * @param expected
+ *   The pointer point to value to be expected when descriptor is set.
+ * @param mask
+ *   The pointer point to comparison bitmask for the expected value.
+ * @param data_sz
+ *   Data size for the expected value (can be 1, 2, 4, or 8 bytes)
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success
+ * @retval -EINVAL
+ *   Invalid parameters
+ */
+typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void **tail_desc_addr,
+		uint64_t *expected, uint64_t *mask, uint8_t *data_sz);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -910,6 +936,8 @@  struct eth_dev_ops {
 	/**< Set up the connection between the pair of hairpin queues. */
 	eth_hairpin_queue_peer_unbind_t hairpin_queue_peer_unbind;
 	/**< Disconnect the hairpin queues of a pair from each other. */
+	eth_get_wake_addr_t get_wake_addr;
+	/**< Get next RX queue ring entry address. */
 };
 
 /**
diff --git a/lib/librte_ethdev/version.map b/lib/librte_ethdev/version.map
index 8ddda2547f..392c273712 100644
--- a/lib/librte_ethdev/version.map
+++ b/lib/librte_ethdev/version.map
@@ -244,6 +244,7 @@  EXPERIMENTAL {
 	rte_flow_get_restore_info;
 	rte_flow_tunnel_action_decap_release;
 	rte_flow_tunnel_item_release;
+	rte_eth_get_wake_addr;
 };
 
 INTERNAL {