[21.08,v1,1/2] power: invert the monitor check

Message ID 819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [21.08,v1,1/2] power: invert the monitor check |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Burakov, Anatoly May 11, 2021, 3:31 p.m. UTC
  Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value.

We can reverse the check, and instead have monitor sleep to be aborted
if the expected value *doesn't* match what's in memory. This allows us
to both implement all currently implemented driver code, as well as
support more use cases which don't easily map to previous semantics
(such as waiting on writes to AF_XDP counter value).

This commit also adjusts all current driver implementations to match the
new semantics.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/event/dlb2/dlb2.c                      | 2 +-
 drivers/net/i40e/i40e_rxtx.c                   | 2 +-
 drivers/net/iavf/iavf_rxtx.c                   | 2 +-
 drivers/net/ice/ice_rxtx.c                     | 2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c                 | 2 +-
 drivers/net/mlx5/mlx5_rx.c                     | 2 +-
 lib/eal/include/generic/rte_power_intrinsics.h | 8 ++++----
 lib/eal/x86/rte_power_intrinsics.c             | 4 ++--
 8 files changed, 12 insertions(+), 12 deletions(-)
  

Comments

Burakov, Anatoly May 11, 2021, 3:39 p.m. UTC | #1
On 11-May-21 4:31 PM, Anatoly Burakov wrote:
> Previously, the semantics of power monitor were such that we were
> checking current value against the expected value, and if they matched,
> then the sleep was aborted. This is somewhat inflexible, because it only
> allowed us to check for a specific value.
> 
> We can reverse the check, and instead have monitor sleep to be aborted
> if the expected value *doesn't* match what's in memory. This allows us
> to both implement all currently implemented driver code, as well as
> support more use cases which don't easily map to previous semantics
> (such as waiting on writes to AF_XDP counter value).
> 
> This commit also adjusts all current driver implementations to match the
> new semantics.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

<snip>

> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 6cd71a44eb..3cbbe5bf59 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -282,7 +282,7 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>   		return -rte_errno;
>   	}
>   	pmc->addr = &cqe->op_own;
> -	pmc->val =  !!idx;
> +	pmc->val =  !idx;
>   	pmc->mask = MLX5_CQE_OWNER_MASK;
>   	pmc->size = sizeof(uint8_t);
>   	return 0;

Both previous and current code seem suspicious to me, as no attention is 
paid to endianness of the code. I would appreciate if mlx5 maintainers 
chimed in here :)
  
Timothy McDaniel May 11, 2021, 4:28 p.m. UTC | #2
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, May 11, 2021 10:32 AM
> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; Viacheslav Ovsiienko <viacheslavo@nvidia.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: [21.08 PATCH v1 1/2] power: invert the monitor check
> 
> Previously, the semantics of power monitor were such that we were
> checking current value against the expected value, and if they matched,
> then the sleep was aborted. This is somewhat inflexible, because it only
> allowed us to check for a specific value.
> 
> We can reverse the check, and instead have monitor sleep to be aborted
> if the expected value *doesn't* match what's in memory. This allows us
> to both implement all currently implemented driver code, as well as
> support more use cases which don't easily map to previous semantics
> (such as waiting on writes to AF_XDP counter value).
> 
> This commit also adjusts all current driver implementations to match the
> new semantics.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  drivers/event/dlb2/dlb2.c                      | 2 +-
>  drivers/net/i40e/i40e_rxtx.c                   | 2 +-
>  drivers/net/iavf/iavf_rxtx.c                   | 2 +-
>  drivers/net/ice/ice_rxtx.c                     | 2 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c                 | 2 +-
>  drivers/net/mlx5/mlx5_rx.c                     | 2 +-
>  lib/eal/include/generic/rte_power_intrinsics.h | 8 ++++----
>  lib/eal/x86/rte_power_intrinsics.c             | 4 ++--
>  8 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index 3570678b9e..5701bbb8ab 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -3188,7 +3188,7 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>  			&cq_base[qm_port->cq_idx];
>  		monitor_addr++; /* cq_gen bit is in second 64bit location */
> 
> -		if (qm_port->gen_bit)
> +		if (!qm_port->gen_bit)
>  			expected_value = qe_mask.raw_qe[1];
>  		else
>  			expected_value = 0;
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 02cf5e787c..4617ae914a 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -88,7 +88,7 @@ i40e_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> 
>  	/* registers are 64-bit */
> diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
> index 87f7eebc65..d8d9cc860c 100644
> --- a/drivers/net/iavf/iavf_rxtx.c
> +++ b/drivers/net/iavf/iavf_rxtx.c
> @@ -73,7 +73,7 @@ iavf_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> 
>  	/* registers are 64-bit */
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
> index 92fbbc18da..4e349bfa3f 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -43,7 +43,7 @@ ice_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> 
>  	/* register is 16-bit */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index d69f36e977..2793718171 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1385,7 +1385,7 @@ ixgbe_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> 
>  	/* the registers are 32-bit */
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 6cd71a44eb..3cbbe5bf59 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -282,7 +282,7 @@ int mlx5_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  		return -rte_errno;
>  	}
>  	pmc->addr = &cqe->op_own;
> -	pmc->val =  !!idx;
> +	pmc->val =  !idx;
>  	pmc->mask = MLX5_CQE_OWNER_MASK;
>  	pmc->size = sizeof(uint8_t);
>  	return 0;
> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h
> b/lib/eal/include/generic/rte_power_intrinsics.h
> index dddca3d41c..28c481a8d2 100644
> --- a/lib/eal/include/generic/rte_power_intrinsics.h
> +++ b/lib/eal/include/generic/rte_power_intrinsics.h
> @@ -45,10 +45,10 @@ struct rte_power_monitor_cond {
>   * Additionally, an expected value (`pmc->val`), mask (`pmc->mask`), and data
>   * size (`pmc->size`) are provided in the `pmc` power monitoring condition. If
>   * the mask is non-zero, the current value pointed to by the `pmc->addr` pointer
> - * will be read and compared against the expected value, and if they match, the
> - * entering of optimized power state will be aborted. This is intended to
> - * prevent the CPU from entering optimized power state and waiting on a write
> - * that has already happened by the time this API is called.
> + * will be read and compared against the expected value, and if they do not
> + * match, the entering of optimized power state will be aborted. This is
> + * intended to prevent the CPU from entering optimized power state and
> waiting
> + * on a write that has already happened by the time this API is called.
>   *
>   * @warning It is responsibility of the user to check if this function is
>   *   supported at runtime using `rte_cpu_get_intrinsics_support()` API call.
> diff --git a/lib/eal/x86/rte_power_intrinsics.c
> b/lib/eal/x86/rte_power_intrinsics.c
> index 39ea9fdecd..7f0588d70e 100644
> --- a/lib/eal/x86/rte_power_intrinsics.c
> +++ b/lib/eal/x86/rte_power_intrinsics.c
> @@ -116,8 +116,8 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  				pmc->addr, pmc->size);
>  		const uint64_t masked = cur_value & pmc->mask;
> 
> -		/* if the masked value is already matching, abort */
> -		if (masked == pmc->val)
> +		/* if the masked value is not matching, abort */
> +		if (masked != pmc->val)
>  			goto end;
>  	}
> 
> --
> 2.25.1

Acked-by: Timothy McDaniel  <timothy.mcdaniel@intel.com>
  
Alexander Kozyrev May 12, 2021, 5:57 p.m. UTC | #3
On 11-May-2021 11:39 AM, Anatoly Burakov wrote:
> On 11-May-21 4:31 PM, Anatoly Burakov wrote:
> > Previously, the semantics of power monitor were such that we were
> > checking current value against the expected value, and if they matched,
> > then the sleep was aborted. This is somewhat inflexible, because it only
> > allowed us to check for a specific value.
> >
> > We can reverse the check, and instead have monitor sleep to be aborted
> > if the expected value *doesn't* match what's in memory. This allows us
> > to both implement all currently implemented driver code, as well as
> > support more use cases which don't easily map to previous semantics
> > (such as waiting on writes to AF_XDP counter value).
> >
> > This commit also adjusts all current driver implementations to match the
> > new semantics.
> >
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> 
> <snip>
> 
> > diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> > index 6cd71a44eb..3cbbe5bf59 100644
> > --- a/drivers/net/mlx5/mlx5_rx.c
> > +++ b/drivers/net/mlx5/mlx5_rx.c
> > @@ -282,7 +282,7 @@ int mlx5_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
> >   		return -rte_errno;
> >   	}
> >   	pmc->addr = &cqe->op_own;
> > -	pmc->val =  !!idx;
> > +	pmc->val =  !idx;
> >   	pmc->mask = MLX5_CQE_OWNER_MASK;
> >   	pmc->size = sizeof(uint8_t);
> >   	return 0;
> 
> Both previous and current code seem suspicious to me, as no attention is
> paid to endianness of the code. I would appreciate if mlx5 maintainers
> chimed in here :)

It is hard to mess up with endianness when you only have 1 byte :)
  
Marvin Liu May 25, 2021, 9:15 a.m. UTC | #4
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Anatoly Burakov
> Sent: Tuesday, May 11, 2021 11:32 PM
> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Viacheslav
> Ovsiienko <viacheslavo@nvidia.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
> 
> Previously, the semantics of power monitor were such that we were
> checking current value against the expected value, and if they matched,
> then the sleep was aborted. This is somewhat inflexible, because it only
> allowed us to check for a specific value.
> 
> We can reverse the check, and instead have monitor sleep to be aborted
> if the expected value *doesn't* match what's in memory. This allows us
> to both implement all currently implemented driver code, as well as
> support more use cases which don't easily map to previous semantics
> (such as waiting on writes to AF_XDP counter value).
> 

Hi Anatoly,
In virtio spec, packed formatted descriptor utilizes two bits for representing the status. One bit for available status, one bit for used status. 
For checking the status more precisely, it is need to check value against the expected value. 
The monitor function in virtio datapath still can work with new semantics, but it may lead to some useless io call. 
Base on that, I'd like to keep previous semantics.

Regards,
Marvin

> This commit also adjusts all current driver implementations to match the
> new semantics.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  drivers/event/dlb2/dlb2.c                      | 2 +-
>  drivers/net/i40e/i40e_rxtx.c                   | 2 +-
>  drivers/net/iavf/iavf_rxtx.c                   | 2 +-
>  drivers/net/ice/ice_rxtx.c                     | 2 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c                 | 2 +-
>  drivers/net/mlx5/mlx5_rx.c                     | 2 +-
>  lib/eal/include/generic/rte_power_intrinsics.h | 8 ++++----
>  lib/eal/x86/rte_power_intrinsics.c             | 4 ++--
>  8 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index 3570678b9e..5701bbb8ab 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -3188,7 +3188,7 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>  			&cq_base[qm_port->cq_idx];
>  		monitor_addr++; /* cq_gen bit is in second 64bit location */
> 
> -		if (qm_port->gen_bit)
> +		if (!qm_port->gen_bit)
>  			expected_value = qe_mask.raw_qe[1];
>  		else
>  			expected_value = 0;
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 02cf5e787c..4617ae914a 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -88,7 +88,7 @@ i40e_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> 
>  	/* registers are 64-bit */
> diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
> index 87f7eebc65..d8d9cc860c 100644
> --- a/drivers/net/iavf/iavf_rxtx.c
> +++ b/drivers/net/iavf/iavf_rxtx.c
> @@ -73,7 +73,7 @@ iavf_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> 
>  	/* registers are 64-bit */
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
> index 92fbbc18da..4e349bfa3f 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -43,7 +43,7 @@ ice_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> 
>  	/* register is 16-bit */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index d69f36e977..2793718171 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1385,7 +1385,7 @@ ixgbe_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	 * we expect the DD bit to be set to 1 if this descriptor was already
>  	 * written to.
>  	 */
> -	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +	pmc->val = 0;
>  	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> 
>  	/* the registers are 32-bit */
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 6cd71a44eb..3cbbe5bf59 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -282,7 +282,7 @@ int mlx5_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  		return -rte_errno;
>  	}
>  	pmc->addr = &cqe->op_own;
> -	pmc->val =  !!idx;
> +	pmc->val =  !idx;
>  	pmc->mask = MLX5_CQE_OWNER_MASK;
>  	pmc->size = sizeof(uint8_t);
>  	return 0;
> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h
> b/lib/eal/include/generic/rte_power_intrinsics.h
> index dddca3d41c..28c481a8d2 100644
> --- a/lib/eal/include/generic/rte_power_intrinsics.h
> +++ b/lib/eal/include/generic/rte_power_intrinsics.h
> @@ -45,10 +45,10 @@ struct rte_power_monitor_cond {
>   * Additionally, an expected value (`pmc->val`), mask (`pmc->mask`), and
> data
>   * size (`pmc->size`) are provided in the `pmc` power monitoring condition. If
>   * the mask is non-zero, the current value pointed to by the `pmc->addr`
> pointer
> - * will be read and compared against the expected value, and if they match,
> the
> - * entering of optimized power state will be aborted. This is intended to
> - * prevent the CPU from entering optimized power state and waiting on a
> write
> - * that has already happened by the time this API is called.
> + * will be read and compared against the expected value, and if they do not
> + * match, the entering of optimized power state will be aborted. This is
> + * intended to prevent the CPU from entering optimized power state and
> waiting
> + * on a write that has already happened by the time this API is called.
>   *
>   * @warning It is responsibility of the user to check if this function is
>   *   supported at runtime using `rte_cpu_get_intrinsics_support()` API call.
> diff --git a/lib/eal/x86/rte_power_intrinsics.c
> b/lib/eal/x86/rte_power_intrinsics.c
> index 39ea9fdecd..7f0588d70e 100644
> --- a/lib/eal/x86/rte_power_intrinsics.c
> +++ b/lib/eal/x86/rte_power_intrinsics.c
> @@ -116,8 +116,8 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  				pmc->addr, pmc->size);
>  		const uint64_t masked = cur_value & pmc->mask;
> 
> -		/* if the masked value is already matching, abort */
> -		if (masked == pmc->val)
> +		/* if the masked value is not matching, abort */
> +		if (masked != pmc->val)
>  			goto end;
>  	}
> 
> --
> 2.25.1
  
Burakov, Anatoly May 27, 2021, 1:06 p.m. UTC | #5
On 25-May-21 10:15 AM, Liu, Yong wrote:
> 
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Anatoly Burakov
>> Sent: Tuesday, May 11, 2021 11:32 PM
>> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Xing,
>> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
>> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> Wang, Haiyue <haiyue.wang@intel.com>; Matan Azrad
>> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Viacheslav
>> Ovsiienko <viacheslavo@nvidia.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev@intel.com>
>> Cc: Loftus, Ciara <ciara.loftus@intel.com>
>> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
>>
>> Previously, the semantics of power monitor were such that we were
>> checking current value against the expected value, and if they matched,
>> then the sleep was aborted. This is somewhat inflexible, because it only
>> allowed us to check for a specific value.
>>
>> We can reverse the check, and instead have monitor sleep to be aborted
>> if the expected value *doesn't* match what's in memory. This allows us
>> to both implement all currently implemented driver code, as well as
>> support more use cases which don't easily map to previous semantics
>> (such as waiting on writes to AF_XDP counter value).
>>
> 
> Hi Anatoly,
> In virtio spec, packed formatted descriptor utilizes two bits for representing the status. One bit for available status, one bit for used status.
> For checking the status more precisely, it is need to check value against the expected value.
> The monitor function in virtio datapath still can work with new semantics, but it may lead to some useless io call.
> Base on that, I'd like to keep previous semantics.
> 
> Regards,
> Marvin
> 

Thanks for your feedback! Would making this an option make things 
better? Because we need the inverted semantics for AF_XDP, it can't work 
without it. So, we either invert all of them, or we have an option to do 
regular or inverted check on a per-condition basis. Would that work?
  
Marvin Liu May 28, 2021, 1:07 a.m. UTC | #6
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Thursday, May 27, 2021 9:07 PM
> To: Liu, Yong <yong.liu@intel.com>; dev@dpdk.org; McDaniel, Timothy
> <timothy.mcdaniel@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: Re: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor
> check
> 
> On 25-May-21 10:15 AM, Liu, Yong wrote:
> >
> >
> >> -----Original Message-----
> >> From: dev <dev-bounces@dpdk.org> On Behalf Of Anatoly Burakov
> >> Sent: Tuesday, May 11, 2021 11:32 PM
> >> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>;
> Xing,
> >> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
> >> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> Wang, Haiyue <haiyue.wang@intel.com>; Matan Azrad
> >> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Viacheslav
> >> Ovsiienko <viacheslavo@nvidia.com>; Richardson, Bruce
> >> <bruce.richardson@intel.com>; Ananyev, Konstantin
> >> <konstantin.ananyev@intel.com>
> >> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> >> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
> >>
> >> Previously, the semantics of power monitor were such that we were
> >> checking current value against the expected value, and if they matched,
> >> then the sleep was aborted. This is somewhat inflexible, because it only
> >> allowed us to check for a specific value.
> >>
> >> We can reverse the check, and instead have monitor sleep to be aborted
> >> if the expected value *doesn't* match what's in memory. This allows us
> >> to both implement all currently implemented driver code, as well as
> >> support more use cases which don't easily map to previous semantics
> >> (such as waiting on writes to AF_XDP counter value).
> >>
> >
> > Hi Anatoly,
> > In virtio spec, packed formatted descriptor utilizes two bits for representing
> the status. One bit for available status, one bit for used status.
> > For checking the status more precisely, it is need to check value against the
> expected value.
> > The monitor function in virtio datapath still can work with new semantics,
> but it may lead to some useless io call.
> > Base on that, I'd like to keep previous semantics.
> >
> > Regards,
> > Marvin
> >
> 
> Thanks for your feedback! Would making this an option make things
> better? Because we need the inverted semantics for AF_XDP, it can't work
> without it. So, we either invert all of them, or we have an option to do
> regular or inverted check on a per-condition basis. Would that work?
> 

That will be great if we can select the check type based on input parameter.
Just in virtio datapath, we need both inverted and original semantics for different ring formats. 

Regards,
Marvin

> 
> --
> Thanks,
> Anatoly
  
Ananyev, Konstantin May 28, 2021, 9:09 a.m. UTC | #7
> >
> > On 25-May-21 10:15 AM, Liu, Yong wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: dev <dev-bounces@dpdk.org> On Behalf Of Anatoly Burakov
> > >> Sent: Tuesday, May 11, 2021 11:32 PM
> > >> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>;
> > Xing,
> > >> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
> > >> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > >> Wang, Haiyue <haiyue.wang@intel.com>; Matan Azrad
> > >> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Viacheslav
> > >> Ovsiienko <viacheslavo@nvidia.com>; Richardson, Bruce
> > >> <bruce.richardson@intel.com>; Ananyev, Konstantin
> > >> <konstantin.ananyev@intel.com>
> > >> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> > >> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
> > >>
> > >> Previously, the semantics of power monitor were such that we were
> > >> checking current value against the expected value, and if they matched,
> > >> then the sleep was aborted. This is somewhat inflexible, because it only
> > >> allowed us to check for a specific value.
> > >>
> > >> We can reverse the check, and instead have monitor sleep to be aborted
> > >> if the expected value *doesn't* match what's in memory. This allows us
> > >> to both implement all currently implemented driver code, as well as
> > >> support more use cases which don't easily map to previous semantics
> > >> (such as waiting on writes to AF_XDP counter value).
> > >>
> > >
> > > Hi Anatoly,
> > > In virtio spec, packed formatted descriptor utilizes two bits for representing
> > the status. One bit for available status, one bit for used status.
> > > For checking the status more precisely, it is need to check value against the
> > expected value.
> > > The monitor function in virtio datapath still can work with new semantics,
> > but it may lead to some useless io call.
> > > Base on that, I'd like to keep previous semantics.
> > >
> > > Regards,
> > > Marvin
> > >
> >
> > Thanks for your feedback! Would making this an option make things
> > better? Because we need the inverted semantics for AF_XDP, it can't work
> > without it. So, we either invert all of them, or we have an option to do
> > regular or inverted check on a per-condition basis. Would that work?
> >
> 
> That will be great if we can select the check type based on input parameter.
> Just in virtio datapath, we need both inverted and original semantics for different ring formats.
> 

Should we probably the consider introducing _check_ callback to be provided by PMD?
So we can leave these various check details inside PMD itself. 
And monitor will just read the specified address and call the callback.
Konstantin
  
Burakov, Anatoly June 1, 2021, 2:21 p.m. UTC | #8
On 28-May-21 10:09 AM, Ananyev, Konstantin wrote:
> 
>>>
>>> On 25-May-21 10:15 AM, Liu, Yong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: dev <dev-bounces@dpdk.org> On Behalf Of Anatoly Burakov
>>>>> Sent: Tuesday, May 11, 2021 11:32 PM
>>>>> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>;
>>> Xing,
>>>>> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
>>>>> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>>>>> Wang, Haiyue <haiyue.wang@intel.com>; Matan Azrad
>>>>> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Viacheslav
>>>>> Ovsiienko <viacheslavo@nvidia.com>; Richardson, Bruce
>>>>> <bruce.richardson@intel.com>; Ananyev, Konstantin
>>>>> <konstantin.ananyev@intel.com>
>>>>> Cc: Loftus, Ciara <ciara.loftus@intel.com>
>>>>> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
>>>>>
>>>>> Previously, the semantics of power monitor were such that we were
>>>>> checking current value against the expected value, and if they matched,
>>>>> then the sleep was aborted. This is somewhat inflexible, because it only
>>>>> allowed us to check for a specific value.
>>>>>
>>>>> We can reverse the check, and instead have monitor sleep to be aborted
>>>>> if the expected value *doesn't* match what's in memory. This allows us
>>>>> to both implement all currently implemented driver code, as well as
>>>>> support more use cases which don't easily map to previous semantics
>>>>> (such as waiting on writes to AF_XDP counter value).
>>>>>
>>>>
>>>> Hi Anatoly,
>>>> In virtio spec, packed formatted descriptor utilizes two bits for representing
>>> the status. One bit for available status, one bit for used status.
>>>> For checking the status more precisely, it is need to check value against the
>>> expected value.
>>>> The monitor function in virtio datapath still can work with new semantics,
>>> but it may lead to some useless io call.
>>>> Base on that, I'd like to keep previous semantics.
>>>>
>>>> Regards,
>>>> Marvin
>>>>
>>>
>>> Thanks for your feedback! Would making this an option make things
>>> better? Because we need the inverted semantics for AF_XDP, it can't work
>>> without it. So, we either invert all of them, or we have an option to do
>>> regular or inverted check on a per-condition basis. Would that work?
>>>
>>
>> That will be great if we can select the check type based on input parameter.
>> Just in virtio datapath, we need both inverted and original semantics for different ring formats.
>>
> 
> Should we probably the consider introducing _check_ callback to be provided by PMD?
> So we can leave these various check details inside PMD itself.
> And monitor will just read the specified address and call the callback.
> Konstantin
> 

Getting monitor condition *is* "the check" IMO. I think adding an option 
to the comparison should cover pretty much all worthwhile use cases 
without overcomplicating things. In any case, patches already sent [1] :)

[1] http://patches.dpdk.org/project/dpdk/list/?series=17191
  

Patch

diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index 3570678b9e..5701bbb8ab 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3188,7 +3188,7 @@  dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			&cq_base[qm_port->cq_idx];
 		monitor_addr++; /* cq_gen bit is in second 64bit location */
 
-		if (qm_port->gen_bit)
+		if (!qm_port->gen_bit)
 			expected_value = qe_mask.raw_qe[1];
 		else
 			expected_value = 0;
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 02cf5e787c..4617ae914a 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -88,7 +88,7 @@  i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	 * we expect the DD bit to be set to 1 if this descriptor was already
 	 * written to.
 	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	pmc->val = 0;
 	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
 
 	/* registers are 64-bit */
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 87f7eebc65..d8d9cc860c 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -73,7 +73,7 @@  iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	 * we expect the DD bit to be set to 1 if this descriptor was already
 	 * written to.
 	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	pmc->val = 0;
 	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
 
 	/* registers are 64-bit */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 92fbbc18da..4e349bfa3f 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -43,7 +43,7 @@  ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	 * we expect the DD bit to be set to 1 if this descriptor was already
 	 * written to.
 	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	pmc->val = 0;
 	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
 
 	/* register is 16-bit */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..2793718171 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1385,7 +1385,7 @@  ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	 * we expect the DD bit to be set to 1 if this descriptor was already
 	 * written to.
 	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	pmc->val = 0;
 	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
 
 	/* the registers are 32-bit */
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 6cd71a44eb..3cbbe5bf59 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -282,7 +282,7 @@  int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
+	pmc->val =  !idx;
 	pmc->mask = MLX5_CQE_OWNER_MASK;
 	pmc->size = sizeof(uint8_t);
 	return 0;
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..28c481a8d2 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -45,10 +45,10 @@  struct rte_power_monitor_cond {
  * Additionally, an expected value (`pmc->val`), mask (`pmc->mask`), and data
  * size (`pmc->size`) are provided in the `pmc` power monitoring condition. If
  * the mask is non-zero, the current value pointed to by the `pmc->addr` pointer
- * will be read and compared against the expected value, and if they match, the
- * entering of optimized power state will be aborted. This is intended to
- * prevent the CPU from entering optimized power state and waiting on a write
- * that has already happened by the time this API is called.
+ * will be read and compared against the expected value, and if they do not
+ * match, the entering of optimized power state will be aborted. This is
+ * intended to prevent the CPU from entering optimized power state and waiting
+ * on a write that has already happened by the time this API is called.
  *
  * @warning It is responsibility of the user to check if this function is
  *   supported at runtime using `rte_cpu_get_intrinsics_support()` API call.
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..7f0588d70e 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -116,8 +116,8 @@  rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 				pmc->addr, pmc->size);
 		const uint64_t masked = cur_value & pmc->mask;
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
+		/* if the masked value is not matching, abort */
+		if (masked != pmc->val)
 			goto end;
 	}