[v2,1/2] service: add API to retrieve service core active

Message ID 20200720143829.46280-1-harry.van.haaren@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v2,1/2] service: add API to retrieve service core active |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS

Commit Message

Van Haaren, Harry July 20, 2020, 2:38 p.m. UTC
  This commit adds a new experimental API which allows the user
to retrieve the active state of an lcore. Knowing when the service
lcore is completed its polling loop can be useful to applications
to avoid race conditions when e.g. finalizing statistics.

The service thread itself now has a variable to indicate if its
thread is active. When zero the service thread has completed its
service, and has returned from the service_runner_func() function.

Suggested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

Thanks for feedback Lukasz, please have a look and see if this was
what you were expecting?

Honnappa/Phil, are the __atomic_load/store's correct?

---
 lib/librte_eal/common/rte_service.c  | 14 ++++++++++++++
 lib/librte_eal/include/rte_service.h |  9 +++++++++
 lib/librte_eal/rte_eal_version.map   |  1 +
 3 files changed, 24 insertions(+)
  

Comments

Lukasz Wojciechowski July 20, 2020, 5:45 p.m. UTC | #1
W dniu 20.07.2020 o 16:38, Harry van Haaren pisze:
> This commit adds a new experimental API which allows the user
> to retrieve the active state of an lcore. Knowing when the service
> lcore is completed its polling loop can be useful to applications
> to avoid race conditions when e.g. finalizing statistics.
>
> The service thread itself now has a variable to indicate if its
> thread is active. When zero the service thread has completed its
> service, and has returned from the service_runner_func() function.
>
> Suggested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>
> ---
>
> Thanks for feedback Lukasz, please have a look and see if this was
> what you were expecting?
>
> Honnappa/Phil, are the __atomic_load/store's correct?
>
> ---
>   lib/librte_eal/common/rte_service.c  | 14 ++++++++++++++
>   lib/librte_eal/include/rte_service.h |  9 +++++++++
>   lib/librte_eal/rte_eal_version.map   |  1 +
>   3 files changed, 24 insertions(+)
>
> diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
> index 6a0e0ff65..4c276b006 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -65,6 +65,7 @@ struct core_state {
>   	/* map of services IDs are run on this core */
>   	uint64_t service_mask;
>   	uint8_t runstate; /* running or stopped */
> +	uint8_t thread_active; /* indicates when thread is in service_run() */
>   	uint8_t is_service_core; /* set if core is currently a service core */
>   	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
>   	uint64_t loops;
> @@ -457,6 +458,8 @@ service_runner_func(void *arg)
>   	const int lcore = rte_lcore_id();
>   	struct core_state *cs = &lcore_states[lcore];
>   
> +	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> +
>   	/* runstate act as the guard variable. Use load-acquire
>   	 * memory order here to synchronize with store-release
>   	 * in runstate update functions.
> @@ -475,9 +478,20 @@ service_runner_func(void *arg)
>   		cs->loops++;
>   	}
>   
> +	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
>   	return 0;
>   }
>   
> +int32_t
> +rte_service_lcore_active(uint32_t lcore)
> +{
> +	if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
> +		return -EINVAL;
> +
> +	return __atomic_load_n(&lcore_states[lcore].thread_active,
> +			       __ATOMIC_RELAXED);
> +}
> +
>   int32_t
>   rte_service_lcore_count(void)
>   {
> diff --git a/lib/librte_eal/include/rte_service.h b/lib/librte_eal/include/rte_service.h
> index e2d0a6dd3..f7134b5c0 100644
> --- a/lib/librte_eal/include/rte_service.h
> +++ b/lib/librte_eal/include/rte_service.h
> @@ -261,6 +261,15 @@ int32_t rte_service_lcore_start(uint32_t lcore_id);
>    */
>   int32_t rte_service_lcore_stop(uint32_t lcore_id);
>   
> +/**
> + * Reports if a service lcore is currently running.
> + * @retval 0 Service thread is not active, and has been returned to EAL.
> + * @retval 1 Service thread is in the service core polling loop.
> + * @retval -EINVAL Invalid *lcore_id* provided.
> + */
> +__rte_experimental
> +int32_t rte_service_lcore_active(uint32_t lcore_id);
> +
>   /**
>    * Adds lcore to the list of service cores.
>    *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index bf0c17c23..d53d5d5b9 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -401,6 +401,7 @@ EXPERIMENTAL {
>   	rte_lcore_dump;
>   	rte_lcore_iterate;
>   	rte_mp_disable;
> +	rte_service_lcore_active;
>   	rte_thread_register;
>   	rte_thread_unregister;
>   };
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
  
Phil Yang July 21, 2020, 7:47 a.m. UTC | #2
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Harry van Haaren
> Sent: Monday, July 20, 2020 10:38 PM
> To: dev@dpdk.org
> Cc: david.marchand@redhat.com; igor.romanov@oktetlabs.ru; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; ferruh.yigit@intel.com; nd
> <nd@arm.com>; aconole@redhat.com;
> l.wojciechow@partner.samsung.com; Harry van Haaren
> <harry.van.haaren@intel.com>
> Subject: [dpdk-dev] [PATCH v2 1/2] service: add API to retrieve service core
> active
> 
> This commit adds a new experimental API which allows the user
> to retrieve the active state of an lcore. Knowing when the service
> lcore is completed its polling loop can be useful to applications
> to avoid race conditions when e.g. finalizing statistics.
> 
> The service thread itself now has a variable to indicate if its
> thread is active. When zero the service thread has completed its
> service, and has returned from the service_runner_func() function.
> 
> Suggested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> 

It looks good to me.

Reviewed-by: Phil Yang <phil.yang@arm.com>

Thanks,
Phil
<...>
  
Honnappa Nagarahalli July 21, 2020, 7:43 p.m. UTC | #3
<snip>

> Subject: [PATCH v2 1/2] service: add API to retrieve service core active
> 
> This commit adds a new experimental API which allows the user to retrieve
> the active state of an lcore. Knowing when the service lcore is completed its
> polling loop can be useful to applications to avoid race conditions when e.g.
> finalizing statistics.
> 
> The service thread itself now has a variable to indicate if its thread is active.
> When zero the service thread has completed its service, and has returned
> from the service_runner_func() function.
> 
> Suggested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> 
> ---
> 
> Thanks for feedback Lukasz, please have a look and see if this was what you
> were expecting?
> 
> Honnappa/Phil, are the __atomic_load/store's correct?
> 
> ---
>  lib/librte_eal/common/rte_service.c  | 14 ++++++++++++++
> lib/librte_eal/include/rte_service.h |  9 +++++++++
>  lib/librte_eal/rte_eal_version.map   |  1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/lib/librte_eal/common/rte_service.c
> b/lib/librte_eal/common/rte_service.c
> index 6a0e0ff65..4c276b006 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -65,6 +65,7 @@ struct core_state {
>  	/* map of services IDs are run on this core */
>  	uint64_t service_mask;
>  	uint8_t runstate; /* running or stopped */
> +	uint8_t thread_active; /* indicates when thread is in service_run() */
Nit, if possible would be good to describe the difference between 'runstate' and 'thread_active'.

>  	uint8_t is_service_core; /* set if core is currently a service core */
>  	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
>  	uint64_t loops;
> @@ -457,6 +458,8 @@ service_runner_func(void *arg)
>  	const int lcore = rte_lcore_id();
>  	struct core_state *cs = &lcore_states[lcore];
> 
> +	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
Essentially, we have to ensure that all the memory operations are contained within both stores (this one and the one below) to 'thread_active'.
We should use __ATOMIC_SEQ_CST, which will avoid any memory operations from getting hoisted above this line.
Performance should not be an issue as it will get executed only when the service core is started.
It would be good to add comment reasoning the memory order used.

Also, what happens if the user calls 'rte_service_lcore_active' just before the above statement is executed? It might not be a valid use case, but it is good to document the race conditions and correct sequence of API calls.

> +
>  	/* runstate act as the guard variable. Use load-acquire
>  	 * memory order here to synchronize with store-release
>  	 * in runstate update functions.
> @@ -475,9 +478,20 @@ service_runner_func(void *arg)
>  		cs->loops++;
>  	}
> 
> +	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
__ATOMIC_RELEASE would be enough. But, __ATOMIC_SEQ_CST should not cause any performance issues.

>  	return 0;
>  }
> 
> +int32_t
> +rte_service_lcore_active(uint32_t lcore) {
> +	if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
> +		return -EINVAL;
> +
> +	return __atomic_load_n(&lcore_states[lcore].thread_active,
> +			       __ATOMIC_RELAXED);
> +}
> +
>  int32_t
>  rte_service_lcore_count(void)
>  {
> diff --git a/lib/librte_eal/include/rte_service.h
> b/lib/librte_eal/include/rte_service.h
> index e2d0a6dd3..f7134b5c0 100644
> --- a/lib/librte_eal/include/rte_service.h
> +++ b/lib/librte_eal/include/rte_service.h
> @@ -261,6 +261,15 @@ int32_t rte_service_lcore_start(uint32_t lcore_id);
>   */
>  int32_t rte_service_lcore_stop(uint32_t lcore_id);
> 
> +/**
> + * Reports if a service lcore is currently running.
> + * @retval 0 Service thread is not active, and has been returned to EAL.
> + * @retval 1 Service thread is in the service core polling loop.
> + * @retval -EINVAL Invalid *lcore_id* provided.
> + */
> +__rte_experimental
> +int32_t rte_service_lcore_active(uint32_t lcore_id);
Would 'rte_service_lcore_may_be_active' better? It would be inline with 'rte_service_may_be_active'?

I think we need additional documentation for 'rte_service_lcore_stop' to indicate that the caller should not assume that the service thread on the lcore has stopped running and should call the above API to check.

> +
>  /**
>   * Adds lcore to the list of service cores.
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index bf0c17c23..d53d5d5b9 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -401,6 +401,7 @@ EXPERIMENTAL {
>  	rte_lcore_dump;
>  	rte_lcore_iterate;
>  	rte_mp_disable;
> +	rte_service_lcore_active;
>  	rte_thread_register;
>  	rte_thread_unregister;
>  };
> --
> 2.17.1
  
David Marchand July 21, 2020, 7:50 p.m. UTC | #4
On Tue, Jul 21, 2020 at 9:43 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > @@ -457,6 +458,8 @@ service_runner_func(void *arg)
> >       const int lcore = rte_lcore_id();
> >       struct core_state *cs = &lcore_states[lcore];
> >
> > +     __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> Essentially, we have to ensure that all the memory operations are contained within both stores (this one and the one below) to 'thread_active'.
> We should use __ATOMIC_SEQ_CST, which will avoid any memory operations from getting hoisted above this line.
> Performance should not be an issue as it will get executed only when the service core is started.
> It would be good to add comment reasoning the memory order used.
>
> Also, what happens if the user calls 'rte_service_lcore_active' just before the above statement is executed? It might not be a valid use case, but it is good to document the race conditions and correct sequence of API calls.
>
> > +
> >       /* runstate act as the guard variable. Use load-acquire
> >        * memory order here to synchronize with store-release
> >        * in runstate update functions.
> > @@ -475,9 +478,20 @@ service_runner_func(void *arg)
> >               cs->loops++;
> >       }
> >
> > +     __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
> __ATOMIC_RELEASE would be enough. But, __ATOMIC_SEQ_CST should not cause any performance issues.

But then are we missing a ACQUIRE barrier in rte_service_lcore_active?


>
> >       return 0;
> >  }
> >
> > +int32_t
> > +rte_service_lcore_active(uint32_t lcore) {
> > +     if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
> > +             return -EINVAL;
> > +
> > +     return __atomic_load_n(&lcore_states[lcore].thread_active,
> > +                            __ATOMIC_RELAXED);
> > +}
> > +
> >  int32_t
> >  rte_service_lcore_count(void)
> >  {
> > diff --git a/lib/librte_eal/include/rte_service.h
> > b/lib/librte_eal/include/rte_service.h
> > index e2d0a6dd3..f7134b5c0 100644
> > --- a/lib/librte_eal/include/rte_service.h
> > +++ b/lib/librte_eal/include/rte_service.h
> > @@ -261,6 +261,15 @@ int32_t rte_service_lcore_start(uint32_t lcore_id);
> >   */
> >  int32_t rte_service_lcore_stop(uint32_t lcore_id);
> >
> > +/**
> > + * Reports if a service lcore is currently running.
> > + * @retval 0 Service thread is not active, and has been returned to EAL.
> > + * @retval 1 Service thread is in the service core polling loop.
> > + * @retval -EINVAL Invalid *lcore_id* provided.
> > + */
> > +__rte_experimental
> > +int32_t rte_service_lcore_active(uint32_t lcore_id);
> Would 'rte_service_lcore_may_be_active' better? It would be inline with 'rte_service_may_be_active'?
>
> I think we need additional documentation for 'rte_service_lcore_stop' to indicate that the caller should not assume that the service thread on the lcore has stopped running and should call the above API to check.

+1
Additional documentation can't hurt.
  
Honnappa Nagarahalli July 21, 2020, 8:23 p.m. UTC | #5
> Subject: Re: [PATCH v2 1/2] service: add API to retrieve service core active
> 
> On Tue, Jul 21, 2020 at 9:43 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> > > @@ -457,6 +458,8 @@ service_runner_func(void *arg)
> > >       const int lcore = rte_lcore_id();
> > >       struct core_state *cs = &lcore_states[lcore];
> > >
> > > +     __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> > Essentially, we have to ensure that all the memory operations are contained
> within both stores (this one and the one below) to 'thread_active'.
> > We should use __ATOMIC_SEQ_CST, which will avoid any memory
> operations from getting hoisted above this line.
> > Performance should not be an issue as it will get executed only when the
> service core is started.
> > It would be good to add comment reasoning the memory order used.
> >
> > Also, what happens if the user calls 'rte_service_lcore_active' just before the
> above statement is executed? It might not be a valid use case, but it is good
> to document the race conditions and correct sequence of API calls.
> >
> > > +
> > >       /* runstate act as the guard variable. Use load-acquire
> > >        * memory order here to synchronize with store-release
> > >        * in runstate update functions.
> > > @@ -475,9 +478,20 @@ service_runner_func(void *arg)
> > >               cs->loops++;
> > >       }
> > >
> > > +     __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
> > __ATOMIC_RELEASE would be enough. But, __ATOMIC_SEQ_CST should not
> cause any performance issues.
> 
> But then are we missing a ACQUIRE barrier in rte_service_lcore_active?
+1 (see below)

> 
> 
> >
> > >       return 0;
> > >  }
> > >
> > > +int32_t
> > > +rte_service_lcore_active(uint32_t lcore) {
> > > +     if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
> > > +             return -EINVAL;
> > > +
> > > +     return __atomic_load_n(&lcore_states[lcore].thread_active,
> > > +                            __ATOMIC_RELAXED); }
Agree with David. ACQUIRE is the safer order to ensure memory operations are not hoisted in cases like:

if (rte_service_lcore_active(lcore) == 0) {
	<do something>; /* Do not allow the memory operations to hoist above 'if' statement */
}

> > > +
> > >  int32_t
> > >  rte_service_lcore_count(void)
> > >  {
> > > diff --git a/lib/librte_eal/include/rte_service.h
> > > b/lib/librte_eal/include/rte_service.h
> > > index e2d0a6dd3..f7134b5c0 100644
> > > --- a/lib/librte_eal/include/rte_service.h
> > > +++ b/lib/librte_eal/include/rte_service.h
> > > @@ -261,6 +261,15 @@ int32_t rte_service_lcore_start(uint32_t lcore_id);
> > >   */
> > >  int32_t rte_service_lcore_stop(uint32_t lcore_id);
> > >
> > > +/**
> > > + * Reports if a service lcore is currently running.
> > > + * @retval 0 Service thread is not active, and has been returned to EAL.
> > > + * @retval 1 Service thread is in the service core polling loop.
> > > + * @retval -EINVAL Invalid *lcore_id* provided.
> > > + */
> > > +__rte_experimental
> > > +int32_t rte_service_lcore_active(uint32_t lcore_id);
> > Would 'rte_service_lcore_may_be_active' better? It would be inline with
> 'rte_service_may_be_active'?
> >
> > I think we need additional documentation for 'rte_service_lcore_stop' to
> indicate that the caller should not assume that the service thread on the lcore
> has stopped running and should call the above API to check.
> 
> +1
> Additional documentation can't hurt.
> 
> 
> --
> David Marchand
  
Van Haaren, Harry July 22, 2020, 10:14 a.m. UTC | #6
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Tuesday, July 21, 2020 9:24 PM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org;
> igor.romanov@oktetlabs.ru; Yigit, Ferruh <ferruh.yigit@intel.com>; nd
> <nd@arm.com>; aconole@redhat.com; l.wojciechow@partner.samsung.com; Phil
> Yang <Phil.Yang@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v2 1/2] service: add API to retrieve service core active
> 
> > Subject: Re: [PATCH v2 1/2] service: add API to retrieve service core active
> >
> > On Tue, Jul 21, 2020 at 9:43 PM Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com> wrote:
> > > > @@ -457,6 +458,8 @@ service_runner_func(void *arg)
> > > >       const int lcore = rte_lcore_id();
> > > >       struct core_state *cs = &lcore_states[lcore];
> > > >
> > > > +     __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> > > Essentially, we have to ensure that all the memory operations are contained
> > within both stores (this one and the one below) to 'thread_active'.
> > > We should use __ATOMIC_SEQ_CST, which will avoid any memory
> > operations from getting hoisted above this line.
> > > Performance should not be an issue as it will get executed only when the
> > service core is started.
> > > It would be good to add comment reasoning the memory order used.

OK, will update to SEQ_CST in v3, and add comment.

> > > Also, what happens if the user calls 'rte_service_lcore_active' just before the
> > above statement is executed? It might not be a valid use case, but it is good
> > to document the race conditions and correct sequence of API calls.
> > >
> > > > +
> > > >       /* runstate act as the guard variable. Use load-acquire
> > > >        * memory order here to synchronize with store-release
> > > >        * in runstate update functions.
> > > > @@ -475,9 +478,20 @@ service_runner_func(void *arg)
> > > >               cs->loops++;
> > > >       }
> > > >
> > > > +     __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
> > > __ATOMIC_RELEASE would be enough. But, __ATOMIC_SEQ_CST should not
> > cause any performance issues.
> >
> > But then are we missing a ACQUIRE barrier in rte_service_lcore_active?
> +1 (see below)

OK, will update to SEQ_CST in v3, with comment.


> > > > +int32_t
> > > > +rte_service_lcore_active(uint32_t lcore) {
> > > > +     if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
> > > > +             return -EINVAL;
> > > > +
> > > > +     return __atomic_load_n(&lcore_states[lcore].thread_active,
> > > > +                            __ATOMIC_RELAXED); }
> Agree with David. ACQUIRE is the safer order to ensure memory operations are not
> hoisted in cases like:
> 
> if (rte_service_lcore_active(lcore) == 0) {
> 	<do something>; /* Do not allow the memory operations to hoist above 'if'
> statement */
> }

OK, will change to ACQUIRE in v3.

<snip>

> > > > +/**
> > > > + * Reports if a service lcore is currently running.
> > > > + * @retval 0 Service thread is not active, and has been returned to EAL.
> > > > + * @retval 1 Service thread is in the service core polling loop.
> > > > + * @retval -EINVAL Invalid *lcore_id* provided.
> > > > + */
> > > > +__rte_experimental
> > > > +int32_t rte_service_lcore_active(uint32_t lcore_id);
> > > Would 'rte_service_lcore_may_be_active' better? It would be inline with
> > 'rte_service_may_be_active'?

I think the implementation behind the API is different, so I think _may_be_ is not appropriate for service_lcore_active, keeping same function name for v3.

rte_service_lcore_active() checks at a particular point in the calling thread if another thread is active *at that time*. It is either active or not. This is defined, it is deterministic in that the result is either yes or no, and there is no ambiguity at any given check. You're right the value can change *just* after the check - but at the time of the check the answer was deterministic.

rte_service_may_be_active() checks if a service *could* be run by a service core. It is not deterministic. A service lcore only sets a service as "active on lcore" (or not active) when it polls it - this opens a window of nondeterministic result. When a runstate is set to off, there is a window of "unknown" before we know certainly that the service is not run on a service core anymore. That is why I believe the _may_be_ is appropriate for this API, it shows this non determinism.

> > > I think we need additional documentation for 'rte_service_lcore_stop' to
> > indicate that the caller should not assume that the service thread on the lcore
> > has stopped running and should call the above API to check.
> >
> > +1
> > Additional documentation can't hurt.

Will add a section to the _stop() and _lcore_active() in doxygen for v3.
  
Honnappa Nagarahalli July 22, 2020, 6:50 p.m. UTC | #7
+ Gage (as I referring to his commit below)

<snip>

> 
> > > > > +/**
> > > > > + * Reports if a service lcore is currently running.
> > > > > + * @retval 0 Service thread is not active, and has been returned to
> EAL.
> > > > > + * @retval 1 Service thread is in the service core polling loop.
> > > > > + * @retval -EINVAL Invalid *lcore_id* provided.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int32_t rte_service_lcore_active(uint32_t lcore_id);
> > > > Would 'rte_service_lcore_may_be_active' better? It would be inline
> > > > with
> > > 'rte_service_may_be_active'?
> 
> I think the implementation behind the API is different, so I think _may_be_ is
> not appropriate for service_lcore_active, keeping same function name for v3.
> 
> rte_service_lcore_active() checks at a particular point in the calling thread if
> another thread is active *at that time*. It is either active or not. This is
> defined, it is deterministic in that the result is either yes or no, and there is
> no ambiguity at any given check. You're right the value can change *just* after
> the check - but at the time of the check the answer was deterministic.
> 
> rte_service_may_be_active() checks if a service *could* be run by a service
> core. It is not deterministic. A service lcore only sets a service as "active on
> lcore" (or not active) when it polls it - this opens a window of
> nondeterministic result. When a runstate is set to off, there is a window of
> "unknown" before we know certainly that the service is not run on a service
> core anymore. That is why I believe the _may_be_ is appropriate for this API,
> it shows this non determinism.
> 

I am looking at this from the application usage perspective (not the implementation). I am pointing to the similarity that exists with the new API. i.e. when 'rte_service_lcore_stop' is called, it is not known if the service lcore has stopped. If 'rte_service_lcore_active' returns 1, it indicates the lcore 'may be' active (the reasoning could be different in this case), it is not guaranteed that it is active by the time caller checks it. But when the API returns 0, it guarantees that the service lcore (assuming it was started and verified that it was started), has stopped.

Looking at the commit e30dd31847d212cd1b766612cbd980c7d8240baa that added the 'rte_service_may_be_active', the use case is a mechanism to identify a quiescent state after a service was stopped.
The use case for the new API is also the same. We want to identify a quiescent state after a service lcore is stopped.

<snip>
  
Van Haaren, Harry July 23, 2020, 4:59 p.m. UTC | #8
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, July 22, 2020 7:50 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev@dpdk.org; igor.romanov@oktetlabs.ru; Yigit, Ferruh
> <ferruh.yigit@intel.com>; nd <nd@arm.com>; aconole@redhat.com;
> l.wojciechow@partner.samsung.com; Phil Yang <Phil.Yang@arm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Eads, Gage
> <gage.eads@intel.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v2 1/2] service: add API to retrieve service core active
> 
> + Gage (as I referring to his commit below)
> 
> <snip>
> 
> >
> > > > > > +/**
> > > > > > + * Reports if a service lcore is currently running.
> > > > > > + * @retval 0 Service thread is not active, and has been returned to
> > EAL.
> > > > > > + * @retval 1 Service thread is in the service core polling loop.
> > > > > > + * @retval -EINVAL Invalid *lcore_id* provided.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int32_t rte_service_lcore_active(uint32_t lcore_id);
> > > > > Would 'rte_service_lcore_may_be_active' better? It would be inline
> > > > > with
> > > > 'rte_service_may_be_active'?
> >
> > I think the implementation behind the API is different, so I think _may_be_ is
> > not appropriate for service_lcore_active, keeping same function name for v3.
> >
> > rte_service_lcore_active() checks at a particular point in the calling thread if
> > another thread is active *at that time*. It is either active or not. This is
> > defined, it is deterministic in that the result is either yes or no, and there is
> > no ambiguity at any given check. You're right the value can change *just* after
> > the check - but at the time of the check the answer was deterministic.
> >
> > rte_service_may_be_active() checks if a service *could* be run by a service
> > core. It is not deterministic. A service lcore only sets a service as "active on
> > lcore" (or not active) when it polls it - this opens a window of
> > nondeterministic result. When a runstate is set to off, there is a window of
> > "unknown" before we know certainly that the service is not run on a service
> > core anymore. That is why I believe the _may_be_ is appropriate for this API,
> > it shows this non determinism.
> >
> 
> I am looking at this from the application usage perspective (not the
> implementation). I am pointing to the similarity that exists with the new API. i.e.
> when 'rte_service_lcore_stop' is called, it is not known if the service lcore has
> stopped. If 'rte_service_lcore_active' returns 1, it indicates the lcore 'may be'
> active (the reasoning could be different in this case), it is not guaranteed that it
> is active by the time caller checks it. But when the API returns 0, it guarantees
> that the service lcore (assuming it was started and verified that it was started),
> has stopped.
> 
> Looking at the commit e30dd31847d212cd1b766612cbd980c7d8240baa that
> added the 'rte_service_may_be_active', the use case is a mechanism to identify a
> quiescent state after a service was stopped.
> The use case for the new API is also the same. We want to identify a quiescent
> state after a service lcore is stopped.
> 
> <snip>

Sure - if you feel strongly that we need the _may_be_ to focus the end-users
attention that this is a cross-thread check and has some quiescent property, lets
add it to err on the side of obvious. Will change API name for v3.

Saw the other feedback on patches too - will incorporate and send ASAP, might need
till tomorrow to get it done.
  

Patch

diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
index 6a0e0ff65..4c276b006 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -65,6 +65,7 @@  struct core_state {
 	/* map of services IDs are run on this core */
 	uint64_t service_mask;
 	uint8_t runstate; /* running or stopped */
+	uint8_t thread_active; /* indicates when thread is in service_run() */
 	uint8_t is_service_core; /* set if core is currently a service core */
 	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
 	uint64_t loops;
@@ -457,6 +458,8 @@  service_runner_func(void *arg)
 	const int lcore = rte_lcore_id();
 	struct core_state *cs = &lcore_states[lcore];
 
+	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
+
 	/* runstate act as the guard variable. Use load-acquire
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
@@ -475,9 +478,20 @@  service_runner_func(void *arg)
 		cs->loops++;
 	}
 
+	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
 	return 0;
 }
 
+int32_t
+rte_service_lcore_active(uint32_t lcore)
+{
+	if (lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core)
+		return -EINVAL;
+
+	return __atomic_load_n(&lcore_states[lcore].thread_active,
+			       __ATOMIC_RELAXED);
+}
+
 int32_t
 rte_service_lcore_count(void)
 {
diff --git a/lib/librte_eal/include/rte_service.h b/lib/librte_eal/include/rte_service.h
index e2d0a6dd3..f7134b5c0 100644
--- a/lib/librte_eal/include/rte_service.h
+++ b/lib/librte_eal/include/rte_service.h
@@ -261,6 +261,15 @@  int32_t rte_service_lcore_start(uint32_t lcore_id);
  */
 int32_t rte_service_lcore_stop(uint32_t lcore_id);
 
+/**
+ * Reports if a service lcore is currently running.
+ * @retval 0 Service thread is not active, and has been returned to EAL.
+ * @retval 1 Service thread is in the service core polling loop.
+ * @retval -EINVAL Invalid *lcore_id* provided.
+ */
+__rte_experimental
+int32_t rte_service_lcore_active(uint32_t lcore_id);
+
 /**
  * Adds lcore to the list of service cores.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bf0c17c23..d53d5d5b9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -401,6 +401,7 @@  EXPERIMENTAL {
 	rte_lcore_dump;
 	rte_lcore_iterate;
 	rte_mp_disable;
+	rte_service_lcore_active;
 	rte_thread_register;
 	rte_thread_unregister;
 };