eal: fix lcore state bug

Message ID 20200428012139.32196-1-l.wojciechow@partner.samsung.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series eal: fix lcore state bug |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-nxp-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Lukasz Wojciechowski April 28, 2020, 1:21 a.m. UTC
  The rte_service_lcore_reset_all function stops execution of services
on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
However the thread loop for slave lcores (eal_thread_loop) distincts these
roles to set lcore state after processing delegated function.
It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
So changing the role to RTE before stopping work in slave lcores
causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
must be run after rte_service_lcore_reset_all to bring back lcores to
launchable (WAIT) state.
This has been fixed in test app and clarified in API documentation.

Setting the state to WAIT in rte_service_runner_func is premature
as the rte_service_runner_func function is still a part of the lcore
function delegated to slave lcore. The state is overwritten anyway in
slave lcore thread loop. This premature setting state to WAIT might
however cause rte_eal_lcore_wait, that was called by the application,
to return before slave lcore thread set the FINISHED state. That's
why it is removed from librte_eal rte_service_runner_func function.

Bugzilla ID: 464
Fixes: 21698354c832 ("service: introduce service cores concept")
Fixes: f038a81e1c56 ("service: add unit tests")
Cc: harry.van.haaren@intel.com
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
---
 app/test/test_service_cores.c        | 1 +
 lib/librte_eal/common/rte_service.c  | 2 --
 lib/librte_eal/include/rte_service.h | 4 ++++
 3 files changed, 5 insertions(+), 2 deletions(-)
  

Comments

Ruifeng Wang April 29, 2020, 3:11 a.m. UTC | #1
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
> Sent: Tuesday, April 28, 2020 9:22 AM
> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
>
> The rte_service_lcore_reset_all function stops execution of services on all
> lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> However the thread loop for slave lcores (eal_thread_loop) distincts these
> roles to set lcore state after processing delegated function.
> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> So changing the role to RTE before stopping work in slave lcores causes lcores
> to end in FINISHED state. That is why the rte_eal_lcore_wait must be run
> after rte_service_lcore_reset_all to bring back lcores to launchable (WAIT)
> state.
> This has been fixed in test app and clarified in API documentation.
>
> Setting the state to WAIT in rte_service_runner_func is premature as the
> rte_service_runner_func function is still a part of the lcore function
> delegated to slave lcore. The state is overwritten anyway in slave lcore
> thread loop. This premature setting state to WAIT might however cause
> rte_eal_lcore_wait, that was called by the application, to return before slave
> lcore thread set the FINISHED state. That's why it is removed from librte_eal
> rte_service_runner_func function.
>
> Bugzilla ID: 464
> Fixes: 21698354c832 ("service: introduce service cores concept")
> Fixes: f038a81e1c56 ("service: add unit tests")
> Cc: harry.van.haaren@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> ---
>  app/test/test_service_cores.c        | 1 +
>  lib/librte_eal/common/rte_service.c  | 2 --
> lib/librte_eal/include/rte_service.h | 4 ++++
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
> index a922c7ddc..2a4978e29 100644
> --- a/app/test/test_service_cores.c
> +++ b/app/test/test_service_cores.c
> @@ -114,6 +114,7 @@ unregister_all(void)
>  }
>
>  rte_service_lcore_reset_all();
> +rte_eal_mp_wait_lcore();
>
>  return TEST_SUCCESS;
>  }
> diff --git a/lib/librte_eal/common/rte_service.c
> b/lib/librte_eal/common/rte_service.c
> index 70d17a5d7..018876199 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -458,8 +458,6 @@ rte_service_runner_func(void *arg)
>  rte_smp_rmb();
>  }
>
> -lcore_config[lcore].state = WAIT;
> -
>  return 0;
>  }
>
> diff --git a/lib/librte_eal/include/rte_service.h
> b/lib/librte_eal/include/rte_service.h
> index d8701dd4c..acdda8c54 100644
> --- a/lib/librte_eal/include/rte_service.h
> +++ b/lib/librte_eal/include/rte_service.h
> @@ -300,6 +300,10 @@ int32_t rte_service_lcore_count(void);
>   * from duty, just unmaps all services / cores, and stops() the service cores.
>   * The runstate of services is not modified.
>   *
> + * The cores that are stopped with this call, are in FINISHED state and
> + * the application must take care of bringing them back to a launchable state:
> + * e.g. call *rte_eal_lcore_wait* on the lcore_id.
> + *
>   * @retval 0 Success
>   */
>  int32_t rte_service_lcore_reset_all(void);
> --
> 2.17.1

Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Phil Yang April 29, 2020, 3:07 p.m. UTC | #2
Hi Lukasz,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
> Sent: Tuesday, April 28, 2020 9:22 AM
> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
> 
> The rte_service_lcore_reset_all function stops execution of services
> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> However the thread loop for slave lcores (eal_thread_loop) distincts these
> roles to set lcore state after processing delegated function.
> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> So changing the role to RTE before stopping work in slave lcores
> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> must be run after rte_service_lcore_reset_all to bring back lcores to
> launchable (WAIT) state.

Is that make sense to call rte_eal_mp_wait_lcore() inside rte_serice_lcore_reset_all() ?

<snip>

Thanks,
Phil
  
Lukasz Wojciechowski April 29, 2020, 9:32 p.m. UTC | #3
Hi Phil,

W dniu 29.04.2020 o 17:07, Phil Yang pisze:
> Hi Lukasz,
>
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
>> Sent: Tuesday, April 28, 2020 9:22 AM
>> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
>> <jerin.jacob@caviumnetworks.com>
>> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com; stable@dpdk.org
>> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
>>
>> The rte_service_lcore_reset_all function stops execution of services
>> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
>> However the thread loop for slave lcores (eal_thread_loop) distincts these
>> roles to set lcore state after processing delegated function.
>> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
>> So changing the role to RTE before stopping work in slave lcores
>> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
>> must be run after rte_service_lcore_reset_all to bring back lcores to
>> launchable (WAIT) state.
> Is that make sense to call rte_eal_mp_wait_lcore() inside rte_serice_lcore_reset_all() ?

yeah, I thought about it and in my opinion the answer is no, because if 
the function run on slave lcore is in FINISHED state it means, that 
someone can still read the value returned by the function and the only 
one who can be interested in the value is the one that delegated the 
service.

If we will wait for lcores to end their jobs, read the values and switch 
them to WAIT state, the values will be lost. The application might need 
to read them. We cannot take this possibility from it.

> <snip>
>
> Thanks,
> Phil
  
Phil Yang April 30, 2020, 2:54 a.m. UTC | #4
> -----Original Message-----
> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Sent: Thursday, April 30, 2020 5:32 AM
> To: Phil Yang <Phil.Yang@arm.com>; Harry van Haaren
> <harry.van.haaren@intel.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH] eal: fix lcore state bug
> 
> Hi Phil,
> 
> W dniu 29.04.2020 o 17:07, Phil Yang pisze:
> > Hi Lukasz,
> >
> >> -----Original Message-----
> >> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
> >> Sent: Tuesday, April 28, 2020 9:22 AM
> >> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
> >> <jerin.jacob@caviumnetworks.com>
> >> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com;
> stable@dpdk.org
> >> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
> >>
> >> The rte_service_lcore_reset_all function stops execution of services
> >> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> >> However the thread loop for slave lcores (eal_thread_loop) distincts
> these
> >> roles to set lcore state after processing delegated function.
> >> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> >> So changing the role to RTE before stopping work in slave lcores
> >> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> >> must be run after rte_service_lcore_reset_all to bring back lcores to
> >> launchable (WAIT) state.
> > Is that make sense to call rte_eal_mp_wait_lcore() inside
> rte_serice_lcore_reset_all() ?
> 
> yeah, I thought about it and in my opinion the answer is no, because if
> the function run on slave lcore is in FINISHED state it means, that
> someone can still read the value returned by the function and the only
> one who can be interested in the value is the one that delegated the
> service.
> 
> If we will wait for lcores to end their jobs, read the values and switch
> them to WAIT state, the values will be lost. The application might need
> to read them. We cannot take this possibility from it.

Yeah. I think that is a good point.

Reviewed-by: Phil Yang <phil.yang@arm.com>

> 
> > <snip>
> >
> > Thanks,
> > Phil
> 
> --
> 
> Lukasz Wojciechowski
> Principal Software Engineer
> 
> Samsung R&D Institute Poland
> Samsung Electronics
> Office +48 22 377 88 25
> l.wojciechow@partner.samsung.com
  
Lukasz Wojciechowski April 30, 2020, 9:06 a.m. UTC | #5
W dniu 30.04.2020 o 04:54, Phil Yang pisze:
>> -----Original Message-----
>> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>> Sent: Thursday, April 30, 2020 5:32 AM
>> To: Phil Yang <Phil.Yang@arm.com>; Harry van Haaren
>> <harry.van.haaren@intel.com>; Jerin Jacob
>> <jerin.jacob@caviumnetworks.com>
>> Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>
>> Subject: Re: [dpdk-dev] [PATCH] eal: fix lcore state bug
>>
>> Hi Phil,
>>
>> W dniu 29.04.2020 o 17:07, Phil Yang pisze:
>>> Hi Lukasz,
>>>
>>>> -----Original Message-----
>>>> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
>>>> Sent: Tuesday, April 28, 2020 9:22 AM
>>>> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
>>>> <jerin.jacob@caviumnetworks.com>
>>>> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com;
>> stable@dpdk.org
>>>> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
>>>>
>>>> The rte_service_lcore_reset_all function stops execution of services
>>>> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
>>>> However the thread loop for slave lcores (eal_thread_loop) distincts
>> these
>>>> roles to set lcore state after processing delegated function.
>>>> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
>>>> So changing the role to RTE before stopping work in slave lcores
>>>> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
>>>> must be run after rte_service_lcore_reset_all to bring back lcores to
>>>> launchable (WAIT) state.
>>> Is that make sense to call rte_eal_mp_wait_lcore() inside
>> rte_serice_lcore_reset_all() ?
>>
>> yeah, I thought about it and in my opinion the answer is no, because if
>> the function run on slave lcore is in FINISHED state it means, that
>> someone can still read the value returned by the function and the only
>> one who can be interested in the value is the one that delegated the
>> service.
>>
>> If we will wait for lcores to end their jobs, read the values and switch
>> them to WAIT state, the values will be lost. The application might need
>> to read them. We cannot take this possibility from it.
> Yeah. I think that is a good point.
>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
Thank you
>
>>> <snip>
>>>
>>> Thanks,
>>> Phil
>> --
>>
>> Lukasz Wojciechowski
>> Principal Software Engineer
>>
>> Samsung R&D Institute Poland
>> Samsung Electronics
>> Office +48 22 377 88 25
>> l.wojciechow@partner.samsung.com
  
Van Haaren, Harry May 8, 2020, 4:12 p.m. UTC | #6
> -----Original Message-----
> From: Phil Yang <Phil.Yang@arm.com>
> Sent: Thursday, April 30, 2020 3:54 AM
> To: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>; Van Haaren,
> Harry <harry.van.haaren@intel.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] eal: fix lcore state bug
> 
> > -----Original Message-----
> > From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> > Sent: Thursday, April 30, 2020 5:32 AM
> > To: Phil Yang <Phil.Yang@arm.com>; Harry van Haaren
> > <harry.van.haaren@intel.com>; Jerin Jacob
> > <jerin.jacob@caviumnetworks.com>
> > Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH] eal: fix lcore state bug
> >
> > Hi Phil,
> >
> > W dniu 29.04.2020 o 17:07, Phil Yang pisze:
> > > Hi Lukasz,
> > >
> > >> -----Original Message-----
> > >> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
> > >> Sent: Tuesday, April 28, 2020 9:22 AM
> > >> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
> > >> <jerin.jacob@caviumnetworks.com>
> > >> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com;
> > stable@dpdk.org
> > >> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
> > >>
> > >> The rte_service_lcore_reset_all function stops execution of services
> > >> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> > >> However the thread loop for slave lcores (eal_thread_loop) distincts
> > these
> > >> roles to set lcore state after processing delegated function.
> > >> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> > >> So changing the role to RTE before stopping work in slave lcores
> > >> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> > >> must be run after rte_service_lcore_reset_all to bring back lcores to
> > >> launchable (WAIT) state.
> > >
> > > Is that make sense to call rte_eal_mp_wait_lcore() inside
> > rte_serice_lcore_reset_all() ?
> >
> > yeah, I thought about it and in my opinion the answer is no, because if
> > the function run on slave lcore is in FINISHED state it means, that
> > someone can still read the value returned by the function and the only
> > one who can be interested in the value is the one that delegated the
> > service.
> >
> > If we will wait for lcores to end their jobs, read the values and switch
> > them to WAIT state, the values will be lost. The application might need
> > to read them. We cannot take this possibility from it.

I understand that on exiting, the lcore state is different per service or rte lcore.
The goal was to leave the lcore thread in a state as if it was never used.

As Phil suggested, doing the wait() inside service cores achieves that.
Lukasz's point is that this hides the service core return code.

Is it really a problem if application is not getting access to the return code of the service lcore?
What do we expect the application will care about? Today I'm not aware of any service-lcore
return value that the application should be checking.

> Yeah. I think that is a good point.
> 
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> 
> >
> > > <snip>
> > >
> > > Thanks,
> > > Phil
> >
> > --
> >
> > Lukasz Wojciechowski
> > Principal Software Engineer
> >
> > Samsung R&D Institute Poland
> > Samsung Electronics
> > Office +48 22 377 88 25
> > l.wojciechow@partner.samsung.com
  
Lukasz Wojciechowski May 8, 2020, 5:04 p.m. UTC | #7
W dniu 08.05.2020 o 18:12, Van Haaren, Harry pisze:
>> -----Original Message-----
>> From: Phil Yang <Phil.Yang@arm.com>
>> Sent: Thursday, April 30, 2020 3:54 AM
>> To: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>; Van Haaren,
>> Harry <harry.van.haaren@intel.com>; Jerin Jacob
>> <jerin.jacob@caviumnetworks.com>
>> Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
>> Subject: RE: [dpdk-dev] [PATCH] eal: fix lcore state bug
>>
>>> -----Original Message-----
>>> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>>> Sent: Thursday, April 30, 2020 5:32 AM
>>> To: Phil Yang <Phil.Yang@arm.com>; Harry van Haaren
>>> <harry.van.haaren@intel.com>; Jerin Jacob
>>> <jerin.jacob@caviumnetworks.com>
>>> Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>
>>> Subject: Re: [dpdk-dev] [PATCH] eal: fix lcore state bug
>>>
>>> Hi Phil,
>>>
>>> W dniu 29.04.2020 o 17:07, Phil Yang pisze:
>>>> Hi Lukasz,
>>>>
>>>>> -----Original Message-----
>>>>> From: dev <dev-bounces@dpdk.org> On Behalf Of Lukasz Wojciechowski
>>>>> Sent: Tuesday, April 28, 2020 9:22 AM
>>>>> To: Harry van Haaren <harry.van.haaren@intel.com>; Jerin Jacob
>>>>> <jerin.jacob@caviumnetworks.com>
>>>>> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com;
>>> stable@dpdk.org
>>>>> Subject: [dpdk-dev] [PATCH] eal: fix lcore state bug
>>>>>
>>>>> The rte_service_lcore_reset_all function stops execution of services
>>>>> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
>>>>> However the thread loop for slave lcores (eal_thread_loop) distincts
>>> these
>>>>> roles to set lcore state after processing delegated function.
>>>>> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
>>>>> So changing the role to RTE before stopping work in slave lcores
>>>>> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
>>>>> must be run after rte_service_lcore_reset_all to bring back lcores to
>>>>> launchable (WAIT) state.
>>>> Is that make sense to call rte_eal_mp_wait_lcore() inside
>>> rte_serice_lcore_reset_all() ?
>>>
>>> yeah, I thought about it and in my opinion the answer is no, because if
>>> the function run on slave lcore is in FINISHED state it means, that
>>> someone can still read the value returned by the function and the only
>>> one who can be interested in the value is the one that delegated the
>>> service.
>>>
>>> If we will wait for lcores to end their jobs, read the values and switch
>>> them to WAIT state, the values will be lost. The application might need
>>> to read them. We cannot take this possibility from it.
> I understand that on exiting, the lcore state is different per service or rte lcore.
> The goal was to leave the lcore thread in a state as if it was never used.
>
> As Phil suggested, doing the wait() inside service cores achieves that.
> Lukasz's point is that this hides the service core return code.
>
> Is it really a problem if application is not getting access to the return code of the service lcore?
> What do we expect the application will care about? Today I'm not aware of any service-lcore
> return value that the application should be checking.

It would probably be convenient to don't need to remember about calling 
wait() after the rte_service_lcore_reset_al(). However such a change 
will change the current behavior of the rte_service_lcore_reset_all 
function. We cannot know if someone is not using it with this 
functionality, maybe outside of the dpdk tree.

So I believe that this patch should focus on fixing the current bug, 
reported on bugzilla and do nothing more. IMO change of the API should 
be discussed and introduced separately in another patch if decided so.

>> Yeah. I think that is a good point.
>>
>> Reviewed-by: Phil Yang <phil.yang@arm.com>
>>
>>>> <snip>
>>>>
>>>> Thanks,
>>>> Phil
>>> --
>>>
>>> Lukasz Wojciechowski
>>> Principal Software Engineer
>>>
>>> Samsung R&D Institute Poland
>>> Samsung Electronics
>>> Office +48 22 377 88 25
>>> l.wojciechow@partner.samsung.com
  
Lukasz Wojciechowski May 18, 2020, 6:25 p.m. UTC | #8
Hi David,

The patch is here for quite a while and I believe it's assign to you.
Today there were some questions about it on 
https://bugs.dpdk.org/show_bug.cgi?id=464

Is there anything else to be done, so it can be accepted?

Best regards
Lukasz

W dniu 28.04.2020 o 03:21, Lukasz Wojciechowski pisze:
> The rte_service_lcore_reset_all function stops execution of services
> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> However the thread loop for slave lcores (eal_thread_loop) distincts these
> roles to set lcore state after processing delegated function.
> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> So changing the role to RTE before stopping work in slave lcores
> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> must be run after rte_service_lcore_reset_all to bring back lcores to
> launchable (WAIT) state.
> This has been fixed in test app and clarified in API documentation.
>
> Setting the state to WAIT in rte_service_runner_func is premature
> as the rte_service_runner_func function is still a part of the lcore
> function delegated to slave lcore. The state is overwritten anyway in
> slave lcore thread loop. This premature setting state to WAIT might
> however cause rte_eal_lcore_wait, that was called by the application,
> to return before slave lcore thread set the FINISHED state. That's
> why it is removed from librte_eal rte_service_runner_func function.
>
> Bugzilla ID: 464
> Fixes: 21698354c832 ("service: introduce service cores concept")
> Fixes: f038a81e1c56 ("service: add unit tests")
> Cc: harry.van.haaren@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> ---
>   app/test/test_service_cores.c        | 1 +
>   lib/librte_eal/common/rte_service.c  | 2 --
>   lib/librte_eal/include/rte_service.h | 4 ++++
>   3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
> index a922c7ddc..2a4978e29 100644
> --- a/app/test/test_service_cores.c
> +++ b/app/test/test_service_cores.c
> @@ -114,6 +114,7 @@ unregister_all(void)
>   	}
>   
>   	rte_service_lcore_reset_all();
> +	rte_eal_mp_wait_lcore();
>   
>   	return TEST_SUCCESS;
>   }
> diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
> index 70d17a5d7..018876199 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -458,8 +458,6 @@ rte_service_runner_func(void *arg)
>   		rte_smp_rmb();
>   	}
>   
> -	lcore_config[lcore].state = WAIT;
> -
>   	return 0;
>   }
>   
> diff --git a/lib/librte_eal/include/rte_service.h b/lib/librte_eal/include/rte_service.h
> index d8701dd4c..acdda8c54 100644
> --- a/lib/librte_eal/include/rte_service.h
> +++ b/lib/librte_eal/include/rte_service.h
> @@ -300,6 +300,10 @@ int32_t rte_service_lcore_count(void);
>    * from duty, just unmaps all services / cores, and stops() the service cores.
>    * The runstate of services is not modified.
>    *
> + * The cores that are stopped with this call, are in FINISHED state and
> + * the application must take care of bringing them back to a launchable state:
> + * e.g. call *rte_eal_lcore_wait* on the lcore_id.
> + *
>    * @retval 0 Success
>    */
>   int32_t rte_service_lcore_reset_all(void);
  
David Marchand May 18, 2020, 6:39 p.m. UTC | #9
On Mon, May 18, 2020 at 8:25 PM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> Hi David,
>
> The patch is here for quite a while and I believe it's assign to you.
> Today there were some questions about it on
> https://bugs.dpdk.org/show_bug.cgi?id=464
>
> Is there anything else to be done, so it can be accepted?

I want a clear ACK from a maintainer.
Here Harry, since this is services specific code.
The current discussion seems unfinished to me, maybe Harry can conclude?

Besides, I am reluctant to take EAL changes in rc3 (out of compilation
fixes or fixes for problems introduced in the current release).
  
Lukasz Wojciechowski May 18, 2020, 6:43 p.m. UTC | #10
W dniu 18.05.2020 o 20:39, David Marchand pisze:
> On Mon, May 18, 2020 at 8:25 PM Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com> wrote:
>> Hi David,
>>
>> The patch is here for quite a while and I believe it's assign to you.
>> Today there were some questions about it on
>> https://protect2.fireeye.com/url?k=a0a7498c-fd694ac1-a0a6c2c3-000babff24ad-7a5364411def2fdf&q=1&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D464
>>
>> Is there anything else to be done, so it can be accepted?
> I want a clear ACK from a maintainer.
> Here Harry, since this is services specific code.
> The current discussion seems unfinished to me, maybe Harry can conclude?
>
> Besides, I am reluctant to take EAL changes in rc3 (out of compilation
> fixes or fixes for problems introduced in the current release).

Thanks for the answer.

Let's wait for Harry's opinion and come back to the topic after current 
release.

Sorry for the rush.

>
  
Van Haaren, Harry May 20, 2020, 11:40 a.m. UTC | #11
> -----Original Message-----
> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Sent: Monday, May 18, 2020 7:43 PM
> To: David Marchand <david.marchand@redhat.com>; Van Haaren, Harry
> <harry.van.haaren@intel.com>
> Cc: Jerin Jacob <jerin.jacob@caviumnetworks.com>; dev <dev@dpdk.org>; dpdk
> stable <stable@dpdk.org>
> Subject: Re: [PATCH] eal: fix lcore state bug
> 
> 
> W dniu 18.05.2020 o 20:39, David Marchand pisze:
> > On Mon, May 18, 2020 at 8:25 PM Lukasz Wojciechowski
> > <l.wojciechow@partner.samsung.com> wrote:
> >> Hi David,
> >>
> >> The patch is here for quite a while and I believe it's assign to you.
> >> Today there were some questions about it on
> >> https://protect2.fireeye.com/url?k=a0a7498c-fd694ac1-a0a6c2c3-
> 000babff24ad-
> 7a5364411def2fdf&q=1&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3F
> id%3D464
> >>
> >> Is there anything else to be done, so it can be accepted?
> > I want a clear ACK from a maintainer.
> > Here Harry, since this is services specific code.
> > The current discussion seems unfinished to me, maybe Harry can conclude?
> >
> > Besides, I am reluctant to take EAL changes in rc3 (out of compilation
> > fixes or fixes for problems introduced in the current release).
> 
> Thanks for the answer.
> 
> Let's wait for Harry's opinion and come back to the topic after current
> release.
> 
> Sorry for the rush.

Hi Lukasz,

Also reluctant to make changes in RC3, so let's fix early in the 20.08 timeframe.
Will schedule some time to dig into the detail here and provide detailed input.

Regards, -Harry
  
Lukasz Wojciechowski May 20, 2020, 12:47 p.m. UTC | #12
W dniu 20.05.2020 o 13:40, Van Haaren, Harry pisze:
>> -----Original Message-----
>> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>> Sent: Monday, May 18, 2020 7:43 PM
>> To: David Marchand <david.marchand@redhat.com>; Van Haaren, Harry
>> <harry.van.haaren@intel.com>
>> Cc: Jerin Jacob <jerin.jacob@caviumnetworks.com>; dev <dev@dpdk.org>; dpdk
>> stable <stable@dpdk.org>
>> Subject: Re: [PATCH] eal: fix lcore state bug
>>
>>
>> W dniu 18.05.2020 o 20:39, David Marchand pisze:
>>> On Mon, May 18, 2020 at 8:25 PM Lukasz Wojciechowski
>>> <l.wojciechow@partner.samsung.com> wrote:
>>>> Hi David,
>>>>
>>>> The patch is here for quite a while and I believe it's assign to you.
>>>> Today there were some questions about it on
>>>> https://protect2.fireeye.com/url?k=a0a7498c-fd694ac1-a0a6c2c3-
>> 000babff24ad-
>> 7a5364411def2fdf&q=1&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3F
>> id%3D464
>>>> Is there anything else to be done, so it can be accepted?
>>> I want a clear ACK from a maintainer.
>>> Here Harry, since this is services specific code.
>>> The current discussion seems unfinished to me, maybe Harry can conclude?
>>>
>>> Besides, I am reluctant to take EAL changes in rc3 (out of compilation
>>> fixes or fixes for problems introduced in the current release).
>> Thanks for the answer.
>>
>> Let's wait for Harry's opinion and come back to the topic after current
>> release.
>>
>> Sorry for the rush.
> Hi Lukasz,
>
> Also reluctant to make changes in RC3, so let's fix early in the 20.08 timeframe.
> Will schedule some time to dig into the detail here and provide detailed input.
>
> Regards, -Harry

No problem, all in good time.

Thanks,

Lukasz
  

Patch

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index a922c7ddc..2a4978e29 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -114,6 +114,7 @@  unregister_all(void)
 	}
 
 	rte_service_lcore_reset_all();
+	rte_eal_mp_wait_lcore();
 
 	return TEST_SUCCESS;
 }
diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
index 70d17a5d7..018876199 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -458,8 +458,6 @@  rte_service_runner_func(void *arg)
 		rte_smp_rmb();
 	}
 
-	lcore_config[lcore].state = WAIT;
-
 	return 0;
 }
 
diff --git a/lib/librte_eal/include/rte_service.h b/lib/librte_eal/include/rte_service.h
index d8701dd4c..acdda8c54 100644
--- a/lib/librte_eal/include/rte_service.h
+++ b/lib/librte_eal/include/rte_service.h
@@ -300,6 +300,10 @@  int32_t rte_service_lcore_count(void);
  * from duty, just unmaps all services / cores, and stops() the service cores.
  * The runstate of services is not modified.
  *
+ * The cores that are stopped with this call, are in FINISHED state and
+ * the application must take care of bringing them back to a launchable state:
+ * e.g. call *rte_eal_lcore_wait* on the lcore_id.
+ *
  * @retval 0 Success
  */
 int32_t rte_service_lcore_reset_all(void);