[v2] eal: fix lcore state bug
Checks
Commit Message
The rte_service_lcore_reset_all function stops execution of services
on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
However the thread loop for slave lcores (eal_thread_loop) distincts these
roles to set lcore state after processing delegated function.
It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
So changing the role to RTE before stopping work in slave lcores
causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
must be run after rte_service_lcore_reset_all to bring back lcores to
launchable (WAIT) state.
This has been fixed in test app and clarified in API documentation.
Setting the state to WAIT in rte_service_runner_func is premature
as the rte_service_runner_func function is still a part of the lcore
function delegated to slave lcore. The state is overwritten anyway in
slave lcore thread loop. This premature setting state to WAIT might
however cause rte_eal_lcore_wait, that was called by the application,
to return before slave lcore thread set the FINISHED state. That's
why it is removed from librte_eal rte_service_runner_func function.
Bugzilla ID: 464
Fixes: 21698354c832 ("service: introduce service cores concept")
Fixes: f038a81e1c56 ("service: add unit tests")
Cc: harry.van.haaren@intel.com
Cc: stable@dpdk.org
Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
---
app/test/test_service_cores.c | 1 +
lib/librte_eal/common/rte_service.c | 2 --
lib/librte_eal/include/rte_service.h | 4 ++++
3 files changed, 5 insertions(+), 2 deletions(-)
Comments
> -----Original Message-----
> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Sent: Wednesday, July 8, 2020 2:38 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com; stable@dpdk.org
> Subject: [PATCH v2] eal: fix lcore state bug
>
> The rte_service_lcore_reset_all function stops execution of services
> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> However the thread loop for slave lcores (eal_thread_loop) distincts these
> roles to set lcore state after processing delegated function.
> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> So changing the role to RTE before stopping work in slave lcores
> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> must be run after rte_service_lcore_reset_all to bring back lcores to
> launchable (WAIT) state.
> This has been fixed in test app and clarified in API documentation.
>
> Setting the state to WAIT in rte_service_runner_func is premature
> as the rte_service_runner_func function is still a part of the lcore
> function delegated to slave lcore. The state is overwritten anyway in
> slave lcore thread loop. This premature setting state to WAIT might
> however cause rte_eal_lcore_wait, that was called by the application,
> to return before slave lcore thread set the FINISHED state. That's
> why it is removed from librte_eal rte_service_runner_func function.
>
> Bugzilla ID: 464
> Fixes: 21698354c832 ("service: introduce service cores concept")
> Fixes: f038a81e1c56 ("service: add unit tests")
> Cc: harry.van.haaren@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Thanks for v2, applies cleanly. Tested patch with unit tests, bug description,
service_cores sample app, and testpmd running (idle) service cores in bg, all fine;
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
On Wed, Jul 8, 2020 at 4:52 PM Van Haaren, Harry
<harry.van.haaren@intel.com> wrote:
> > The rte_service_lcore_reset_all function stops execution of services
> > on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> > However the thread loop for slave lcores (eal_thread_loop) distincts these
> > roles to set lcore state after processing delegated function.
> > It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> > So changing the role to RTE before stopping work in slave lcores
> > causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> > must be run after rte_service_lcore_reset_all to bring back lcores to
> > launchable (WAIT) state.
> > This has been fixed in test app and clarified in API documentation.
> >
> > Setting the state to WAIT in rte_service_runner_func is premature
> > as the rte_service_runner_func function is still a part of the lcore
> > function delegated to slave lcore. The state is overwritten anyway in
> > slave lcore thread loop. This premature setting state to WAIT might
> > however cause rte_eal_lcore_wait, that was called by the application,
> > to return before slave lcore thread set the FINISHED state. That's
> > why it is removed from librte_eal rte_service_runner_func function.
Thanks for the explanation and fix.
> >
> > Bugzilla ID: 464
> > Fixes: 21698354c832 ("service: introduce service cores concept")
> > Fixes: f038a81e1c56 ("service: add unit tests")
> > Cc: stable@dpdk.org
> >
Reported-by: Sarosh Arif <sarosh.arif@emumba.com>
> > Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Applied, thanks Lukasz.
W dniu 08.07.2020 o 19:10, David Marchand pisze:
> On Wed, Jul 8, 2020 at 4:52 PM Van Haaren, Harry
> <harry.van.haaren@intel.com> wrote:
>>> The rte_service_lcore_reset_all function stops execution of services
>>> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
>>> However the thread loop for slave lcores (eal_thread_loop) distincts these
>>> roles to set lcore state after processing delegated function.
>>> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
>>> So changing the role to RTE before stopping work in slave lcores
>>> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
>>> must be run after rte_service_lcore_reset_all to bring back lcores to
>>> launchable (WAIT) state.
>>> This has been fixed in test app and clarified in API documentation.
>>>
>>> Setting the state to WAIT in rte_service_runner_func is premature
>>> as the rte_service_runner_func function is still a part of the lcore
>>> function delegated to slave lcore. The state is overwritten anyway in
>>> slave lcore thread loop. This premature setting state to WAIT might
>>> however cause rte_eal_lcore_wait, that was called by the application,
>>> to return before slave lcore thread set the FINISHED state. That's
>>> why it is removed from librte_eal rte_service_runner_func function.
> Thanks for the explanation and fix.
>
>>> Bugzilla ID: 464
>>> Fixes: 21698354c832 ("service: introduce service cores concept")
>>> Fixes: f038a81e1c56 ("service: add unit tests")
>>> Cc: stable@dpdk.org
>>>
> Reported-by: Sarosh Arif <sarosh.arif@emumba.com>
>>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
> Applied, thanks Lukasz.
>
Great, thank you
@@ -114,6 +114,7 @@ unregister_all(void)
}
rte_service_lcore_reset_all();
+ rte_eal_mp_wait_lcore();
return TEST_SUCCESS;
}
@@ -475,8 +475,6 @@ service_runner_func(void *arg)
cs->loops++;
}
- lcore_config[lcore].state = WAIT;
-
return 0;
}
@@ -304,6 +304,10 @@ int32_t rte_service_lcore_count(void);
* from duty, just unmaps all services / cores, and stops() the service cores.
* The runstate of services is not modified.
*
+ * The cores that are stopped with this call, are in FINISHED state and
+ * the application must take care of bringing them back to a launchable state:
+ * e.g. call *rte_eal_lcore_wait* on the lcore_id.
+ *
* @retval 0 Success
*/
int32_t rte_service_lcore_reset_all(void);