Message ID | 20200708133733.29468-1-l.wojciechow@partner.samsung.com |
---|---|
State | Accepted, archived |
Delegated to: | David Marchand |
Headers | show |
Series |
|
Related | show |
Context | Check | Description |
---|---|---|
ci/iol-intel-Performance | success | Performance Testing PASS |
ci/Intel-compilation | success | Compilation OK |
ci/iol-mellanox-Performance | fail | Performance Testing issues |
ci/iol-testing | success | Testing PASS |
ci/iol-broadcom-Performance | success | Performance Testing PASS |
ci/checkpatch | warning | coding style issues |
> -----Original Message----- > From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> > Sent: Wednesday, July 8, 2020 2:38 PM > To: Van Haaren, Harry <harry.van.haaren@intel.com>; Jerin Jacob > <jerin.jacob@caviumnetworks.com> > Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com; stable@dpdk.org > Subject: [PATCH v2] eal: fix lcore state bug > > The rte_service_lcore_reset_all function stops execution of services > on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE. > However the thread loop for slave lcores (eal_thread_loop) distincts these > roles to set lcore state after processing delegated function. > It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE. > So changing the role to RTE before stopping work in slave lcores > causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait > must be run after rte_service_lcore_reset_all to bring back lcores to > launchable (WAIT) state. > This has been fixed in test app and clarified in API documentation. > > Setting the state to WAIT in rte_service_runner_func is premature > as the rte_service_runner_func function is still a part of the lcore > function delegated to slave lcore. The state is overwritten anyway in > slave lcore thread loop. This premature setting state to WAIT might > however cause rte_eal_lcore_wait, that was called by the application, > to return before slave lcore thread set the FINISHED state. That's > why it is removed from librte_eal rte_service_runner_func function. > > Bugzilla ID: 464 > Fixes: 21698354c832 ("service: introduce service cores concept") > Fixes: f038a81e1c56 ("service: add unit tests") > Cc: harry.van.haaren@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> Thanks for v2, applies cleanly. Tested patch with unit tests, bug description, service_cores sample app, and testpmd running (idle) service cores in bg, all fine; Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
On Wed, Jul 8, 2020 at 4:52 PM Van Haaren, Harry <harry.van.haaren@intel.com> wrote: > > The rte_service_lcore_reset_all function stops execution of services > > on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE. > > However the thread loop for slave lcores (eal_thread_loop) distincts these > > roles to set lcore state after processing delegated function. > > It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE. > > So changing the role to RTE before stopping work in slave lcores > > causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait > > must be run after rte_service_lcore_reset_all to bring back lcores to > > launchable (WAIT) state. > > This has been fixed in test app and clarified in API documentation. > > > > Setting the state to WAIT in rte_service_runner_func is premature > > as the rte_service_runner_func function is still a part of the lcore > > function delegated to slave lcore. The state is overwritten anyway in > > slave lcore thread loop. This premature setting state to WAIT might > > however cause rte_eal_lcore_wait, that was called by the application, > > to return before slave lcore thread set the FINISHED state. That's > > why it is removed from librte_eal rte_service_runner_func function. Thanks for the explanation and fix. > > > > Bugzilla ID: 464 > > Fixes: 21698354c832 ("service: introduce service cores concept") > > Fixes: f038a81e1c56 ("service: add unit tests") > > Cc: stable@dpdk.org > > Reported-by: Sarosh Arif <sarosh.arif@emumba.com> > > Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> > Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Applied, thanks Lukasz.
W dniu 08.07.2020 o 19:10, David Marchand pisze: > On Wed, Jul 8, 2020 at 4:52 PM Van Haaren, Harry > <harry.van.haaren@intel.com> wrote: >>> The rte_service_lcore_reset_all function stops execution of services >>> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE. >>> However the thread loop for slave lcores (eal_thread_loop) distincts these >>> roles to set lcore state after processing delegated function. >>> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE. >>> So changing the role to RTE before stopping work in slave lcores >>> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait >>> must be run after rte_service_lcore_reset_all to bring back lcores to >>> launchable (WAIT) state. >>> This has been fixed in test app and clarified in API documentation. >>> >>> Setting the state to WAIT in rte_service_runner_func is premature >>> as the rte_service_runner_func function is still a part of the lcore >>> function delegated to slave lcore. The state is overwritten anyway in >>> slave lcore thread loop. This premature setting state to WAIT might >>> however cause rte_eal_lcore_wait, that was called by the application, >>> to return before slave lcore thread set the FINISHED state. That's >>> why it is removed from librte_eal rte_service_runner_func function. > Thanks for the explanation and fix. > >>> Bugzilla ID: 464 >>> Fixes: 21698354c832 ("service: introduce service cores concept") >>> Fixes: f038a81e1c56 ("service: add unit tests") >>> Cc: stable@dpdk.org >>> > Reported-by: Sarosh Arif <sarosh.arif@emumba.com> >>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> >> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> > Applied, thanks Lukasz. > Great, thank you
diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c index 981e21213..6764a5d2c 100644 --- a/app/test/test_service_cores.c +++ b/app/test/test_service_cores.c @@ -114,6 +114,7 @@ unregister_all(void) } rte_service_lcore_reset_all(); + rte_eal_mp_wait_lcore(); return TEST_SUCCESS; } diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c index 6123a2124..b16698f21 100644 --- a/lib/librte_eal/common/rte_service.c +++ b/lib/librte_eal/common/rte_service.c @@ -475,8 +475,6 @@ service_runner_func(void *arg) cs->loops++; } - lcore_config[lcore].state = WAIT; - return 0; } diff --git a/lib/librte_eal/include/rte_service.h b/lib/librte_eal/include/rte_service.h index 3a1c735c5..e2d0a6dd3 100644 --- a/lib/librte_eal/include/rte_service.h +++ b/lib/librte_eal/include/rte_service.h @@ -304,6 +304,10 @@ int32_t rte_service_lcore_count(void); * from duty, just unmaps all services / cores, and stops() the service cores. * The runstate of services is not modified. * + * The cores that are stopped with this call, are in FINISHED state and + * the application must take care of bringing them back to a launchable state: + * e.g. call *rte_eal_lcore_wait* on the lcore_id. + * * @retval 0 Success */ int32_t rte_service_lcore_reset_all(void);
The rte_service_lcore_reset_all function stops execution of services on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE. However the thread loop for slave lcores (eal_thread_loop) distincts these roles to set lcore state after processing delegated function. It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE. So changing the role to RTE before stopping work in slave lcores causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait must be run after rte_service_lcore_reset_all to bring back lcores to launchable (WAIT) state. This has been fixed in test app and clarified in API documentation. Setting the state to WAIT in rte_service_runner_func is premature as the rte_service_runner_func function is still a part of the lcore function delegated to slave lcore. The state is overwritten anyway in slave lcore thread loop. This premature setting state to WAIT might however cause rte_eal_lcore_wait, that was called by the application, to return before slave lcore thread set the FINISHED state. That's why it is removed from librte_eal rte_service_runner_func function. Bugzilla ID: 464 Fixes: 21698354c832 ("service: introduce service cores concept") Fixes: f038a81e1c56 ("service: add unit tests") Cc: harry.van.haaren@intel.com Cc: stable@dpdk.org Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> --- app/test/test_service_cores.c | 1 + lib/librte_eal/common/rte_service.c | 2 -- lib/librte_eal/include/rte_service.h | 4 ++++ 3 files changed, 5 insertions(+), 2 deletions(-)