[v2] eal: fix lcore state bug

Message ID 20200708133733.29468-1-l.wojciechow@partner.samsung.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v2] eal: fix lcore state bug |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance fail Performance Testing issues
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Lukasz Wojciechowski July 8, 2020, 1:37 p.m. UTC
  The rte_service_lcore_reset_all function stops execution of services
on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
However the thread loop for slave lcores (eal_thread_loop) distincts these
roles to set lcore state after processing delegated function.
It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
So changing the role to RTE before stopping work in slave lcores
causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
must be run after rte_service_lcore_reset_all to bring back lcores to
launchable (WAIT) state.
This has been fixed in test app and clarified in API documentation.

Setting the state to WAIT in rte_service_runner_func is premature
as the rte_service_runner_func function is still a part of the lcore
function delegated to slave lcore. The state is overwritten anyway in
slave lcore thread loop. This premature setting state to WAIT might
however cause rte_eal_lcore_wait, that was called by the application,
to return before slave lcore thread set the FINISHED state. That's
why it is removed from librte_eal rte_service_runner_func function.

Bugzilla ID: 464
Fixes: 21698354c832 ("service: introduce service cores concept")
Fixes: f038a81e1c56 ("service: add unit tests")
Cc: harry.van.haaren@intel.com
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
---
 app/test/test_service_cores.c        | 1 +
 lib/librte_eal/common/rte_service.c  | 2 --
 lib/librte_eal/include/rte_service.h | 4 ++++
 3 files changed, 5 insertions(+), 2 deletions(-)
  

Comments

Van Haaren, Harry July 8, 2020, 2:52 p.m. UTC | #1
> -----Original Message-----
> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Sent: Wednesday, July 8, 2020 2:38 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org; l.wojciechow@partner.samsung.com; stable@dpdk.org
> Subject: [PATCH v2] eal: fix lcore state bug
> 
> The rte_service_lcore_reset_all function stops execution of services
> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> However the thread loop for slave lcores (eal_thread_loop) distincts these
> roles to set lcore state after processing delegated function.
> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> So changing the role to RTE before stopping work in slave lcores
> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> must be run after rte_service_lcore_reset_all to bring back lcores to
> launchable (WAIT) state.
> This has been fixed in test app and clarified in API documentation.
> 
> Setting the state to WAIT in rte_service_runner_func is premature
> as the rte_service_runner_func function is still a part of the lcore
> function delegated to slave lcore. The state is overwritten anyway in
> slave lcore thread loop. This premature setting state to WAIT might
> however cause rte_eal_lcore_wait, that was called by the application,
> to return before slave lcore thread set the FINISHED state. That's
> why it is removed from librte_eal rte_service_runner_func function.
> 
> Bugzilla ID: 464
> Fixes: 21698354c832 ("service: introduce service cores concept")
> Fixes: f038a81e1c56 ("service: add unit tests")
> Cc: harry.van.haaren@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>

Thanks for v2, applies cleanly. Tested patch with unit tests, bug description,
service_cores sample app, and testpmd running (idle) service cores in bg, all fine;

Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
  
David Marchand July 8, 2020, 5:10 p.m. UTC | #2
On Wed, Jul 8, 2020 at 4:52 PM Van Haaren, Harry
<harry.van.haaren@intel.com> wrote:
> > The rte_service_lcore_reset_all function stops execution of services
> > on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
> > However the thread loop for slave lcores (eal_thread_loop) distincts these
> > roles to set lcore state after processing delegated function.
> > It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
> > So changing the role to RTE before stopping work in slave lcores
> > causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
> > must be run after rte_service_lcore_reset_all to bring back lcores to
> > launchable (WAIT) state.
> > This has been fixed in test app and clarified in API documentation.
> >
> > Setting the state to WAIT in rte_service_runner_func is premature
> > as the rte_service_runner_func function is still a part of the lcore
> > function delegated to slave lcore. The state is overwritten anyway in
> > slave lcore thread loop. This premature setting state to WAIT might
> > however cause rte_eal_lcore_wait, that was called by the application,
> > to return before slave lcore thread set the FINISHED state. That's
> > why it is removed from librte_eal rte_service_runner_func function.

Thanks for the explanation and fix.

> >
> > Bugzilla ID: 464
> > Fixes: 21698354c832 ("service: introduce service cores concept")
> > Fixes: f038a81e1c56 ("service: add unit tests")
> > Cc: stable@dpdk.org
> >

Reported-by: Sarosh Arif <sarosh.arif@emumba.com>
> > Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

Applied, thanks Lukasz.
  
Lukasz Wojciechowski July 8, 2020, 7:37 p.m. UTC | #3
W dniu 08.07.2020 o 19:10, David Marchand pisze:
> On Wed, Jul 8, 2020 at 4:52 PM Van Haaren, Harry
> <harry.van.haaren@intel.com> wrote:
>>> The rte_service_lcore_reset_all function stops execution of services
>>> on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
>>> However the thread loop for slave lcores (eal_thread_loop) distincts these
>>> roles to set lcore state after processing delegated function.
>>> It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
>>> So changing the role to RTE before stopping work in slave lcores
>>> causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
>>> must be run after rte_service_lcore_reset_all to bring back lcores to
>>> launchable (WAIT) state.
>>> This has been fixed in test app and clarified in API documentation.
>>>
>>> Setting the state to WAIT in rte_service_runner_func is premature
>>> as the rte_service_runner_func function is still a part of the lcore
>>> function delegated to slave lcore. The state is overwritten anyway in
>>> slave lcore thread loop. This premature setting state to WAIT might
>>> however cause rte_eal_lcore_wait, that was called by the application,
>>> to return before slave lcore thread set the FINISHED state. That's
>>> why it is removed from librte_eal rte_service_runner_func function.
> Thanks for the explanation and fix.
>
>>> Bugzilla ID: 464
>>> Fixes: 21698354c832 ("service: introduce service cores concept")
>>> Fixes: f038a81e1c56 ("service: add unit tests")
>>> Cc: stable@dpdk.org
>>>
> Reported-by: Sarosh Arif <sarosh.arif@emumba.com>
>>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
> Applied, thanks Lukasz.
>
Great, thank you
  

Patch

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 981e21213..6764a5d2c 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -114,6 +114,7 @@  unregister_all(void)
 	}
 
 	rte_service_lcore_reset_all();
+	rte_eal_mp_wait_lcore();
 
 	return TEST_SUCCESS;
 }
diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
index 6123a2124..b16698f21 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -475,8 +475,6 @@  service_runner_func(void *arg)
 		cs->loops++;
 	}
 
-	lcore_config[lcore].state = WAIT;
-
 	return 0;
 }
 
diff --git a/lib/librte_eal/include/rte_service.h b/lib/librte_eal/include/rte_service.h
index 3a1c735c5..e2d0a6dd3 100644
--- a/lib/librte_eal/include/rte_service.h
+++ b/lib/librte_eal/include/rte_service.h
@@ -304,6 +304,10 @@  int32_t rte_service_lcore_count(void);
  * from duty, just unmaps all services / cores, and stops() the service cores.
  * The runstate of services is not modified.
  *
+ * The cores that are stopped with this call, are in FINISHED state and
+ * the application must take care of bringing them back to a launchable state:
+ * e.g. call *rte_eal_lcore_wait* on the lcore_id.
+ *
  * @retval 0 Success
  */
 int32_t rte_service_lcore_reset_all(void);