[v3,2/2] service: fix potential stats race-condition on MT services

Message ID 20220711131825.3373195-2-harry.van.haaren@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v3,1/2] test/service: add perf measurements for with stats mode |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-abi-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Van Haaren, Harry July 11, 2022, 1:18 p.m. UTC
  This commit fixes a potential racey-add that could occur if
multiple service-lcores were executing the same MT-safe service
at the same time, with service statistics collection enabled.

Because multiple threads can run and execute the service, the
stats values can have multiple writer threads, resulting in the
requirement of using atomic addition for correctness.

Note that when a MT unsafe service is executed, a spinlock is
held, so the stats increments are protected. This fact is used
to avoid executing atomic add instructions when not required.
Regular reads and increments are used, and only the store is
specified as atomic, reducing perf impact on e.g. x86 arch.

This patch causes a 1.25x increase in cycle-cost for polling a
MT safe service when statistics are enabled. No change was seen
for MT unsafe services, or when statistics are disabled.

Reported-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Suggested-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Suggested-by: Morten Brørup <mb@smartsharesystems.com>
Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

v3:
- Fix 32-bit build, by forcing natural alignment of uint64_t in
  the struct that contains it, using __rte_aligned(8) macro.
- Note: I'm seeing a checkpatch "avoid externs in .c files" warning,
  but it doesn't make sense to me, so perhaps its a false-positive..?

v2 (Thanks Honnappa, Morten, Bruce & Mattias for discussion):
- Improved handling of stat stores to ensure they're atomic by
  using __atomic_store_n() with regular loads/increments.
- Added BUILD_BUG_ON alignment checks for the uint64_t stats
  variables, tested with __rte_packed to ensure build breaks.
---
 lib/eal/common/rte_service.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)
  

Comments

David Marchand Oct. 5, 2022, 1:06 p.m. UTC | #1
On Mon, Jul 11, 2022 at 3:18 PM Harry van Haaren
<harry.van.haaren@intel.com> wrote:
>
> This commit fixes a potential racey-add that could occur if
> multiple service-lcores were executing the same MT-safe service
> at the same time, with service statistics collection enabled.
>
> Because multiple threads can run and execute the service, the
> stats values can have multiple writer threads, resulting in the
> requirement of using atomic addition for correctness.
>
> Note that when a MT unsafe service is executed, a spinlock is
> held, so the stats increments are protected. This fact is used
> to avoid executing atomic add instructions when not required.
> Regular reads and increments are used, and only the store is
> specified as atomic, reducing perf impact on e.g. x86 arch.
>
> This patch causes a 1.25x increase in cycle-cost for polling a
> MT safe service when statistics are enabled. No change was seen
> for MT unsafe services, or when statistics are disabled.

Fixes: 21698354c832 ("service: introduce service cores concept")

I did not mark for backport since the commitlog indicates a performance impact.
You can still ask for backport by pinging LTS maintainers.

>
> Reported-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Suggested-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>


Series applied, thanks.
  

Patch

diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index d2b7275ac0..94cb056196 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -50,10 +50,17 @@  struct rte_service_spec_impl {
 	 * on currently.
 	 */
 	uint32_t num_mapped_cores;
-	uint64_t calls;
-	uint64_t cycles_spent;
+
+	/* 32-bit builds won't naturally align a uint64_t, so force alignment,
+	 * allowing regular reads to be atomic.
+	 */
+	uint64_t calls __rte_aligned(8);
+	uint64_t cycles_spent __rte_aligned(8);
 } __rte_cache_aligned;
 
+/* Mask used to ensure uint64_t 8 byte vars are naturally aligned. */
+#define RTE_SERVICE_STAT_ALIGN_MASK (8 - 1)
+
 /* the internal values of a service core */
 struct core_state {
 	/* map of services IDs are run on this core */
@@ -359,13 +366,29 @@  service_runner_do_callback(struct rte_service_spec_impl *s,
 {
 	void *userdata = s->spec.callback_userdata;
 
+	/* Ensure the atomically stored variables are naturally aligned,
+	 * as required for regular loads to be atomic.
+	 */
+	RTE_BUILD_BUG_ON((offsetof(struct rte_service_spec_impl, calls)
+		& RTE_SERVICE_STAT_ALIGN_MASK) != 0);
+	RTE_BUILD_BUG_ON((offsetof(struct rte_service_spec_impl, cycles_spent)
+		& RTE_SERVICE_STAT_ALIGN_MASK) != 0);
+
 	if (service_stats_enabled(s)) {
 		uint64_t start = rte_rdtsc();
 		s->spec.callback(userdata);
 		uint64_t end = rte_rdtsc();
-		s->cycles_spent += end - start;
+		uint64_t cycles = end - start;
 		cs->calls_per_service[service_idx]++;
-		s->calls++;
+		if (service_mt_safe(s)) {
+			__atomic_fetch_add(&s->cycles_spent, cycles, __ATOMIC_RELAXED);
+			__atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED);
+		} else {
+			uint64_t cycles_new = s->cycles_spent + cycles;
+			uint64_t calls_new = s->calls++;
+			__atomic_store_n(&s->cycles_spent, cycles_new, __ATOMIC_RELAXED);
+			__atomic_store_n(&s->calls, calls_new, __ATOMIC_RELAXED);
+		}
 	} else
 		s->spec.callback(userdata);
 }