[v5] event/cnxk: use WFE LDP loop for getwork routine

Message ID 20240227081153.20826-1-pbhagavatula@marvell.com (mailing list archive)
State Accepted, archived
Delegated to: Jerin Jacob
Headers
Series [v5] event/cnxk: use WFE LDP loop for getwork routine |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Pavan Nikhilesh Bhagavatula Feb. 27, 2024, 8:11 a.m. UTC
  From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use WFE LDP loop while polling for GETWORK completion for better
power savings.
Disabled by default and can be enabled by configuring meson with
'RTE_ARM_USE_WFE' enabled.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 v4 Changes:
 - Split patches
 v5 Changes:
 - Update release notes and documentation.

 doc/guides/eventdevs/cnxk.rst          |  9 +++++
 doc/guides/rel_notes/release_24_03.rst |  4 ++
 drivers/event/cnxk/cn10k_worker.h      | 52 +++++++++++++++++++++-----
 3 files changed, 56 insertions(+), 9 deletions(-)

--
2.25.1
  

Comments

Jerin Jacob March 1, 2024, 12:12 p.m. UTC | #1
On Tue, Feb 27, 2024 at 1:42 PM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Use WFE LDP loop while polling for GETWORK completion for better
> power savings.
> Disabled by default and can be enabled by configuring meson with
> 'RTE_ARM_USE_WFE' enabled.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

1) Changed subject as "event/cnxk: support power-saving during dequeue
operation"
2) Applied following diff
for-main]dell[dpdk-next-eventdev] $ git diff
diff --git a/doc/guides/rel_notes/release_24_03.rst
b/doc/guides/rel_notes/release_24_03.rst
index 7e68b697c2..080815c000 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -140,8 +140,7 @@ New Features

 * **Updated Marvell cnxk eventdev driver.**

-  * Added ARM WFE instruction in ``GETWORK(rte_event_dev_dequeue)`` routine
-    to save power while waiting for SSO to schedule work.
+  * Added power-saving functionality during polling within the
``rte_event_dequeue_burst()`` API.


Applied to dpdk-next-eventdev/for-main. Thanks
  

Patch

diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
index cccb8a0304..49ba11c902 100644
--- a/doc/guides/eventdevs/cnxk.rst
+++ b/doc/guides/eventdevs/cnxk.rst
@@ -198,6 +198,15 @@  Runtime Config Options

     -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0

+Power Savings on CN10K
+----------------------
+
+ARM cores can additionally use WFE when polling for transactions on SSO bus
+to save power i.e., in the event dequeue call ARM core can enter WFE and exit
+when either work has been scheduled or dequeue timeout has reached.
+This feature can be selected by configuring meson with the ``RTE_ARM_USE_WFE``
+enabled.
+
 Debugging Options
 -----------------

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 879bb4944c..7e68b697c2 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -138,6 +138,10 @@  New Features
     to support TLS v1.2, TLS v1.3 and DTLS v1.2.
   * Added PMD API to allow raw submission of instructions to CPT.

+* **Updated Marvell cnxk eventdev driver.**
+
+  * Added ARM WFE instruction in ``GETWORK(rte_event_dev_dequeue)`` routine
+    to save power while waiting for SSO to schedule work.

 Removed Items
 -------------
diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h
index 8aa916fa12..92d5190842 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -250,23 +250,57 @@  cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev,

 	gw.get_work = ws->gw_wdata;
 #if defined(RTE_ARCH_ARM64)
-#if !defined(__clang__)
-	asm volatile(
-		PLT_CPU_FEATURE_PREAMBLE
-		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
-		: [wdata] "+r"(gw.get_work)
-		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
-		: "memory");
-#else
+#if defined(__clang__)
 	register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0];
 	register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1];
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbz %[x0], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbnz %[x0], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
 	asm volatile(".arch armv8-a+lse\n"
 		     "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n"
-		     : [x0] "+r"(x0), [x1] "+r"(x1)
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
 		     : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
 		     : "memory");
+#endif
 	gw.u64[0] = x0;
 	gw.u64[1] = x1;
+#else
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbz %[wdata], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbnz %[wdata], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [wdata] "=&r"(gw.get_work)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
+	asm volatile(
+		PLT_CPU_FEATURE_PREAMBLE
+		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
+		: [wdata] "+r"(gw.get_work)
+		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
+		: "memory");
+#endif
 #endif
 #else
 	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);