net/mlx5: fix timestamp initialization on empty clock queue

Message ID 20210728142335.31324-1-viacheslavo@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix timestamp initialization on empty clock queue |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS

Commit Message

Slava Ovsiienko July 28, 2021, 2:23 p.m. UTC
  The committing completions by clock queue might be delayed
after queue initialization done and the only Clock Queue
completion entry (CQE) might keep the invalid status till
the CQE first update happens.

The mlx5_txpp_update_timestamp() wrongly recognized invalid
status as error and reported about lost synchronization.

The patch recognizes the invalid status as "not updated yet"
and accurate scheduling initialization routine waits till
CQE first update happens.

Some collateral typos in comment are fixed as well.

Fixes: 77522be0a56d ("net/mlx5: introduce clock queue service routine")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_txpp.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)
  

Comments

Raslan Darawsheh July 29, 2021, 9:45 a.m. UTC | #1
Hi,
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Wednesday, July 28, 2021 5:24 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix timestamp initialization on empty clock queue
> 
> The committing completions by clock queue might be delayed
> after queue initialization done and the only Clock Queue
Added missing auxiliary verb in the syntax,
Should be after queue initialization is done.

> completion entry (CQE) might keep the invalid status till
> the CQE first update happens.
> 
> The mlx5_txpp_update_timestamp() wrongly recognized invalid
> status as error and reported about lost synchronization.
> 
> The patch recognizes the invalid status as "not updated yet"
> and accurate scheduling initialization routine waits till
> CQE first update happens.
> 
> Some collateral typos in comment are fixed as well.
> 
> Fixes: 77522be0a56d ("net/mlx5: introduce clock queue service routine")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  drivers/net/mlx5/mlx5_txpp.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index d90399afb5..4f6da9f2d1 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -530,8 +530,8 @@  mlx5_atomic_read_cqe(rte_int128_t *from, rte_int128_t *ts)
 {
 	/*
 	 * The only CQE of Clock Queue is being continuously
-	 * update by hardware with soecified rate. We have to
-	 * read timestump and WQE completion index atomically.
+	 * updated by hardware with specified rate. We must
+	 * read timestamp and WQE completion index atomically.
 	 */
 #if defined(RTE_ARCH_X86_64)
 	rte_int128_t src;
@@ -592,13 +592,22 @@  mlx5_txpp_update_timestamp(struct mlx5_dev_ctx_shared *sh)
 	} to;
 	uint64_t ts;
 	uint16_t ci;
+	uint8_t opcode;
 
 	mlx5_atomic_read_cqe((rte_int128_t *)&cqe->timestamp, &to.u128);
-	if (to.cts.op_own >> 4) {
-		DRV_LOG(DEBUG, "Clock Queue error sync lost.");
-		__atomic_fetch_add(&sh->txpp.err_clock_queue,
+	opcode = MLX5_CQE_OPCODE(to.cts.op_own);
+	if (opcode) {
+		if (opcode != MLX5_CQE_INVALID) {
+			/*
+			 * Commit the error state if and only if
+			 * we have got at least one actual completion.
+			 */
+			DRV_LOG(DEBUG,
+				"Clock Queue error sync lost (%X).", opcode);
+				__atomic_fetch_add(&sh->txpp.err_clock_queue,
 				   1, __ATOMIC_RELAXED);
-		sh->txpp.sync_lost = 1;
+			sh->txpp.sync_lost = 1;
+		}
 		return;
 	}
 	ci = rte_be_to_cpu_16(to.cts.wqe_counter);