drivers/net/mlx5: fix mlx5 send packet failed

Message ID 1742899140-2013-1-git-send-email-liuwenbo109@gmail.com (mailing list archive)
State New
Delegated to: Raslan Darawsheh
Headers
Series drivers/net/mlx5: fix mlx5 send packet failed |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-mellanox-Functional success Functional Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-unit-arm64-testing warning Testing issues
ci/iol-abi-testing warning Testing issues
ci/iol-sample-apps-testing warning Testing issues
ci/iol-compile-amd64-testing fail Testing issues
ci/iol-compile-arm64-testing fail Testing issues
ci/github-robot: build fail github build: failed
ci/Intel-compilation fail Compilation issues
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-unit-amd64-testing fail Testing issues
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Wenbo Liu March 25, 2025, 10:39 a.m. UTC
Test Environment: ARM architecture, OpenEuler operating system
CPU: HUAWEI Kunpeng 920 5220, BIOS Vendor ID: HiSilicon
Network Card: Mellanox Technologies MT27800 Family [ConnectX-5]
DPDK program sending self-encapsulated packets with MAC, IP, and UDP headers
continuously prints the following errors and ceases packet transmission

mlx5_common: Failed to modify SQ using DevX
mlx5_net: Cannot change the Tx SQ state to RESET Remote I/O error

Signed-off-by: Wenbo Liu <liuwenbo109@gmail.com>
---
 drivers/net/mlx5/mlx5_tx.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
  

Comments

Stephen Hemminger March 25, 2025, 2:10 p.m. UTC | #1
On Tue, 25 Mar 2025 18:39:00 +0800
Wenbo Liu <liuwenbo109@gmail.com> wrote:

> Test Environment: ARM architecture, OpenEuler operating system
> CPU: HUAWEI Kunpeng 920 5220, BIOS Vendor ID: HiSilicon
> Network Card: Mellanox Technologies MT27800 Family [ConnectX-5]
> DPDK program sending self-encapsulated packets with MAC, IP, and UDP headers
> continuously prints the following errors and ceases packet transmission
> 
> mlx5_common: Failed to modify SQ using DevX
> mlx5_net: Cannot change the Tx SQ state to RESET Remote I/O error
> 
> Signed-off-by: Wenbo Liu <liuwenbo109@gmail.com>

Patch has compile failures in CI and coding indent issue reported by checkpatch.
  
Dariusz Sosnowski June 3, 2025, 2:49 p.m. UTC | #2
Hi,

Adding all mlx5 maintainers.

On Tue, Mar 25, 2025 at 06:39:00PM +0800, Wenbo Liu wrote:
> Test Environment: ARM architecture, OpenEuler operating system
> CPU: HUAWEI Kunpeng 920 5220, BIOS Vendor ID: HiSilicon
> Network Card: Mellanox Technologies MT27800 Family [ConnectX-5]
> DPDK program sending self-encapsulated packets with MAC, IP, and UDP headers
> continuously prints the following errors and ceases packet transmission
> 
> mlx5_common: Failed to modify SQ using DevX
> mlx5_net: Cannot change the Tx SQ state to RESET Remote I/O error
> 
> Signed-off-by: Wenbo Liu <liuwenbo109@gmail.com>
> ---
>  drivers/net/mlx5/mlx5_tx.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
> index 4286876..5cf9873 100644
> --- a/drivers/net/mlx5/mlx5_tx.c
> +++ b/drivers/net/mlx5/mlx5_tx.c
> @@ -186,6 +186,7 @@
>  	volatile struct mlx5_cqe *last_cqe = NULL;
>  	bool ring_doorbell = false;
>  	int ret;
> +	int offset = 0;
>  
>  	do {
>  		volatile struct mlx5_cqe *cqe;
> @@ -205,8 +206,11 @@
>  			 * here, before we might perform SQ reset.
>  			 */
>  			rte_wmb();
> +#if (RTE_CACHE_LINE_SIZE == 128)
> +                        offset = 64;
> +#endif
>  			ret = mlx5_tx_error_cqe_handle
> -				(txq, (volatile struct mlx5_error_cqe *)cqe);
> +				(txq, (volatile struct mlx5_err_cqe *)(((char *)cqe) + offset));

Could you please elaborate what exactly that changes?
I think I'm missing something.
mlx5_error_cqe, used here originally, takes into account the
possibility of cache line being 128B
(https://github.com/DPDK/dpdk/blob/main/drivers/common/mlx5/mlx5_prm.h#L427)
and this achieves the equivalent to the above diff.

>  			if (unlikely(ret < 0)) {
>  				/*
>  				 * Some error occurred on queue error
> -- 
> 1.8.3.1
> 

Best regards,
Dariusz Sosnowski
  

Patch

diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 4286876..5cf9873 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -186,6 +186,7 @@ 
 	volatile struct mlx5_cqe *last_cqe = NULL;
 	bool ring_doorbell = false;
 	int ret;
+	int offset = 0;
 
 	do {
 		volatile struct mlx5_cqe *cqe;
@@ -205,8 +206,11 @@ 
 			 * here, before we might perform SQ reset.
 			 */
 			rte_wmb();
+#if (RTE_CACHE_LINE_SIZE == 128)
+                        offset = 64;
+#endif
 			ret = mlx5_tx_error_cqe_handle
-				(txq, (volatile struct mlx5_error_cqe *)cqe);
+				(txq, (volatile struct mlx5_err_cqe *)(((char *)cqe) + offset));
 			if (unlikely(ret < 0)) {
 				/*
 				 * Some error occurred on queue error