vdpa/mlx5: fix notification timing

Message ID 1595858444-126652-1-git-send-email-matan@mellanox.com (mailing list archive)
State Accepted, archived
Delegated to: Maxime Coquelin
Headers
Series vdpa/mlx5: fix notification timing |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Matan Azrad July 27, 2020, 2 p.m. UTC
The issue is relevant only for the timer event modes: 0 and 1.

When the HW finishes to consume a burst of the guest Rx descriptors,
it creates a CQE in the CQ.
When traffic stops, the mlx5 driver arms the CQ to get a notification
when a specific CQE index is created - the index to be armed is the
next CQE index which should be polled by the driver.

The mlx5 driver configured the kernel driver to send notification to
the guest callfd in the same time of the armed CQE event.
It means that the guest was notified only for each first CQE in a
poll cycle, so if the driver polled CQEs of all the virtio queue
available descriptors, the guest was not notified again for the rest
because there was no any new CQE to trigger the guest notification.

Hence, the Rx queues might be stuck when the guest didn't work with
poll mode.

Remove prior kernel notification, and do manual notification after CQ
polling.

Fixes: a9dd7275a149 ("vdpa/mlx5: optimize notification events")

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)
  

Comments

Maxime Coquelin July 28, 2020, 1:40 p.m. UTC | #1
On 7/27/20 4:00 PM, Matan Azrad wrote:
> The issue is relevant only for the timer event modes: 0 and 1.
> 
> When the HW finishes to consume a burst of the guest Rx descriptors,
> it creates a CQE in the CQ.
> When traffic stops, the mlx5 driver arms the CQ to get a notification
> when a specific CQE index is created - the index to be armed is the
> next CQE index which should be polled by the driver.
> 
> The mlx5 driver configured the kernel driver to send notification to
> the guest callfd in the same time of the armed CQE event.
> It means that the guest was notified only for each first CQE in a
> poll cycle, so if the driver polled CQEs of all the virtio queue
> available descriptors, the guest was not notified again for the rest
> because there was no any new CQE to trigger the guest notification.
> 
> Hence, the Rx queues might be stuck when the guest didn't work with
> poll mode.
> 
> Remove prior kernel notification, and do manual notification after CQ
> polling.
> 
> Fixes: a9dd7275a149 ("vdpa/mlx5: optimize notification events")
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa_event.c | 17 +++--------------
>  1 file changed, 3 insertions(+), 14 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime
  
Maxime Coquelin July 28, 2020, 3:48 p.m. UTC | #2
On 7/27/20 4:00 PM, Matan Azrad wrote:
> The issue is relevant only for the timer event modes: 0 and 1.
> 
> When the HW finishes to consume a burst of the guest Rx descriptors,
> it creates a CQE in the CQ.
> When traffic stops, the mlx5 driver arms the CQ to get a notification
> when a specific CQE index is created - the index to be armed is the
> next CQE index which should be polled by the driver.
> 
> The mlx5 driver configured the kernel driver to send notification to
> the guest callfd in the same time of the armed CQE event.
> It means that the guest was notified only for each first CQE in a
> poll cycle, so if the driver polled CQEs of all the virtio queue
> available descriptors, the guest was not notified again for the rest
> because there was no any new CQE to trigger the guest notification.
> 
> Hence, the Rx queues might be stuck when the guest didn't work with
> poll mode.
> 
> Remove prior kernel notification, and do manual notification after CQ
> polling.
> 
> Fixes: a9dd7275a149 ("vdpa/mlx5: optimize notification events")
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Xueming Li <xuemingl@mellanox.com>

Applied to dpdk-next-virtio/master

Thanks,
Maxime
  

Patch

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index e14b380..0414c91 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -172,17 +172,6 @@ 
 		rte_errno = errno;
 		goto error;
 	}
-	if (callfd != -1 &&
-	    priv->event_mode != MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT) {
-		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv->eventc,
-							      callfd,
-							      cq->cq->obj, 0);
-		if (ret) {
-			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
-			rte_errno = errno;
-			goto error;
-		}
-	}
 	cq->callfd = callfd;
 	/* Init CQ to ones to be in HW owner in the start. */
 	cq->cqes[0].op_own = MLX5_CQE_OWNER_MASK;
@@ -352,11 +341,11 @@ 
 						   struct mlx5_vdpa_virtq, eqp);
 
 		mlx5_vdpa_cq_poll(cq);
+		/* Notify guest for descs consuming. */
+		if (cq->callfd != -1)
+			eventfd_write(cq->callfd, (eventfd_t)1);
 		if (priv->event_mode == MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT) {
 			mlx5_vdpa_cq_arm(priv, cq);
-			/* Notify guest for descs consuming. */
-			if (cq->callfd != -1)
-				eventfd_write(cq->callfd, (eventfd_t)1);
 			return;
 		}
 		/* Don't arm again - timer will take control. */