[v2,3/3] net/vhost: fix interrupt mode

Message ID 20200728165021.216291-4-maxime.coquelin@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Maxime Coquelin
Headers
Series Fix Vhost regressions |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot success Travis build: passed
ci/Intel-compilation success Compilation OK

Commit Message

Maxime Coquelin July 28, 2020, 4:50 p.m. UTC
  At .new_device() time, only the first vring pair is
now ready, other vrings are consfigured later.

Problem is that when application will setup and enable
interrupts, only the first queue pair Rx interrupt will
be enabled.

This patches fixes the issue by setting the number of
max interrupts to the number of Rx queues that will be
later initialized. Then, as soon as a Rx vring is ready,
it removes the corresponding uninitialized epoll event,
and install a new one with the valid FD.

Fixes: 604052ae5395 ("net/vhost: support queue update")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 39 ++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 6 deletions(-)
  

Comments

Matan Azrad July 28, 2020, 7:03 p.m. UTC | #1
From: Maxime Coquelin:
> At .new_device() time, only the first vring pair is now ready, other vrings are
> consfigured later.
> 
> Problem is that when application will setup and enable interrupts, only the
> first queue pair Rx interrupt will be enabled.
> 
> This patches fixes the issue by setting the number of max interrupts to the
> number of Rx queues that will be later initialized. Then, as soon as a Rx vring
> is ready, it removes the corresponding uninitialized epoll event, and install a
> new one with the valid FD.

Doesn't it race condition to the application decision?
App may change the configuration per queue in any time by the app control thread.
The vhost PMD may change it usynchronically from the vhost control thread in the vring state callback.

I already mentioned it in other thread on this topic but didn't get reply.

> Fixes: 604052ae5395 ("net/vhost: support queue update")
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
  
Maxime Coquelin July 29, 2020, 7:52 a.m. UTC | #2
On 7/28/20 9:03 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin:
>> At .new_device() time, only the first vring pair is now ready, other vrings are
>> consfigured later.
>>
>> Problem is that when application will setup and enable interrupts, only the
>> first queue pair Rx interrupt will be enabled.
>>
>> This patches fixes the issue by setting the number of max interrupts to the
>> number of Rx queues that will be later initialized. Then, as soon as a Rx vring
>> is ready, it removes the corresponding uninitialized epoll event, and install a
>> new one with the valid FD.
> 
> Doesn't it race condition to the application decision?
> App may change the configuration per queue in any time by the app control thread.
> The vhost PMD may change it usynchronically from the vhost control thread in the vring state callback.

Yes you are right there could be a race here,I'm looking into getting it
done in a safe way. Yet it is good to get the confirmation from Intel
that it does fix the problem on their side.

Based on David suggestion, it might be made safe by relying on
eth_rxq_intr_enable()/eth_rxq_intr_disable().

If we cannot solve it in a safe way, then we'll have no other choice
than reverting partially your patch.

Maxime

> I already mentioned it in other thread on this topic but didn't get reply.
> 
>> Fixes: 604052ae5395 ("net/vhost: support queue update")
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
  
Matan Azrad July 29, 2020, 8:43 a.m. UTC | #3
Hi Maxime

From: Maxime Coquelin:
> On 7/28/20 9:03 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> At .new_device() time, only the first vring pair is now ready, other
> >> vrings are consfigured later.
> >>
> >> Problem is that when application will setup and enable interrupts,
> >> only the first queue pair Rx interrupt will be enabled.
> >>
> >> This patches fixes the issue by setting the number of max interrupts
> >> to the number of Rx queues that will be later initialized. Then, as
> >> soon as a Rx vring is ready, it removes the corresponding
> >> uninitialized epoll event, and install a new one with the valid FD.
> >
> > Doesn't it race condition to the application decision?
> > App may change the configuration per queue in any time by the app
> control thread.
> > The vhost PMD may change it usynchronically from the vhost control
> thread in the vring state callback.
> 
> Yes you are right there could be a race here,I'm looking into getting it done in
> a safe way. Yet it is good to get the confirmation from Intel that it does fix the
> problem on their side.

Intel case\l3fw-power is only one case, we can put out the fire now by solving specific case but we need a fix for the global usage.

> Based on David suggestion, it might be made safe by relying on
> eth_rxq_intr_enable()/eth_rxq_intr_disable().

Yes, maybe.

One more option to adjust the vhost PMD is to do the new_device callback logic when the last queue is ready as was done before,
By this way, the vhost PMD will use the same timing as before.

> If we cannot solve it in a safe way, then we'll have no other choice than
> reverting partially your patch.

I'm sure we can find a solution.

As you probably remember we did the design for the readiness series together (a long discussion in other thread),
It came to solve an architecture issue in QEMU last versions which might affect vhost lib users in other cases.
We took it into account that some vhost lib users should do adjustment for the new behavior.


Matan 

> 
> Maxime
> 
> > I already mentioned it in other thread on this topic but didn't get reply.
> >
> >> Fixes: 604052ae5395 ("net/vhost: support queue update")
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >
  
Maxime Coquelin July 29, 2020, 9:10 a.m. UTC | #4
On 7/29/20 10:43 AM, Matan Azrad wrote:
> Hi Maxime
> 
> From: Maxime Coquelin:
>> On 7/28/20 9:03 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> At .new_device() time, only the first vring pair is now ready, other
>>>> vrings are consfigured later.
>>>>
>>>> Problem is that when application will setup and enable interrupts,
>>>> only the first queue pair Rx interrupt will be enabled.
>>>>
>>>> This patches fixes the issue by setting the number of max interrupts
>>>> to the number of Rx queues that will be later initialized. Then, as
>>>> soon as a Rx vring is ready, it removes the corresponding
>>>> uninitialized epoll event, and install a new one with the valid FD.
>>>
>>> Doesn't it race condition to the application decision?
>>> App may change the configuration per queue in any time by the app
>> control thread.
>>> The vhost PMD may change it usynchronically from the vhost control
>> thread in the vring state callback.
>>
>> Yes you are right there could be a race here,I'm looking into getting it done in
>> a safe way. Yet it is good to get the confirmation from Intel that it does fix the
>> problem on their side.
> 
> Intel case\l3fw-power is only one case, we can put out the fire now by solving specific case but we need a fix for the global usage.
> 
>> Based on David suggestion, it might be made safe by relying on
>> eth_rxq_intr_enable()/eth_rxq_intr_disable().
> 
> Yes, maybe.
> 
> One more option to adjust the vhost PMD is to do the new_device callback logic when the last queue is ready as was done before,
> By this way, the vhost PMD will use the same timing as before.

Yes, that's one option but that would not be ideal because we would
lose the benefits of what the initial series brings, and it is not a
light change that late in the release.

Trying to fix interrupts, we mainly risk to break interrupt support
which is not widely used. Trying to wait for all queue event to be
received may cause other regressions in all use-cases.

>> If we cannot solve it in a safe way, then we'll have no other choice than
>> reverting partially your patch.
> 
> I'm sure we can find a solution.
> 
> As you probably remember we did the design for the readiness series together (a long discussion in other thread),
> It came to solve an architecture issue in QEMU last versions which might affect vhost lib users in other cases.
> We took it into account that some vhost lib users should do adjustment for the new behavior.
> 
> 
> Matan 
> 
>>
>> Maxime
>>
>>> I already mentioned it in other thread on this topic but didn't get reply.
>>>
>>>> Fixes: 604052ae5395 ("net/vhost: support queue update")
>>>>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>
>
  

Patch

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 951929c663..0f6a0bbbf7 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -5,6 +5,7 @@ 
 #include <unistd.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <sys/epoll.h>
 
 #include <rte_mbuf.h>
 #include <rte_ethdev_driver.h>
@@ -593,7 +594,6 @@  eth_vhost_install_intr(struct rte_eth_dev *dev)
 {
 	struct rte_vhost_vring vring;
 	struct vhost_queue *vq;
-	int count = 0;
 	int nb_rxq = dev->data->nb_rx_queues;
 	int i;
 	int ret;
@@ -623,6 +623,8 @@  eth_vhost_install_intr(struct rte_eth_dev *dev)
 
 	VHOST_LOG(INFO, "Prepare intr vec\n");
 	for (i = 0; i < nb_rxq; i++) {
+		dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + i;
+		dev->intr_handle->efds[i] = -1;
 		vq = dev->data->rx_queues[i];
 		if (!vq) {
 			VHOST_LOG(INFO, "rxq-%d not setup yet, skip!\n", i);
@@ -641,14 +643,12 @@  eth_vhost_install_intr(struct rte_eth_dev *dev)
 				"rxq-%d's kickfd is invalid, skip!\n", i);
 			continue;
 		}
-		dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + i;
 		dev->intr_handle->efds[i] = vring.kickfd;
-		count++;
 		VHOST_LOG(INFO, "Installed intr vec for rxq-%d\n", i);
 	}
 
-	dev->intr_handle->nb_efd = count;
-	dev->intr_handle->max_intr = count + 1;
+	dev->intr_handle->nb_efd = nb_rxq;
+	dev->intr_handle->max_intr = nb_rxq + 1;
 	dev->intr_handle->type = RTE_INTR_HANDLE_VDEV;
 
 	return 0;
@@ -836,8 +836,11 @@  vring_conf_update(int vid, struct rte_eth_dev *eth_dev, uint16_t vring_id)
 	struct rte_eth_conf *dev_conf = &eth_dev->data->dev_conf;
 	struct pmd_internal *internal = eth_dev->data->dev_private;
 	struct rte_vhost_vring vring;
+	struct rte_intr_handle *handle;
+	struct rte_epoll_event rev;
 	int rx_idx = vring_id % 2 ? (vring_id - 1) >> 1 : -1;
 	int ret = 0;
+	int epfd;
 
 	/*
 	 * The vring kickfd may be changed after the new device notification.
@@ -857,7 +860,31 @@  vring_conf_update(int vid, struct rte_eth_dev *eth_dev, uint16_t vring_id)
 		if (vring.kickfd != eth_dev->intr_handle->efds[rx_idx]) {
 			VHOST_LOG(INFO, "kickfd for rxq-%d was changed.\n",
 					  rx_idx);
-			eth_dev->intr_handle->efds[rx_idx] = vring.kickfd;
+
+			/*
+			 * First remove invalid epoll event, and then isntall
+			 * the new one. May be solved with a proper API in the
+			 * future.
+			 */
+			handle = eth_dev->intr_handle;
+			handle->efds[rx_idx] = vring.kickfd;
+			epfd = handle->elist[rx_idx].epfd;
+			rev = handle->elist[rx_idx];
+			ret = rte_epoll_ctl(epfd, EPOLL_CTL_DEL, rev.fd,
+					&handle->elist[rx_idx]);
+			if (ret) {
+				VHOST_LOG(ERR, "Delete epoll event failed.\n");
+				return ret;
+			}
+
+			rev.fd = vring.kickfd;
+			handle->elist[rx_idx] = rev;
+			ret = rte_epoll_ctl(epfd, EPOLL_CTL_ADD, rev.fd,
+					&handle->elist[rx_idx]);
+			if (ret) {
+				VHOST_LOG(ERR, "Add epoll event failed.\n");
+				return ret;
+			}
 		}
 	}