[v2,3/3] net/vhost: fix interrupt mode
Checks
Commit Message
At .new_device() time, only the first vring pair is
now ready, other vrings are consfigured later.
Problem is that when application will setup and enable
interrupts, only the first queue pair Rx interrupt will
be enabled.
This patches fixes the issue by setting the number of
max interrupts to the number of Rx queues that will be
later initialized. Then, as soon as a Rx vring is ready,
it removes the corresponding uninitialized epoll event,
and install a new one with the valid FD.
Fixes: 604052ae5395 ("net/vhost: support queue update")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
drivers/net/vhost/rte_eth_vhost.c | 39 ++++++++++++++++++++++++++-----
1 file changed, 33 insertions(+), 6 deletions(-)
Comments
From: Maxime Coquelin:
> At .new_device() time, only the first vring pair is now ready, other vrings are
> consfigured later.
>
> Problem is that when application will setup and enable interrupts, only the
> first queue pair Rx interrupt will be enabled.
>
> This patches fixes the issue by setting the number of max interrupts to the
> number of Rx queues that will be later initialized. Then, as soon as a Rx vring
> is ready, it removes the corresponding uninitialized epoll event, and install a
> new one with the valid FD.
Doesn't it race condition to the application decision?
App may change the configuration per queue in any time by the app control thread.
The vhost PMD may change it usynchronically from the vhost control thread in the vring state callback.
I already mentioned it in other thread on this topic but didn't get reply.
> Fixes: 604052ae5395 ("net/vhost: support queue update")
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
On 7/28/20 9:03 PM, Matan Azrad wrote:
>
>
> From: Maxime Coquelin:
>> At .new_device() time, only the first vring pair is now ready, other vrings are
>> consfigured later.
>>
>> Problem is that when application will setup and enable interrupts, only the
>> first queue pair Rx interrupt will be enabled.
>>
>> This patches fixes the issue by setting the number of max interrupts to the
>> number of Rx queues that will be later initialized. Then, as soon as a Rx vring
>> is ready, it removes the corresponding uninitialized epoll event, and install a
>> new one with the valid FD.
>
> Doesn't it race condition to the application decision?
> App may change the configuration per queue in any time by the app control thread.
> The vhost PMD may change it usynchronically from the vhost control thread in the vring state callback.
Yes you are right there could be a race here,I'm looking into getting it
done in a safe way. Yet it is good to get the confirmation from Intel
that it does fix the problem on their side.
Based on David suggestion, it might be made safe by relying on
eth_rxq_intr_enable()/eth_rxq_intr_disable().
If we cannot solve it in a safe way, then we'll have no other choice
than reverting partially your patch.
Maxime
> I already mentioned it in other thread on this topic but didn't get reply.
>
>> Fixes: 604052ae5395 ("net/vhost: support queue update")
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
Hi Maxime
From: Maxime Coquelin:
> On 7/28/20 9:03 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> At .new_device() time, only the first vring pair is now ready, other
> >> vrings are consfigured later.
> >>
> >> Problem is that when application will setup and enable interrupts,
> >> only the first queue pair Rx interrupt will be enabled.
> >>
> >> This patches fixes the issue by setting the number of max interrupts
> >> to the number of Rx queues that will be later initialized. Then, as
> >> soon as a Rx vring is ready, it removes the corresponding
> >> uninitialized epoll event, and install a new one with the valid FD.
> >
> > Doesn't it race condition to the application decision?
> > App may change the configuration per queue in any time by the app
> control thread.
> > The vhost PMD may change it usynchronically from the vhost control
> thread in the vring state callback.
>
> Yes you are right there could be a race here,I'm looking into getting it done in
> a safe way. Yet it is good to get the confirmation from Intel that it does fix the
> problem on their side.
Intel case\l3fw-power is only one case, we can put out the fire now by solving specific case but we need a fix for the global usage.
> Based on David suggestion, it might be made safe by relying on
> eth_rxq_intr_enable()/eth_rxq_intr_disable().
Yes, maybe.
One more option to adjust the vhost PMD is to do the new_device callback logic when the last queue is ready as was done before,
By this way, the vhost PMD will use the same timing as before.
> If we cannot solve it in a safe way, then we'll have no other choice than
> reverting partially your patch.
I'm sure we can find a solution.
As you probably remember we did the design for the readiness series together (a long discussion in other thread),
It came to solve an architecture issue in QEMU last versions which might affect vhost lib users in other cases.
We took it into account that some vhost lib users should do adjustment for the new behavior.
Matan
>
> Maxime
>
> > I already mentioned it in other thread on this topic but didn't get reply.
> >
> >> Fixes: 604052ae5395 ("net/vhost: support queue update")
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >
On 7/29/20 10:43 AM, Matan Azrad wrote:
> Hi Maxime
>
> From: Maxime Coquelin:
>> On 7/28/20 9:03 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> At .new_device() time, only the first vring pair is now ready, other
>>>> vrings are consfigured later.
>>>>
>>>> Problem is that when application will setup and enable interrupts,
>>>> only the first queue pair Rx interrupt will be enabled.
>>>>
>>>> This patches fixes the issue by setting the number of max interrupts
>>>> to the number of Rx queues that will be later initialized. Then, as
>>>> soon as a Rx vring is ready, it removes the corresponding
>>>> uninitialized epoll event, and install a new one with the valid FD.
>>>
>>> Doesn't it race condition to the application decision?
>>> App may change the configuration per queue in any time by the app
>> control thread.
>>> The vhost PMD may change it usynchronically from the vhost control
>> thread in the vring state callback.
>>
>> Yes you are right there could be a race here,I'm looking into getting it done in
>> a safe way. Yet it is good to get the confirmation from Intel that it does fix the
>> problem on their side.
>
> Intel case\l3fw-power is only one case, we can put out the fire now by solving specific case but we need a fix for the global usage.
>
>> Based on David suggestion, it might be made safe by relying on
>> eth_rxq_intr_enable()/eth_rxq_intr_disable().
>
> Yes, maybe.
>
> One more option to adjust the vhost PMD is to do the new_device callback logic when the last queue is ready as was done before,
> By this way, the vhost PMD will use the same timing as before.
Yes, that's one option but that would not be ideal because we would
lose the benefits of what the initial series brings, and it is not a
light change that late in the release.
Trying to fix interrupts, we mainly risk to break interrupt support
which is not widely used. Trying to wait for all queue event to be
received may cause other regressions in all use-cases.
>> If we cannot solve it in a safe way, then we'll have no other choice than
>> reverting partially your patch.
>
> I'm sure we can find a solution.
>
> As you probably remember we did the design for the readiness series together (a long discussion in other thread),
> It came to solve an architecture issue in QEMU last versions which might affect vhost lib users in other cases.
> We took it into account that some vhost lib users should do adjustment for the new behavior.
>
>
> Matan
>
>>
>> Maxime
>>
>>> I already mentioned it in other thread on this topic but didn't get reply.
>>>
>>>> Fixes: 604052ae5395 ("net/vhost: support queue update")
>>>>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>
>
@@ -5,6 +5,7 @@
#include <unistd.h>
#include <pthread.h>
#include <stdbool.h>
+#include <sys/epoll.h>
#include <rte_mbuf.h>
#include <rte_ethdev_driver.h>
@@ -593,7 +594,6 @@ eth_vhost_install_intr(struct rte_eth_dev *dev)
{
struct rte_vhost_vring vring;
struct vhost_queue *vq;
- int count = 0;
int nb_rxq = dev->data->nb_rx_queues;
int i;
int ret;
@@ -623,6 +623,8 @@ eth_vhost_install_intr(struct rte_eth_dev *dev)
VHOST_LOG(INFO, "Prepare intr vec\n");
for (i = 0; i < nb_rxq; i++) {
+ dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + i;
+ dev->intr_handle->efds[i] = -1;
vq = dev->data->rx_queues[i];
if (!vq) {
VHOST_LOG(INFO, "rxq-%d not setup yet, skip!\n", i);
@@ -641,14 +643,12 @@ eth_vhost_install_intr(struct rte_eth_dev *dev)
"rxq-%d's kickfd is invalid, skip!\n", i);
continue;
}
- dev->intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + i;
dev->intr_handle->efds[i] = vring.kickfd;
- count++;
VHOST_LOG(INFO, "Installed intr vec for rxq-%d\n", i);
}
- dev->intr_handle->nb_efd = count;
- dev->intr_handle->max_intr = count + 1;
+ dev->intr_handle->nb_efd = nb_rxq;
+ dev->intr_handle->max_intr = nb_rxq + 1;
dev->intr_handle->type = RTE_INTR_HANDLE_VDEV;
return 0;
@@ -836,8 +836,11 @@ vring_conf_update(int vid, struct rte_eth_dev *eth_dev, uint16_t vring_id)
struct rte_eth_conf *dev_conf = ð_dev->data->dev_conf;
struct pmd_internal *internal = eth_dev->data->dev_private;
struct rte_vhost_vring vring;
+ struct rte_intr_handle *handle;
+ struct rte_epoll_event rev;
int rx_idx = vring_id % 2 ? (vring_id - 1) >> 1 : -1;
int ret = 0;
+ int epfd;
/*
* The vring kickfd may be changed after the new device notification.
@@ -857,7 +860,31 @@ vring_conf_update(int vid, struct rte_eth_dev *eth_dev, uint16_t vring_id)
if (vring.kickfd != eth_dev->intr_handle->efds[rx_idx]) {
VHOST_LOG(INFO, "kickfd for rxq-%d was changed.\n",
rx_idx);
- eth_dev->intr_handle->efds[rx_idx] = vring.kickfd;
+
+ /*
+ * First remove invalid epoll event, and then isntall
+ * the new one. May be solved with a proper API in the
+ * future.
+ */
+ handle = eth_dev->intr_handle;
+ handle->efds[rx_idx] = vring.kickfd;
+ epfd = handle->elist[rx_idx].epfd;
+ rev = handle->elist[rx_idx];
+ ret = rte_epoll_ctl(epfd, EPOLL_CTL_DEL, rev.fd,
+ &handle->elist[rx_idx]);
+ if (ret) {
+ VHOST_LOG(ERR, "Delete epoll event failed.\n");
+ return ret;
+ }
+
+ rev.fd = vring.kickfd;
+ handle->elist[rx_idx] = rev;
+ ret = rte_epoll_ctl(epfd, EPOLL_CTL_ADD, rev.fd,
+ &handle->elist[rx_idx]);
+ if (ret) {
+ VHOST_LOG(ERR, "Add epoll event failed.\n");
+ return ret;
+ }
}
}