[dpdk-dev] vhost: flush used->idx update before reading avail->flags

Message ID 1429720392-25345-1-git-send-email-huawei.xie@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Huawei Xie April 22, 2015, 4:33 p.m. UTC
  update of used->idx and read of avail->flags could be reordered.
memory fence should be used to ensure the order, otherwise guest could see a stale used->idx value after it toggles the interrupt suppression flag.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_rxtx.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Luke Gorrie April 23, 2015, 1:57 p.m. UTC | #1
On 22 April 2015 at 18:33, Huawei Xie <huawei.xie@intel.com> wrote:

> update of used->idx and read of avail->flags could be reordered.
> memory fence should be used to ensure the order, otherwise guest could see
> a stale used->idx value after it toggles the interrupt suppression flag.
>

This patch looks right to me.
  
Linhaifeng April 24, 2015, 1:01 a.m. UTC | #2
On 2015/4/23 0:33, Huawei Xie wrote:
> update of used->idx and read of avail->flags could be reordered.
> memory fence should be used to ensure the order, otherwise guest could see a stale used->idx value after it toggles the interrupt suppression flag.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> ---
>  lib/librte_vhost/vhost_rxtx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 510ffe8..6afba35 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -178,6 +178,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>  	*(volatile uint16_t *)&vq->used->idx += count;
>  	vq->last_used_idx = res_end_idx;
>  
> +	/* flush used->idx update before we read avail->flags. */
> +	rte_mb();
> +
>  	/* Kick the guest if necessary. */
>  	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>  		eventfd_write((int)vq->callfd, 1);
> 

If not add memory fence what would happen? Packets loss or interrupt loss?How to test it ?
  
Huawei Xie April 24, 2015, 2:40 a.m. UTC | #3
> -----Original Message-----
> From: Linhaifeng [mailto:haifeng.lin@huawei.com]
> Sent: Friday, April 24, 2015 9:01 AM
> To: Xie, Huawei; dev@dpdk.org
> Cc: luke@snabb.co; mst@redhat.com
> Subject: Re: [PATCH] vhost: flush used->idx update before reading avail-
> >flags
> 
> 
> 
> On 2015/4/23 0:33, Huawei Xie wrote:
> > update of used->idx and read of avail->flags could be reordered.
> > memory fence should be used to ensure the order, otherwise guest could
> see a stale used->idx value after it toggles the interrupt suppression flag.
> >
> > Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> > ---
> >  lib/librte_vhost/vhost_rxtx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> > index 510ffe8..6afba35 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -178,6 +178,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
> >  	*(volatile uint16_t *)&vq->used->idx += count;
> >  	vq->last_used_idx = res_end_idx;
> >
> > +	/* flush used->idx update before we read avail->flags. */
> > +	rte_mb();
> > +
> >  	/* Kick the guest if necessary. */
> >  	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> >  		eventfd_write((int)vq->callfd, 1);
> >
> 
> If not add memory fence what would happen? Packets loss or interrupt
> loss?How to test it ?

If those two store and load are reordered,  guest toggles the interrupt suppression flag, then it checks that there is no more work to do.
Actually there is.
  
Luke Gorrie April 24, 2015, 7:27 a.m. UTC | #4
On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote:

> If not add memory fence what would happen? Packets loss or interrupt
> loss?How to test it ?
>

You should be able to test it like this:

1. Boot two Linux kernel (e.g. 3.13) guests.
2. Connect them via vhost switch.
3. Run continuous traffic between them (e.g. iperf).

I would expect that within a reasonable timeframe (< 1 hour) one of the
guests' network interfaces will hang indefinitely due to a missed interrupt.

You won't be able to reproduce this using DPDK guests because they are not
using the same interrupt suppression method.

This is a serious real-world problem. I wouldn't deploy the vhost
implementation without this fix.

Cheers,
-Luke
  
Linhaifeng June 9, 2015, 7:04 a.m. UTC | #5
On 2015/4/24 15:27, Luke Gorrie wrote:
> On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote:
> 
>> If not add memory fence what would happen? Packets loss or interrupt
>> loss?How to test it ?
>>
> 
> You should be able to test it like this:
> 
> 1. Boot two Linux kernel (e.g. 3.13) guests.
> 2. Connect them via vhost switch.
> 3. Run continuous traffic between them (e.g. iperf).
> 
> I would expect that within a reasonable timeframe (< 1 hour) one of the
> guests' network interfaces will hang indefinitely due to a missed interrupt.
> 
> You won't be able to reproduce this using DPDK guests because they are not
> using the same interrupt suppression method.
> 
> This is a serious real-world problem. I wouldn't deploy the vhost
> implementation without this fix.
> 
> Cheers,
> -Luke
> 

I think this patch can't resole this problem. On the other hand we still would miss interrupt.

After add rte_mb() function the we want the case is :
1.write used->idx. ring is full or empty.
2.virtio_net open interrupt.
3.read avail->flags.

but this case(miss interrupt) would happen too:
1.write used->idx. ring is full or empty.
2.read avail->flags.
3.virtio_net open interrupt.
  
Luke Gorrie June 9, 2015, 7:45 a.m. UTC | #6
On 9 June 2015 at 09:04, Linhaifeng <haifeng.lin@huawei.com> wrote:

> On 2015/4/24 15:27, Luke Gorrie wrote:
> > You should be able to test it like this:
> >
> > 1. Boot two Linux kernel (e.g. 3.13) guests.
> > 2. Connect them via vhost switch.
> > 3. Run continuous traffic between them (e.g. iperf).
> >
> > I would expect that within a reasonable timeframe (< 1 hour) one of the
> > guests' network interfaces will hang indefinitely due to a missed
> interrupt.
> >
> > You won't be able to reproduce this using DPDK guests because they are
> not
> > using the same interrupt suppression method.
>
> I think this patch can't resole this problem. On the other hand we still
> would miss interrupt.
>

For what it is worth, we were able to reproduce the problem as described
above with older Snabb Switch releases and we were also able to verify that
inserting a memory barrier fixes this problem.

This is the relevant commit in the snabbswitch repo for reference:
https://github.com/SnabbCo/snabbswitch/commit/c33cdd8704246887e11d7c353f773f7b488a47f2

In a nutshell, we added an MFENCE instruction after writing used->idx and
before checking VRING_F_NO_INTERRUPT.

I have not tested this case under DPDK myself and so I am not really
certain which memory barrier operations are sufficient/insufficient in that
context. I hope that our experience is relevant/helpful though and I am
happy to explain more about that if I have missed any important details.

Cheers,
-Luke
  
Michael S. Tsirkin June 9, 2015, 8:46 a.m. UTC | #7
On Tue, Jun 09, 2015 at 03:04:02PM +0800, Linhaifeng wrote:
> 
> 
> On 2015/4/24 15:27, Luke Gorrie wrote:
> > On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote:
> > 
> >> If not add memory fence what would happen? Packets loss or interrupt
> >> loss?How to test it ?
> >>
> > 
> > You should be able to test it like this:
> > 
> > 1. Boot two Linux kernel (e.g. 3.13) guests.
> > 2. Connect them via vhost switch.
> > 3. Run continuous traffic between them (e.g. iperf).
> > 
> > I would expect that within a reasonable timeframe (< 1 hour) one of the
> > guests' network interfaces will hang indefinitely due to a missed interrupt.
> > 
> > You won't be able to reproduce this using DPDK guests because they are not
> > using the same interrupt suppression method.
> > 
> > This is a serious real-world problem. I wouldn't deploy the vhost
> > implementation without this fix.
> > 
> > Cheers,
> > -Luke
> > 
> 
> I think this patch can't resole this problem. On the other hand we still would miss interrupt.
> 
> After add rte_mb() function the we want the case is :
> 1.write used->idx. ring is full or empty.
> 2.virtio_net open interrupt.
> 3.read avail->flags.
> 
> but this case(miss interrupt) would happen too:
> 1.write used->idx. ring is full or empty.
> 2.read avail->flags.
> 3.virtio_net open interrupt.
> 

That's why a correct guest, after detecting an empty used ring, must always
re-check used idx at least once after writing avail->flags.

By the way, similarly, host side must re-check avail idx after writing
used flags. I don't see where snabbswitch does it - is that a bug
in snabbswitch?
  
Huawei Xie June 9, 2015, 1:34 p.m. UTC | #8
On 6/9/2015 4:47 PM, Michael S. Tsirkin wrote:
> On Tue, Jun 09, 2015 at 03:04:02PM +0800, Linhaifeng wrote:
>>
>> On 2015/4/24 15:27, Luke Gorrie wrote:
>>> On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote:
>>>
>>>> If not add memory fence what would happen? Packets loss or interrupt
>>>> loss?How to test it ?
>>>>
>>> You should be able to test it like this:
>>>
>>> 1. Boot two Linux kernel (e.g. 3.13) guests.
>>> 2. Connect them via vhost switch.
>>> 3. Run continuous traffic between them (e.g. iperf).
>>>
>>> I would expect that within a reasonable timeframe (< 1 hour) one of the
>>> guests' network interfaces will hang indefinitely due to a missed interrupt.
>>>
>>> You won't be able to reproduce this using DPDK guests because they are not
>>> using the same interrupt suppression method.
>>>
>>> This is a serious real-world problem. I wouldn't deploy the vhost
>>> implementation without this fix.
>>>
>>> Cheers,
>>> -Luke
>>>
>> I think this patch can't resole this problem. On the other hand we still would miss interrupt.
>>
>> After add rte_mb() function the we want the case is :
>> 1.write used->idx. ring is full or empty.
>> 2.virtio_net open interrupt.
>> 3.read avail->flags.
>>
>> but this case(miss interrupt) would happen too:
>> 1.write used->idx. ring is full or empty.
>> 2.read avail->flags.
>> 3.virtio_net open interrupt.
>>
> That's why a correct guest, after detecting an empty used ring, must always
> re-check used idx at least once after writing avail->flags.
>
> By the way, similarly, host side must re-check avail idx after writing
> used flags. I don't see where snabbswitch does it - is that a bug
> in snabbswitch?
>
yes, both host and guest should recheck if there is more work added
after they toggle the flag.
For DPDK vHost, as it runs in polling mode, we will recheck avail idx
soon, so we don't need recheck.
  
Linhaifeng June 10, 2015, 2:36 a.m. UTC | #9
On 2015/6/9 21:34, Xie, Huawei wrote:
> On 6/9/2015 4:47 PM, Michael S. Tsirkin wrote:
>> On Tue, Jun 09, 2015 at 03:04:02PM +0800, Linhaifeng wrote:
>>>
>>> On 2015/4/24 15:27, Luke Gorrie wrote:
>>>> On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote:
>>>>
>>>>> If not add memory fence what would happen? Packets loss or interrupt
>>>>> loss?How to test it ?
>>>>>
>>>> You should be able to test it like this:
>>>>
>>>> 1. Boot two Linux kernel (e.g. 3.13) guests.
>>>> 2. Connect them via vhost switch.
>>>> 3. Run continuous traffic between them (e.g. iperf).
>>>>
>>>> I would expect that within a reasonable timeframe (< 1 hour) one of the
>>>> guests' network interfaces will hang indefinitely due to a missed interrupt.
>>>>
>>>> You won't be able to reproduce this using DPDK guests because they are not
>>>> using the same interrupt suppression method.
>>>>
>>>> This is a serious real-world problem. I wouldn't deploy the vhost
>>>> implementation without this fix.
>>>>
>>>> Cheers,
>>>> -Luke
>>>>
>>> I think this patch can't resole this problem. On the other hand we still would miss interrupt.
>>>
>>> After add rte_mb() function the we want the case is :
>>> 1.write used->idx. ring is full or empty.
>>> 2.virtio_net open interrupt.
>>> 3.read avail->flags.
>>>
>>> but this case(miss interrupt) would happen too:
>>> 1.write used->idx. ring is full or empty.
>>> 2.read avail->flags.
>>> 3.virtio_net open interrupt.
>>>
>> That's why a correct guest, after detecting an empty used ring, must always
>> re-check used idx at least once after writing avail->flags.
>>
>> By the way, similarly, host side must re-check avail idx after writing
>> used flags. I don't see where snabbswitch does it - is that a bug
>> in snabbswitch?
>>
> yes, both host and guest should recheck if there is more work added
> after they toggle the flag.
> For DPDK vHost, as it runs in polling mode, we will recheck avail idx
> soon, so we don't need recheck.
> 
> 

DPDK does check the avail idx but does nothing like this:
if (vq->last_used_idx == avail_idx) {
	return;
}

If we miss an interrupt after calling rte_mb(),
(!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) is False;
while (vq->last_used_idx == avail_idx) is True,

then the guest will miss the interrupt forever and virtio-net would stop working.

Would this case happen?
  
Luke Gorrie June 10, 2015, 8:30 a.m. UTC | #10
On 9 June 2015 at 10:46, Michael S. Tsirkin <mst@redhat.com> wrote:

> By the way, similarly, host side must re-check avail idx after writing
> used flags. I don't see where snabbswitch does it - is that a bug
> in snabbswitch?


Good question.

Snabb Switch does not use interrupts from the guest. We always set
VRING_F_NO_NOTIFY to tell the guest that it need not interrupt us. Then we
run in poll mode and in practice check the avail ring for new descriptors
every 20us or so.

So the argument for not needing this check in both Snabb Switch and DPDK is
that we are running poll mode and don't notice whether interrupts are being
sent or not.

Is that a solid argument or do I misunderstand what the race condition is?

Cheers,
-Luke
  
Linhaifeng June 11, 2015, 12:50 a.m. UTC | #11
On 2015/6/10 16:30, Luke Gorrie wrote:
> On 9 June 2015 at 10:46, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>> By the way, similarly, host side must re-check avail idx after writing
>> used flags. I don't see where snabbswitch does it - is that a bug
>> in snabbswitch?
> 
> 
> Good question.
> 
> Snabb Switch does not use interrupts from the guest. We always set
> VRING_F_NO_NOTIFY to tell the guest that it need not interrupt us. Then we
> run in poll mode and in practice check the avail ring for new descriptors
> every 20us or so.

Yes, host not need guest to notify when in poll mode but host also need to
notify guest who use virtio_net when vring is full or emtpy.If host loss this
notification guest would stop working.

> 
> So the argument for not needing this check in both Snabb Switch and DPDK is
> that we are running poll mode and don't notice whether interrupts are being
> sent or not.
> 
> Is that a solid argument or do I misunderstand what the race condition is?
> 
> Cheers,
> -Luke
>
  

Patch

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 510ffe8..6afba35 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -178,6 +178,9 @@  virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	*(volatile uint16_t *)&vq->used->idx += count;
 	vq->last_used_idx = res_end_idx;
 
+	/* flush used->idx update before we read avail->flags. */
+	rte_mb();
+
 	/* Kick the guest if necessary. */
 	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
 		eventfd_write((int)vq->callfd, 1);