Message ID | 1429720392-25345-1-git-send-email-huawei.xie@intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Thomas Monjalon |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 026865A52; Wed, 22 Apr 2015 18:33:35 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 6553E5A3E for <dev@dpdk.org>; Wed, 22 Apr 2015 18:33:33 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP; 22 Apr 2015 09:33:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,624,1422950400"; d="scan'208";a="484255692" Received: from shvmail01.sh.intel.com ([10.239.29.42]) by FMSMGA003.fm.intel.com with ESMTP; 22 Apr 2015 09:33:20 -0700 Received: from shecgisg003.sh.intel.com (shecgisg003.sh.intel.com [10.239.29.90]) by shvmail01.sh.intel.com with ESMTP id t3MGXIbD014889; Thu, 23 Apr 2015 00:33:18 +0800 Received: from shecgisg003.sh.intel.com (localhost [127.0.0.1]) by shecgisg003.sh.intel.com (8.13.6/8.13.6/SuSE Linux 0.8) with ESMTP id t3MGXGFJ025379; Thu, 23 Apr 2015 00:33:18 +0800 Received: (from hxie5@localhost) by shecgisg003.sh.intel.com (8.13.6/8.13.6/Submit) id t3MGXEGP025375; Thu, 23 Apr 2015 00:33:14 +0800 From: Huawei Xie <huawei.xie@intel.com> To: dev@dpdk.org Date: Thu, 23 Apr 2015 00:33:12 +0800 Message-Id: <1429720392-25345-1-git-send-email-huawei.xie@intel.com> X-Mailer: git-send-email 1.7.4.1 Cc: mst@redhat.com Subject: [dpdk-dev] [PATCH] vhost: flush used->idx update before reading avail->flags X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Huawei Xie
April 22, 2015, 4:33 p.m. UTC
update of used->idx and read of avail->flags could be reordered.
memory fence should be used to ensure the order, otherwise guest could see a stale used->idx value after it toggles the interrupt suppression flag.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
lib/librte_vhost/vhost_rxtx.c | 3 +++
1 file changed, 3 insertions(+)
Comments
On 22 April 2015 at 18:33, Huawei Xie <huawei.xie@intel.com> wrote: > update of used->idx and read of avail->flags could be reordered. > memory fence should be used to ensure the order, otherwise guest could see > a stale used->idx value after it toggles the interrupt suppression flag. > This patch looks right to me.
On 2015/4/23 0:33, Huawei Xie wrote: > update of used->idx and read of avail->flags could be reordered. > memory fence should be used to ensure the order, otherwise guest could see a stale used->idx value after it toggles the interrupt suppression flag. > > Signed-off-by: Huawei Xie <huawei.xie@intel.com> > --- > lib/librte_vhost/vhost_rxtx.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > index 510ffe8..6afba35 100644 > --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c > @@ -178,6 +178,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > *(volatile uint16_t *)&vq->used->idx += count; > vq->last_used_idx = res_end_idx; > > + /* flush used->idx update before we read avail->flags. */ > + rte_mb(); > + > /* Kick the guest if necessary. */ > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) > eventfd_write((int)vq->callfd, 1); > If not add memory fence what would happen? Packets loss or interrupt loss?How to test it ?
> -----Original Message----- > From: Linhaifeng [mailto:haifeng.lin@huawei.com] > Sent: Friday, April 24, 2015 9:01 AM > To: Xie, Huawei; dev@dpdk.org > Cc: luke@snabb.co; mst@redhat.com > Subject: Re: [PATCH] vhost: flush used->idx update before reading avail- > >flags > > > > On 2015/4/23 0:33, Huawei Xie wrote: > > update of used->idx and read of avail->flags could be reordered. > > memory fence should be used to ensure the order, otherwise guest could > see a stale used->idx value after it toggles the interrupt suppression flag. > > > > Signed-off-by: Huawei Xie <huawei.xie@intel.com> > > --- > > lib/librte_vhost/vhost_rxtx.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > > index 510ffe8..6afba35 100644 > > --- a/lib/librte_vhost/vhost_rxtx.c > > +++ b/lib/librte_vhost/vhost_rxtx.c > > @@ -178,6 +178,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, > > *(volatile uint16_t *)&vq->used->idx += count; > > vq->last_used_idx = res_end_idx; > > > > + /* flush used->idx update before we read avail->flags. */ > > + rte_mb(); > > + > > /* Kick the guest if necessary. */ > > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) > > eventfd_write((int)vq->callfd, 1); > > > > If not add memory fence what would happen? Packets loss or interrupt > loss?How to test it ? If those two store and load are reordered, guest toggles the interrupt suppression flag, then it checks that there is no more work to do. Actually there is.
On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote: > If not add memory fence what would happen? Packets loss or interrupt > loss?How to test it ? > You should be able to test it like this: 1. Boot two Linux kernel (e.g. 3.13) guests. 2. Connect them via vhost switch. 3. Run continuous traffic between them (e.g. iperf). I would expect that within a reasonable timeframe (< 1 hour) one of the guests' network interfaces will hang indefinitely due to a missed interrupt. You won't be able to reproduce this using DPDK guests because they are not using the same interrupt suppression method. This is a serious real-world problem. I wouldn't deploy the vhost implementation without this fix. Cheers, -Luke
On 2015/4/24 15:27, Luke Gorrie wrote: > On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote: > >> If not add memory fence what would happen? Packets loss or interrupt >> loss?How to test it ? >> > > You should be able to test it like this: > > 1. Boot two Linux kernel (e.g. 3.13) guests. > 2. Connect them via vhost switch. > 3. Run continuous traffic between them (e.g. iperf). > > I would expect that within a reasonable timeframe (< 1 hour) one of the > guests' network interfaces will hang indefinitely due to a missed interrupt. > > You won't be able to reproduce this using DPDK guests because they are not > using the same interrupt suppression method. > > This is a serious real-world problem. I wouldn't deploy the vhost > implementation without this fix. > > Cheers, > -Luke > I think this patch can't resole this problem. On the other hand we still would miss interrupt. After add rte_mb() function the we want the case is : 1.write used->idx. ring is full or empty. 2.virtio_net open interrupt. 3.read avail->flags. but this case(miss interrupt) would happen too: 1.write used->idx. ring is full or empty. 2.read avail->flags. 3.virtio_net open interrupt.
On 9 June 2015 at 09:04, Linhaifeng <haifeng.lin@huawei.com> wrote: > On 2015/4/24 15:27, Luke Gorrie wrote: > > You should be able to test it like this: > > > > 1. Boot two Linux kernel (e.g. 3.13) guests. > > 2. Connect them via vhost switch. > > 3. Run continuous traffic between them (e.g. iperf). > > > > I would expect that within a reasonable timeframe (< 1 hour) one of the > > guests' network interfaces will hang indefinitely due to a missed > interrupt. > > > > You won't be able to reproduce this using DPDK guests because they are > not > > using the same interrupt suppression method. > > I think this patch can't resole this problem. On the other hand we still > would miss interrupt. > For what it is worth, we were able to reproduce the problem as described above with older Snabb Switch releases and we were also able to verify that inserting a memory barrier fixes this problem. This is the relevant commit in the snabbswitch repo for reference: https://github.com/SnabbCo/snabbswitch/commit/c33cdd8704246887e11d7c353f773f7b488a47f2 In a nutshell, we added an MFENCE instruction after writing used->idx and before checking VRING_F_NO_INTERRUPT. I have not tested this case under DPDK myself and so I am not really certain which memory barrier operations are sufficient/insufficient in that context. I hope that our experience is relevant/helpful though and I am happy to explain more about that if I have missed any important details. Cheers, -Luke
On Tue, Jun 09, 2015 at 03:04:02PM +0800, Linhaifeng wrote: > > > On 2015/4/24 15:27, Luke Gorrie wrote: > > On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote: > > > >> If not add memory fence what would happen? Packets loss or interrupt > >> loss?How to test it ? > >> > > > > You should be able to test it like this: > > > > 1. Boot two Linux kernel (e.g. 3.13) guests. > > 2. Connect them via vhost switch. > > 3. Run continuous traffic between them (e.g. iperf). > > > > I would expect that within a reasonable timeframe (< 1 hour) one of the > > guests' network interfaces will hang indefinitely due to a missed interrupt. > > > > You won't be able to reproduce this using DPDK guests because they are not > > using the same interrupt suppression method. > > > > This is a serious real-world problem. I wouldn't deploy the vhost > > implementation without this fix. > > > > Cheers, > > -Luke > > > > I think this patch can't resole this problem. On the other hand we still would miss interrupt. > > After add rte_mb() function the we want the case is : > 1.write used->idx. ring is full or empty. > 2.virtio_net open interrupt. > 3.read avail->flags. > > but this case(miss interrupt) would happen too: > 1.write used->idx. ring is full or empty. > 2.read avail->flags. > 3.virtio_net open interrupt. > That's why a correct guest, after detecting an empty used ring, must always re-check used idx at least once after writing avail->flags. By the way, similarly, host side must re-check avail idx after writing used flags. I don't see where snabbswitch does it - is that a bug in snabbswitch?
On 6/9/2015 4:47 PM, Michael S. Tsirkin wrote: > On Tue, Jun 09, 2015 at 03:04:02PM +0800, Linhaifeng wrote: >> >> On 2015/4/24 15:27, Luke Gorrie wrote: >>> On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote: >>> >>>> If not add memory fence what would happen? Packets loss or interrupt >>>> loss?How to test it ? >>>> >>> You should be able to test it like this: >>> >>> 1. Boot two Linux kernel (e.g. 3.13) guests. >>> 2. Connect them via vhost switch. >>> 3. Run continuous traffic between them (e.g. iperf). >>> >>> I would expect that within a reasonable timeframe (< 1 hour) one of the >>> guests' network interfaces will hang indefinitely due to a missed interrupt. >>> >>> You won't be able to reproduce this using DPDK guests because they are not >>> using the same interrupt suppression method. >>> >>> This is a serious real-world problem. I wouldn't deploy the vhost >>> implementation without this fix. >>> >>> Cheers, >>> -Luke >>> >> I think this patch can't resole this problem. On the other hand we still would miss interrupt. >> >> After add rte_mb() function the we want the case is : >> 1.write used->idx. ring is full or empty. >> 2.virtio_net open interrupt. >> 3.read avail->flags. >> >> but this case(miss interrupt) would happen too: >> 1.write used->idx. ring is full or empty. >> 2.read avail->flags. >> 3.virtio_net open interrupt. >> > That's why a correct guest, after detecting an empty used ring, must always > re-check used idx at least once after writing avail->flags. > > By the way, similarly, host side must re-check avail idx after writing > used flags. I don't see where snabbswitch does it - is that a bug > in snabbswitch? > yes, both host and guest should recheck if there is more work added after they toggle the flag. For DPDK vHost, as it runs in polling mode, we will recheck avail idx soon, so we don't need recheck.
On 2015/6/9 21:34, Xie, Huawei wrote: > On 6/9/2015 4:47 PM, Michael S. Tsirkin wrote: >> On Tue, Jun 09, 2015 at 03:04:02PM +0800, Linhaifeng wrote: >>> >>> On 2015/4/24 15:27, Luke Gorrie wrote: >>>> On 24 April 2015 at 03:01, Linhaifeng <haifeng.lin@huawei.com> wrote: >>>> >>>>> If not add memory fence what would happen? Packets loss or interrupt >>>>> loss?How to test it ? >>>>> >>>> You should be able to test it like this: >>>> >>>> 1. Boot two Linux kernel (e.g. 3.13) guests. >>>> 2. Connect them via vhost switch. >>>> 3. Run continuous traffic between them (e.g. iperf). >>>> >>>> I would expect that within a reasonable timeframe (< 1 hour) one of the >>>> guests' network interfaces will hang indefinitely due to a missed interrupt. >>>> >>>> You won't be able to reproduce this using DPDK guests because they are not >>>> using the same interrupt suppression method. >>>> >>>> This is a serious real-world problem. I wouldn't deploy the vhost >>>> implementation without this fix. >>>> >>>> Cheers, >>>> -Luke >>>> >>> I think this patch can't resole this problem. On the other hand we still would miss interrupt. >>> >>> After add rte_mb() function the we want the case is : >>> 1.write used->idx. ring is full or empty. >>> 2.virtio_net open interrupt. >>> 3.read avail->flags. >>> >>> but this case(miss interrupt) would happen too: >>> 1.write used->idx. ring is full or empty. >>> 2.read avail->flags. >>> 3.virtio_net open interrupt. >>> >> That's why a correct guest, after detecting an empty used ring, must always >> re-check used idx at least once after writing avail->flags. >> >> By the way, similarly, host side must re-check avail idx after writing >> used flags. I don't see where snabbswitch does it - is that a bug >> in snabbswitch? >> > yes, both host and guest should recheck if there is more work added > after they toggle the flag. > For DPDK vHost, as it runs in polling mode, we will recheck avail idx > soon, so we don't need recheck. > > DPDK does check the avail idx but does nothing like this: if (vq->last_used_idx == avail_idx) { return; } If we miss an interrupt after calling rte_mb(), (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) is False; while (vq->last_used_idx == avail_idx) is True, then the guest will miss the interrupt forever and virtio-net would stop working. Would this case happen?
On 9 June 2015 at 10:46, Michael S. Tsirkin <mst@redhat.com> wrote: > By the way, similarly, host side must re-check avail idx after writing > used flags. I don't see where snabbswitch does it - is that a bug > in snabbswitch? Good question. Snabb Switch does not use interrupts from the guest. We always set VRING_F_NO_NOTIFY to tell the guest that it need not interrupt us. Then we run in poll mode and in practice check the avail ring for new descriptors every 20us or so. So the argument for not needing this check in both Snabb Switch and DPDK is that we are running poll mode and don't notice whether interrupts are being sent or not. Is that a solid argument or do I misunderstand what the race condition is? Cheers, -Luke
On 2015/6/10 16:30, Luke Gorrie wrote: > On 9 June 2015 at 10:46, Michael S. Tsirkin <mst@redhat.com> wrote: > >> By the way, similarly, host side must re-check avail idx after writing >> used flags. I don't see where snabbswitch does it - is that a bug >> in snabbswitch? > > > Good question. > > Snabb Switch does not use interrupts from the guest. We always set > VRING_F_NO_NOTIFY to tell the guest that it need not interrupt us. Then we > run in poll mode and in practice check the avail ring for new descriptors > every 20us or so. Yes, host not need guest to notify when in poll mode but host also need to notify guest who use virtio_net when vring is full or emtpy.If host loss this notification guest would stop working. > > So the argument for not needing this check in both Snabb Switch and DPDK is > that we are running poll mode and don't notice whether interrupts are being > sent or not. > > Is that a solid argument or do I misunderstand what the race condition is? > > Cheers, > -Luke >
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 510ffe8..6afba35 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -178,6 +178,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, *(volatile uint16_t *)&vq->used->idx += count; vq->last_used_idx = res_end_idx; + /* flush used->idx update before we read avail->flags. */ + rte_mb(); + /* Kick the guest if necessary. */ if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) eventfd_write((int)vq->callfd, 1);