Message ID | 1461259092-9309-1-git-send-email-huawei.xie@intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Yuanhan Liu |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 7542C2E83; Fri, 22 Apr 2016 11:12:39 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 9ACCD2C4B for <dev@dpdk.org>; Fri, 22 Apr 2016 11:12:37 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP; 22 Apr 2016 02:12:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,516,1455004800"; d="scan'208";a="937881742" Received: from dpdk15.sh.intel.com ([10.239.129.25]) by orsmga001.jf.intel.com with ESMTP; 22 Apr 2016 02:12:36 -0700 From: Huawei Xie <huawei.xie@intel.com> To: dev@dpdk.org Date: Fri, 22 Apr 2016 01:18:12 +0800 Message-Id: <1461259092-9309-1-git-send-email-huawei.xie@intel.com> X-Mailer: git-send-email 1.8.1.4 Subject: [dpdk-dev] [RFC PATCH] avail idx update optimizations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Huawei Xie
April 21, 2016, 5:18 p.m. UTC
eliminate unnecessary cache to cache transfer between virtio and vhost core --- drivers/net/virtio/virtqueue.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Comments
Forget to cc the mailing list. On 4/22/2016 9:53 PM, Xie, Huawei wrote: > Hi: > > This is a series of virtio/vhost idx/ring update optimizations for cache > to cache transfer. Actually I don't expect many of them as virtio/vhost > has already done quite right. > > For this patch, in a VM2VM test, i observed ~6% performance increase. > In VM1, run testpmd with txonly mode > In VM2, run testpmd with rxonly mode > In host, run testpmd(with two vhostpmds) with io forward > > Michael: > We have talked about this method when i tried the fixed ring. > > > On 4/22/2016 5:12 PM, Xie, Huawei wrote: >> eliminate unnecessary cache to cache transfer between virtio and vhost >> core >> >> --- >> drivers/net/virtio/virtqueue.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h >> index 4e9239e..8c46a83 100644 >> --- a/drivers/net/virtio/virtqueue.h >> +++ b/drivers/net/virtio/virtqueue.h >> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx) >> * descriptor. >> */ >> avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1)); >> - vq->vq_ring.avail->ring[avail_idx] = desc_idx; >> + if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx)) >> + vq->vq_ring.avail->ring[avail_idx] = desc_idx; >> vq->vq_avail_idx++; >> } >> >
On Sun, Apr 24, 2016 at 02:45:22AM +0000, Xie, Huawei wrote: > Forget to cc the mailing list. > > On 4/22/2016 9:53 PM, Xie, Huawei wrote: > > Hi: > > > > This is a series of virtio/vhost idx/ring update optimizations for cache > > to cache transfer. Actually I don't expect many of them as virtio/vhost > > has already done quite right. Hmm - is it a series or a single patch? > > For this patch, in a VM2VM test, i observed ~6% performance increase. Interesting. In that case, it seems likely that new ring layout would give you an even bigger performance gain. Could you take a look at tools/virtio/ringtest/ring.c in latest Linux and tell me what do you think? In particular, I know you looked at using vectored instructions to do ring updates - would the layout in tools/virtio/ringtest/ring.c interfere with that? > > In VM1, run testpmd with txonly mode > > In VM2, run testpmd with rxonly mode > > In host, run testpmd(with two vhostpmds) with io forward > > > > Michael: > > We have talked about this method when i tried the fixed ring. > > > > > > On 4/22/2016 5:12 PM, Xie, Huawei wrote: > >> eliminate unnecessary cache to cache transfer between virtio and vhost > >> core Yes I remember proposing this, but you probably should include the explanation about why this works in he commit log: - pre-format avail ring with expected descriptor index values - as long as entries are consumed in-order, there's no need to modify the avail ring - as long as avail ring is not modified, it can be valid in caches of both consumer and producer > >> > >> --- > >> drivers/net/virtio/virtqueue.h | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h > >> index 4e9239e..8c46a83 100644 > >> --- a/drivers/net/virtio/virtqueue.h > >> +++ b/drivers/net/virtio/virtqueue.h > >> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx) > >> * descriptor. > >> */ > >> avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1)); > >> - vq->vq_ring.avail->ring[avail_idx] = desc_idx; > >> + if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx)) > >> + vq->vq_ring.avail->ring[avail_idx] = desc_idx; > >> vq->vq_avail_idx++; > >> } > >> > > >
On 4/24/2016 5:15 PM, Michael S. Tsirkin wrote: > On Sun, Apr 24, 2016 at 02:45:22AM +0000, Xie, Huawei wrote: >> Forget to cc the mailing list. >> >> On 4/22/2016 9:53 PM, Xie, Huawei wrote: >>> Hi: >>> >>> This is a series of virtio/vhost idx/ring update optimizations for cache >>> to cache transfer. Actually I don't expect many of them as virtio/vhost >>> has already done quite right. > Hmm - is it a series or a single patch? Others in my mind is caching a copy of avail index in vhost. Use the cached copy if there are enough slots and then sync with the index in the ring. Haven't evaluated that idea. >>> For this patch, in a VM2VM test, i observed ~6% performance increase. > Interesting. In that case, it seems likely that new ring layout > would give you an even bigger performance gain. > Could you take a look at tools/virtio/ringtest/ring.c > in latest Linux and tell me what do you think? > In particular, I know you looked at using vectored instructions > to do ring updates - would the layout in tools/virtio/ringtest/ring.c > interfere with that? Thanks. Would check. You know i have ever tried fixing avail ring in the virtio driver. In purely vhost->virtio test, it could have 2~3 times performance boost, but it isn't that obvious if with the physical nic involved, investigating that issue. I had planned to sync with you whether it is generic enough that we could have a negotiated feature, either for in order descriptor processing or like fixed avail ring. Would check your new ring layout. > >>> In VM1, run testpmd with txonly mode >>> In VM2, run testpmd with rxonly mode >>> In host, run testpmd(with two vhostpmds) with io forward >>> >>> Michael: >>> We have talked about this method when i tried the fixed ring. >>> >>> >>> On 4/22/2016 5:12 PM, Xie, Huawei wrote: >>>> eliminate unnecessary cache to cache transfer between virtio and vhost >>>> core > Yes I remember proposing this, but you probably should include the > explanation about why this works in he commit log: > > - pre-format avail ring with expected descriptor index values > - as long as entries are consumed in-order, there's no > need to modify the avail ring > - as long as avail ring is not modified, it can be > valid in caches of both consumer and producer Yes, would add the explanation in the formal patch. > >>>> --- >>>> drivers/net/virtio/virtqueue.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h >>>> index 4e9239e..8c46a83 100644 >>>> --- a/drivers/net/virtio/virtqueue.h >>>> +++ b/drivers/net/virtio/virtqueue.h >>>> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx) >>>> * descriptor. >>>> */ >>>> avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1)); >>>> - vq->vq_ring.avail->ring[avail_idx] = desc_idx; >>>> + if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx)) >>>> + vq->vq_ring.avail->ring[avail_idx] = desc_idx; >>>> vq->vq_avail_idx++; >>>> } >>>>
On 4/24/2016 9:24 PM, Xie, Huawei wrote: > On 4/24/2016 5:15 PM, Michael S. Tsirkin wrote: >> On Sun, Apr 24, 2016 at 02:45:22AM +0000, Xie, Huawei wrote: >>> Forget to cc the mailing list. >>> >>> On 4/22/2016 9:53 PM, Xie, Huawei wrote: >>>> Hi: >>>> >>>> This is a series of virtio/vhost idx/ring update optimizations for cache >>>> to cache transfer. Actually I don't expect many of them as virtio/vhost >>>> has already done quite right. >> Hmm - is it a series or a single patch? > Others in my mind is caching a copy of avail index in vhost. Use the > cached copy if there are enough slots and then sync with the index in > the ring. > Haven't evaluated that idea. Tried cached vhost idx which gives a slight better perf, but will hold this patch, as i guess we couldn't blindly set this cached avail idx to 0, which might cause issue in live migration. > >>>> For this patch, in a VM2VM test, i observed ~6% performance increase. >> Interesting. In that case, it seems likely that new ring layout >> would give you an even bigger performance gain. >> Could you take a look at tools/virtio/ringtest/ring.c >> in latest Linux and tell me what do you think? >> In particular, I know you looked at using vectored instructions >> to do ring updates - would the layout in tools/virtio/ringtest/ring.c >> interfere with that? > Thanks. Would check. You know i have ever tried fixing avail ring in the > virtio driver. In purely vhost->virtio test, it could have 2~3 times > performance boost, but it isn't that obvious if with the physical nic > involved, investigating that issue. > I had planned to sync with you whether it is generic enough that we > could have a negotiated feature, either for in order descriptor > processing or like fixed avail ring. Would check your new ring layout. > > >>>> In VM1, run testpmd with txonly mode >>>> In VM2, run testpmd with rxonly mode >>>> In host, run testpmd(with two vhostpmds) with io forward >>>> >>>> Michael: >>>> We have talked about this method when i tried the fixed ring. >>>> >>>> >>>> On 4/22/2016 5:12 PM, Xie, Huawei wrote: >>>>> eliminate unnecessary cache to cache transfer between virtio and vhost >>>>> core >> Yes I remember proposing this, but you probably should include the >> explanation about why this works in he commit log: >> >> - pre-format avail ring with expected descriptor index values >> - as long as entries are consumed in-order, there's no >> need to modify the avail ring >> - as long as avail ring is not modified, it can be >> valid in caches of both consumer and producer > Yes, would add the explanation in the formal patch. > > >>>>> --- >>>>> drivers/net/virtio/virtqueue.h | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h >>>>> index 4e9239e..8c46a83 100644 >>>>> --- a/drivers/net/virtio/virtqueue.h >>>>> +++ b/drivers/net/virtio/virtqueue.h >>>>> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx) >>>>> * descriptor. >>>>> */ >>>>> avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1)); >>>>> - vq->vq_ring.avail->ring[avail_idx] = desc_idx; >>>>> + if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx)) >>>>> + vq->vq_ring.avail->ring[avail_idx] = desc_idx; >>>>> vq->vq_avail_idx++; >>>>> } >>>>> >
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 4e9239e..8c46a83 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx) * descriptor. */ avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1)); - vq->vq_ring.avail->ring[avail_idx] = desc_idx; + if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx)) + vq->vq_ring.avail->ring[avail_idx] = desc_idx; vq->vq_avail_idx++; }