diff mbox

[dpdk-dev,RFC] avail idx update optimizations

Message ID	1461259092-9309-1-git-send-email-huawei.xie@intel.com (mailing list archive)
State	Superseded, archived
Delegated to:	Yuanhan Liu
Headers	From: Huawei Xie <huawei.xie@intel.com> To: dev@dpdk.org Date: Fri, 22 Apr 2016 01:18:12 +0800 Message-Id: <1461259092-9309-1-git-send-email-huawei.xie@intel.com> Subject: [dpdk-dev] [RFC PATCH] avail idx update optimizations Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>

Commit Message

Huawei Xie April 21, 2016, 5:18 p.m. UTC

  eliminate unnecessary cache to cache transfer between virtio and vhost
core

---
 drivers/net/virtio/virtqueue.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Huawei Xie April 24, 2016, 2:45 a.m. UTC | #1

Forget to cc the mailing list.

On 4/22/2016 9:53 PM, Xie, Huawei wrote:
> Hi:
>
> This is a series of virtio/vhost idx/ring update optimizations for cache
> to cache transfer. Actually I don't expect many of them as virtio/vhost
> has already done quite right.
>
> For this patch, in a VM2VM test, i observed ~6% performance increase.
> In VM1, run testpmd with txonly mode
> In VM2, run testpmd with rxonly mode
> In host, run testpmd(with two vhostpmds) with io forward
>
> Michael:
> We have talked about this method when i tried the fixed ring.
>
>
> On 4/22/2016 5:12 PM, Xie, Huawei wrote:
>> eliminate unnecessary cache to cache transfer between virtio and vhost
>> core
>>
>> ---
>>  drivers/net/virtio/virtqueue.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
>> index 4e9239e..8c46a83 100644
>> --- a/drivers/net/virtio/virtqueue.h
>> +++ b/drivers/net/virtio/virtqueue.h
>> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
>>  	 * descriptor.
>>  	 */
>>  	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
>> -	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
>> +	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
>> +		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
>>  	vq->vq_avail_idx++;
>>  }
>>  
>

Michael S. Tsirkin April 24, 2016, 9:15 a.m. UTC | #2

On Sun, Apr 24, 2016 at 02:45:22AM +0000, Xie, Huawei wrote:
> Forget to cc the mailing list.
> 
> On 4/22/2016 9:53 PM, Xie, Huawei wrote:
> > Hi:
> >
> > This is a series of virtio/vhost idx/ring update optimizations for cache
> > to cache transfer. Actually I don't expect many of them as virtio/vhost
> > has already done quite right.

Hmm - is it a series or a single patch?

> > For this patch, in a VM2VM test, i observed ~6% performance increase.

Interesting. In that case, it seems likely that new ring layout
would give you an even bigger performance gain.
Could you take a look at tools/virtio/ringtest/ring.c
in latest Linux and tell me what do you think?
In particular, I know you looked at using vectored instructions
to do ring updates - would the layout in tools/virtio/ringtest/ring.c
interfere with that?

> > In VM1, run testpmd with txonly mode
> > In VM2, run testpmd with rxonly mode
> > In host, run testpmd(with two vhostpmds) with io forward
> >
> > Michael:
> > We have talked about this method when i tried the fixed ring.
> >
> >
> > On 4/22/2016 5:12 PM, Xie, Huawei wrote:
> >> eliminate unnecessary cache to cache transfer between virtio and vhost
> >> core

Yes I remember proposing this, but you probably should include the
explanation about why this works in he commit log:

- pre-format avail ring with expected descriptor index values
- as long as entries are consumed in-order, there's no
  need to modify the avail ring
- as long as avail ring is not modified, it can be
  valid in caches of both consumer and producer


> >>
> >> ---
> >>  drivers/net/virtio/virtqueue.h | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> >> index 4e9239e..8c46a83 100644
> >> --- a/drivers/net/virtio/virtqueue.h
> >> +++ b/drivers/net/virtio/virtqueue.h
> >> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
> >>  	 * descriptor.
> >>  	 */
> >>  	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
> >> -	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
> >> +	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
> >> +		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
> >>  	vq->vq_avail_idx++;
> >>  }
> >>  
> >
>

Huawei Xie April 24, 2016, 1:23 p.m. UTC | #3

On 4/24/2016 5:15 PM, Michael S. Tsirkin wrote:
> On Sun, Apr 24, 2016 at 02:45:22AM +0000, Xie, Huawei wrote:
>> Forget to cc the mailing list.
>>
>> On 4/22/2016 9:53 PM, Xie, Huawei wrote:
>>> Hi:
>>>
>>> This is a series of virtio/vhost idx/ring update optimizations for cache
>>> to cache transfer. Actually I don't expect many of them as virtio/vhost
>>> has already done quite right.
> Hmm - is it a series or a single patch?

Others in my mind is caching a copy of avail index in vhost. Use the
cached copy if there are enough slots and then sync with the index in
the ring.
Haven't evaluated that idea.

>>> For this patch, in a VM2VM test, i observed ~6% performance increase.
> Interesting. In that case, it seems likely that new ring layout
> would give you an even bigger performance gain.
> Could you take a look at tools/virtio/ringtest/ring.c
> in latest Linux and tell me what do you think?
> In particular, I know you looked at using vectored instructions
> to do ring updates - would the layout in tools/virtio/ringtest/ring.c
> interfere with that?

Thanks. Would check. You know i have ever tried fixing avail ring in the
virtio driver. In purely vhost->virtio test, it could have 2~3 times
performance boost, but it isn't that obvious if with the physical nic
involved, investigating that issue.
I had planned to sync with you whether it is generic enough that we
could have a negotiated feature, either for in order descriptor
processing or like fixed avail ring. Would check your new ring layout.


>
>>> In VM1, run testpmd with txonly mode
>>> In VM2, run testpmd with rxonly mode
>>> In host, run testpmd(with two vhostpmds) with io forward
>>>
>>> Michael:
>>> We have talked about this method when i tried the fixed ring.
>>>
>>>
>>> On 4/22/2016 5:12 PM, Xie, Huawei wrote:
>>>> eliminate unnecessary cache to cache transfer between virtio and vhost
>>>> core
> Yes I remember proposing this, but you probably should include the
> explanation about why this works in he commit log:
>
> - pre-format avail ring with expected descriptor index values
> - as long as entries are consumed in-order, there's no
>   need to modify the avail ring
> - as long as avail ring is not modified, it can be
>   valid in caches of both consumer and producer

Yes, would add the explanation in the formal patch.


>
>>>> ---
>>>>  drivers/net/virtio/virtqueue.h | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
>>>> index 4e9239e..8c46a83 100644
>>>> --- a/drivers/net/virtio/virtqueue.h
>>>> +++ b/drivers/net/virtio/virtqueue.h
>>>> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
>>>>  	 * descriptor.
>>>>  	 */
>>>>  	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
>>>> -	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
>>>> +	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
>>>> +		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
>>>>  	vq->vq_avail_idx++;
>>>>  }
>>>>

Huawei Xie April 28, 2016, 9:17 a.m. UTC | #4

On 4/24/2016 9:24 PM, Xie, Huawei wrote:
> On 4/24/2016 5:15 PM, Michael S. Tsirkin wrote:
>> On Sun, Apr 24, 2016 at 02:45:22AM +0000, Xie, Huawei wrote:
>>> Forget to cc the mailing list.
>>>
>>> On 4/22/2016 9:53 PM, Xie, Huawei wrote:
>>>> Hi:
>>>>
>>>> This is a series of virtio/vhost idx/ring update optimizations for cache
>>>> to cache transfer. Actually I don't expect many of them as virtio/vhost
>>>> has already done quite right.
>> Hmm - is it a series or a single patch?
> Others in my mind is caching a copy of avail index in vhost. Use the
> cached copy if there are enough slots and then sync with the index in
> the ring.
> Haven't evaluated that idea.

Tried cached vhost idx which gives a slight better perf, but will hold
this patch, as i guess we couldn't blindly set this cached avail idx to
0, which might cause issue in live migration.

>
>>>> For this patch, in a VM2VM test, i observed ~6% performance increase.
>> Interesting. In that case, it seems likely that new ring layout
>> would give you an even bigger performance gain.
>> Could you take a look at tools/virtio/ringtest/ring.c
>> in latest Linux and tell me what do you think?
>> In particular, I know you looked at using vectored instructions
>> to do ring updates - would the layout in tools/virtio/ringtest/ring.c
>> interfere with that?
> Thanks. Would check. You know i have ever tried fixing avail ring in the
> virtio driver. In purely vhost->virtio test, it could have 2~3 times
> performance boost, but it isn't that obvious if with the physical nic
> involved, investigating that issue.
> I had planned to sync with you whether it is generic enough that we
> could have a negotiated feature, either for in order descriptor
> processing or like fixed avail ring. Would check your new ring layout.
>
>
>>>> In VM1, run testpmd with txonly mode
>>>> In VM2, run testpmd with rxonly mode
>>>> In host, run testpmd(with two vhostpmds) with io forward
>>>>
>>>> Michael:
>>>> We have talked about this method when i tried the fixed ring.
>>>>
>>>>
>>>> On 4/22/2016 5:12 PM, Xie, Huawei wrote:
>>>>> eliminate unnecessary cache to cache transfer between virtio and vhost
>>>>> core
>> Yes I remember proposing this, but you probably should include the
>> explanation about why this works in he commit log:
>>
>> - pre-format avail ring with expected descriptor index values
>> - as long as entries are consumed in-order, there's no
>>   need to modify the avail ring
>> - as long as avail ring is not modified, it can be
>>   valid in caches of both consumer and producer
> Yes, would add the explanation in the formal patch.
>
>
>>>>> ---
>>>>>  drivers/net/virtio/virtqueue.h | 3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
>>>>> index 4e9239e..8c46a83 100644
>>>>> --- a/drivers/net/virtio/virtqueue.h
>>>>> +++ b/drivers/net/virtio/virtqueue.h
>>>>> @@ -302,7 +302,8 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
>>>>>  	 * descriptor.
>>>>>  	 */
>>>>>  	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
>>>>> -	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
>>>>> +	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
>>>>> +		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
>>>>>  	vq->vq_avail_idx++;
>>>>>  }
>>>>>  
>

diff mbox

Patch

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 4e9239e..8c46a83 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -302,7 +302,8 @@  vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
 	 * descriptor.
 	 */
 	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
-	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
 	vq->vq_avail_idx++;
 }