[dpdk-dev,03/21] vhost: protect virtio_net device struct
Checks
Commit Message
On 09/06/2017 09:30 AM, Tiwei Bie wrote:
> On Wed, Sep 06, 2017 at 09:15:47AM +0200, Maxime Coquelin wrote:
>> Hi Tiwei,
>>
>> On 09/06/2017 03:15 AM, Tiwei Bie wrote:
>>> On Tue, Sep 05, 2017 at 01:00:42PM +0200, Maxime Coquelin wrote:
>>>> On 09/05/2017 12:07 PM, Tiwei Bie wrote:
>>>>> On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote:
>>>>>> On 09/05/2017 06:45 AM, Tiwei Bie wrote:
>>>>>>> On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:
>>>>>>>> virtio_net device might be accessed while being reallocated
>>>>>>>> in case of NUMA awareness. This case might be theoretical,
>>>>>>>> but it will be needed anyway to protect vrings pages against
>>>>>>>> invalidation.
>>>>>>>>
>>>>>>>> The virtio_net devs are now protected with a readers/writers
>>>>>>>> lock, so that before reallocating the device, it is ensured
>>>>>>>> that it is not being referenced by the processing threads.
>>>>>>>>
>>>>>>> [...]
>>>>>>>> +struct virtio_net *
>>>>>>>> +get_device(int vid)
>>>>>>>> +{
>>>>>>>> + struct virtio_net *dev;
>>>>>>>> +
>>>>>>>> + rte_rwlock_read_lock(&vhost_devices[vid].lock);
>>>>>>>> +
>>>>>>>> + dev = __get_device(vid);
>>>>>>>> + if (unlikely(!dev))
>>>>>>>> + rte_rwlock_read_unlock(&vhost_devices[vid].lock);
>>>>>>>> +
>>>>>>>> + return dev;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void
>>>>>>>> +put_device(int vid)
>>>>>>>> +{
>>>>>>>> + rte_rwlock_read_unlock(&vhost_devices[vid].lock);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>
>>>>>>> This patch introduced a per-device rwlock which needs to be acquired
>>>>>>> unconditionally in the data path. So for each vhost device, the IO
>>>>>>> threads of different queues will need to acquire/release this lock
>>>>>>> during each enqueue and dequeue operation, which will cause cache
>>>>>>> contention when multiple queues are enabled and handled by different
>>>>>>> cores. With this patch alone, I saw ~7% performance drop when enabling
>>>>>>> 6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
>>>>>>> introducing this lock to the data path?
>>>>>>
>>>>>> First, I'd like to thank you for running the MQ test.
>>>>>> I agree it may have a performance impact in this case.
>>>>>>
>>>>>> This lock has currently two purposes:
>>>>>> 1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
>>>>>> 2. Protect vring pages against invalidation.
>>>>>>
>>>>>> For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
>>>>>> case in my early prototypes that had per device IOTLB cache).
>>>>>>
>>>>>> For 1., this is an existing problem, so we might consider it is
>>>>>> acceptable to keep current state. Maybe it could be improved by only
>>>>>> reallocating in case VQ0 is not on the right NUMA node, the other VQs
>>>>>> not being initialized at this point.
>>>>>>
>>>>>> If we do this we might be able to get rid of this lock, I need some more
>>>>>> time though to ensure I'm not missing something.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>
>>>>> Cool. So it's possible that the lock in the data path will be
>>>>> acquired only when the IOMMU feature is enabled. It will be
>>>>> great!
>>>>>
>>>>> Besides, I just did a very simple MQ test to verify my thoughts.
>>>>> Lei (CC'ed in this mail) may do a thorough performance test for
>>>>> this patch set to evaluate the performance impacts.
>>>>
>>>> I'll try to post v2 this week including the proposed change.
>>>> Maybe it'll be better Lei waits for the v2.
>>>>
>>>
>>> Cool. Sure. Thank you! :)
>>
>> I have done the changes, you can find the v2 on my gitlab repo:
>> https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_v2
>>
>> I'm testing it right now, but if you'd like to run some early benchmark
>> before I post the series, there it is!
>>
>
> Got it. Thanks! :)
Just to let you know that I have updated my branch to remove another
regression with iommu=off by inlining the noiommu part of
vhost_iova_to_vva call (See below for the patch, that is squashed into
my branch).
Without this, when running microbenchmarks (txonly, rxonly, ...) I
noticed a 4% perf degradation.
I think I'll have to post the series without testing PVP, because I had
to change the machine I use as packet generator, and now I have X710
NICs that seems to be unsupported with Moongen :(.
I have been advised to us TRex instead, but I'll need some time to set
it up...
Regards,
Maxime
ps: Are you coming to Dublin?
> Best regards,
> Tiwei Bie
>
Subject: [PATCH] vhost: inline IOMMU feature check
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
lib/librte_vhost/vhost.c | 5 +----
lib/librte_vhost/vhost.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 5 deletions(-)
int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq);
void vring_invalidate(struct virtio_net *dev, struct vhost_virtqueue *vq);
Comments
On Wed, Sep 06, 2017 at 10:02:29PM +0200, Maxime Coquelin wrote:
> On 09/06/2017 09:30 AM, Tiwei Bie wrote:
> > On Wed, Sep 06, 2017 at 09:15:47AM +0200, Maxime Coquelin wrote:
> > > Hi Tiwei,
> > >
> > > On 09/06/2017 03:15 AM, Tiwei Bie wrote:
> > > > On Tue, Sep 05, 2017 at 01:00:42PM +0200, Maxime Coquelin wrote:
> > > > > On 09/05/2017 12:07 PM, Tiwei Bie wrote:
> > > > > > On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote:
> > > > > > > On 09/05/2017 06:45 AM, Tiwei Bie wrote:
> > > > > > > > On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:
> > > > > > > > > virtio_net device might be accessed while being reallocated
> > > > > > > > > in case of NUMA awareness. This case might be theoretical,
> > > > > > > > > but it will be needed anyway to protect vrings pages against
> > > > > > > > > invalidation.
> > > > > > > > >
> > > > > > > > > The virtio_net devs are now protected with a readers/writers
> > > > > > > > > lock, so that before reallocating the device, it is ensured
> > > > > > > > > that it is not being referenced by the processing threads.
> > > > > > > > >
> > > > > > > > [...]
> > > > > > > > > +struct virtio_net *
> > > > > > > > > +get_device(int vid)
> > > > > > > > > +{
> > > > > > > > > + struct virtio_net *dev;
> > > > > > > > > +
> > > > > > > > > + rte_rwlock_read_lock(&vhost_devices[vid].lock);
> > > > > > > > > +
> > > > > > > > > + dev = __get_device(vid);
> > > > > > > > > + if (unlikely(!dev))
> > > > > > > > > + rte_rwlock_read_unlock(&vhost_devices[vid].lock);
> > > > > > > > > +
> > > > > > > > > + return dev;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +void
> > > > > > > > > +put_device(int vid)
> > > > > > > > > +{
> > > > > > > > > + rte_rwlock_read_unlock(&vhost_devices[vid].lock);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > >
> > > > > > > > This patch introduced a per-device rwlock which needs to be acquired
> > > > > > > > unconditionally in the data path. So for each vhost device, the IO
> > > > > > > > threads of different queues will need to acquire/release this lock
> > > > > > > > during each enqueue and dequeue operation, which will cause cache
> > > > > > > > contention when multiple queues are enabled and handled by different
> > > > > > > > cores. With this patch alone, I saw ~7% performance drop when enabling
> > > > > > > > 6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
> > > > > > > > introducing this lock to the data path?
> > > > > > >
> > > > > > > First, I'd like to thank you for running the MQ test.
> > > > > > > I agree it may have a performance impact in this case.
> > > > > > >
> > > > > > > This lock has currently two purposes:
> > > > > > > 1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
> > > > > > > 2. Protect vring pages against invalidation.
> > > > > > >
> > > > > > > For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
> > > > > > > case in my early prototypes that had per device IOTLB cache).
> > > > > > >
> > > > > > > For 1., this is an existing problem, so we might consider it is
> > > > > > > acceptable to keep current state. Maybe it could be improved by only
> > > > > > > reallocating in case VQ0 is not on the right NUMA node, the other VQs
> > > > > > > not being initialized at this point.
> > > > > > >
> > > > > > > If we do this we might be able to get rid of this lock, I need some more
> > > > > > > time though to ensure I'm not missing something.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > >
> > > > > > Cool. So it's possible that the lock in the data path will be
> > > > > > acquired only when the IOMMU feature is enabled. It will be
> > > > > > great!
> > > > > >
> > > > > > Besides, I just did a very simple MQ test to verify my thoughts.
> > > > > > Lei (CC'ed in this mail) may do a thorough performance test for
> > > > > > this patch set to evaluate the performance impacts.
> > > > >
> > > > > I'll try to post v2 this week including the proposed change.
> > > > > Maybe it'll be better Lei waits for the v2.
> > > > >
> > > >
> > > > Cool. Sure. Thank you! :)
> > >
> > > I have done the changes, you can find the v2 on my gitlab repo:
> > > https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_v2
> > >
> > > I'm testing it right now, but if you'd like to run some early benchmark
> > > before I post the series, there it is!
> > >
> >
> > Got it. Thanks! :)
>
> Just to let you know that I have updated my branch to remove another
> regression with iommu=off by inlining the noiommu part of
> vhost_iova_to_vva call (See below for the patch, that is squashed into
> my branch).
>
> Without this, when running microbenchmarks (txonly, rxonly, ...) I
> noticed a 4% perf degradation.
>
Nice work!
Best regards,
Tiwei Bie
@@ -55,7 +55,7 @@
struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
-uint64_t vhost_iova_to_vva(struct virtio_net *dev, struct
vhost_virtqueue *vq,
+uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct
vhost_virtqueue *vq,
uint64_t iova, uint64_t size, uint8_t perm)
{
uint64_t vva, tmp_size;
@@ -63,9 +63,6 @@ uint64_t vhost_iova_to_vva(struct virtio_net *dev,
struct vhost_virtqueue *vq,
if (unlikely(!size))
return 0;
- if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
- return rte_vhost_gpa_to_vva(dev->mem, iova);
-
tmp_size = size;
vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
@@ -355,8 +355,18 @@ struct vhost_device_ops const
*vhost_driver_callback_get(const char *path);
*/
void vhost_backend_cleanup(struct virtio_net *dev);
-uint64_t vhost_iova_to_vva(struct virtio_net *dev, struct
vhost_virtqueue *vq,
+uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct
vhost_virtqueue *vq,
uint64_t iova, uint64_t size, uint8_t perm);
+
+static __rte_always_inline uint64_t
+vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+ uint64_t iova, uint64_t size, uint8_t perm)
+{
+ if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
+ return rte_vhost_gpa_to_vva(dev->mem, iova);
+
+ return __vhost_iova_to_vva(dev, vq, iova, size, perm);
+}