[dpdk-dev,RFC,v2,08/12] lib/librte_vhost: vhost-user support

Message ID 549104F7.20906@igel.co.jp (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Tetsuya Mukawa Dec. 17, 2014, 4:22 a.m. UTC
  (2014/12/17 12:31), Tetsuya Mukawa wrote:
> (2014/12/17 10:06), Xie, Huawei wrote:
>>>> +{
>>>> +	struct virtio_net *dev = get_device(ctx);
>>>> +
>>>> +	/* We have to stop the queue (virtio) if it is running. */
>>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
>>>> +		notify_ops->destroy_device(dev);
>>> I have an one concern about finalization of vrings.
>>> Can vhost-backend stop accessing RX/TX to the vring before replying to
>>> this message?
>>>
>>> QEMU sends this message when virtio-net device is finalized by
>>> virtio-net driver on the guest.
>>> After finalization, memories used by the vring will be freed by
>>> virtio-net driver, because these memories are allocated by virtio-net
>>> driver.
>>> Because of this, I guess vhost-backend must stop accessing to vring
>>> before replying to this message.
>>>
>>> I am not sure what is a good way to stop accessing.
>>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
>>> and rte_vhost_enqueue_burst() is called.
>>> Anyway we probably need to wait for stopping access before replying.
>>>
>>> Thanks,
>>> Tetsuya
>>>
>> I think we have discussed the similar question.
> Sorry, probably I might not be able to got your point correctly at the
> former email.
>
>> It is actually the same issue whether the virtio APP in guest is crashed, or is finalized.
> I guess when the APP is finalized correctly, we can have a solution.
> Could you please read comment I wrote later?
>
>> The virtio APP will only write to the STATUS register without waiting/syncing to vhost backend.
> Yes, virtio APP only write to the STATUS register. I agree with it.
>
> When the register is written by guest, KVM will catch it, and the
> context will be change to QEMU. And QEMU will works like below.
> (Also while doing following steps, guest is waiting because the context
> is in QEMU)
>
> Could you please see below with latest QEMU code?
> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
> wait for replying of this function.
> 2. virtio_set_status() [hw/virtio/virtio.c]
> 3. virtio_net_set_status() [hw/net/virtio-net.c]
> 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> 5. vhost_net_stop() [hw/net/vhost_net.c]
> 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> 7. vhost_dev_stop() [hw/virtio/vhost.c]
> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> 9. vhost_user_call() [virtio/vhost-user.c]
> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> for backend reply.
>
> When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> APP is stopped.
> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
> is synchronous message.
> Because of above, I guess virtio APP can wait for vhost-backend
> finalization.
>
>> After that, not only the guest memory pointed by vring entry but also the vring itself isn't usable any more.
>> The memory for vring or pointed by vring entry might be used by other APPs.
> I agree.
>
>> This will crash guest(rather than the vhost, do you agree?).
> I guess we cannot assume how the freed memory is used.
> In some cases, a new APP still works, but vhost backend can access
> inconsistency vring structure.
> In the case vhost backend could receive illegal packets.
> For example, avail_idx value might be changed to be 0xFFFF by a new APP.
> (I am not sure RX/TX functions can handle such a value correctly)
>
> Anyway, my point is if we can finalize vhost backend correctly, we only
> need to take care of crashing case.
> (If so, it's very nice :))
> So let's make sure whether virtio APP can wait for finalization, or not.
> I am thinking how to do it now.
>

I added sleep() like below.


When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
takes 10 seconds to be finished.
So we can assume that virtio APP is able to wait for finalization of
vhost backend.

Thanks,
Tetsuya

>> If you mean this issue, I think we have no solution but one walk around: keep the huge page files of crashed app, and 
>> bind virtio to igb_uio and then delete the huge page files.
> Yes I agree.
> If the virtio APP is crashed, this will be a solution.
>
> Thanks,
> Tetsuya
>
>> In our implementation, when vhost sends the message,  we will call the destroy_device provided by the vSwitch to ask the
>> vSwitch to stop processing the vring, but this willn't solve the issue I mention above, because the virtio APP in guest will n't 
>> wait us.
>>
>> Could you explain a bit more? Is it the same issue?
>>
>>
>> -huawei
>>
>>
>>
>
  

Comments

Huawei Xie Dec. 17, 2014, 5:31 p.m. UTC | #1
> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Tuesday, December 16, 2014 9:22 PM
> To: Xie, Huawei; dev@dpdk.org
> Cc: Linhaifeng (haifeng.lin@huawei.com)
> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
> 
> (2014/12/17 12:31), Tetsuya Mukawa wrote:
> > (2014/12/17 10:06), Xie, Huawei wrote:
> >>>> +{
> >>>> +	struct virtio_net *dev = get_device(ctx);
> >>>> +
> >>>> +	/* We have to stop the queue (virtio) if it is running. */
> >>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
> >>>> +		notify_ops->destroy_device(dev);
> >>> I have an one concern about finalization of vrings.
> >>> Can vhost-backend stop accessing RX/TX to the vring before replying to
> >>> this message?
> >>>
> >>> QEMU sends this message when virtio-net device is finalized by
> >>> virtio-net driver on the guest.
> >>> After finalization, memories used by the vring will be freed by
> >>> virtio-net driver, because these memories are allocated by virtio-net
> >>> driver.
> >>> Because of this, I guess vhost-backend must stop accessing to vring
> >>> before replying to this message.
> >>>
> >>> I am not sure what is a good way to stop accessing.
> >>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
> >>> and rte_vhost_enqueue_burst() is called.
> >>> Anyway we probably need to wait for stopping access before replying.
> >>>
> >>> Thanks,
> >>> Tetsuya
> >>>
> >> I think we have discussed the similar question.
> > Sorry, probably I might not be able to got your point correctly at the
> > former email.
> >
> >> It is actually the same issue whether the virtio APP in guest is crashed, or is
> finalized.
> > I guess when the APP is finalized correctly, we can have a solution.
> > Could you please read comment I wrote later?
> >
> >> The virtio APP will only write to the STATUS register without waiting/syncing
> to vhost backend.
> > Yes, virtio APP only write to the STATUS register. I agree with it.
> >
> > When the register is written by guest, KVM will catch it, and the
> > context will be change to QEMU. And QEMU will works like below.
> > (Also while doing following steps, guest is waiting because the context
> > is in QEMU)
> >
> > Could you please see below with latest QEMU code?
> > 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
> > wait for replying of this function.
> > 2. virtio_set_status() [hw/virtio/virtio.c]
> > 3. virtio_net_set_status() [hw/net/virtio-net.c]
> > 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> > 5. vhost_net_stop() [hw/net/vhost_net.c]
> > 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> > 7. vhost_dev_stop() [hw/virtio/vhost.c]
> > 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> > 9. vhost_user_call() [virtio/vhost-user.c]
> > 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> > for backend reply.
> >
> > When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> > APP is stopped.
> > Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
> > is synchronous message.
> > Because of above, I guess virtio APP can wait for vhost-backend
> > finalization.
> >
> >> After that, not only the guest memory pointed by vring entry but also the
> vring itself isn't usable any more.
> >> The memory for vring or pointed by vring entry might be used by other APPs.
> > I agree.
> >
> >> This will crash guest(rather than the vhost, do you agree?).
> > I guess we cannot assume how the freed memory is used.
> > In some cases, a new APP still works, but vhost backend can access
> > inconsistency vring structure.
> > In the case vhost backend could receive illegal packets.
> > For example, avail_idx value might be changed to be 0xFFFF by a new APP.
> > (I am not sure RX/TX functions can handle such a value correctly)

Yes, I fully understand your concern and this is a very good point.
my point is, in theory, a well written vhost backend is either able to detect the error,
or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this value blindly, it will 
1. for tx ring, receive illegal packets. This is ok.
2. for rx ring, dma to guest memory before the availd_idx, which will crash guest virtual machine.
In current implementation, if there is chance our vhost backend aborts itself in error handling, that is incorrect. We
need to check our code if there is such case.

> >
> > Anyway, my point is if we can finalize vhost backend correctly, we only
> > need to take care of crashing case.
> > (If so, it's very nice :))
> > So let's make sure whether virtio APP can wait for finalization, or not.
> > I am thinking how to do it now.

Sorry for the confuse. 
The STATUS write must be synchronized with qemu.
The vcpu thread for the virtio APP willn't continue until qemu  finishes the simulation
and resumes the vcpu thread.

In the RFC patch, the message handler will first call destroy_device provided by vSwitch
which will cause the vSwitch stop processing this virtio_device, then handler will send the reply.
Is there an issue with the RFC patch?

> >
> 
> I added sleep() like below.
> 
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
> uint32_t addr, uint32_t val)
>              virtio_pci_stop_ioeventfd(proxy);
>          }
> 
>          virtio_set_status(vdev, val & 0xFF);
> +        if (val == 0)
> +            sleep(10);
Yes, the register simulation for virtio device must be synching operation. :).
> 
>          if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>              virtio_pci_start_ioeventfd(proxy);
> 
> When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
> takes 10 seconds to be finished.
> So we can assume that virtio APP is able to wait for finalization of
> vhost backend.
> 
> Thanks,
> Tetsuya
> 
> >> If you mean this issue, I think we have no solution but one walk around: keep
> the huge page files of crashed app, and
> >> bind virtio to igb_uio and then delete the huge page files.
> > Yes I agree.
> > If the virtio APP is crashed, this will be a solution.
> >
> > Thanks,
> > Tetsuya
> >
> >> In our implementation, when vhost sends the message,  we will call the
> destroy_device provided by the vSwitch to ask the
> >> vSwitch to stop processing the vring, but this willn't solve the issue I mention
> above, because the virtio APP in guest will n't
> >> wait us.
> >>
> >> Could you explain a bit more? Is it the same issue?
> >>
> >>
> >> -huawei
> >>
> >>
> >>
> >
>
  
Tetsuya Mukawa Dec. 19, 2014, 3:36 a.m. UTC | #2
(2014/12/18 2:31), Xie, Huawei wrote:
>
>> -----Original Message-----
>> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
>> Sent: Tuesday, December 16, 2014 9:22 PM
>> To: Xie, Huawei; dev@dpdk.org
>> Cc: Linhaifeng (haifeng.lin@huawei.com)
>> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
>>
>> (2014/12/17 12:31), Tetsuya Mukawa wrote:
>>> (2014/12/17 10:06), Xie, Huawei wrote:
>>>>>> +{
>>>>>> +	struct virtio_net *dev = get_device(ctx);
>>>>>> +
>>>>>> +	/* We have to stop the queue (virtio) if it is running. */
>>>>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
>>>>>> +		notify_ops->destroy_device(dev);
>>>>> I have an one concern about finalization of vrings.
>>>>> Can vhost-backend stop accessing RX/TX to the vring before replying to
>>>>> this message?
>>>>>
>>>>> QEMU sends this message when virtio-net device is finalized by
>>>>> virtio-net driver on the guest.
>>>>> After finalization, memories used by the vring will be freed by
>>>>> virtio-net driver, because these memories are allocated by virtio-net
>>>>> driver.
>>>>> Because of this, I guess vhost-backend must stop accessing to vring
>>>>> before replying to this message.
>>>>>
>>>>> I am not sure what is a good way to stop accessing.
>>>>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
>>>>> and rte_vhost_enqueue_burst() is called.
>>>>> Anyway we probably need to wait for stopping access before replying.
>>>>>
>>>>> Thanks,
>>>>> Tetsuya
>>>>>
>>>> I think we have discussed the similar question.
>>> Sorry, probably I might not be able to got your point correctly at the
>>> former email.
>>>
>>>> It is actually the same issue whether the virtio APP in guest is crashed, or is
>> finalized.
>>> I guess when the APP is finalized correctly, we can have a solution.
>>> Could you please read comment I wrote later?
>>>
>>>> The virtio APP will only write to the STATUS register without waiting/syncing
>> to vhost backend.
>>> Yes, virtio APP only write to the STATUS register. I agree with it.
>>>
>>> When the register is written by guest, KVM will catch it, and the
>>> context will be change to QEMU. And QEMU will works like below.
>>> (Also while doing following steps, guest is waiting because the context
>>> is in QEMU)
>>>
>>> Could you please see below with latest QEMU code?
>>> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
>>> wait for replying of this function.
>>> 2. virtio_set_status() [hw/virtio/virtio.c]
>>> 3. virtio_net_set_status() [hw/net/virtio-net.c]
>>> 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
>>> 5. vhost_net_stop() [hw/net/vhost_net.c]
>>> 6. vhost_net_stop_one() [hw/net/vhost_net.c]
>>> 7. vhost_dev_stop() [hw/virtio/vhost.c]
>>> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
>>> 9. vhost_user_call() [virtio/vhost-user.c]
>>> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
>>> for backend reply.
>>>
>>> When the vhost-user backend receives GET_VRING_BASE, I guess the guest
>>> APP is stopped.
>>> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
>>> is synchronous message.
>>> Because of above, I guess virtio APP can wait for vhost-backend
>>> finalization.
>>>
>>>> After that, not only the guest memory pointed by vring entry but also the
>> vring itself isn't usable any more.
>>>> The memory for vring or pointed by vring entry might be used by other APPs.
>>> I agree.
>>>
>>>> This will crash guest(rather than the vhost, do you agree?).
>>> I guess we cannot assume how the freed memory is used.
>>> In some cases, a new APP still works, but vhost backend can access
>>> inconsistency vring structure.
>>> In the case vhost backend could receive illegal packets.
>>> For example, avail_idx value might be changed to be 0xFFFF by a new APP.
>>> (I am not sure RX/TX functions can handle such a value correctly)
> Yes, I fully understand your concern and this is a very good point.
> my point is, in theory, a well written vhost backend is either able to detect the error,
> or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
> For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this value blindly, it will 
> 1. for tx ring, receive illegal packets. This is ok.
> 2. for rx ring, dma to guest memory before the availd_idx, which will crash guest virtual machine.
> In current implementation, if there is chance our vhost backend aborts itself in error handling, that is incorrect. We
> need to check our code if there is such case.

I agree. I will also check the code.

>
>>> Anyway, my point is if we can finalize vhost backend correctly, we only
>>> need to take care of crashing case.
>>> (If so, it's very nice :))
>>> So let's make sure whether virtio APP can wait for finalization, or not.
>>> I am thinking how to do it now.
> Sorry for the confuse. 
> The STATUS write must be synchronized with qemu.
> The vcpu thread for the virtio APP willn't continue until qemu  finishes the simulation
> and resumes the vcpu thread.
>
> In the RFC patch, the message handler will first call destroy_device provided by vSwitch
> which will cause the vSwitch stop processing this virtio_device, then handler will send the reply.
> Is there an issue with the RFC patch?

Thanks. I had a miss understanding for destroy_device handler.
I agree with your implementation. It should work correctly.

Thanks,
Tetsuya

>> I added sleep() like below.
>>
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
>> uint32_t addr, uint32_t val)
>>              virtio_pci_stop_ioeventfd(proxy);
>>          }
>>
>>          virtio_set_status(vdev, val & 0xFF);
>> +        if (val == 0)
>> +            sleep(10);
> Yes, the register simulation for virtio device must be synching operation. :).
>>          if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>>              virtio_pci_start_ioeventfd(proxy);
>>
>> When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
>> takes 10 seconds to be finished.
>> So we can assume that virtio APP is able to wait for finalization of
>> vhost backend.
>>
>> Thanks,
>> Tetsuya
>>
>>>> If you mean this issue, I think we have no solution but one walk around: keep
>> the huge page files of crashed app, and
>>>> bind virtio to igb_uio and then delete the huge page files.
>>> Yes I agree.
>>> If the virtio APP is crashed, this will be a solution.
>>>
>>> Thanks,
>>> Tetsuya
>>>
>>>> In our implementation, when vhost sends the message,  we will call the
>> destroy_device provided by the vSwitch to ask the
>>>> vSwitch to stop processing the vring, but this willn't solve the issue I mention
>> above, because the virtio APP in guest will n't
>>>> wait us.
>>>>
>>>> Could you explain a bit more? Is it the same issue?
>>>>
>>>>
>>>> -huawei
>>>>
>>>>
>>>>
  

Patch

--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -300,7 +300,10 @@  static void virtio_ioport_write(void *opaque,
uint32_t addr, uint32_t val)
             virtio_pci_stop_ioeventfd(proxy);
         }
 
         virtio_set_status(vdev, val & 0xFF);
+        if (val == 0)
+            sleep(10);
 
         if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
             virtio_pci_start_ioeventfd(proxy);