[dpdk-dev,v5,1/3] vhost: Add callback and private data for vhost PMD

Message ID 1448355603-21275-2-git-send-email-mukawa@igel.co.jp (mailing list archive)
State Changes Requested, archived
Headers

Commit Message

Tetsuya Mukawa Nov. 24, 2015, 9 a.m. UTC
  The vhost PMD will be a wrapper of vhost library, but some of vhost
library APIs cannot be mapped to ethdev library APIs.
Becasue of this, in some cases, we still need to use vhost library APIs
for a port created by the vhost PMD.

Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers. The vhost PMD need to use this
pair of callback handlers to know which virtio devices are connected
actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications cannot have a way to know the events.

This may break legacy DPDK applications that uses vhost library. To prevent
it, this patch adds one more pair of callbacks to vhost library especially
for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they need
additional specific handling for virtio device creation and destruction.

For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_vhost_version.map        |  6 +++
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++---
 lib/librte_vhost/virtio-net.c                 | 60 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  4 +-
 5 files changed, 73 insertions(+), 13 deletions(-)
  

Comments

Yuanhan Liu Dec. 17, 2015, 11:42 a.m. UTC | #1
On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> The vhost PMD will be a wrapper of vhost library, but some of vhost
> library APIs cannot be mapped to ethdev library APIs.
> Becasue of this, in some cases, we still need to use vhost library APIs
> for a port created by the vhost PMD.
> 
> Currently, when virtio device is created and destroyed, vhost library
> will call one of callback handlers. The vhost PMD need to use this
> pair of callback handlers to know which virtio devices are connected
> actually.
> Because we can register only one pair of callbacks to vhost library, if
> the PMD use it, DPDK applications cannot have a way to know the events.
> 
> This may break legacy DPDK applications that uses vhost library. To prevent
> it, this patch adds one more pair of callbacks to vhost library especially
> for the vhost PMD.
> With the patch, legacy applications can use the vhost PMD even if they need
> additional specific handling for virtio device creation and destruction.
> 
> For example, legacy application can call
> rte_vhost_enable_guest_notification() in callbacks to change setting.

TBH, I never liked it since the beginning. Introducing two callbacks
for one event is a bit messy, and therefore error prone.

I have been thinking this occasionally last few weeks, and have came
up something that we may introduce another layer callback based on
the vhost pmd itself, by a new API:

	rte_eth_vhost_register_callback().

And we then call those new callback inside the vhost pmd new_device()
and vhost pmd destroy_device() implementations.

And we could have same callbacks like vhost have, but I'm thinking
that new_device() and destroy_device() doesn't sound like a good name
to a PMD driver. Maybe a name like "link_state_changed" is better?

What do you think of that?


On the other hand, I'm still thinking is that really necessary to let
the application be able to call vhost functions like rte_vhost_enable_guest_notification()
with the vhost PMD driver?

	--yliu
  
Tetsuya Mukawa Dec. 18, 2015, 3:15 a.m. UTC | #2
On 2015/12/17 20:42, Yuanhan Liu wrote:
> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>> library APIs cannot be mapped to ethdev library APIs.
>> Becasue of this, in some cases, we still need to use vhost library APIs
>> for a port created by the vhost PMD.
>>
>> Currently, when virtio device is created and destroyed, vhost library
>> will call one of callback handlers. The vhost PMD need to use this
>> pair of callback handlers to know which virtio devices are connected
>> actually.
>> Because we can register only one pair of callbacks to vhost library, if
>> the PMD use it, DPDK applications cannot have a way to know the events.
>>
>> This may break legacy DPDK applications that uses vhost library. To prevent
>> it, this patch adds one more pair of callbacks to vhost library especially
>> for the vhost PMD.
>> With the patch, legacy applications can use the vhost PMD even if they need
>> additional specific handling for virtio device creation and destruction.
>>
>> For example, legacy application can call
>> rte_vhost_enable_guest_notification() in callbacks to change setting.
> TBH, I never liked it since the beginning. Introducing two callbacks
> for one event is a bit messy, and therefore error prone.

I agree with you.

> I have been thinking this occasionally last few weeks, and have came
> up something that we may introduce another layer callback based on
> the vhost pmd itself, by a new API:
>
> 	rte_eth_vhost_register_callback().
>
> And we then call those new callback inside the vhost pmd new_device()
> and vhost pmd destroy_device() implementations.
>
> And we could have same callbacks like vhost have, but I'm thinking
> that new_device() and destroy_device() doesn't sound like a good name
> to a PMD driver. Maybe a name like "link_state_changed" is better?
>
> What do you think of that?

Yes,  "link_state_changed" will be good.

BTW, I thought it was ok that an DPDK app that used vhost PMD called
vhost library APIs directly.
But probably you may feel strangeness about it. Is this correct?

If so, how about implementing legacy status interrupt mechanism to vhost
PMD?
For example, an DPDK app can register callback handler like
"examples/link_status_interrupt".

Also, if the app doesn't call vhost library APIs directly,
rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
need to handle virtio device structure anymore.

>
>
> On the other hand, I'm still thinking is that really necessary to let
> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
> with the vhost PMD driver?

Basic concept of my patch is that vhost PMD will provides the features
that vhost library provides.

How about removing rte_vhost_enable_guest_notification() from "vhost
library"?
(I also not sure what are use cases)
If we can do this, vhost PMD also doesn't need to take care of it.
Or if rte_vhost_enable_guest_notification() will be removed in the
future, vhost PMD is able to ignore it.


Please let me correct up my thinking about your questions.
 - Change concept of patch not to call vhost library APIs directly.
These should be wrapped by ethdev APIs.
 - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
 - Implement legacy status changed interrupt to vhost PMD instead of
using own callback mechanism.
 - Check if we can remove rte_vhost_enable_guest_notification() from
vhost library.


Hi Xie,

Do you know the use cases of rte_vhost_enable_guest_notification()?

Thanks,
Tetsuya
  
Tetsuya Mukawa Dec. 18, 2015, 3:36 a.m. UTC | #3
On 2015/12/18 12:15, Tetsuya Mukawa wrote:
> On 2015/12/17 20:42, Yuanhan Liu wrote:
>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>> library APIs cannot be mapped to ethdev library APIs.
>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>> for a port created by the vhost PMD.
>>>
>>> Currently, when virtio device is created and destroyed, vhost library
>>> will call one of callback handlers. The vhost PMD need to use this
>>> pair of callback handlers to know which virtio devices are connected
>>> actually.
>>> Because we can register only one pair of callbacks to vhost library, if
>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>
>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>> it, this patch adds one more pair of callbacks to vhost library especially
>>> for the vhost PMD.
>>> With the patch, legacy applications can use the vhost PMD even if they need
>>> additional specific handling for virtio device creation and destruction.
>>>
>>> For example, legacy application can call
>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>> TBH, I never liked it since the beginning. Introducing two callbacks
>> for one event is a bit messy, and therefore error prone.
> I agree with you.
>
>> I have been thinking this occasionally last few weeks, and have came
>> up something that we may introduce another layer callback based on
>> the vhost pmd itself, by a new API:
>>
>> 	rte_eth_vhost_register_callback().
>>
>> And we then call those new callback inside the vhost pmd new_device()
>> and vhost pmd destroy_device() implementations.
>>
>> And we could have same callbacks like vhost have, but I'm thinking
>> that new_device() and destroy_device() doesn't sound like a good name
>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>
>> What do you think of that?
> Yes,  "link_state_changed" will be good.
>
> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> vhost library APIs directly.
> But probably you may feel strangeness about it. Is this correct?
>
> If so, how about implementing legacy status interrupt mechanism to vhost
> PMD?
> For example, an DPDK app can register callback handler like
> "examples/link_status_interrupt".

Addition:
In this case, we don't need something special pmd_priv field in vhost
library.
vhost PMD will register callbacks.
And in this callbacks, legacy interrupt mechanism will be kicked.
Then user can receive interrupt from ethdev.

Tetsuya

> Also, if the app doesn't call vhost library APIs directly,
> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> need to handle virtio device structure anymore.
>
>>
>> On the other hand, I'm still thinking is that really necessary to let
>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>> with the vhost PMD driver?
> Basic concept of my patch is that vhost PMD will provides the features
> that vhost library provides.
>
> How about removing rte_vhost_enable_guest_notification() from "vhost
> library"?
> (I also not sure what are use cases)
> If we can do this, vhost PMD also doesn't need to take care of it.
> Or if rte_vhost_enable_guest_notification() will be removed in the
> future, vhost PMD is able to ignore it.
>
>
> Please let me correct up my thinking about your questions.
>  - Change concept of patch not to call vhost library APIs directly.
> These should be wrapped by ethdev APIs.
>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>  - Implement legacy status changed interrupt to vhost PMD instead of
> using own callback mechanism.
>  - Check if we can remove rte_vhost_enable_guest_notification() from
> vhost library.
>
>
> Hi Xie,
>
> Do you know the use cases of rte_vhost_enable_guest_notification()?
>
> Thanks,
> Tetsuya
  
Yuanhan Liu Dec. 18, 2015, 4:15 a.m. UTC | #4
On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/17 20:42, Yuanhan Liu wrote:
> > On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> >> The vhost PMD will be a wrapper of vhost library, but some of vhost
> >> library APIs cannot be mapped to ethdev library APIs.
> >> Becasue of this, in some cases, we still need to use vhost library APIs
> >> for a port created by the vhost PMD.
> >>
> >> Currently, when virtio device is created and destroyed, vhost library
> >> will call one of callback handlers. The vhost PMD need to use this
> >> pair of callback handlers to know which virtio devices are connected
> >> actually.
> >> Because we can register only one pair of callbacks to vhost library, if
> >> the PMD use it, DPDK applications cannot have a way to know the events.
> >>
> >> This may break legacy DPDK applications that uses vhost library. To prevent
> >> it, this patch adds one more pair of callbacks to vhost library especially
> >> for the vhost PMD.
> >> With the patch, legacy applications can use the vhost PMD even if they need
> >> additional specific handling for virtio device creation and destruction.
> >>
> >> For example, legacy application can call
> >> rte_vhost_enable_guest_notification() in callbacks to change setting.
> > TBH, I never liked it since the beginning. Introducing two callbacks
> > for one event is a bit messy, and therefore error prone.
> 
> I agree with you.
> 
> > I have been thinking this occasionally last few weeks, and have came
> > up something that we may introduce another layer callback based on
> > the vhost pmd itself, by a new API:
> >
> > 	rte_eth_vhost_register_callback().
> >
> > And we then call those new callback inside the vhost pmd new_device()
> > and vhost pmd destroy_device() implementations.
> >
> > And we could have same callbacks like vhost have, but I'm thinking
> > that new_device() and destroy_device() doesn't sound like a good name
> > to a PMD driver. Maybe a name like "link_state_changed" is better?
> >
> > What do you think of that?
> 
> Yes,  "link_state_changed" will be good.
> 
> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> vhost library APIs directly.
> But probably you may feel strangeness about it. Is this correct?

Unluckily, that's true :)

> 
> If so, how about implementing legacy status interrupt mechanism to vhost
> PMD?
> For example, an DPDK app can register callback handler like
> "examples/link_status_interrupt".
> 
> Also, if the app doesn't call vhost library APIs directly,
> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> need to handle virtio device structure anymore.
> 
> >
> >
> > On the other hand, I'm still thinking is that really necessary to let
> > the application be able to call vhost functions like rte_vhost_enable_guest_notification()
> > with the vhost PMD driver?
> 
> Basic concept of my patch is that vhost PMD will provides the features
> that vhost library provides.

I don't think that's necessary. Let's just treat it as a normal pmd
driver, having nothing to do with vhost library.

> How about removing rte_vhost_enable_guest_notification() from "vhost
> library"?
> (I also not sure what are use cases)
> If we can do this, vhost PMD also doesn't need to take care of it.
> Or if rte_vhost_enable_guest_notification() will be removed in the
> future, vhost PMD is able to ignore it.

You could either call it in vhost-pmd (which you already have done that),
or ignore it in vhost-pmd, but dont' remove it from vhost library.

> 
> Please let me correct up my thinking about your questions.
>  - Change concept of patch not to call vhost library APIs directly.
> These should be wrapped by ethdev APIs.
>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>  - Implement legacy status changed interrupt to vhost PMD instead of
> using own callback mechanism.
>  - Check if we can remove rte_vhost_enable_guest_notification() from
> vhost library.

So, how about making it __fare__ simple as the first step, to get merged
easily, that we don't assume the applications will call any vhost library
functions any more, so that we don't need the callback, and we don't need
the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
I don't see so far).

Tetsuya, what do you think of that then?

> 
> Hi Xie,
> 
> Do you know the use cases of rte_vhost_enable_guest_notification()?

Setting the arg to 0 avoids the guest kicking the virtqueue, which
is good for performance, and we should keep it.

	--yliu
  
Tetsuya Mukawa Dec. 18, 2015, 4:28 a.m. UTC | #5
On 2015/12/18 13:15, Yuanhan Liu wrote:
> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>> library APIs cannot be mapped to ethdev library APIs.
>>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>>> for a port created by the vhost PMD.
>>>>
>>>> Currently, when virtio device is created and destroyed, vhost library
>>>> will call one of callback handlers. The vhost PMD need to use this
>>>> pair of callback handlers to know which virtio devices are connected
>>>> actually.
>>>> Because we can register only one pair of callbacks to vhost library, if
>>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>>
>>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>>> it, this patch adds one more pair of callbacks to vhost library especially
>>>> for the vhost PMD.
>>>> With the patch, legacy applications can use the vhost PMD even if they need
>>>> additional specific handling for virtio device creation and destruction.
>>>>
>>>> For example, legacy application can call
>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>> for one event is a bit messy, and therefore error prone.
>> I agree with you.
>>
>>> I have been thinking this occasionally last few weeks, and have came
>>> up something that we may introduce another layer callback based on
>>> the vhost pmd itself, by a new API:
>>>
>>> 	rte_eth_vhost_register_callback().
>>>
>>> And we then call those new callback inside the vhost pmd new_device()
>>> and vhost pmd destroy_device() implementations.
>>>
>>> And we could have same callbacks like vhost have, but I'm thinking
>>> that new_device() and destroy_device() doesn't sound like a good name
>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>
>>> What do you think of that?
>> Yes,  "link_state_changed" will be good.
>>
>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>> vhost library APIs directly.
>> But probably you may feel strangeness about it. Is this correct?
> Unluckily, that's true :)
>
>> If so, how about implementing legacy status interrupt mechanism to vhost
>> PMD?
>> For example, an DPDK app can register callback handler like
>> "examples/link_status_interrupt".
>>
>> Also, if the app doesn't call vhost library APIs directly,
>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>> need to handle virtio device structure anymore.
>>
>>>
>>> On the other hand, I'm still thinking is that really necessary to let
>>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>>> with the vhost PMD driver?
>> Basic concept of my patch is that vhost PMD will provides the features
>> that vhost library provides.
> I don't think that's necessary. Let's just treat it as a normal pmd
> driver, having nothing to do with vhost library.
>
>> How about removing rte_vhost_enable_guest_notification() from "vhost
>> library"?
>> (I also not sure what are use cases)
>> If we can do this, vhost PMD also doesn't need to take care of it.
>> Or if rte_vhost_enable_guest_notification() will be removed in the
>> future, vhost PMD is able to ignore it.
> You could either call it in vhost-pmd (which you already have done that),
> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>
>> Please let me correct up my thinking about your questions.
>>  - Change concept of patch not to call vhost library APIs directly.
>> These should be wrapped by ethdev APIs.
>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>>  - Implement legacy status changed interrupt to vhost PMD instead of
>> using own callback mechanism.
>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>> vhost library.
> So, how about making it __fare__ simple as the first step, to get merged
> easily, that we don't assume the applications will call any vhost library
> functions any more, so that we don't need the callback, and we don't need
> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
> normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
> I don't see so far).
>
> Tetsuya, what do you think of that then?

I agree with you. But will wait a few days.
Because if someone wants to use it from vhost PMD, they probably will
provides use cases.
And if there are no use cases, let's do like above.

Thanks,
Tetsuya
  
Huawei Xie Dec. 18, 2015, 10:03 a.m. UTC | #6
On 12/18/2015 12:15 PM, Yuanhan Liu wrote:
> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>> library APIs cannot be mapped to ethdev library APIs.
>>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>>> for a port created by the vhost PMD.
>>>>
>>>> Currently, when virtio device is created and destroyed, vhost library
>>>> will call one of callback handlers. The vhost PMD need to use this
>>>> pair of callback handlers to know which virtio devices are connected
>>>> actually.
>>>> Because we can register only one pair of callbacks to vhost library, if
>>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>>
>>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>>> it, this patch adds one more pair of callbacks to vhost library especially
>>>> for the vhost PMD.
>>>> With the patch, legacy applications can use the vhost PMD even if they need
>>>> additional specific handling for virtio device creation and destruction.
>>>>
>>>> For example, legacy application can call
>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>> for one event is a bit messy, and therefore error prone.
>> I agree with you.
>>
>>> I have been thinking this occasionally last few weeks, and have came
>>> up something that we may introduce another layer callback based on
>>> the vhost pmd itself, by a new API:
>>>
>>> 	rte_eth_vhost_register_callback().
>>>
>>> And we then call those new callback inside the vhost pmd new_device()
>>> and vhost pmd destroy_device() implementations.
>>>
>>> And we could have same callbacks like vhost have, but I'm thinking
>>> that new_device() and destroy_device() doesn't sound like a good name
>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>
>>> What do you think of that?
>> Yes,  "link_state_changed" will be good.
>>
>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>> vhost library APIs directly.
>> But probably you may feel strangeness about it. Is this correct?
> Unluckily, that's true :)
>
>> If so, how about implementing legacy status interrupt mechanism to vhost
>> PMD?
>> For example, an DPDK app can register callback handler like
>> "examples/link_status_interrupt".
>>
>> Also, if the app doesn't call vhost library APIs directly,
>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>> need to handle virtio device structure anymore.
>>
>>>
>>> On the other hand, I'm still thinking is that really necessary to let
>>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>>> with the vhost PMD driver?
>> Basic concept of my patch is that vhost PMD will provides the features
>> that vhost library provides.
> I don't think that's necessary. Let's just treat it as a normal pmd
> driver, having nothing to do with vhost library.
>
>> How about removing rte_vhost_enable_guest_notification() from "vhost
>> library"?
>> (I also not sure what are use cases)
>> If we can do this, vhost PMD also doesn't need to take care of it.
>> Or if rte_vhost_enable_guest_notification() will be removed in the
>> future, vhost PMD is able to ignore it.
> You could either call it in vhost-pmd (which you already have done that),
> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>
>> Please let me correct up my thinking about your questions.
>>  - Change concept of patch not to call vhost library APIs directly.
>> These should be wrapped by ethdev APIs.
>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>>  - Implement legacy status changed interrupt to vhost PMD instead of
>> using own callback mechanism.
>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>> vhost library.
> So, how about making it __fare__ simple as the first step, to get merged
> easily, that we don't assume the applications will call any vhost library
> functions any more, so that we don't need the callback, and we don't need
> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
> normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
> I don't see so far).
>
> Tetsuya, what do you think of that then?
>
>> Hi Xie,
>>
>> Do you know the use cases of rte_vhost_enable_guest_notification()?
If vhost runs in loop mode, it doesn't need to be notified. You have
wrapped vhost as the PMD, which is nice for OVS integration. If we
require that all PMDs could be polled by select/poll, then we could use
this API for vhost PMD, and wait on the kickfd. For physical nics, we
could wait on the fd for user space interrupt.
> Setting the arg to 0 avoids the guest kicking the virtqueue, which
> is good for performance, and we should keep it.
>
> 	--yliu
>
  
Rich Lane Dec. 18, 2015, 6:01 p.m. UTC | #7
I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
few ways:

1. new_device/destroy_device: Link state change (will be covered by the
link status interrupt).
2. new_device: Add first queue to datapath.
3. vring_state_changed: Add/remove queue to datapath.
4. destroy_device: Remove all queues (vring_state_changed is not called
when qemu is killed).
5. new_device and struct virtio_net: Determine NUMA node of the VM.

The vring_state_changed callback is necessary because the VM might not be
using the maximum number of RX queues. If I boot Linux in the VM it will
start out using one RX queue, which can be changed with ethtool. The DPDK
app in the host needs to be notified that it can start sending traffic to
the new queue.

The vring_state_changed callback is also useful for guest TX queues to
avoid reading from an inactive queue.

API I'd like to have:

1. Link status interrupt.
2. New queue_state_changed callback. Unlike vring_state_changed this should
cover the first queue at new_device and removal of all queues at
destroy_device.
3. Per-queue or per-device NUMA node info.

On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> On 2015/12/18 13:15, Yuanhan Liu wrote:
> > On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/12/17 20:42, Yuanhan Liu wrote:
> >>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> >>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
> >>>> library APIs cannot be mapped to ethdev library APIs.
> >>>> Becasue of this, in some cases, we still need to use vhost library
> APIs
> >>>> for a port created by the vhost PMD.
> >>>>
> >>>> Currently, when virtio device is created and destroyed, vhost library
> >>>> will call one of callback handlers. The vhost PMD need to use this
> >>>> pair of callback handlers to know which virtio devices are connected
> >>>> actually.
> >>>> Because we can register only one pair of callbacks to vhost library,
> if
> >>>> the PMD use it, DPDK applications cannot have a way to know the
> events.
> >>>>
> >>>> This may break legacy DPDK applications that uses vhost library. To
> prevent
> >>>> it, this patch adds one more pair of callbacks to vhost library
> especially
> >>>> for the vhost PMD.
> >>>> With the patch, legacy applications can use the vhost PMD even if
> they need
> >>>> additional specific handling for virtio device creation and
> destruction.
> >>>>
> >>>> For example, legacy application can call
> >>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
> >>> TBH, I never liked it since the beginning. Introducing two callbacks
> >>> for one event is a bit messy, and therefore error prone.
> >> I agree with you.
> >>
> >>> I have been thinking this occasionally last few weeks, and have came
> >>> up something that we may introduce another layer callback based on
> >>> the vhost pmd itself, by a new API:
> >>>
> >>>     rte_eth_vhost_register_callback().
> >>>
> >>> And we then call those new callback inside the vhost pmd new_device()
> >>> and vhost pmd destroy_device() implementations.
> >>>
> >>> And we could have same callbacks like vhost have, but I'm thinking
> >>> that new_device() and destroy_device() doesn't sound like a good name
> >>> to a PMD driver. Maybe a name like "link_state_changed" is better?
> >>>
> >>> What do you think of that?
> >> Yes,  "link_state_changed" will be good.
> >>
> >> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> >> vhost library APIs directly.
> >> But probably you may feel strangeness about it. Is this correct?
> > Unluckily, that's true :)
> >
> >> If so, how about implementing legacy status interrupt mechanism to vhost
> >> PMD?
> >> For example, an DPDK app can register callback handler like
> >> "examples/link_status_interrupt".
> >>
> >> Also, if the app doesn't call vhost library APIs directly,
> >> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> >> need to handle virtio device structure anymore.
> >>
> >>>
> >>> On the other hand, I'm still thinking is that really necessary to let
> >>> the application be able to call vhost functions like
> rte_vhost_enable_guest_notification()
> >>> with the vhost PMD driver?
> >> Basic concept of my patch is that vhost PMD will provides the features
> >> that vhost library provides.
> > I don't think that's necessary. Let's just treat it as a normal pmd
> > driver, having nothing to do with vhost library.
> >
> >> How about removing rte_vhost_enable_guest_notification() from "vhost
> >> library"?
> >> (I also not sure what are use cases)
> >> If we can do this, vhost PMD also doesn't need to take care of it.
> >> Or if rte_vhost_enable_guest_notification() will be removed in the
> >> future, vhost PMD is able to ignore it.
> > You could either call it in vhost-pmd (which you already have done that),
> > or ignore it in vhost-pmd, but dont' remove it from vhost library.
> >
> >> Please let me correct up my thinking about your questions.
> >>  - Change concept of patch not to call vhost library APIs directly.
> >> These should be wrapped by ethdev APIs.
> >>  - Remove rte_eth_vhost_portid2vdev(), because of above concept
> changing.
> >>  - Implement legacy status changed interrupt to vhost PMD instead of
> >> using own callback mechanism.
> >>  - Check if we can remove rte_vhost_enable_guest_notification() from
> >> vhost library.
> > So, how about making it __fare__ simple as the first step, to get merged
> > easily, that we don't assume the applications will call any vhost library
> > functions any more, so that we don't need the callback, and we don't need
> > the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
> > normal (nothing special) pmd driver.  (UNLESS, there is a real must,
> which
> > I don't see so far).
> >
> > Tetsuya, what do you think of that then?
>
> I agree with you. But will wait a few days.
> Because if someone wants to use it from vhost PMD, they probably will
> provides use cases.
> And if there are no use cases, let's do like above.
>
> Thanks,
> Tetsuya
>
  
Tetsuya Mukawa Dec. 21, 2015, 2:10 a.m. UTC | #8
On 2015/12/18 19:03, Xie, Huawei wrote:
> On 12/18/2015 12:15 PM, Yuanhan Liu wrote:
>> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>>> library APIs cannot be mapped to ethdev library APIs.
>>>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>>>> for a port created by the vhost PMD.
>>>>>
>>>>> Currently, when virtio device is created and destroyed, vhost library
>>>>> will call one of callback handlers. The vhost PMD need to use this
>>>>> pair of callback handlers to know which virtio devices are connected
>>>>> actually.
>>>>> Because we can register only one pair of callbacks to vhost library, if
>>>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>>>
>>>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>>>> it, this patch adds one more pair of callbacks to vhost library especially
>>>>> for the vhost PMD.
>>>>> With the patch, legacy applications can use the vhost PMD even if they need
>>>>> additional specific handling for virtio device creation and destruction.
>>>>>
>>>>> For example, legacy application can call
>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>>> for one event is a bit messy, and therefore error prone.
>>> I agree with you.
>>>
>>>> I have been thinking this occasionally last few weeks, and have came
>>>> up something that we may introduce another layer callback based on
>>>> the vhost pmd itself, by a new API:
>>>>
>>>> 	rte_eth_vhost_register_callback().
>>>>
>>>> And we then call those new callback inside the vhost pmd new_device()
>>>> and vhost pmd destroy_device() implementations.
>>>>
>>>> And we could have same callbacks like vhost have, but I'm thinking
>>>> that new_device() and destroy_device() doesn't sound like a good name
>>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>>
>>>> What do you think of that?
>>> Yes,  "link_state_changed" will be good.
>>>
>>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>>> vhost library APIs directly.
>>> But probably you may feel strangeness about it. Is this correct?
>> Unluckily, that's true :)
>>
>>> If so, how about implementing legacy status interrupt mechanism to vhost
>>> PMD?
>>> For example, an DPDK app can register callback handler like
>>> "examples/link_status_interrupt".
>>>
>>> Also, if the app doesn't call vhost library APIs directly,
>>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>>> need to handle virtio device structure anymore.
>>>
>>>> On the other hand, I'm still thinking is that really necessary to let
>>>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>>>> with the vhost PMD driver?
>>> Basic concept of my patch is that vhost PMD will provides the features
>>> that vhost library provides.
>> I don't think that's necessary. Let's just treat it as a normal pmd
>> driver, having nothing to do with vhost library.
>>
>>> How about removing rte_vhost_enable_guest_notification() from "vhost
>>> library"?
>>> (I also not sure what are use cases)
>>> If we can do this, vhost PMD also doesn't need to take care of it.
>>> Or if rte_vhost_enable_guest_notification() will be removed in the
>>> future, vhost PMD is able to ignore it.
>> You could either call it in vhost-pmd (which you already have done that),
>> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>>
>>> Please let me correct up my thinking about your questions.
>>>  - Change concept of patch not to call vhost library APIs directly.
>>> These should be wrapped by ethdev APIs.
>>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>>>  - Implement legacy status changed interrupt to vhost PMD instead of
>>> using own callback mechanism.
>>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>>> vhost library.
>> So, how about making it __fare__ simple as the first step, to get merged
>> easily, that we don't assume the applications will call any vhost library
>> functions any more, so that we don't need the callback, and we don't need
>> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
>> normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
>> I don't see so far).
>>
>> Tetsuya, what do you think of that then?
>>
>>> Hi Xie,
>>>
>>> Do you know the use cases of rte_vhost_enable_guest_notification()?
> If vhost runs in loop mode, it doesn't need to be notified. You have
> wrapped vhost as the PMD, which is nice for OVS integration. If we
> require that all PMDs could be polled by select/poll, then we could use
> this API for vhost PMD, and wait on the kickfd. For physical nics, we
> could wait on the fd for user space interrupt.

Thanks for clarification.
I will ignore the function for first release of vhost PMD.

Thanks,
Tetsuya


>> Setting the arg to 0 avoids the guest kicking the virtqueue, which
>> is good for performance, and we should keep it.
>>
>> 	--yliu
>>
  
Tetsuya Mukawa Dec. 21, 2015, 2:10 a.m. UTC | #9
On 2015/12/19 3:01, Rich Lane wrote:
> I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
> few ways:
>
> 1. new_device/destroy_device: Link state change (will be covered by the
> link status interrupt).
> 2. new_device: Add first queue to datapath.
> 3. vring_state_changed: Add/remove queue to datapath.
> 4. destroy_device: Remove all queues (vring_state_changed is not called
> when qemu is killed).
> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>
> The vring_state_changed callback is necessary because the VM might not be
> using the maximum number of RX queues. If I boot Linux in the VM it will
> start out using one RX queue, which can be changed with ethtool. The DPDK
> app in the host needs to be notified that it can start sending traffic to
> the new queue.
>
> The vring_state_changed callback is also useful for guest TX queues to
> avoid reading from an inactive queue.
>
> API I'd like to have:
>
> 1. Link status interrupt.
> 2. New queue_state_changed callback. Unlike vring_state_changed this should
> cover the first queue at new_device and removal of all queues at
> destroy_device.
> 3. Per-queue or per-device NUMA node info.

Hi Rich and Yuanhan,

As Rich described, some users needs more information when the interrupts
comes.
And the virtio_net structure contains the information.

I guess it's very similar to interrupt handling of normal hardware.
First, a interrupt comes, then an interrupt handler checks status
register of the device to know actually what was happened.
In vhost PMD case, reading status register equals reading virtio_net
structure.

So how about below specification?

1. The link status interrupt of vhost PMD will occurs when new_device,
destroy_device and vring_state_changed events are happened.
2. Vhost PMD provides a function to let the users know virtio_net
structure of the interrupted port.
   (Probably almost same as "rte_eth_vhost_portid2vdev" that I described
in "[PATCH v5 3/3] vhost: Add helper function to convert port id to
virtio device pointer")

I guess what kind of information the users need will depends on their
environments.
So just providing virtio_net structure may be good.
What do you think?

Tetsuya,

>
> On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> On 2015/12/18 13:15, Yuanhan Liu wrote:
>>> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>>>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>>>> library APIs cannot be mapped to ethdev library APIs.
>>>>>> Becasue of this, in some cases, we still need to use vhost library
>> APIs
>>>>>> for a port created by the vhost PMD.
>>>>>>
>>>>>> Currently, when virtio device is created and destroyed, vhost library
>>>>>> will call one of callback handlers. The vhost PMD need to use this
>>>>>> pair of callback handlers to know which virtio devices are connected
>>>>>> actually.
>>>>>> Because we can register only one pair of callbacks to vhost library,
>> if
>>>>>> the PMD use it, DPDK applications cannot have a way to know the
>> events.
>>>>>> This may break legacy DPDK applications that uses vhost library. To
>> prevent
>>>>>> it, this patch adds one more pair of callbacks to vhost library
>> especially
>>>>>> for the vhost PMD.
>>>>>> With the patch, legacy applications can use the vhost PMD even if
>> they need
>>>>>> additional specific handling for virtio device creation and
>> destruction.
>>>>>> For example, legacy application can call
>>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>>>> for one event is a bit messy, and therefore error prone.
>>>> I agree with you.
>>>>
>>>>> I have been thinking this occasionally last few weeks, and have came
>>>>> up something that we may introduce another layer callback based on
>>>>> the vhost pmd itself, by a new API:
>>>>>
>>>>>     rte_eth_vhost_register_callback().
>>>>>
>>>>> And we then call those new callback inside the vhost pmd new_device()
>>>>> and vhost pmd destroy_device() implementations.
>>>>>
>>>>> And we could have same callbacks like vhost have, but I'm thinking
>>>>> that new_device() and destroy_device() doesn't sound like a good name
>>>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>>>
>>>>> What do you think of that?
>>>> Yes,  "link_state_changed" will be good.
>>>>
>>>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>>>> vhost library APIs directly.
>>>> But probably you may feel strangeness about it. Is this correct?
>>> Unluckily, that's true :)
>>>
>>>> If so, how about implementing legacy status interrupt mechanism to vhost
>>>> PMD?
>>>> For example, an DPDK app can register callback handler like
>>>> "examples/link_status_interrupt".
>>>>
>>>> Also, if the app doesn't call vhost library APIs directly,
>>>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>>>> need to handle virtio device structure anymore.
>>>>
>>>>> On the other hand, I'm still thinking is that really necessary to let
>>>>> the application be able to call vhost functions like
>> rte_vhost_enable_guest_notification()
>>>>> with the vhost PMD driver?
>>>> Basic concept of my patch is that vhost PMD will provides the features
>>>> that vhost library provides.
>>> I don't think that's necessary. Let's just treat it as a normal pmd
>>> driver, having nothing to do with vhost library.
>>>
>>>> How about removing rte_vhost_enable_guest_notification() from "vhost
>>>> library"?
>>>> (I also not sure what are use cases)
>>>> If we can do this, vhost PMD also doesn't need to take care of it.
>>>> Or if rte_vhost_enable_guest_notification() will be removed in the
>>>> future, vhost PMD is able to ignore it.
>>> You could either call it in vhost-pmd (which you already have done that),
>>> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>>>
>>>> Please let me correct up my thinking about your questions.
>>>>  - Change concept of patch not to call vhost library APIs directly.
>>>> These should be wrapped by ethdev APIs.
>>>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept
>> changing.
>>>>  - Implement legacy status changed interrupt to vhost PMD instead of
>>>> using own callback mechanism.
>>>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>>>> vhost library.
>>> So, how about making it __fare__ simple as the first step, to get merged
>>> easily, that we don't assume the applications will call any vhost library
>>> functions any more, so that we don't need the callback, and we don't need
>>> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
>>> normal (nothing special) pmd driver.  (UNLESS, there is a real must,
>> which
>>> I don't see so far).
>>>
>>> Tetsuya, what do you think of that then?
>> I agree with you. But will wait a few days.
>> Because if someone wants to use it from vhost PMD, they probably will
>> provides use cases.
>> And if there are no use cases, let's do like above.
>>
>> Thanks,
>> Tetsuya
>>
  
Yuanhan Liu Dec. 22, 2015, 3:41 a.m. UTC | #10
On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a few
> ways:

Rich, thanks for the info!

> 
> 1. new_device/destroy_device: Link state change (will be covered by the link
> status interrupt).
> 2. new_device: Add first queue to datapath.

I'm wondering why vring_state_changed() is not used, as it will also be
triggered at the beginning, when the default queue (the first queue) is
enabled.

> 3. vring_state_changed: Add/remove queue to datapath.
> 4. destroy_device: Remove all queues (vring_state_changed is not called when
> qemu is killed).

I had a plan to invoke vring_state_changed() to disable all vrings
when destroy_device() is called.

> 5. new_device and struct virtio_net: Determine NUMA node of the VM.

You can get the 'struct virtio_net' dev from all above callbacks.
 
> 
> The vring_state_changed callback is necessary because the VM might not be using
> the maximum number of RX queues. If I boot Linux in the VM it will start out
> using one RX queue, which can be changed with ethtool. The DPDK app in the host
> needs to be notified that it can start sending traffic to the new queue.
> 
> The vring_state_changed callback is also useful for guest TX queues to avoid
> reading from an inactive queue.
> 
> API I'd like to have:
> 
> 1. Link status interrupt.

To vhost pmd, new_device()/destroy_device() equals to the link status
interrupt, where new_device() is a link up, and destroy_device() is link
down().


> 2. New queue_state_changed callback. Unlike vring_state_changed this should
> cover the first queue at new_device and removal of all queues at
> destroy_device.

As stated above, vring_state_changed() should be able to do that, except
the one on destroy_device(), which is not done yet.

> 3. Per-queue or per-device NUMA node info.

You can query the NUMA node info implicitly by get_mempolicy(); check
numa_realloc() at lib/librte_vhost/virtio-net.c for reference.

	--yliu
> 
> On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
> 
>     On 2015/12/18 13:15, Yuanhan Liu wrote:
>     > On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>     >> On 2015/12/17 20:42, Yuanhan Liu wrote:
>     >>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>     >>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>     >>>> library APIs cannot be mapped to ethdev library APIs.
>     >>>> Becasue of this, in some cases, we still need to use vhost library
>     APIs
>     >>>> for a port created by the vhost PMD.
>     >>>>
>     >>>> Currently, when virtio device is created and destroyed, vhost library
>     >>>> will call one of callback handlers. The vhost PMD need to use this
>     >>>> pair of callback handlers to know which virtio devices are connected
>     >>>> actually.
>     >>>> Because we can register only one pair of callbacks to vhost library,
>     if
>     >>>> the PMD use it, DPDK applications cannot have a way to know the
>     events.
>     >>>>
>     >>>> This may break legacy DPDK applications that uses vhost library. To
>     prevent
>     >>>> it, this patch adds one more pair of callbacks to vhost library
>     especially
>     >>>> for the vhost PMD.
>     >>>> With the patch, legacy applications can use the vhost PMD even if they
>     need
>     >>>> additional specific handling for virtio device creation and
>     destruction.
>     >>>>
>     >>>> For example, legacy application can call
>     >>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>     >>> TBH, I never liked it since the beginning. Introducing two callbacks
>     >>> for one event is a bit messy, and therefore error prone.
>     >> I agree with you.
>     >>
>     >>> I have been thinking this occasionally last few weeks, and have came
>     >>> up something that we may introduce another layer callback based on
>     >>> the vhost pmd itself, by a new API:
>     >>>
>     >>>     rte_eth_vhost_register_callback().
>     >>>
>     >>> And we then call those new callback inside the vhost pmd new_device()
>     >>> and vhost pmd destroy_device() implementations.
>     >>>
>     >>> And we could have same callbacks like vhost have, but I'm thinking
>     >>> that new_device() and destroy_device() doesn't sound like a good name
>     >>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>     >>>
>     >>> What do you think of that?
>     >> Yes,  "link_state_changed" will be good.
>     >>
>     >> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>     >> vhost library APIs directly.
>     >> But probably you may feel strangeness about it. Is this correct?
>     > Unluckily, that's true :)
>     >
>     >> If so, how about implementing legacy status interrupt mechanism to vhost
>     >> PMD?
>     >> For example, an DPDK app can register callback handler like
>     >> "examples/link_status_interrupt".
>     >>
>     >> Also, if the app doesn't call vhost library APIs directly,
>     >> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>     >> need to handle virtio device structure anymore.
>     >>
>     >>>
>     >>> On the other hand, I'm still thinking is that really necessary to let
>     >>> the application be able to call vhost functions like
>     rte_vhost_enable_guest_notification()
>     >>> with the vhost PMD driver?
>     >> Basic concept of my patch is that vhost PMD will provides the features
>     >> that vhost library provides.
>     > I don't think that's necessary. Let's just treat it as a normal pmd
>     > driver, having nothing to do with vhost library.
>     >
>     >> How about removing rte_vhost_enable_guest_notification() from "vhost
>     >> library"?
>     >> (I also not sure what are use cases)
>     >> If we can do this, vhost PMD also doesn't need to take care of it.
>     >> Or if rte_vhost_enable_guest_notification() will be removed in the
>     >> future, vhost PMD is able to ignore it.
>     > You could either call it in vhost-pmd (which you already have done that),
>     > or ignore it in vhost-pmd, but dont' remove it from vhost library.
>     >
>     >> Please let me correct up my thinking about your questions.
>     >>  - Change concept of patch not to call vhost library APIs directly.
>     >> These should be wrapped by ethdev APIs.
>     >>  - Remove rte_eth_vhost_portid2vdev(), because of above concept
>     changing.
>     >>  - Implement legacy status changed interrupt to vhost PMD instead of
>     >> using own callback mechanism.
>     >>  - Check if we can remove rte_vhost_enable_guest_notification() from
>     >> vhost library.
>     > So, how about making it __fare__ simple as the first step, to get merged
>     > easily, that we don't assume the applications will call any vhost library
>     > functions any more, so that we don't need the callback, and we don't need
>     > the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
>     > normal (nothing special) pmd driver.  (UNLESS, there is a real must,
>     which
>     > I don't see so far).
>     >
>     > Tetsuya, what do you think of that then?
> 
>     I agree with you. But will wait a few days.
>     Because if someone wants to use it from vhost PMD, they probably will
>     provides use cases.
>     And if there are no use cases, let's do like above.
> 
>     Thanks,
>     Tetsuya
> 
>
  
Yuanhan Liu Dec. 22, 2015, 4:36 a.m. UTC | #11
On Mon, Dec 21, 2015 at 11:10:10AM +0900, Tetsuya Mukawa wrote:
> nes: 168
> 
> On 2015/12/19 3:01, Rich Lane wrote:
> > I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
> > few ways:
> >
> > 1. new_device/destroy_device: Link state change (will be covered by the
> > link status interrupt).
> > 2. new_device: Add first queue to datapath.
> > 3. vring_state_changed: Add/remove queue to datapath.
> > 4. destroy_device: Remove all queues (vring_state_changed is not called
> > when qemu is killed).
> > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >
> > The vring_state_changed callback is necessary because the VM might not be
> > using the maximum number of RX queues. If I boot Linux in the VM it will
> > start out using one RX queue, which can be changed with ethtool. The DPDK
> > app in the host needs to be notified that it can start sending traffic to
> > the new queue.
> >
> > The vring_state_changed callback is also useful for guest TX queues to
> > avoid reading from an inactive queue.
> >
> > API I'd like to have:
> >
> > 1. Link status interrupt.
> > 2. New queue_state_changed callback. Unlike vring_state_changed this should
> > cover the first queue at new_device and removal of all queues at
> > destroy_device.
> > 3. Per-queue or per-device NUMA node info.
> 
> Hi Rich and Yuanhan,
> 
> As Rich described, some users needs more information when the interrupts
> comes.
> And the virtio_net structure contains the information.
> 
> I guess it's very similar to interrupt handling of normal hardware.
> First, a interrupt comes, then an interrupt handler checks status
> register of the device to know actually what was happened.
> In vhost PMD case, reading status register equals reading virtio_net
> structure.
> 
> So how about below specification?
> 
> 1. The link status interrupt of vhost PMD will occurs when new_device,
> destroy_device and vring_state_changed events are happened.
> 2. Vhost PMD provides a function to let the users know virtio_net
> structure of the interrupted port.
>    (Probably almost same as "rte_eth_vhost_portid2vdev" that I described
> in "[PATCH v5 3/3] vhost: Add helper function to convert port id to
> virtio device pointer")

That is one option, to wrapper everything into the link status interrupt
handler, and let it to query the virtio_net structure to know what exactly
happened, and then take the proper action.

With that, we could totally get rid of the "two sets of vhost callbacks".
The interface is a also clean. However, there is a drawback: it's not
that extensible: what if vhost introduces a new vhost callback, and it
does not literally belong to a link status interrupt event?

The another options is to introduce a new set of callbacks (not based on
vhost, but based on vhost-pmd as I suggested before). Here we could
rename the callback to make it looks more reasonable to a pmd driver,
say, remove the new_device()/destroy_device() and combine them as one
callback: link_status_changed.

The good thing about that is it's extensible; we could easily add a
new callback when there is a new one at vhost. However, it's not that
clean as the first one. Besides that, two sets of callback for vhost
is always weird to me.

Thoughts, comments?

	--yliu
  
Rich Lane Dec. 22, 2015, 4:47 a.m. UTC | #12
On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> > I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> in a few
> > ways:
>
> Rich, thanks for the info!
>
> >
> > 1. new_device/destroy_device: Link state change (will be covered by the
> link
> > status interrupt).
> > 2. new_device: Add first queue to datapath.
>
> I'm wondering why vring_state_changed() is not used, as it will also be
> triggered at the beginning, when the default queue (the first queue) is
> enabled.
>

Turns out I'd misread the code and it's already using the
vring_state_changed callback for the
first queue. Not sure if this is intentional but vring_state_changed is
called for the first queue
before new_device.


> > 3. vring_state_changed: Add/remove queue to datapath.
> > 4. destroy_device: Remove all queues (vring_state_changed is not called
> when
> > qemu is killed).
>
> I had a plan to invoke vring_state_changed() to disable all vrings
> when destroy_device() is called.
>

That would be good.


> > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>
> You can get the 'struct virtio_net' dev from all above callbacks.



> 1. Link status interrupt.
>
> To vhost pmd, new_device()/destroy_device() equals to the link status
> interrupt, where new_device() is a link up, and destroy_device() is link
> down().
>
>
> > 2. New queue_state_changed callback. Unlike vring_state_changed this
> should
> > cover the first queue at new_device and removal of all queues at
> > destroy_device.
>
> As stated above, vring_state_changed() should be able to do that, except
> the one on destroy_device(), which is not done yet.
>
> > 3. Per-queue or per-device NUMA node info.
>
> You can query the NUMA node info implicitly by get_mempolicy(); check
> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>

Your suggestions are exactly how my application is already working. I was
commenting on the
proposed changes to the vhost PMD API. I would prefer to
use RTE_ETH_EVENT_INTR_LSC
and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
of these vhost-specific
hacks. The queue state change callback is the one new API that needs to be
added because
normal NICs don't have this behavior.

You could add another rte_eth_event_type for the queue state change
callback, and pass the
queue ID, RX/TX direction, and enable bit through cb_arg. The application
would never need
to touch struct virtio_net.
  
Yuanhan Liu Dec. 22, 2015, 5:47 a.m. UTC | #13
On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>     > I'm using the vhost callbacks and struct virtio_net with the vhost PMD in
>     a few
>     > ways:
> 
>     Rich, thanks for the info!
>    
>     >
>     > 1. new_device/destroy_device: Link state change (will be covered by the
>     link
>     > status interrupt).
>     > 2. new_device: Add first queue to datapath.
> 
>     I'm wondering why vring_state_changed() is not used, as it will also be
>     triggered at the beginning, when the default queue (the first queue) is
>     enabled.
> 
> 
> Turns out I'd misread the code and it's already using the vring_state_changed
> callback for the
> first queue. Not sure if this is intentional but vring_state_changed is called
> for the first queue
> before new_device.

Yeah, you were right, we can't count on this vring_state_changed(), for
it's sent earlier than vring has been initialized on vhost side. Maybe
we should invoke vring_state_changed() callback at new_device() as well.

>  
> 
>     > 3. vring_state_changed: Add/remove queue to datapath.
>     > 4. destroy_device: Remove all queues (vring_state_changed is not called
>     when
>     > qemu is killed).
> 
>     I had a plan to invoke vring_state_changed() to disable all vrings
>     when destroy_device() is called.
> 
> 
> That would be good.
>  
> 
>     > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> 
>     You can get the 'struct virtio_net' dev from all above callbacks.
> 
>      
> 
>     > 1. Link status interrupt.
> 
>     To vhost pmd, new_device()/destroy_device() equals to the link status
>     interrupt, where new_device() is a link up, and destroy_device() is link
>     down().
>    
> 
>     > 2. New queue_state_changed callback. Unlike vring_state_changed this
>     should
>     > cover the first queue at new_device and removal of all queues at
>     > destroy_device.
> 
>     As stated above, vring_state_changed() should be able to do that, except
>     the one on destroy_device(), which is not done yet.
>    
>     > 3. Per-queue or per-device NUMA node info.
> 
>     You can query the NUMA node info implicitly by get_mempolicy(); check
>     numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> 
> 
> Your suggestions are exactly how my application is already working. I was
> commenting on the
> proposed changes to the vhost PMD API. I would prefer to
> use RTE_ETH_EVENT_INTR_LSC
> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead of
> these vhost-specific
> hacks.

That's a good suggestion.

> The queue state change callback is the one new API that needs to be
> added because
> normal NICs don't have this behavior.

Again I'd ask, will vring_state_changed() be enough, when above issues
are resolved: vring_state_changed() will be invoked at new_device()/
destroy_device(), and of course, ethtool change?

	--yliu

> 
> You could add another rte_eth_event_type for the queue state change callback,
> and pass the
> queue ID, RX/TX direction, and enable bit through cb_arg. The application would
> never need
> to touch struct virtio_net.
  
Rich Lane Dec. 22, 2015, 9:38 a.m. UTC | #14
On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
>
> Again I'd ask, will vring_state_changed() be enough, when above issues
> are resolved: vring_state_changed() will be invoked at new_device()/
> destroy_device(), and of course, ethtool change?


It would be sufficient. It is not a great API though, because it requires
the
application to do the conversion from struct virtio_net to a DPDK port
number,
and from a virtqueue index to a DPDK queue id and direction. Also, the
current
implementation often makes this callback when the vring state has not
actually
changed (enabled -> enabled and disabled -> disabled).

If you're asking about using vring_state_changed() _instead_ of the link
status
event and rte_eth_dev_socket_id(), then yes, it still works. I'd only
consider
that a stopgap until the real ethdev APIs are implemented.

I'd suggest to add RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than
create another callback registration API.

Perhaps we could merge the basic PMD which I think is pretty solid and then
continue the API discussion with patches to it.
  
Yuanhan Liu Dec. 23, 2015, 2:44 a.m. UTC | #15
On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
>     > The queue state change callback is the one new API that needs to be
>     > added because
>     > normal NICs don't have this behavior.
> 
>     Again I'd ask, will vring_state_changed() be enough, when above issues
>     are resolved: vring_state_changed() will be invoked at new_device()/
>     destroy_device(), and of course, ethtool change?
> 
> 
> It would be sufficient. It is not a great API though, because it requires the
> application to do the conversion from struct virtio_net to a DPDK port number,
> and from a virtqueue index to a DPDK queue id and direction. Also, the current
> implementation often makes this callback when the vring state has not actually
> changed (enabled -> enabled and disabled -> disabled).
> 
> If you're asking about using vring_state_changed() _instead_ of the link status
> event and rte_eth_dev_socket_id(),

No, I like the idea of link status event and rte_eth_dev_socket_id();
I was just wondering why a new API is needed. Both Tetsuya and I
were thinking to leverage the link status event to represent the
queue stats change (triggered by vring_state_changed()) as well,
so that we don't need to introduce another eth event. However, I'd
agree that it's better if we could have a new dedicate event.

Thomas, here is some background for you. For vhost pmd and linux
virtio-net combo, the queue can be dynamically changed by ethtool,
therefore, the application wishes to have another eth event, say
RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
add/remove corresponding queue to the datapath when that happens.
What do you think of that?

> then yes, it still works. I'd only consider
> that a stopgap until the real ethdev APIs are implemented.
> 
> I'd suggest to add RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than
> create another callback registration API.
> 
> Perhaps we could merge the basic PMD which I think is pretty solid and then
> continue the API discussion with patches to it.

Perhaps, but let's see how hard it could be for the new eth event
discussion then.

	--yliu
  
Thomas Monjalon Dec. 23, 2015, 10 p.m. UTC | #16
2015-12-23 10:44, Yuanhan Liu:
> On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > wrote:
> > 
> >     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> >     > The queue state change callback is the one new API that needs to be
> >     > added because
> >     > normal NICs don't have this behavior.
> > 
> >     Again I'd ask, will vring_state_changed() be enough, when above issues
> >     are resolved: vring_state_changed() will be invoked at new_device()/
> >     destroy_device(), and of course, ethtool change?
> > 
> > 
> > It would be sufficient. It is not a great API though, because it requires the
> > application to do the conversion from struct virtio_net to a DPDK port number,
> > and from a virtqueue index to a DPDK queue id and direction. Also, the current
> > implementation often makes this callback when the vring state has not actually
> > changed (enabled -> enabled and disabled -> disabled).
> > 
> > If you're asking about using vring_state_changed() _instead_ of the link status
> > event and rte_eth_dev_socket_id(),
> 
> No, I like the idea of link status event and rte_eth_dev_socket_id();
> I was just wondering why a new API is needed. Both Tetsuya and I
> were thinking to leverage the link status event to represent the
> queue stats change (triggered by vring_state_changed()) as well,
> so that we don't need to introduce another eth event. However, I'd
> agree that it's better if we could have a new dedicate event.
> 
> Thomas, here is some background for you. For vhost pmd and linux
> virtio-net combo, the queue can be dynamically changed by ethtool,
> therefore, the application wishes to have another eth event, say
> RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
> add/remove corresponding queue to the datapath when that happens.
> What do you think of that?

Yes it is an event. So I don't understand the question.
What may be better than a specific rte_eth_event_type?
  
Tetsuya Mukawa Dec. 24, 2015, 3:09 a.m. UTC | #17
On 2015/12/22 13:47, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
>
>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>> in a few
>>> ways:
>> Rich, thanks for the info!
>>
>>> 1. new_device/destroy_device: Link state change (will be covered by the
>> link
>>> status interrupt).
>>> 2. new_device: Add first queue to datapath.
>> I'm wondering why vring_state_changed() is not used, as it will also be
>> triggered at the beginning, when the default queue (the first queue) is
>> enabled.
>>
> Turns out I'd misread the code and it's already using the
> vring_state_changed callback for the
> first queue. Not sure if this is intentional but vring_state_changed is
> called for the first queue
> before new_device.
>
>
>>> 3. vring_state_changed: Add/remove queue to datapath.
>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>> when
>>> qemu is killed).
>> I had a plan to invoke vring_state_changed() to disable all vrings
>> when destroy_device() is called.
>>
> That would be good.
>
>
>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>> You can get the 'struct virtio_net' dev from all above callbacks.
>
>
>> 1. Link status interrupt.
>>
>> To vhost pmd, new_device()/destroy_device() equals to the link status
>> interrupt, where new_device() is a link up, and destroy_device() is link
>> down().
>>
>>
>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>> should
>>> cover the first queue at new_device and removal of all queues at
>>> destroy_device.
>> As stated above, vring_state_changed() should be able to do that, except
>> the one on destroy_device(), which is not done yet.
>>
>>> 3. Per-queue or per-device NUMA node info.
>> You can query the NUMA node info implicitly by get_mempolicy(); check
>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>
> Your suggestions are exactly how my application is already working. I was
> commenting on the
> proposed changes to the vhost PMD API. I would prefer to
> use RTE_ETH_EVENT_INTR_LSC
> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> of these vhost-specific
> hacks. The queue state change callback is the one new API that needs to be
> added because
> normal NICs don't have this behavior.
>
> You could add another rte_eth_event_type for the queue state change
> callback, and pass the
> queue ID, RX/TX direction, and enable bit through cb_arg. 

Hi Rich,

So far, EAL provides rte_eth_dev_callback_register() for event handling.
DPDK app can register callback handler and "callback argument".
And EAL will call callback handler with the argument.
Anyway, vhost library and PMD cannot change the argument.

I guess the callback handler will need to call ethdev APIs to know what
causes the interrupt.
Probably rte_eth_dev_socket_id() is to know numa_node, and
rte_eth_dev_info_get() is to know the number of queues.
Is this okay for your case?

Thanks,
Tetsuya
  
Yuanhan Liu Dec. 24, 2015, 3:51 a.m. UTC | #18
On Wed, Dec 23, 2015 at 11:00:15PM +0100, Thomas Monjalon wrote:
> 2015-12-23 10:44, Yuanhan Liu:
> > On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> > > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > > wrote:
> > > 
> > >     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > >     > The queue state change callback is the one new API that needs to be
> > >     > added because
> > >     > normal NICs don't have this behavior.
> > > 
> > >     Again I'd ask, will vring_state_changed() be enough, when above issues
> > >     are resolved: vring_state_changed() will be invoked at new_device()/
> > >     destroy_device(), and of course, ethtool change?
> > > 
> > > 
> > > It would be sufficient. It is not a great API though, because it requires the
> > > application to do the conversion from struct virtio_net to a DPDK port number,
> > > and from a virtqueue index to a DPDK queue id and direction. Also, the current
> > > implementation often makes this callback when the vring state has not actually
> > > changed (enabled -> enabled and disabled -> disabled).
> > > 
> > > If you're asking about using vring_state_changed() _instead_ of the link status
> > > event and rte_eth_dev_socket_id(),
> > 
> > No, I like the idea of link status event and rte_eth_dev_socket_id();
> > I was just wondering why a new API is needed. Both Tetsuya and I
> > were thinking to leverage the link status event to represent the
> > queue stats change (triggered by vring_state_changed()) as well,
> > so that we don't need to introduce another eth event. However, I'd
> > agree that it's better if we could have a new dedicate event.
> > 
> > Thomas, here is some background for you. For vhost pmd and linux
> > virtio-net combo, the queue can be dynamically changed by ethtool,
> > therefore, the application wishes to have another eth event, say
> > RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
> > add/remove corresponding queue to the datapath when that happens.
> > What do you think of that?
> 
> Yes it is an event. So I don't understand the question.
> What may be better than a specific rte_eth_event_type?

The alternative is a new set of callbacks, but judging that we already
have a set of callback for vhost libraray, and adding a new set to vhost
pmd doesn't seem elegant to me.

Therefore, I'd prefer a new eth event.

	--yliu
  
Tetsuya Mukawa Dec. 24, 2015, 3:54 a.m. UTC | #19
On 2015/12/24 12:09, Tetsuya Mukawa wrote:
> On 2015/12/22 13:47, Rich Lane wrote:
>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> wrote:
>>
>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>> in a few
>>>> ways:
>>> Rich, thanks for the info!
>>>
>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>> link
>>>> status interrupt).
>>>> 2. new_device: Add first queue to datapath.
>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>> triggered at the beginning, when the default queue (the first queue) is
>>> enabled.
>>>
>> Turns out I'd misread the code and it's already using the
>> vring_state_changed callback for the
>> first queue. Not sure if this is intentional but vring_state_changed is
>> called for the first queue
>> before new_device.
>>
>>
>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>> when
>>>> qemu is killed).
>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>> when destroy_device() is called.
>>>
>> That would be good.
>>
>>
>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>
>>> 1. Link status interrupt.
>>>
>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>> down().
>>>
>>>
>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>> should
>>>> cover the first queue at new_device and removal of all queues at
>>>> destroy_device.
>>> As stated above, vring_state_changed() should be able to do that, except
>>> the one on destroy_device(), which is not done yet.
>>>
>>>> 3. Per-queue or per-device NUMA node info.
>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>
>> Your suggestions are exactly how my application is already working. I was
>> commenting on the
>> proposed changes to the vhost PMD API. I would prefer to
>> use RTE_ETH_EVENT_INTR_LSC
>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>> of these vhost-specific
>> hacks. The queue state change callback is the one new API that needs to be
>> added because
>> normal NICs don't have this behavior.
>>
>> You could add another rte_eth_event_type for the queue state change
>> callback, and pass the
>> queue ID, RX/TX direction, and enable bit through cb_arg. 
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>
> I guess the callback handler will need to call ethdev APIs to know what
> causes the interrupt.
> Probably rte_eth_dev_socket_id() is to know numa_node, and
> rte_eth_dev_info_get() is to know the number of queues.

It seems rte_eth_dev_info_get() is not enough, because DPDK application
needs not only the number of queues, but also which queues are enabled
or disabled at least.
I guess current interrupt mechanism and ethdev APIs cannot provides such
information, because it's something special for vhost case.

Tetsuya

> Is this okay for your case?
>
> Thanks,
> Tetsuya
  
Yuanhan Liu Dec. 24, 2015, 4 a.m. UTC | #20
On Thu, Dec 24, 2015 at 12:09:10PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg. 
> 
> Hi Rich,
> 
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.

Yes, the event callback argument is provided from application, where it
has no way to know info you mentioned above, like queue ID, RX/TX direction.

For that, we may need introduce another structure to note all above
info, and embed it to virtio_net struct.

> I guess the callback handler will need to call ethdev APIs to know what
> causes the interrupt.
> Probably rte_eth_dev_socket_id() is to know numa_node, and
> rte_eth_dev_info_get() is to know the number of queues.
> Is this okay for your case?

I don't think that's enough, and I think Rich has already given all info
needed to a queue change interrupt to you. (That would also answer your
questions in another email)

	--yliu
  
Tetsuya Mukawa Dec. 24, 2015, 4:07 a.m. UTC | #21
On 2015/12/24 12:51, Yuanhan Liu wrote:
> On Wed, Dec 23, 2015 at 11:00:15PM +0100, Thomas Monjalon wrote:
>> 2015-12-23 10:44, Yuanhan Liu:
>>> On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
>>>> On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>> wrote:
>>>>
>>>>     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
>>>>     > The queue state change callback is the one new API that needs to be
>>>>     > added because
>>>>     > normal NICs don't have this behavior.
>>>>
>>>>     Again I'd ask, will vring_state_changed() be enough, when above issues
>>>>     are resolved: vring_state_changed() will be invoked at new_device()/
>>>>     destroy_device(), and of course, ethtool change?
>>>>
>>>>
>>>> It would be sufficient. It is not a great API though, because it requires the
>>>> application to do the conversion from struct virtio_net to a DPDK port number,
>>>> and from a virtqueue index to a DPDK queue id and direction. Also, the current
>>>> implementation often makes this callback when the vring state has not actually
>>>> changed (enabled -> enabled and disabled -> disabled).
>>>>
>>>> If you're asking about using vring_state_changed() _instead_ of the link status
>>>> event and rte_eth_dev_socket_id(),
>>> No, I like the idea of link status event and rte_eth_dev_socket_id();
>>> I was just wondering why a new API is needed. Both Tetsuya and I
>>> were thinking to leverage the link status event to represent the
>>> queue stats change (triggered by vring_state_changed()) as well,
>>> so that we don't need to introduce another eth event. However, I'd
>>> agree that it's better if we could have a new dedicate event.
>>>
>>> Thomas, here is some background for you. For vhost pmd and linux
>>> virtio-net combo, the queue can be dynamically changed by ethtool,
>>> therefore, the application wishes to have another eth event, say
>>> RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
>>> add/remove corresponding queue to the datapath when that happens.
>>> What do you think of that?
>> Yes it is an event. So I don't understand the question.
>> What may be better than a specific rte_eth_event_type?
> The alternative is a new set of callbacks, but judging that we already
> have a set of callback for vhost libraray, and adding a new set to vhost
> pmd doesn't seem elegant to me.
>
> Therefore, I'd prefer a new eth event.
>
> 	--yliu

I am ok to have one more event type.

BTW, I have questions about numa_node.
I guess "rte_eth_dev_socket_id()" can only return numa_node of specified
port.
If multiple queues are used in one device(port), can we say all queues
are always in same numa_node?

If the answer is no, I am still not sure we can remove "struct
virtio_net" from DPDK application callback handling.
I agree we can add RTE_ETH_EVENT_QUEUE_STATE_CHANGE for interrupt notice.
But current ethdev APIs may not be able to hide vhost specific
properties, then the callback hander needs to handle "struct virtio_net"
directly.

Thanks,
Tetsuya
  
Tetsuya Mukawa Dec. 24, 2015, 4:23 a.m. UTC | #22
On 2015/12/24 13:00, Yuanhan Liu wrote:
> On Thu, Dec 24, 2015 at 12:09:10PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>> wrote:
>>>
>>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>>> in a few
>>>>> ways:
>>>> Rich, thanks for the info!
>>>>
>>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>>> link
>>>>> status interrupt).
>>>>> 2. new_device: Add first queue to datapath.
>>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>>> triggered at the beginning, when the default queue (the first queue) is
>>>> enabled.
>>>>
>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
>>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>>> when
>>>>> qemu is killed).
>>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>>> when destroy_device() is called.
>>>>
>>> That would be good.
>>>
>>>
>>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>>
>>>> 1. Link status interrupt.
>>>>
>>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>>> down().
>>>>
>>>>
>>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>>> should
>>>>> cover the first queue at new_device and removal of all queues at
>>>>> destroy_device.
>>>> As stated above, vring_state_changed() should be able to do that, except
>>>> the one on destroy_device(), which is not done yet.
>>>>
>>>>> 3. Per-queue or per-device NUMA node info.
>>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>>
>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg. 
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
> Yes, the event callback argument is provided from application, where it
> has no way to know info you mentioned above, like queue ID, RX/TX direction.
>
> For that, we may need introduce another structure to note all above
> info, and embed it to virtio_net struct.
>
>> I guess the callback handler will need to call ethdev APIs to know what
>> causes the interrupt.
>> Probably rte_eth_dev_socket_id() is to know numa_node, and
>> rte_eth_dev_info_get() is to know the number of queues.
>> Is this okay for your case?
> I don't think that's enough, and I think Rich has already given all info
> needed to a queue change interrupt to you. (That would also answer your
> questions in another email)
>
> 	--yliu

Thanks. Yes, I agree we need to provides such information through
"struct virtio_net".
And the callback handler needs to handle the vhost specific structure
directly.

Probably it's difficult to remove "struct virtio_net" from callback
handler of DPDK app.
This is the point I want to describe.

Tetsuya,
  
Rich Lane Dec. 24, 2015, 5:37 a.m. UTC | #23
On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to
> be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg.
>
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>

You're right, I'd mistakenly thought that the PMD controlled the void *
passed to the callback.

Here's a thought:

    struct rte_eth_vhost_queue_event {
        uint16_t queue_id;
        bool rx;
        bool enable;
    };

    int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
rte_eth_vhost_queue_event *event);

On receiving the ethdev event the application could repeatedly call
rte_eth_vhost_get_queue_event
to find out what happened.

An issue with having the application dig into struct virtio_net is that it
can only be safely accessed from
a callback on the vhost thread. A typical application running its control
plane on lcore 0 would need to
copy all the relevant info from struct virtio_net before sending it over.

As you mentioned, queues for a single vhost port could be located on
different NUMA nodes. I think this
is an uncommon scenario but if needed you could add an API to retrieve the
NUMA node for a given port
and queue.
  
Tetsuya Mukawa Dec. 24, 2015, 7:58 a.m. UTC | #24
On 2015/12/24 14:37, Rich Lane wrote:
> On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
>> yuanhan.liu@linux.intel.com>
>>> wrote:
>>>
>>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>>> in a few
>>>>> ways:
>>>> Rich, thanks for the info!
>>>>
>>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>>> link
>>>>> status interrupt).
>>>>> 2. new_device: Add first queue to datapath.
>>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>>> triggered at the beginning, when the default queue (the first queue) is
>>>> enabled.
>>>>
>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
>>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>>> when
>>>>> qemu is killed).
>>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>>> when destroy_device() is called.
>>>>
>>> That would be good.
>>>
>>>
>>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>>
>>>> 1. Link status interrupt.
>>>>
>>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>>> down().
>>>>
>>>>
>>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>>> should
>>>>> cover the first queue at new_device and removal of all queues at
>>>>> destroy_device.
>>>> As stated above, vring_state_changed() should be able to do that, except
>>>> the one on destroy_device(), which is not done yet.
>>>>
>>>>> 3. Per-queue or per-device NUMA node info.
>>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>>
>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to
>> be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg.
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
>>
> You're right, I'd mistakenly thought that the PMD controlled the void *
> passed to the callback.
>
> Here's a thought:
>
>     struct rte_eth_vhost_queue_event {
>         uint16_t queue_id;
>         bool rx;
>         bool enable;
>     };
>
>     int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
> rte_eth_vhost_queue_event *event);
>
> On receiving the ethdev event the application could repeatedly call
> rte_eth_vhost_get_queue_event
> to find out what happened.

Hi Rich and Yuanhan,

I guess we have 2 implementations here.

1. rte_eth_vhost_get_queue_event() returns each event.
2. rte_eth_vhost_get_queue_status() returns current status of the queues.

I guess option "2" is more generic manner to handle interrupts from
device driver.
In the case of option "1", if DPDK application doesn't call
rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
This may exhaust memory.

One more example is current link status interrupt handling.
Actually ethdev API just returns current status of the port.
What do you think?

>
> An issue with having the application dig into struct virtio_net is that it
> can only be safely accessed from
> a callback on the vhost thread.

Here is one of example how to invoke a callback handler registered by
DPDK application from the PMD.

  _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);

Above function is called by interrupt handling thread of the PMDs.

Please check implementation of above function.
A callback handler that DPDK application registers is called in
"interrupt handling context".
(I mean the interrupt handling thread of the PMD calls the callback
handler of DPDK application also.)
Anyway, I guess the callback handler of DPDK application can access to
"struct virtio_net" safely.

> A typical application running its control
> plane on lcore 0 would need to
> copy all the relevant info from struct virtio_net before sending it over.

Could you please describe it more?
Sorry, probably I don't understand correctly which restriction make you
copy data.
(As described above, the callback handler registered by DPDK application
can safely access "to struct virtio_net". Does this solve the copy issue?)

> As you mentioned, queues for a single vhost port could be located on
> different NUMA nodes. I think this
> is an uncommon scenario but if needed you could add an API to retrieve the
> NUMA node for a given port
> and queue.
>

I agree this is very specific for vhost, because in the case of generic
PCI device, all queues of a port are on same NUMA node.
Anyway, because it's very specific for vhost, I am not sure we should
add ethdev API to handle this.

If we handle it by vhost PMD API, we probably have 2 options also here.

1. Extend "struct rte_eth_vhost_queue_event , and use
rte_eth_vhost_get_queue_event() like you described.
struct rte_eth_vhost_queue_event
{
        uint16_t queue_id;
        bool rx;
        bool enable;
+      int socket_id;
};

2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
queues.

Tetsuya
  
Rich Lane Dec. 28, 2015, 9:59 p.m. UTC | #25
On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
> Hi Rich and Yuanhan,
>
> I guess we have 2 implementations here.
>
> 1. rte_eth_vhost_get_queue_event() returns each event.
> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
>
> I guess option "2" is more generic manner to handle interrupts from
> device driver.
> In the case of option "1", if DPDK application doesn't call
> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
> This may exhaust memory.
>

Option 1 can be implemented in constant space by only tracking the latest
state of each
queue. I pushed a rough implementation to https://github.com/rlane/dpdk
vhost-queue-callback.

One more example is current link status interrupt handling.
> Actually ethdev API just returns current status of the port.
> What do you think?
>

Option 2 adds a small burden to the application but I'm fine with this as
long as it's
thread-safe (see my comments below).


> > An issue with having the application dig into struct virtio_net is that
> it
> > can only be safely accessed from
> > a callback on the vhost thread.
>
> Here is one of example how to invoke a callback handler registered by
> DPDK application from the PMD.
>
>   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);
>
> Above function is called by interrupt handling thread of the PMDs.
>
> Please check implementation of above function.
> A callback handler that DPDK application registers is called in
> "interrupt handling context".
> (I mean the interrupt handling thread of the PMD calls the callback
> handler of DPDK application also.)
> Anyway, I guess the callback handler of DPDK application can access to
> "struct virtio_net" safely.


> > A typical application running its control
> > plane on lcore 0 would need to
> > copy all the relevant info from struct virtio_net before sending it over.
>
> Could you please describe it more?
> Sorry, probably I don't understand correctly which restriction make you
> copy data.
> (As described above, the callback handler registered by DPDK application
> can safely access "to struct virtio_net". Does this solve the copy issue?)
>

The ethdev event callback can safely access struct virtio_net, yes. The
problem is
that a real application will likely want to handle queue state changes as
part
of its main event loop running on a separate thread. Because no other thread
can safely access struct virtio_net. the callback would need to copy the
queue
states out of struct virtio_net into another datastructure before sending
it to
the main thread.

Instead of having the callback muck around with struct virtio_net, I would
prefer
an API that I could call from any thread to get the current queue states.
This
also removes struct virtio_net from the PMD's API which I think is a win.


> > As you mentioned, queues for a single vhost port could be located on
> > different NUMA nodes. I think this
> > is an uncommon scenario but if needed you could add an API to retrieve
> the
> > NUMA node for a given port
> > and queue.
> >
>
> I agree this is very specific for vhost, because in the case of generic
> PCI device, all queues of a port are on same NUMA node.
> Anyway, because it's very specific for vhost, I am not sure we should
> add ethdev API to handle this.
>
> If we handle it by vhost PMD API, we probably have 2 options also here.
>
> 1. Extend "struct rte_eth_vhost_queue_event , and use
> rte_eth_vhost_get_queue_event() like you described.
> struct rte_eth_vhost_queue_event
> {
>         uint16_t queue_id;
>         bool rx;
>         bool enable;
> +      int socket_id;
> };
>
> 2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
> queues.
>

Up to you, but I think we can skip this for the time being because it would
be unusual
for a guest to place virtqueues for one PCI device on different NUMA nodes.
  
Tetsuya Mukawa Jan. 6, 2016, 3:56 a.m. UTC | #26
On 2015/12/29 6:59, Rich Lane wrote:
> On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>> Hi Rich and Yuanhan,
>>
>> I guess we have 2 implementations here.
>>
>> 1. rte_eth_vhost_get_queue_event() returns each event.
>> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
>>
>> I guess option "2" is more generic manner to handle interrupts from
>> device driver.
>> In the case of option "1", if DPDK application doesn't call
>> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
>> This may exhaust memory.
>>
> Option 1 can be implemented in constant space by only tracking the latest
> state of each
> queue. I pushed a rough implementation to https://github.com/rlane/dpdk
> vhost-queue-callback.
>
> One more example is current link status interrupt handling.

Hi Rich,

I appreciate your implementation.
I can understand what's your idea, and agree with it.


Hi Yuanhan,

What do you think his implementation?

Thanks.
Tetsuya
  
Yuanhan Liu Jan. 6, 2016, 7:38 a.m. UTC | #27
On Wed, Jan 06, 2016 at 12:56:58PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/29 6:59, Rich Lane wrote:
> > On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
> >> Hi Rich and Yuanhan,
> >>
> >> I guess we have 2 implementations here.
> >>
> >> 1. rte_eth_vhost_get_queue_event() returns each event.
> >> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
> >>
> >> I guess option "2" is more generic manner to handle interrupts from
> >> device driver.
> >> In the case of option "1", if DPDK application doesn't call
> >> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
> >> This may exhaust memory.
> >>
> > Option 1 can be implemented in constant space by only tracking the latest
> > state of each
> > queue. I pushed a rough implementation to https://github.com/rlane/dpdk
> > vhost-queue-callback.
> >
> > One more example is current link status interrupt handling.
> 
> Hi Rich,
> 
> I appreciate your implementation.
> I can understand what's your idea, and agree with it.
> 
> 
> Hi Yuanhan,
> 
> What do you think his implementation?

With a quick glimpse, it looks good to me.

	--yliu
  
Tetsuya Mukawa Feb. 2, 2016, 11:18 a.m. UTC | #28
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)



Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 920 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  73 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   8 +-
 10 files changed, 1088 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
  
Rich Lane Feb. 2, 2016, 7:52 p.m. UTC | #29
Looks good. I tested the queue state change code in particular and didn't
find any problems.

Reviewed-by: Rich Lane <rlane@bigswitch.com>
Tested-by: Rich Lane <rlane@bigswitch.com>
  
Tetsuya Mukawa Feb. 4, 2016, 7:26 a.m. UTC | #30
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)



Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   3 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 898 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 10 files changed, 1102 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
  

Patch

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@  DPDK_2.1 {
 	rte_vhost_driver_unregister;
 
 } DPDK_2.0;
+
+DPDK_2.2 {
+	global:
+
+	rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@  struct virtio_net {
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
@@ -224,6 +225,8 @@  int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@  user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev->mem) {
 		free_mem_region(dev);
@@ -272,7 +272,7 @@  user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -288,7 +288,7 @@  user_get_vring_base(struct vhost_device_ctx ctx,
 		return -1;
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@  user_set_vring_enable(struct vhost_device_ctx ctx,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed) {
-		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
-						enable);
-	}
+	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);
 
 	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
 	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@  user_destroy_device(struct vhost_device_ctx ctx)
 	struct virtio_net *dev = get_device(ctx);
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev && dev->mem) {
 		free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 8364938..dc977b7 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@  struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -81,6 +83,45 @@  static struct virtio_net_config_ll *ll_root;
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+		int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+		return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+	return 0;
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -393,7 +434,7 @@  destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -451,7 +492,7 @@  reset_owner(struct vhost_device_ctx ctx)
 		return -1;
 
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	cleanup_device(dev, 0);
 	reset_device(dev);
@@ -809,12 +850,12 @@  set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
 		if (file->fd == VIRTIO_DEV_STOPPED)
-			notify_ops->destroy_device(dev);
+			notify_destroy_device(dev);
 	return 0;
 }
 
@@ -898,3 +939,14 @@  rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@ 
 #include "vhost-net.h"
 #include "rte_virtio_net.h"
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
 #endif