[3/4] vhost: avoid deadlock on async register

Message ID 1615985773-406787-4-git-send-email-jiayu.hu@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Maxime Coquelin
Headers
Series Refactor async vhost control path |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Hu, Jiayu March 17, 2021, 12:56 p.m. UTC
  Users register async copy device when vhost queue is enabled.
However, if VHOST_USER_F_PROTOCOL_FEATURES is not supported,
a deadlock occurs inside rte_vhost_async_channel_register(),
as vhost_user_msg_handler() already takes vq->access_lock
before processing VHOST_USER_SET_VRING_KICK message.

This patch removes calling vring_state_changed() in
vhost_user_set_vring_kick() to avoid deadlock on async register.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 lib/librte_vhost/vhost_user.c | 3 ---
 1 file changed, 3 deletions(-)
  

Comments

Maxime Coquelin March 29, 2021, 3:19 p.m. UTC | #1
On 3/17/21 1:56 PM, Jiayu Hu wrote:
> Users register async copy device when vhost queue is enabled.
> However, if VHOST_USER_F_PROTOCOL_FEATURES is not supported,
> a deadlock occurs inside rte_vhost_async_channel_register(),
> as vhost_user_msg_handler() already takes vq->access_lock
> before processing VHOST_USER_SET_VRING_KICK message.
> 
> This patch removes calling vring_state_changed() in
> vhost_user_set_vring_kick() to avoid deadlock on async register.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>  lib/librte_vhost/vhost_user.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 399675c..a319c1c 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1919,9 +1919,6 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
>  	 */
>  	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
>  		vq->enabled = 1;
> -		if (dev->notify_ops->vring_state_changed)
> -			dev->notify_ops->vring_state_changed(
> -				dev->vid, file.index, 1);

That looks very wrong, as:
1. The apps want to receive this notification. It looks like breaking
existing apps in order to support the experimental async datapath. E.g.
OVS needs it to start polling the queues when protocol features is not
negotiated.

2. The fix in your case seems to indicate that your app's
vring_state_changed callback called rte_vhost_async_channel_register.
And your fix consists in no more calling the callback, and so no more
calling rte_vhost_async_channel_register?

>  	}
>  
>  	if (vq->ready) {
>
  
Hu, Jiayu March 30, 2021, 1:20 a.m. UTC | #2
Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, March 29, 2021 11:19 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> Cc: Xia, Chenbo <chenbo.xia@intel.com>; Wang, Yinan
> <yinan.wang@intel.com>; Jiang, Cheng1 <cheng1.jiang@intel.com>; Pai G,
> Sunil <sunil.pai.g@intel.com>
> Subject: Re: [PATCH 3/4] vhost: avoid deadlock on async register
> 
> 
> 
> On 3/17/21 1:56 PM, Jiayu Hu wrote:
> > Users register async copy device when vhost queue is enabled.
> > However, if VHOST_USER_F_PROTOCOL_FEATURES is not supported,
> > a deadlock occurs inside rte_vhost_async_channel_register(),
> > as vhost_user_msg_handler() already takes vq->access_lock
> > before processing VHOST_USER_SET_VRING_KICK message.
> >
> > This patch removes calling vring_state_changed() in
> > vhost_user_set_vring_kick() to avoid deadlock on async register.
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >  lib/librte_vhost/vhost_user.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 399675c..a319c1c 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -1919,9 +1919,6 @@ vhost_user_set_vring_kick(struct virtio_net
> **pdev, struct VhostUserMsg *msg,
> >  	 */
> >  	if (!(dev->features & (1ULL <<
> VHOST_USER_F_PROTOCOL_FEATURES))) {
> >  		vq->enabled = 1;
> > -		if (dev->notify_ops->vring_state_changed)
> > -			dev->notify_ops->vring_state_changed(
> > -				dev->vid, file.index, 1);
> 
> That looks very wrong, as:
> 1. The apps want to receive this notification. It looks like breaking
> existing apps in order to support the experimental async datapath. E.g.
> OVS needs it to start polling the queues when protocol features is not
> negotiated.

IMHO, if protocol feature is not negotiated, vring_state_chaned will also
be called in vhost_user_msg_handler. In the case you mentioned,
vq->enabled is set to true in set_vring_kick, and in vhost_user_msg_handler,
"cur_ready != (vq && vq->ready)" is true, as vq->ready is false when init. So
vhost_user_msg_handler will call vhost_user_notify_queue_state, which
calls set_vring_kick inside.

In addition, calling vring_state_changed in set_vring_kick is protected by lock,
but it's not in in vhost_user_msg_handler. It looks confusing to me. Is there
any special reason for this design?

> 
> 2. The fix in your case seems to indicate that your app's
> vring_state_changed callback called rte_vhost_async_channel_register.
> And your fix consists in no more calling the callback, and so no more
> calling rte_vhost_async_channel_register?

rte_vhost_async_channel_register is recommended to call in
vring_state_changed, and vring_state_changed will be called
by vhost_user_msg_handler.

Thanks,
Jiayu
> 
> >  	}
> >
> >  	if (vq->ready) {
> >
  
Maxime Coquelin April 13, 2021, 9:37 a.m. UTC | #3
On 3/30/21 3:20 AM, Hu, Jiayu wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Monday, March 29, 2021 11:19 PM
>> To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
>> Cc: Xia, Chenbo <chenbo.xia@intel.com>; Wang, Yinan
>> <yinan.wang@intel.com>; Jiang, Cheng1 <cheng1.jiang@intel.com>; Pai G,
>> Sunil <sunil.pai.g@intel.com>
>> Subject: Re: [PATCH 3/4] vhost: avoid deadlock on async register
>>
>>
>>
>> On 3/17/21 1:56 PM, Jiayu Hu wrote:
>>> Users register async copy device when vhost queue is enabled.
>>> However, if VHOST_USER_F_PROTOCOL_FEATURES is not supported,
>>> a deadlock occurs inside rte_vhost_async_channel_register(),
>>> as vhost_user_msg_handler() already takes vq->access_lock
>>> before processing VHOST_USER_SET_VRING_KICK message.
>>>
>>> This patch removes calling vring_state_changed() in
>>> vhost_user_set_vring_kick() to avoid deadlock on async register.
>>>
>>> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
>>> ---
>>>  lib/librte_vhost/vhost_user.c | 3 ---
>>>  1 file changed, 3 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>> index 399675c..a319c1c 100644
>>> --- a/lib/librte_vhost/vhost_user.c
>>> +++ b/lib/librte_vhost/vhost_user.c
>>> @@ -1919,9 +1919,6 @@ vhost_user_set_vring_kick(struct virtio_net
>> **pdev, struct VhostUserMsg *msg,
>>>  	 */
>>>  	if (!(dev->features & (1ULL <<
>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>>>  		vq->enabled = 1;
>>> -		if (dev->notify_ops->vring_state_changed)
>>> -			dev->notify_ops->vring_state_changed(
>>> -				dev->vid, file.index, 1);
>>
>> That looks very wrong, as:
>> 1. The apps want to receive this notification. It looks like breaking
>> existing apps in order to support the experimental async datapath. E.g.
>> OVS needs it to start polling the queues when protocol features is not
>> negotiated.
> 
> IMHO, if protocol feature is not negotiated, vring_state_chaned will also
> be called in vhost_user_msg_handler. In the case you mentioned,
> vq->enabled is set to true in set_vring_kick, and in vhost_user_msg_handler,
> "cur_ready != (vq && vq->ready)" is true, as vq->ready is false when init. So
> vhost_user_msg_handler will call vhost_user_notify_queue_state, which
> calls set_vring_kick inside.

OK, I agree, we can drop this one.
But it is not enough as vhost_user_notify_queue_state() is called at
several place with the lock taken.

> In addition, calling vring_state_changed in set_vring_kick is protected by lock,
> but it's not in in vhost_user_msg_handler. It looks confusing to me. Is there
> any special reason for this design?

I think we need the lock help every time the callback is called, to
avoid the case an application calls a Vhost API that would modify the vq
struct. We could get undefined behavior if it happened.

> 
>>
>> 2. The fix in your case seems to indicate that your app's
>> vring_state_changed callback called rte_vhost_async_channel_register.
>> And your fix consists in no more calling the callback, and so no more
>> calling rte_vhost_async_channel_register?
> 
> rte_vhost_async_channel_register is recommended to call in
> vring_state_changed, and vring_state_changed will be called
> by vhost_user_msg_handler.

You might want to schedule a thread to call channel registration. Maybe
using rte_set_alarm?

Regards,
Maxime

> 
> Thanks,
> Jiayu
>>
>>>  	}
>>>
>>>  	if (vq->ready) {
>>>
>
  

Patch

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 399675c..a319c1c 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1919,9 +1919,6 @@  vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	 */
 	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
 		vq->enabled = 1;
-		if (dev->notify_ops->vring_state_changed)
-			dev->notify_ops->vring_state_changed(
-				dev->vid, file.index, 1);
 	}
 
 	if (vq->ready) {