vhost: fix deadlock on port deletion

Message ID 20200114185357.25819-1-maxime.coquelin@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Maxime Coquelin
Headers
Series vhost: fix deadlock on port deletion |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-nxp-Performance success Performance Testing PASS
ci/travis-robot warning Travis build: failed

Commit Message

Maxime Coquelin Jan. 14, 2020, 6:53 p.m. UTC
  If the vhost-user application (e.g. OVS) deletes the vhost-user
port while Qemu sends a vhost-user request, a deadlock can
happen if the request handler tries to acquire vhost-user's
global mutex, which is also locked by the vhost-user port
deletion API (rte_vhost_driver_unregister).

This patch prevents the deadlock by making
rte_vhost_driver_unregister() to release the mutex and try
again if a request is being handled to give a chance to
the request handler to complete.

Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
Fixes: 5fbb3941da9f ("vhost: introduce driver features related APIs")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/socket.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)
  

Comments

Tiwei Bie Jan. 15, 2020, 4:51 a.m. UTC | #1
On Tue, Jan 14, 2020 at 07:53:57PM +0100, Maxime Coquelin wrote:
> If the vhost-user application (e.g. OVS) deletes the vhost-user
> port while Qemu sends a vhost-user request, a deadlock can
> happen if the request handler tries to acquire vhost-user's
> global mutex, which is also locked by the vhost-user port
> deletion API (rte_vhost_driver_unregister).
> 
> This patch prevents the deadlock by making
> rte_vhost_driver_unregister() to release the mutex and try
> again if a request is being handled to give a chance to
> the request handler to complete.
> 
> Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
> Fixes: 5fbb3941da9f ("vhost: introduce driver features related APIs")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/socket.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
  
Eelco Chaudron Jan. 15, 2020, 8:25 a.m. UTC | #2
On 14 Jan 2020, at 19:53, Maxime Coquelin wrote:

> If the vhost-user application (e.g. OVS) deletes the vhost-user
> port while Qemu sends a vhost-user request, a deadlock can
> happen if the request handler tries to acquire vhost-user's
> global mutex, which is also locked by the vhost-user port
> deletion API (rte_vhost_driver_unregister).
>
> This patch prevents the deadlock by making
> rte_vhost_driver_unregister() to release the mutex and try
> again if a request is being handled to give a chance to
> the request handler to complete.
>
> Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
> Fixes: 5fbb3941da9f ("vhost: introduce driver features related APIs")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Acked-by: Eelco Chaudron <echaudro@redhat.com>
  
Maxime Coquelin Jan. 15, 2020, 11:17 a.m. UTC | #3
On 1/14/20 7:53 PM, Maxime Coquelin wrote:
> If the vhost-user application (e.g. OVS) deletes the vhost-user
> port while Qemu sends a vhost-user request, a deadlock can
> happen if the request handler tries to acquire vhost-user's
> global mutex, which is also locked by the vhost-user port
> deletion API (rte_vhost_driver_unregister).
> 
> This patch prevents the deadlock by making
> rte_vhost_driver_unregister() to release the mutex and try
> again if a request is being handled to give a chance to
> the request handler to complete.
> 
> Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
> Fixes: 5fbb3941da9f ("vhost: introduce driver features related APIs")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/socket.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)


Applied to dpdk-next-virtio/master

Thanks to the reviewers,
Maxime
  

Patch

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 633c2cbc27..c57a0c7cdd 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -1052,9 +1052,10 @@  rte_vhost_driver_unregister(const char *path)
 				next = TAILQ_NEXT(conn, next);
 
 				/*
-				 * If r/wcb is executing, release the
-				 * conn_mutex lock, and try again since
-				 * the r/wcb may use the conn_mutex lock.
+				 * If r/wcb is executing, release vsocket's
+				 * conn_mutex and vhost_user's mutex locks, and
+				 * try again since the r/wcb may use the
+				 * conn_mutex and mutex locks.
 				 */
 				if (fdset_try_del(&vhost_user.fdset,
 						  conn->connfd) == -1) {
@@ -1075,8 +1076,17 @@  rte_vhost_driver_unregister(const char *path)
 			pthread_mutex_unlock(&vsocket->conn_mutex);
 
 			if (vsocket->is_server) {
-				fdset_del(&vhost_user.fdset,
-						vsocket->socket_fd);
+				/*
+				 * If r/wcb is executing, release vhost_user's
+				 * mutex lock, and try again since the r/wcb
+				 * may use the mutex lock.
+				 */
+				if (fdset_try_del(&vhost_user.fdset,
+						vsocket->socket_fd) == -1) {
+					pthread_mutex_unlock(&vhost_user.mutex);
+					goto again;
+				}
+
 				close(vsocket->socket_fd);
 				unlink(path);
 			} else if (vsocket->reconnect) {