From patchwork Wed Jun 19 15:14:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54955 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BCCAE1C38B; Wed, 19 Jun 2019 17:15:39 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 270F82BEA for ; Wed, 19 Jun 2019 17:15:36 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id C0AFF182005; Wed, 19 Jun 2019 18:15:35 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 36DAE32C; Wed, 19 Jun 2019 18:15:35 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:26 +0300 Message-Id: <1560957293-17294-2-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This is the first of a series of patches, whose purpose is to add support for the virtio-vhost-user transport. This is a vhost-user transport implementation that is different from the default AF_UNIX transport. It uses the virtio-vhost-user PCI device in order to tunnel vhost-user protocol messages over virtio. This lets guests act as vhost device backends for other guests. File descriptor passing is specific to the AF_UNIX vhost-user protocol transport. In order to add support for additional transports, it is necessary to extract transport-specific code from the main vhost-user code. This patch introduces struct vhost_transport_ops and associates each device with a transport. Core vhost-user code calls into vhost_transport_ops to perform transport-specific operations. Notifying callfd is a transport-specific operation, so it belongs to trans_af_unix.c. Several more patches follow this one to complete the task of moving AF_UNIX transport code out of core vhost-user code. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/Makefile | 2 +- lib/librte_vhost/trans_af_unix.c | 20 ++++++++++++++++++++ lib/librte_vhost/vhost.c | 1 + lib/librte_vhost/vhost.h | 34 +++++++++++++++++++++++++++++----- 4 files changed, 51 insertions(+), 6 deletions(-) create mode 100644 lib/librte_vhost/trans_af_unix.c diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 8623e91..5ff5fb2 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -23,7 +23,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net # all source are stored in SRCS-y SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \ - vhost_user.c virtio_net.c vdpa.c + vhost_user.c virtio_net.c vdpa.c trans_af_unix.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c new file mode 100644 index 0000000..3f0c308 --- /dev/null +++ b/lib/librte_vhost/trans_af_unix.c @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2018 Intel Corporation + * Copyright(c) 2017 Red Hat, Inc. + * Copyright(c) 2019 Arrikto Inc. + */ + +#include "vhost.h" + +static int +af_unix_vring_call(struct virtio_net *dev __rte_unused, + struct vhost_virtqueue *vq) +{ + if (vq->callfd >= 0) + eventfd_write(vq->callfd, (eventfd_t)1); + return 0; +} + +const struct vhost_transport_ops af_unix_trans_ops = { + .vring_call = af_unix_vring_call, +}; diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 981837b..a36bc01 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -507,6 +507,7 @@ vhost_new_device(void) dev->vid = i; dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET; dev->slave_req_fd = -1; + dev->trans_ops = &af_unix_trans_ops; dev->vdpa_dev_id = -1; dev->postcopy_ufd = -1; rte_spinlock_init(&dev->slave_req_lock); diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 884befa..077f213 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -286,6 +286,30 @@ struct guest_page { uint64_t size; }; +struct virtio_net; + +/** + * A structure containing function pointers for transport-specific operations. + */ +struct vhost_transport_ops { + /** + * Notify the guest that used descriptors have been added to the vring. + * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked + * so this function just needs to perform the notification. + * + * @param dev + * vhost device + * @param vq + * vhost virtqueue + * @return + * 0 on success, -1 on failure + */ + int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq); +}; + +/** The traditional AF_UNIX vhost-user protocol transport. */ +extern const struct vhost_transport_ops af_unix_trans_ops; + /** * Device structure contains all configuration information relating * to the device. @@ -312,6 +336,7 @@ struct virtio_net { uint16_t mtu; struct vhost_device_ops const *notify_ops; + struct vhost_transport_ops const *trans_ops; uint32_t nr_guest_pages; uint32_t max_guest_pages; @@ -544,12 +569,11 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq) if ((vhost_need_event(vhost_used_event(vq), new, old) && (vq->callfd >= 0)) || unlikely(!signalled_used_valid)) - eventfd_write(vq->callfd, (eventfd_t) 1); + dev->trans_ops->vring_call(dev, vq); } else { /* Kick the guest if necessary. */ - if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) - && (vq->callfd >= 0)) - eventfd_write(vq->callfd, (eventfd_t)1); + if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) + dev->trans_ops->vring_call(dev, vq); } } @@ -601,7 +625,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) kick = true; kick: if (kick) - eventfd_write(vq->callfd, (eventfd_t)1); + dev->trans_ops->vring_call(dev, vq); } static __rte_always_inline void From patchwork Wed Jun 19 15:14:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54956 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 33CF51C394; Wed, 19 Jun 2019 17:15:42 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 86CA21C386 for ; Wed, 19 Jun 2019 17:15:37 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 12940182006; Wed, 19 Jun 2019 18:15:37 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 8CB73394; Wed, 19 Jun 2019 18:15:35 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:27 +0300 Message-Id: <1560957293-17294-3-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 02/28] vhost: move socket management code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The socket.c file serves two purposes: 1. librte_vhost public API entry points, e.g. rte_vhost_driver_register(). 2. AF_UNIX socket management. Move AF_UNIX socket code into trans_af_unix.c so that socket.c only handles the librte_vhost public API entry points. This will make it possible to support other transports besides AF_UNIX. This patch is a preparatory step that simply moves code from socket.c to trans_af_unix.c unmodified, besides dropping 'static' qualifiers where necessary because socket.c now calls into trans_af_unix.c. A lot of socket.c state is exposed in vhost.h but this is a temporary measure and will be cleaned up in later patches. By simply moving code unmodified in this patch it will be easier to review the actual refactoring that follows. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 549 +-------------------------------------- lib/librte_vhost/trans_af_unix.c | 485 ++++++++++++++++++++++++++++++++++ lib/librte_vhost/vhost.h | 76 ++++++ 3 files changed, 562 insertions(+), 548 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 274988c..a993b67 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -4,16 +4,10 @@ #include #include -#include #include #include #include -#include -#include -#include #include -#include -#include #include #include @@ -22,71 +16,7 @@ #include "vhost.h" #include "vhost_user.h" - -TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); - -/* - * Every time rte_vhost_driver_register() is invoked, an associated - * vhost_user_socket struct will be created. - */ -struct vhost_user_socket { - struct vhost_user_connection_list conn_list; - pthread_mutex_t conn_mutex; - char *path; - int socket_fd; - struct sockaddr_un un; - bool is_server; - bool reconnect; - bool dequeue_zero_copy; - bool iommu_support; - bool use_builtin_virtio_net; - - /* - * The "supported_features" indicates the feature bits the - * vhost driver supports. The "features" indicates the feature - * bits after the rte_vhost_driver_features_disable/enable(). - * It is also the final feature bits used for vhost-user - * features negotiation. - */ - uint64_t supported_features; - uint64_t features; - - uint64_t protocol_features; - - /* - * Device id to identify a specific backend device. - * It's set to -1 for the default software implementation. - * If valid, one socket can have 1 connection only. - */ - int vdpa_dev_id; - - struct vhost_device_ops const *notify_ops; -}; - -struct vhost_user_connection { - struct vhost_user_socket *vsocket; - int connfd; - int vid; - - TAILQ_ENTRY(vhost_user_connection) next; -}; - -#define MAX_VHOST_SOCKET 1024 -struct vhost_user { - struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET]; - struct fdset fdset; - int vsocket_cnt; - pthread_mutex_t mutex; -}; - -#define MAX_VIRTIO_BACKLOG 128 - -static void vhost_user_server_new_connection(int fd, void *data, int *remove); -static void vhost_user_read_cb(int fd, void *dat, int *remove); -static int create_unix_socket(struct vhost_user_socket *vsocket); -static int vhost_user_start_client(struct vhost_user_socket *vsocket); - -static struct vhost_user vhost_user = { +struct vhost_user vhost_user = { .fdset = { .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} }, .fd_mutex = PTHREAD_MUTEX_INITIALIZER, @@ -97,459 +27,6 @@ static struct vhost_user vhost_user = { .mutex = PTHREAD_MUTEX_INITIALIZER, }; -/* - * return bytes# of read on success or negative val on failure. Update fdnum - * with number of fds read. - */ -int -read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, - int *fd_num) -{ - struct iovec iov; - struct msghdr msgh; - char control[CMSG_SPACE(max_fds * sizeof(int))]; - struct cmsghdr *cmsg; - int got_fds = 0; - int ret; - - *fd_num = 0; - - memset(&msgh, 0, sizeof(msgh)); - iov.iov_base = buf; - iov.iov_len = buflen; - - msgh.msg_iov = &iov; - msgh.msg_iovlen = 1; - msgh.msg_control = control; - msgh.msg_controllen = sizeof(control); - - ret = recvmsg(sockfd, &msgh, 0); - if (ret <= 0) { - RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n"); - return ret; - } - - if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) { - RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n"); - return -1; - } - - for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL; - cmsg = CMSG_NXTHDR(&msgh, cmsg)) { - if ((cmsg->cmsg_level == SOL_SOCKET) && - (cmsg->cmsg_type == SCM_RIGHTS)) { - got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int); - *fd_num = got_fds; - memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int)); - break; - } - } - - /* Clear out unused file descriptors */ - while (got_fds < max_fds) - fds[got_fds++] = -1; - - return ret; -} - -int -send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) -{ - - struct iovec iov; - struct msghdr msgh; - size_t fdsize = fd_num * sizeof(int); - char control[CMSG_SPACE(fdsize)]; - struct cmsghdr *cmsg; - int ret; - - memset(&msgh, 0, sizeof(msgh)); - iov.iov_base = buf; - iov.iov_len = buflen; - - msgh.msg_iov = &iov; - msgh.msg_iovlen = 1; - - if (fds && fd_num > 0) { - msgh.msg_control = control; - msgh.msg_controllen = sizeof(control); - cmsg = CMSG_FIRSTHDR(&msgh); - if (cmsg == NULL) { - RTE_LOG(ERR, VHOST_CONFIG, "cmsg == NULL\n"); - errno = EINVAL; - return -1; - } - cmsg->cmsg_len = CMSG_LEN(fdsize); - cmsg->cmsg_level = SOL_SOCKET; - cmsg->cmsg_type = SCM_RIGHTS; - memcpy(CMSG_DATA(cmsg), fds, fdsize); - } else { - msgh.msg_control = NULL; - msgh.msg_controllen = 0; - } - - do { - ret = sendmsg(sockfd, &msgh, MSG_NOSIGNAL); - } while (ret < 0 && errno == EINTR); - - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, "sendmsg error\n"); - return ret; - } - - return ret; -} - -static void -vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) -{ - int vid; - size_t size; - struct vhost_user_connection *conn; - int ret; - - if (vsocket == NULL) - return; - - conn = malloc(sizeof(*conn)); - if (conn == NULL) { - close(fd); - return; - } - - vid = vhost_new_device(); - if (vid == -1) { - goto err; - } - - size = strnlen(vsocket->path, PATH_MAX); - vhost_set_ifname(vid, vsocket->path, size); - - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); - - vhost_attach_vdpa_device(vid, vsocket->vdpa_dev_id); - - if (vsocket->dequeue_zero_copy) - vhost_enable_dequeue_zero_copy(vid); - - RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); - - if (vsocket->notify_ops->new_connection) { - ret = vsocket->notify_ops->new_connection(vid); - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to add vhost user connection with fd %d\n", - fd); - goto err_cleanup; - } - } - - conn->connfd = fd; - conn->vsocket = vsocket; - conn->vid = vid; - ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb, - NULL, conn); - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to add fd %d into vhost server fdset\n", - fd); - - if (vsocket->notify_ops->destroy_connection) - vsocket->notify_ops->destroy_connection(conn->vid); - - goto err_cleanup; - } - - pthread_mutex_lock(&vsocket->conn_mutex); - TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next); - pthread_mutex_unlock(&vsocket->conn_mutex); - - fdset_pipe_notify(&vhost_user.fdset); - return; - -err_cleanup: - vhost_destroy_device(vid); -err: - free(conn); - close(fd); -} - -/* call back when there is new vhost-user connection from client */ -static void -vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused) -{ - struct vhost_user_socket *vsocket = dat; - - fd = accept(fd, NULL, NULL); - if (fd < 0) - return; - - RTE_LOG(INFO, VHOST_CONFIG, "new vhost user connection is %d\n", fd); - vhost_user_add_connection(fd, vsocket); -} - -static void -vhost_user_read_cb(int connfd, void *dat, int *remove) -{ - struct vhost_user_connection *conn = dat; - struct vhost_user_socket *vsocket = conn->vsocket; - int ret; - - ret = vhost_user_msg_handler(conn->vid, connfd); - if (ret < 0) { - struct virtio_net *dev = get_device(conn->vid); - - close(connfd); - *remove = 1; - - if (dev) - vhost_destroy_device_notify(dev); - - if (vsocket->notify_ops->destroy_connection) - vsocket->notify_ops->destroy_connection(conn->vid); - - vhost_destroy_device(conn->vid); - - pthread_mutex_lock(&vsocket->conn_mutex); - TAILQ_REMOVE(&vsocket->conn_list, conn, next); - pthread_mutex_unlock(&vsocket->conn_mutex); - - free(conn); - - if (vsocket->reconnect) { - create_unix_socket(vsocket); - vhost_user_start_client(vsocket); - } - } -} - -static int -create_unix_socket(struct vhost_user_socket *vsocket) -{ - int fd; - struct sockaddr_un *un = &vsocket->un; - - fd = socket(AF_UNIX, SOCK_STREAM, 0); - if (fd < 0) - return -1; - RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n", - vsocket->is_server ? "server" : "client", fd); - - if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) { - RTE_LOG(ERR, VHOST_CONFIG, - "vhost-user: can't set nonblocking mode for socket, fd: " - "%d (%s)\n", fd, strerror(errno)); - close(fd); - return -1; - } - - memset(un, 0, sizeof(*un)); - un->sun_family = AF_UNIX; - strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path)); - un->sun_path[sizeof(un->sun_path) - 1] = '\0'; - - vsocket->socket_fd = fd; - return 0; -} - -static int -vhost_user_start_server(struct vhost_user_socket *vsocket) -{ - int ret; - int fd = vsocket->socket_fd; - const char *path = vsocket->path; - - /* - * bind () may fail if the socket file with the same name already - * exists. But the library obviously should not delete the file - * provided by the user, since we can not be sure that it is not - * being used by other applications. Moreover, many applications form - * socket names based on user input, which is prone to errors. - * - * The user must ensure that the socket does not exist before - * registering the vhost driver in server mode. - */ - ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un)); - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to bind to %s: %s; remove it and try again\n", - path, strerror(errno)); - goto err; - } - RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path); - - ret = listen(fd, MAX_VIRTIO_BACKLOG); - if (ret < 0) - goto err; - - ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection, - NULL, vsocket); - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to add listen fd %d to vhost server fdset\n", - fd); - goto err; - } - - return 0; - -err: - close(fd); - return -1; -} - -struct vhost_user_reconnect { - struct sockaddr_un un; - int fd; - struct vhost_user_socket *vsocket; - - TAILQ_ENTRY(vhost_user_reconnect) next; -}; - -TAILQ_HEAD(vhost_user_reconnect_tailq_list, vhost_user_reconnect); -struct vhost_user_reconnect_list { - struct vhost_user_reconnect_tailq_list head; - pthread_mutex_t mutex; -}; - -static struct vhost_user_reconnect_list reconn_list; -static pthread_t reconn_tid; - -static int -vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) -{ - int ret, flags; - - ret = connect(fd, un, sz); - if (ret < 0 && errno != EISCONN) - return -1; - - flags = fcntl(fd, F_GETFL, 0); - if (flags < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "can't get flags for connfd %d\n", fd); - return -2; - } - if ((flags & O_NONBLOCK) && fcntl(fd, F_SETFL, flags & ~O_NONBLOCK)) { - RTE_LOG(ERR, VHOST_CONFIG, - "can't disable nonblocking on fd %d\n", fd); - return -2; - } - return 0; -} - -static void * -vhost_user_client_reconnect(void *arg __rte_unused) -{ - int ret; - struct vhost_user_reconnect *reconn, *next; - - while (1) { - pthread_mutex_lock(&reconn_list.mutex); - - /* - * An equal implementation of TAILQ_FOREACH_SAFE, - * which does not exist on all platforms. - */ - for (reconn = TAILQ_FIRST(&reconn_list.head); - reconn != NULL; reconn = next) { - next = TAILQ_NEXT(reconn, next); - - ret = vhost_user_connect_nonblock(reconn->fd, - (struct sockaddr *)&reconn->un, - sizeof(reconn->un)); - if (ret == -2) { - close(reconn->fd); - RTE_LOG(ERR, VHOST_CONFIG, - "reconnection for fd %d failed\n", - reconn->fd); - goto remove_fd; - } - if (ret == -1) - continue; - - RTE_LOG(INFO, VHOST_CONFIG, - "%s: connected\n", reconn->vsocket->path); - vhost_user_add_connection(reconn->fd, reconn->vsocket); -remove_fd: - TAILQ_REMOVE(&reconn_list.head, reconn, next); - free(reconn); - } - - pthread_mutex_unlock(&reconn_list.mutex); - sleep(1); - } - - return NULL; -} - -static int -vhost_user_reconnect_init(void) -{ - int ret; - - ret = pthread_mutex_init(&reconn_list.mutex, NULL); - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, "failed to initialize mutex"); - return ret; - } - TAILQ_INIT(&reconn_list.head); - - ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL, - vhost_user_client_reconnect, NULL); - if (ret != 0) { - RTE_LOG(ERR, VHOST_CONFIG, "failed to create reconnect thread"); - if (pthread_mutex_destroy(&reconn_list.mutex)) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to destroy reconnect mutex"); - } - } - - return ret; -} - -static int -vhost_user_start_client(struct vhost_user_socket *vsocket) -{ - int ret; - int fd = vsocket->socket_fd; - const char *path = vsocket->path; - struct vhost_user_reconnect *reconn; - - ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un, - sizeof(vsocket->un)); - if (ret == 0) { - vhost_user_add_connection(fd, vsocket); - return 0; - } - - RTE_LOG(WARNING, VHOST_CONFIG, - "failed to connect to %s: %s\n", - path, strerror(errno)); - - if (ret == -2 || !vsocket->reconnect) { - close(fd); - return -1; - } - - RTE_LOG(INFO, VHOST_CONFIG, "%s: reconnecting...\n", path); - reconn = malloc(sizeof(*reconn)); - if (reconn == NULL) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to allocate memory for reconnect\n"); - close(fd); - return -1; - } - reconn->un = vsocket->un; - reconn->fd = fd; - reconn->vsocket = vsocket; - pthread_mutex_lock(&reconn_list.mutex); - TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next); - pthread_mutex_unlock(&reconn_list.mutex); - - return 0; -} - static struct vhost_user_socket * find_vhost_user_socket(const char *path) { @@ -952,30 +429,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags) return ret; } -static bool -vhost_user_remove_reconnect(struct vhost_user_socket *vsocket) -{ - int found = false; - struct vhost_user_reconnect *reconn, *next; - - pthread_mutex_lock(&reconn_list.mutex); - - for (reconn = TAILQ_FIRST(&reconn_list.head); - reconn != NULL; reconn = next) { - next = TAILQ_NEXT(reconn, next); - - if (reconn->vsocket == vsocket) { - TAILQ_REMOVE(&reconn_list.head, reconn, next); - close(reconn->fd); - free(reconn); - found = true; - break; - } - } - pthread_mutex_unlock(&reconn_list.mutex); - return found; -} - /** * Unregister the specified vhost socket */ diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 3f0c308..89a5b7d 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -4,7 +4,492 @@ * Copyright(c) 2019 Arrikto Inc. */ +#include + +#include + #include "vhost.h" +#include "vhost_user.h" + +#define MAX_VIRTIO_BACKLOG 128 + +static void vhost_user_read_cb(int connfd, void *dat, int *remove); + +/* + * return bytes# of read on success or negative val on failure. Update fdnum + * with number of fds read. + */ +int +read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, + int *fd_num) +{ + struct iovec iov; + struct msghdr msgh; + char control[CMSG_SPACE(max_fds * sizeof(int))]; + struct cmsghdr *cmsg; + int got_fds = 0; + int ret; + + *fd_num = 0; + + memset(&msgh, 0, sizeof(msgh)); + iov.iov_base = buf; + iov.iov_len = buflen; + + msgh.msg_iov = &iov; + msgh.msg_iovlen = 1; + msgh.msg_control = control; + msgh.msg_controllen = sizeof(control); + + ret = recvmsg(sockfd, &msgh, 0); + if (ret <= 0) { + RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n"); + return ret; + } + + if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) { + RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n"); + return -1; + } + + for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL; + cmsg = CMSG_NXTHDR(&msgh, cmsg)) { + if ((cmsg->cmsg_level == SOL_SOCKET) && + (cmsg->cmsg_type == SCM_RIGHTS)) { + got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int); + *fd_num = got_fds; + memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int)); + break; + } + } + + /* Clear out unused file descriptors */ + while (got_fds < max_fds) + fds[got_fds++] = -1; + + return ret; +} + +int +send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) +{ + struct iovec iov; + struct msghdr msgh; + size_t fdsize = fd_num * sizeof(int); + char control[CMSG_SPACE(fdsize)]; + struct cmsghdr *cmsg; + int ret; + + memset(&msgh, 0, sizeof(msgh)); + iov.iov_base = buf; + iov.iov_len = buflen; + + msgh.msg_iov = &iov; + msgh.msg_iovlen = 1; + + if (fds && fd_num > 0) { + msgh.msg_control = control; + msgh.msg_controllen = sizeof(control); + cmsg = CMSG_FIRSTHDR(&msgh); + if (cmsg == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, "cmsg == NULL\n"); + errno = EINVAL; + return -1; + } + cmsg->cmsg_len = CMSG_LEN(fdsize); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + memcpy(CMSG_DATA(cmsg), fds, fdsize); + } else { + msgh.msg_control = NULL; + msgh.msg_controllen = 0; + } + + do { + ret = sendmsg(sockfd, &msgh, MSG_NOSIGNAL); + } while (ret < 0 && errno == EINTR); + + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, "sendmsg error\n"); + return ret; + } + + return ret; +} + +static void +vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) +{ + int vid; + size_t size; + struct vhost_user_connection *conn; + int ret; + + if (vsocket == NULL) + return; + + conn = malloc(sizeof(*conn)); + if (conn == NULL) { + close(fd); + return; + } + + vid = vhost_new_device(); + if (vid == -1) { + goto err; + } + + size = strnlen(vsocket->path, PATH_MAX); + vhost_set_ifname(vid, vsocket->path, size); + + vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); + + vhost_attach_vdpa_device(vid, vsocket->vdpa_dev_id); + + if (vsocket->dequeue_zero_copy) + vhost_enable_dequeue_zero_copy(vid); + + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); + + if (vsocket->notify_ops->new_connection) { + ret = vsocket->notify_ops->new_connection(vid); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to add vhost user connection with fd %d\n", + fd); + goto err_cleanup; + } + } + + conn->connfd = fd; + conn->vsocket = vsocket; + conn->vid = vid; + ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb, + NULL, conn); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to add fd %d into vhost server fdset\n", + fd); + + if (vsocket->notify_ops->destroy_connection) + vsocket->notify_ops->destroy_connection(conn->vid); + + goto err_cleanup; + } + + pthread_mutex_lock(&vsocket->conn_mutex); + TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next); + pthread_mutex_unlock(&vsocket->conn_mutex); + + fdset_pipe_notify(&vhost_user.fdset); + return; + +err_cleanup: + vhost_destroy_device(vid); +err: + free(conn); + close(fd); +} + +/* call back when there is new vhost-user connection from client */ +static void +vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused) +{ + struct vhost_user_socket *vsocket = dat; + + fd = accept(fd, NULL, NULL); + if (fd < 0) + return; + + RTE_LOG(INFO, VHOST_CONFIG, "new vhost user connection is %d\n", fd); + vhost_user_add_connection(fd, vsocket); +} + +static void +vhost_user_read_cb(int connfd, void *dat, int *remove) +{ + struct vhost_user_connection *conn = dat; + struct vhost_user_socket *vsocket = conn->vsocket; + int ret; + + ret = vhost_user_msg_handler(conn->vid, connfd); + if (ret < 0) { + struct virtio_net *dev = get_device(conn->vid); + + close(connfd); + *remove = 1; + + if (dev) + vhost_destroy_device_notify(dev); + + if (vsocket->notify_ops->destroy_connection) + vsocket->notify_ops->destroy_connection(conn->vid); + + vhost_destroy_device(conn->vid); + + pthread_mutex_lock(&vsocket->conn_mutex); + TAILQ_REMOVE(&vsocket->conn_list, conn, next); + pthread_mutex_unlock(&vsocket->conn_mutex); + + free(conn); + + if (vsocket->reconnect) { + create_unix_socket(vsocket); + vhost_user_start_client(vsocket); + } + } +} + +int +create_unix_socket(struct vhost_user_socket *vsocket) +{ + int fd; + struct sockaddr_un *un = &vsocket->un; + + fd = socket(AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) + return -1; + RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n", + vsocket->is_server ? "server" : "client", fd); + + if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) { + RTE_LOG(ERR, VHOST_CONFIG, + "vhost-user: can't set nonblocking mode for socket, fd: " + "%d (%s)\n", fd, strerror(errno)); + close(fd); + return -1; + } + + memset(un, 0, sizeof(*un)); + un->sun_family = AF_UNIX; + strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path)); + un->sun_path[sizeof(un->sun_path) - 1] = '\0'; + + vsocket->socket_fd = fd; + return 0; +} + +int +vhost_user_start_server(struct vhost_user_socket *vsocket) +{ + int ret; + int fd = vsocket->socket_fd; + const char *path = vsocket->path; + + /* + * bind () may fail if the socket file with the same name already + * exists. But the library obviously should not delete the file + * provided by the user, since we can not be sure that it is not + * being used by other applications. Moreover, many applications form + * socket names based on user input, which is prone to errors. + * + * The user must ensure that the socket does not exist before + * registering the vhost driver in server mode. + */ + ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un)); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to bind to %s: %s; remove it and try again\n", + path, strerror(errno)); + goto err; + } + RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path); + + ret = listen(fd, MAX_VIRTIO_BACKLOG); + if (ret < 0) + goto err; + + ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection, + NULL, vsocket); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to add listen fd %d to vhost server fdset\n", + fd); + goto err; + } + + return 0; + +err: + close(fd); + return -1; +} + +struct vhost_user_reconnect { + struct sockaddr_un un; + int fd; + struct vhost_user_socket *vsocket; + + TAILQ_ENTRY(vhost_user_reconnect) next; +}; + +TAILQ_HEAD(vhost_user_reconnect_tailq_list, vhost_user_reconnect); +struct vhost_user_reconnect_list { + struct vhost_user_reconnect_tailq_list head; + pthread_mutex_t mutex; +}; + +static struct vhost_user_reconnect_list reconn_list; +pthread_t reconn_tid; + +static int +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) +{ + int ret, flags; + + ret = connect(fd, un, sz); + if (ret < 0 && errno != EISCONN) + return -1; + + flags = fcntl(fd, F_GETFL, 0); + if (flags < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "can't get flags for connfd %d\n", fd); + return -2; + } + if ((flags & O_NONBLOCK) && fcntl(fd, F_SETFL, flags & ~O_NONBLOCK)) { + RTE_LOG(ERR, VHOST_CONFIG, + "can't disable nonblocking on fd %d\n", fd); + return -2; + } + return 0; +} + +static void * +vhost_user_client_reconnect(void *arg __rte_unused) +{ + int ret; + struct vhost_user_reconnect *reconn, *next; + + while (1) { + pthread_mutex_lock(&reconn_list.mutex); + + /* + * An equal implementation of TAILQ_FOREACH_SAFE, + * which does not exist on all platforms. + */ + for (reconn = TAILQ_FIRST(&reconn_list.head); + reconn != NULL; reconn = next) { + next = TAILQ_NEXT(reconn, next); + + ret = vhost_user_connect_nonblock(reconn->fd, + (struct sockaddr *)&reconn->un, + sizeof(reconn->un)); + if (ret == -2) { + close(reconn->fd); + RTE_LOG(ERR, VHOST_CONFIG, + "reconnection for fd %d failed\n", + reconn->fd); + goto remove_fd; + } + if (ret == -1) + continue; + + RTE_LOG(INFO, VHOST_CONFIG, + "%s: connected\n", reconn->vsocket->path); + vhost_user_add_connection(reconn->fd, reconn->vsocket); +remove_fd: + TAILQ_REMOVE(&reconn_list.head, reconn, next); + free(reconn); + } + + pthread_mutex_unlock(&reconn_list.mutex); + sleep(1); + } + + return NULL; +} + +int +vhost_user_reconnect_init(void) +{ + int ret; + + ret = pthread_mutex_init(&reconn_list.mutex, NULL); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, "failed to initialize mutex"); + return ret; + } + TAILQ_INIT(&reconn_list.head); + + ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL, + vhost_user_client_reconnect, NULL); + if (ret != 0) { + RTE_LOG(ERR, VHOST_CONFIG, "failed to create reconnect thread"); + if (pthread_mutex_destroy(&reconn_list.mutex)) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to destroy reconnect mutex"); + } + } + + return ret; +} + +int +vhost_user_start_client(struct vhost_user_socket *vsocket) +{ + int ret; + int fd = vsocket->socket_fd; + const char *path = vsocket->path; + struct vhost_user_reconnect *reconn; + + ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un, + sizeof(vsocket->un)); + if (ret == 0) { + vhost_user_add_connection(fd, vsocket); + return 0; + } + + RTE_LOG(WARNING, VHOST_CONFIG, + "failed to connect to %s: %s\n", + path, strerror(errno)); + + if (ret == -2 || !vsocket->reconnect) { + close(fd); + return -1; + } + + RTE_LOG(INFO, VHOST_CONFIG, "%s: reconnecting...\n", path); + reconn = malloc(sizeof(*reconn)); + if (reconn == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to allocate memory for reconnect\n"); + close(fd); + return -1; + } + reconn->un = vsocket->un; + reconn->fd = fd; + reconn->vsocket = vsocket; + pthread_mutex_lock(&reconn_list.mutex); + TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next); + pthread_mutex_unlock(&reconn_list.mutex); + + return 0; +} + +bool +vhost_user_remove_reconnect(struct vhost_user_socket *vsocket) +{ + int found = false; + struct vhost_user_reconnect *reconn, *next; + + pthread_mutex_lock(&reconn_list.mutex); + + for (reconn = TAILQ_FIRST(&reconn_list.head); + reconn != NULL; reconn = next) { + next = TAILQ_NEXT(reconn, next); + + if (reconn->vsocket == vsocket) { + TAILQ_REMOVE(&reconn_list.head, reconn, next); + close(reconn->fd); + free(reconn); + found = true; + break; + } + } + pthread_mutex_unlock(&reconn_list.mutex); + return found; +} static int af_unix_vring_call(struct virtio_net *dev __rte_unused, diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 077f213..c363369 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -5,6 +5,7 @@ #ifndef _VHOST_NET_CDEV_H_ #define _VHOST_NET_CDEV_H_ #include +#include #include #include #include @@ -13,13 +14,16 @@ #include #include #include +#include /* TODO remove when trans_af_unix.c refactoring is done */ #include +#include #include #include #include #include +#include "fd_man.h" #include "rte_vhost.h" #include "rte_vdpa.h" @@ -360,6 +364,78 @@ struct virtio_net { struct rte_vhost_user_extern_ops extern_ops; } __rte_cache_aligned; +/* The vhost_user, vhost_user_socket, vhost_user_connection, and reconnect + * declarations are temporary measures for moving AF_UNIX code into + * trans_af_unix.c. They will be cleaned up as socket.c is untangled from + * trans_af_unix.c. + */ +TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); + +/* + * Every time rte_vhost_driver_register() is invoked, an associated + * vhost_user_socket struct will be created. + */ +struct vhost_user_socket { + struct vhost_user_connection_list conn_list; + pthread_mutex_t conn_mutex; + char *path; + int socket_fd; + struct sockaddr_un un; + bool is_server; + bool reconnect; + bool dequeue_zero_copy; + bool iommu_support; + bool use_builtin_virtio_net; + + /* + * The "supported_features" indicates the feature bits the + * vhost driver supports. The "features" indicates the feature + * bits after the rte_vhost_driver_features_disable/enable(). + * It is also the final feature bits used for vhost-user + * features negotiation. + */ + uint64_t supported_features; + uint64_t features; + + uint64_t protocol_features; + + /* + * Device id to identify a specific backend device. + * It's set to -1 for the default software implementation. + * If valid, one socket can have 1 connection only. + */ + int vdpa_dev_id; + + struct vhost_device_ops const *notify_ops; +}; + +struct vhost_user_connection { + struct vhost_user_socket *vsocket; + int connfd; + int vid; + + TAILQ_ENTRY(vhost_user_connection) next; +}; + +#define MAX_VHOST_SOCKET 1024 +struct vhost_user { + struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET]; + struct fdset fdset; + int vsocket_cnt; + pthread_mutex_t mutex; +}; + +extern struct vhost_user vhost_user; + +int create_unix_socket(struct vhost_user_socket *vsocket); +int vhost_user_start_server(struct vhost_user_socket *vsocket); +int vhost_user_start_client(struct vhost_user_socket *vsocket); + +extern pthread_t reconn_tid; + +int vhost_user_reconnect_init(void); +bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket); + static __rte_always_inline bool vq_is_packed(struct virtio_net *dev) { From patchwork Wed Jun 19 15:14:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54957 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 143141C39B; Wed, 19 Jun 2019 17:15:45 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 9E8871C387 for ; Wed, 19 Jun 2019 17:15:37 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 56EE8182007; Wed, 19 Jun 2019 18:15:37 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id D5A9D2B2; Wed, 19 Jun 2019 18:15:36 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:28 +0300 Message-Id: <1560957293-17294-4-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 03/28] vhost: allocate per-socket transport state X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Stefan Hajnoczi vhost-user transports have per-socket state (like file descriptors). Make it possible for transports to keep state beyond what is included in struct vhost_user_socket. This patch makes it possible to move AF_UNIX-specific fields from struct vhost_user_socket into trans_af_unix.c in later patches. Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 6 ++++-- lib/librte_vhost/trans_af_unix.c | 5 +++++ lib/librte_vhost/vhost.h | 9 +++++++++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index a993b67..60d3546 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -316,6 +316,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) { int ret = -1; struct vhost_user_socket *vsocket; + const struct vhost_transport_ops *trans_ops = &af_unix_trans_ops; if (!path) return -1; @@ -328,10 +329,11 @@ rte_vhost_driver_register(const char *path, uint64_t flags) goto out; } - vsocket = malloc(sizeof(struct vhost_user_socket)); + vsocket = malloc(trans_ops->socket_size); if (!vsocket) goto out; - memset(vsocket, 0, sizeof(struct vhost_user_socket)); + memset(vsocket, 0, trans_ops->socket_size); + vsocket->trans_ops = trans_ops; vsocket->path = strdup(path); if (vsocket->path == NULL) { RTE_LOG(ERR, VHOST_CONFIG, diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 89a5b7d..4de2579 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -13,6 +13,10 @@ #define MAX_VIRTIO_BACKLOG 128 +struct af_unix_socket { + struct vhost_user_socket socket; /* must be the first field! */ +}; + static void vhost_user_read_cb(int connfd, void *dat, int *remove); /* @@ -501,5 +505,6 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused, } const struct vhost_transport_ops af_unix_trans_ops = { + .socket_size = sizeof(struct af_unix_socket), .vring_call = af_unix_vring_call, }; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index c363369..9615392 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -296,6 +296,9 @@ struct virtio_net; * A structure containing function pointers for transport-specific operations. */ struct vhost_transport_ops { + /** Size of struct vhost_user_socket-derived per-socket state */ + size_t socket_size; + /** * Notify the guest that used descriptors have been added to the vring. * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked @@ -374,6 +377,11 @@ TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); /* * Every time rte_vhost_driver_register() is invoked, an associated * vhost_user_socket struct will be created. + * + * Transport-specific per-socket state can be kept by embedding this struct at + * the beginning of a transport-specific struct. Set + * vhost_transport_ops->socket_size to the size of the transport-specific + * struct. */ struct vhost_user_socket { struct vhost_user_connection_list conn_list; @@ -407,6 +415,7 @@ struct vhost_user_socket { int vdpa_dev_id; struct vhost_device_ops const *notify_ops; + struct vhost_transport_ops const *trans_ops; }; struct vhost_user_connection { From patchwork Wed Jun 19 15:14:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54958 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DDA111C3A3; Wed, 19 Jun 2019 17:15:47 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id ECC6E1C388 for ; Wed, 19 Jun 2019 17:15:37 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id BC560182008; Wed, 19 Jun 2019 18:15:37 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 2350B32C; Wed, 19 Jun 2019 18:15:37 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:29 +0300 Message-Id: <1560957293-17294-5-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 04/28] vhost: move socket fd and un sockaddr X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The socket file descriptor and AF_UNIX sockaddr are specific to the AF_UNIX transport, so move them into trans_af_unix.c. In order to do this, we need to begin defining the vhost_transport_ops interface that will allow librte_vhost to support multiple transports. This patch adds socket_init() and socket_cleanup() to vhost_transport_ops. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 11 ++------ lib/librte_vhost/trans_af_unix.c | 55 ++++++++++++++++++++++++++++++++-------- lib/librte_vhost/vhost.h | 30 ++++++++++++++++++---- 3 files changed, 72 insertions(+), 24 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 60d3546..3b5608c 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -408,7 +408,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) } else { vsocket->is_server = true; } - ret = create_unix_socket(vsocket); + ret = trans_ops->socket_init(vsocket, flags); if (ret < 0) { goto out_mutex; } @@ -480,14 +480,7 @@ rte_vhost_driver_unregister(const char *path) } pthread_mutex_unlock(&vsocket->conn_mutex); - if (vsocket->is_server) { - fdset_del(&vhost_user.fdset, - vsocket->socket_fd); - close(vsocket->socket_fd); - unlink(path); - } else if (vsocket->reconnect) { - vhost_user_remove_reconnect(vsocket); - } + vsocket->trans_ops->socket_cleanup(vsocket); pthread_mutex_destroy(&vsocket->conn_mutex); vhost_user_socket_mem_free(vsocket); diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 4de2579..f23bb9c 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -4,6 +4,8 @@ * Copyright(c) 2019 Arrikto Inc. */ +#include +#include #include #include @@ -15,8 +17,11 @@ struct af_unix_socket { struct vhost_user_socket socket; /* must be the first field! */ + int socket_fd; + struct sockaddr_un un; }; +static int create_unix_socket(struct vhost_user_socket *vsocket); static void vhost_user_read_cb(int connfd, void *dat, int *remove); /* @@ -244,11 +249,13 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) } } -int +static int create_unix_socket(struct vhost_user_socket *vsocket) { + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); int fd; - struct sockaddr_un *un = &vsocket->un; + struct sockaddr_un *un = &af_vsocket->un; fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd < 0) @@ -269,15 +276,17 @@ create_unix_socket(struct vhost_user_socket *vsocket) strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path)); un->sun_path[sizeof(un->sun_path) - 1] = '\0'; - vsocket->socket_fd = fd; + af_vsocket->socket_fd = fd; return 0; } int vhost_user_start_server(struct vhost_user_socket *vsocket) { + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); int ret; - int fd = vsocket->socket_fd; + int fd = af_vsocket->socket_fd; const char *path = vsocket->path; /* @@ -290,7 +299,7 @@ vhost_user_start_server(struct vhost_user_socket *vsocket) * The user must ensure that the socket does not exist before * registering the vhost driver in server mode. */ - ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un)); + ret = bind(fd, (struct sockaddr *)&af_vsocket->un, sizeof(af_vsocket->un)); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "failed to bind to %s: %s; remove it and try again\n", @@ -432,13 +441,15 @@ vhost_user_reconnect_init(void) int vhost_user_start_client(struct vhost_user_socket *vsocket) { + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); int ret; - int fd = vsocket->socket_fd; + int fd = af_vsocket->socket_fd; const char *path = vsocket->path; struct vhost_user_reconnect *reconn; - ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un, - sizeof(vsocket->un)); + ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&af_vsocket->un, + sizeof(af_vsocket->un)); if (ret == 0) { vhost_user_add_connection(fd, vsocket); return 0; @@ -461,7 +472,7 @@ vhost_user_start_client(struct vhost_user_socket *vsocket) close(fd); return -1; } - reconn->un = vsocket->un; + reconn->un = af_vsocket->un; reconn->fd = fd; reconn->vsocket = vsocket; pthread_mutex_lock(&reconn_list.mutex); @@ -471,7 +482,7 @@ vhost_user_start_client(struct vhost_user_socket *vsocket) return 0; } -bool +static bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket) { int found = false; @@ -496,6 +507,28 @@ vhost_user_remove_reconnect(struct vhost_user_socket *vsocket) } static int +af_unix_socket_init(struct vhost_user_socket *vsocket, + uint64_t flags __rte_unused) +{ + return create_unix_socket(vsocket); +} + +static void +af_unix_socket_cleanup(struct vhost_user_socket *vsocket) +{ + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); + + if (vsocket->is_server) { + fdset_del(&vhost_user.fdset, af_vsocket->socket_fd); + close(af_vsocket->socket_fd); + unlink(vsocket->path); + } else if (vsocket->reconnect) { + vhost_user_remove_reconnect(vsocket); + } +} + +static int af_unix_vring_call(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq) { @@ -506,5 +539,7 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused, const struct vhost_transport_ops af_unix_trans_ops = { .socket_size = sizeof(struct af_unix_socket), + .socket_init = af_unix_socket_init, + .socket_cleanup = af_unix_socket_cleanup, .vring_call = af_unix_vring_call, }; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 9615392..40b5c25 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -14,7 +14,6 @@ #include #include #include -#include /* TODO remove when trans_af_unix.c refactoring is done */ #include #include @@ -291,6 +290,7 @@ struct guest_page { }; struct virtio_net; +struct vhost_user_socket; /** * A structure containing function pointers for transport-specific operations. @@ -300,6 +300,30 @@ struct vhost_transport_ops { size_t socket_size; /** + * Initialize a vhost-user socket that is being created by + * rte_vhost_driver_register(). This function checks that the flags + * are valid but does not establish a vhost-user connection. + * + * @param vsocket + * new socket + * @param flags + * flags argument from rte_vhost_driver_register() + * @return + * 0 on success, -1 on failure + */ + int (*socket_init)(struct vhost_user_socket *vsocket, uint64_t flags); + + /** + * Free resources associated with a socket, including any established + * connections. This function calls vhost_destroy_device() to destroy + * established connections for this socket. + * + * @param vsocket + * vhost socket + */ + void (*socket_cleanup)(struct vhost_user_socket *vsocket); + + /** * Notify the guest that used descriptors have been added to the vring. * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked * so this function just needs to perform the notification. @@ -387,8 +411,6 @@ struct vhost_user_socket { struct vhost_user_connection_list conn_list; pthread_mutex_t conn_mutex; char *path; - int socket_fd; - struct sockaddr_un un; bool is_server; bool reconnect; bool dequeue_zero_copy; @@ -436,14 +458,12 @@ struct vhost_user { extern struct vhost_user vhost_user; -int create_unix_socket(struct vhost_user_socket *vsocket); int vhost_user_start_server(struct vhost_user_socket *vsocket); int vhost_user_start_client(struct vhost_user_socket *vsocket); extern pthread_t reconn_tid; int vhost_user_reconnect_init(void); -bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket); static __rte_always_inline bool vq_is_packed(struct virtio_net *dev) From patchwork Wed Jun 19 15:14:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54959 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 922201C3A9; Wed, 19 Jun 2019 17:15:50 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 3FB581C386 for ; Wed, 19 Jun 2019 17:15:38 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 0D37B182009; Wed, 19 Jun 2019 18:15:38 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 8A62C2B2; Wed, 19 Jun 2019 18:15:37 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:30 +0300 Message-Id: <1560957293-17294-6-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 05/28] vhost: move start server/client calls X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Stefan Hajnoczi Introduce a vhost_transport_ops->socket_start() interface so the transport can begin establishing vhost-user connections. This is part of the AF_UNIX transport refactoring and removes AF_UNIX code from vhost.h and socket.c. Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 5 +---- lib/librte_vhost/trans_af_unix.c | 16 ++++++++++++++-- lib/librte_vhost/vhost.h | 16 +++++++++++++--- 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 3b5608c..df6d707 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -564,8 +564,5 @@ rte_vhost_driver_start(const char *path) } } - if (vsocket->is_server) - return vhost_user_start_server(vsocket); - else - return vhost_user_start_client(vsocket); + return vsocket->trans_ops->socket_start(vsocket); } diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index f23bb9c..93d11f7 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -22,6 +22,8 @@ struct af_unix_socket { }; static int create_unix_socket(struct vhost_user_socket *vsocket); +static int vhost_user_start_server(struct vhost_user_socket *vsocket); +static int vhost_user_start_client(struct vhost_user_socket *vsocket); static void vhost_user_read_cb(int connfd, void *dat, int *remove); /* @@ -280,7 +282,7 @@ create_unix_socket(struct vhost_user_socket *vsocket) return 0; } -int +static int vhost_user_start_server(struct vhost_user_socket *vsocket) { struct af_unix_socket *af_vsocket = @@ -438,7 +440,7 @@ vhost_user_reconnect_init(void) return ret; } -int +static int vhost_user_start_client(struct vhost_user_socket *vsocket) { struct af_unix_socket *af_vsocket = @@ -529,6 +531,15 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) } static int +af_unix_socket_start(struct vhost_user_socket *vsocket) +{ + if (vsocket->is_server) + return vhost_user_start_server(vsocket); + else + return vhost_user_start_client(vsocket); +} + +static int af_unix_vring_call(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq) { @@ -541,5 +552,6 @@ const struct vhost_transport_ops af_unix_trans_ops = { .socket_size = sizeof(struct af_unix_socket), .socket_init = af_unix_socket_init, .socket_cleanup = af_unix_socket_cleanup, + .socket_start = af_unix_socket_start, .vring_call = af_unix_vring_call, }; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 40b5c25..c74753b 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -324,6 +324,19 @@ struct vhost_transport_ops { void (*socket_cleanup)(struct vhost_user_socket *vsocket); /** + * Start establishing vhost-user connections. This function is + * asynchronous and connections may be established after it has + * returned. Call vhost_user_add_connection() to register new + * connections. + * + * @param vsocket + * vhost socket + * @return + * 0 on success, -1 on failure + */ + int (*socket_start)(struct vhost_user_socket *vsocket); + + /** * Notify the guest that used descriptors have been added to the vring. * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked * so this function just needs to perform the notification. @@ -458,9 +471,6 @@ struct vhost_user { extern struct vhost_user vhost_user; -int vhost_user_start_server(struct vhost_user_socket *vsocket); -int vhost_user_start_client(struct vhost_user_socket *vsocket); - extern pthread_t reconn_tid; int vhost_user_reconnect_init(void); From patchwork Wed Jun 19 15:14:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54960 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 705251C3B3; Wed, 19 Jun 2019 17:15:53 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id AF8E21C388 for ; Wed, 19 Jun 2019 17:15:38 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 80BB418200A; Wed, 19 Jun 2019 18:15:38 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id D0C29394; Wed, 19 Jun 2019 18:15:37 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:31 +0300 Message-Id: <1560957293-17294-7-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 06/28] vhost: move vhost-user connection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The AF_UNIX transport can accept multiple client connections on a server socket. Each connection instantiates a separate vhost-user device, which is stored as a vhost_user_connection. This behavior is specific to AF_UNIX and other transports may not support N connections per socket endpoint. Move struct vhost_user_connection to trans_af_unix.c and conn_list/conn_mutex into struct af_unix_socket. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 54 +++--------------------------- lib/librte_vhost/trans_af_unix.c | 72 ++++++++++++++++++++++++++++++++++++---- lib/librte_vhost/vhost.h | 19 ++--------- 3 files changed, 74 insertions(+), 71 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index df6d707..976343c 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -341,13 +341,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vhost_user_socket_mem_free(vsocket); goto out; } - TAILQ_INIT(&vsocket->conn_list); - ret = pthread_mutex_init(&vsocket->conn_mutex, NULL); - if (ret) { - RTE_LOG(ERR, VHOST_CONFIG, - "error: failed to init connection mutex\n"); - goto out_free; - } vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; /* @@ -395,7 +388,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) RTE_LOG(ERR, VHOST_CONFIG, "Postcopy requested but not compiled\n"); ret = -1; - goto out_mutex; + goto out_free; #endif } @@ -403,14 +396,14 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT); if (vsocket->reconnect && reconn_tid == 0) { if (vhost_user_reconnect_init() != 0) - goto out_mutex; + goto out_free; } } else { vsocket->is_server = true; } ret = trans_ops->socket_init(vsocket, flags); if (ret < 0) { - goto out_mutex; + goto out_free; } vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket; @@ -418,11 +411,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags) pthread_mutex_unlock(&vhost_user.mutex); return ret; -out_mutex: - if (pthread_mutex_destroy(&vsocket->conn_mutex)) { - RTE_LOG(ERR, VHOST_CONFIG, - "error: failed to destroy connection mutex\n"); - } out_free: vhost_user_socket_mem_free(vsocket); out: @@ -439,51 +427,19 @@ rte_vhost_driver_unregister(const char *path) { int i; int count; - struct vhost_user_connection *conn, *next; if (path == NULL) return -1; -again: pthread_mutex_lock(&vhost_user.mutex); for (i = 0; i < vhost_user.vsocket_cnt; i++) { struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; if (!strcmp(vsocket->path, path)) { - pthread_mutex_lock(&vsocket->conn_mutex); - for (conn = TAILQ_FIRST(&vsocket->conn_list); - conn != NULL; - conn = next) { - next = TAILQ_NEXT(conn, next); - - /* - * If r/wcb is executing, release the - * conn_mutex lock, and try again since - * the r/wcb may use the conn_mutex lock. - */ - if (fdset_try_del(&vhost_user.fdset, - conn->connfd) == -1) { - pthread_mutex_unlock( - &vsocket->conn_mutex); - pthread_mutex_unlock(&vhost_user.mutex); - goto again; - } - - RTE_LOG(INFO, VHOST_CONFIG, - "free connfd = %d for device '%s'\n", - conn->connfd, path); - close(conn->connfd); - vhost_destroy_device(conn->vid); - TAILQ_REMOVE(&vsocket->conn_list, conn, next); - free(conn); - } - pthread_mutex_unlock(&vsocket->conn_mutex); - vsocket->trans_ops->socket_cleanup(vsocket); - - pthread_mutex_destroy(&vsocket->conn_mutex); - vhost_user_socket_mem_free(vsocket); + free(vsocket->path); + free(vsocket); count = --vhost_user.vsocket_cnt; vhost_user.vsockets[i] = vhost_user.vsockets[count]; diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 93d11f7..58fc9e2 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -15,8 +15,20 @@ #define MAX_VIRTIO_BACKLOG 128 +TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); + +struct vhost_user_connection { + struct vhost_user_socket *vsocket; + int connfd; + int vid; + + TAILQ_ENTRY(vhost_user_connection) next; +}; + struct af_unix_socket { struct vhost_user_socket socket; /* must be the first field! */ + struct vhost_user_connection_list conn_list; + pthread_mutex_t conn_mutex; int socket_fd; struct sockaddr_un un; }; @@ -131,6 +143,8 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) static void vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) { + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); int vid; size_t size; struct vhost_user_connection *conn; @@ -188,9 +202,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) goto err_cleanup; } - pthread_mutex_lock(&vsocket->conn_mutex); - TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next); - pthread_mutex_unlock(&vsocket->conn_mutex); + pthread_mutex_lock(&af_vsocket->conn_mutex); + TAILQ_INSERT_TAIL(&af_vsocket->conn_list, conn, next); + pthread_mutex_unlock(&af_vsocket->conn_mutex); fdset_pipe_notify(&vhost_user.fdset); return; @@ -221,6 +235,8 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) { struct vhost_user_connection *conn = dat; struct vhost_user_socket *vsocket = conn->vsocket; + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); int ret; ret = vhost_user_msg_handler(conn->vid, connfd); @@ -238,9 +254,9 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) vhost_destroy_device(conn->vid); - pthread_mutex_lock(&vsocket->conn_mutex); - TAILQ_REMOVE(&vsocket->conn_list, conn, next); - pthread_mutex_unlock(&vsocket->conn_mutex); + pthread_mutex_lock(&af_vsocket->conn_mutex); + TAILQ_REMOVE(&af_vsocket->conn_list, conn, next); + pthread_mutex_unlock(&af_vsocket->conn_mutex); free(conn); @@ -512,6 +528,18 @@ static int af_unix_socket_init(struct vhost_user_socket *vsocket, uint64_t flags __rte_unused) { + struct af_unix_socket *af_vsocket = + container_of(vsocket, struct af_unix_socket, socket); + int ret; + + TAILQ_INIT(&af_vsocket->conn_list); + ret = pthread_mutex_init(&af_vsocket->conn_mutex, NULL); + if (ret) { + RTE_LOG(ERR, VHOST_CONFIG, + "error: failed to init connection mutex\n"); + return -1; + } + return create_unix_socket(vsocket); } @@ -520,6 +548,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) { struct af_unix_socket *af_vsocket = container_of(vsocket, struct af_unix_socket, socket); + struct vhost_user_connection *conn, *next; if (vsocket->is_server) { fdset_del(&vhost_user.fdset, af_vsocket->socket_fd); @@ -528,6 +557,37 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) } else if (vsocket->reconnect) { vhost_user_remove_reconnect(vsocket); } + +again: + pthread_mutex_lock(&af_vsocket->conn_mutex); + for (conn = TAILQ_FIRST(&af_vsocket->conn_list); + conn != NULL; + conn = next) { + next = TAILQ_NEXT(conn, next); + + /* + * If r/wcb is executing, release the + * conn_mutex lock, and try again since + * the r/wcb may use the conn_mutex lock. + */ + if (fdset_try_del(&vhost_user.fdset, + conn->connfd) == -1) { + pthread_mutex_unlock( + &af_vsocket->conn_mutex); + goto again; + } + + RTE_LOG(INFO, VHOST_CONFIG, + "free connfd = %d for device '%s'\n", + conn->connfd, vsocket->path); + close(conn->connfd); + vhost_destroy_device(conn->vid); + TAILQ_REMOVE(&af_vsocket->conn_list, conn, next); + free(conn); + } + pthread_mutex_unlock(&af_vsocket->conn_mutex); + + pthread_mutex_destroy(&af_vsocket->conn_mutex); } static int diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index c74753b..5c3987d 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -404,13 +404,10 @@ struct virtio_net { struct rte_vhost_user_extern_ops extern_ops; } __rte_cache_aligned; -/* The vhost_user, vhost_user_socket, vhost_user_connection, and reconnect - * declarations are temporary measures for moving AF_UNIX code into - * trans_af_unix.c. They will be cleaned up as socket.c is untangled from - * trans_af_unix.c. +/* The vhost_user, vhost_user_socket, and reconnect declarations are temporary + * measures for moving AF_UNIX code into trans_af_unix.c. They will be cleaned + * up as socket.c is untangled from trans_af_unix.c. */ -TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); - /* * Every time rte_vhost_driver_register() is invoked, an associated * vhost_user_socket struct will be created. @@ -421,8 +418,6 @@ TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); * struct. */ struct vhost_user_socket { - struct vhost_user_connection_list conn_list; - pthread_mutex_t conn_mutex; char *path; bool is_server; bool reconnect; @@ -453,14 +448,6 @@ struct vhost_user_socket { struct vhost_transport_ops const *trans_ops; }; -struct vhost_user_connection { - struct vhost_user_socket *vsocket; - int connfd; - int vid; - - TAILQ_ENTRY(vhost_user_connection) next; -}; - #define MAX_VHOST_SOCKET 1024 struct vhost_user { struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET]; From patchwork Wed Jun 19 15:14:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54961 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6832B1C3B9; Wed, 19 Jun 2019 17:15:55 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id E5DF81C389 for ; Wed, 19 Jun 2019 17:15:38 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id C6E1D18200B; Wed, 19 Jun 2019 18:15:38 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 51CE12B2; Wed, 19 Jun 2019 18:15:38 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:32 +0300 Message-Id: <1560957293-17294-8-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 07/28] vhost: move vhost-user reconnection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The socket reconnection code is highly specific to AF_UNIX, so move the remaining pieces of it into trans_af_unix.c. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 4 ---- lib/librte_vhost/trans_af_unix.c | 9 +++++++-- lib/librte_vhost/vhost.h | 10 +++------- 3 files changed, 10 insertions(+), 13 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 976343c..373c01d 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -394,10 +394,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags) if ((flags & RTE_VHOST_USER_CLIENT) != 0) { vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT); - if (vsocket->reconnect && reconn_tid == 0) { - if (vhost_user_reconnect_init() != 0) - goto out_free; - } } else { vsocket->is_server = true; } diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 58fc9e2..00d5366 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -361,7 +361,7 @@ struct vhost_user_reconnect_list { }; static struct vhost_user_reconnect_list reconn_list; -pthread_t reconn_tid; +static pthread_t reconn_tid; static int vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) @@ -431,7 +431,7 @@ vhost_user_client_reconnect(void *arg __rte_unused) return NULL; } -int +static int vhost_user_reconnect_init(void) { int ret; @@ -532,6 +532,11 @@ af_unix_socket_init(struct vhost_user_socket *vsocket, container_of(vsocket, struct af_unix_socket, socket); int ret; + if (vsocket->reconnect && reconn_tid == 0) { + if (vhost_user_reconnect_init() != 0) + return -1; + } + TAILQ_INIT(&af_vsocket->conn_list); ret = pthread_mutex_init(&af_vsocket->conn_mutex, NULL); if (ret) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5c3987d..d8b5ec2 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -404,9 +404,9 @@ struct virtio_net { struct rte_vhost_user_extern_ops extern_ops; } __rte_cache_aligned; -/* The vhost_user, vhost_user_socket, and reconnect declarations are temporary - * measures for moving AF_UNIX code into trans_af_unix.c. They will be cleaned - * up as socket.c is untangled from trans_af_unix.c. +/* The vhost_user and vhost_user_socket declarations are temporary measures for + * moving AF_UNIX code into trans_af_unix.c. They will be cleaned up as + * socket.c is untangled from trans_af_unix.c. */ /* * Every time rte_vhost_driver_register() is invoked, an associated @@ -458,10 +458,6 @@ struct vhost_user { extern struct vhost_user vhost_user; -extern pthread_t reconn_tid; - -int vhost_user_reconnect_init(void); - static __rte_always_inline bool vq_is_packed(struct virtio_net *dev) { From patchwork Wed Jun 19 15:14:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54963 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C6D401C3CC; Wed, 19 Jun 2019 17:16:01 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id A08101C389 for ; Wed, 19 Jun 2019 17:15:39 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 3160B18200D; Wed, 19 Jun 2019 18:15:39 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 97C9C32C; Wed, 19 Jun 2019 18:15:38 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:33 +0300 Message-Id: <1560957293-17294-9-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 08/28] vhost: move vhost-user fdset X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The fdset is used by the AF_UNIX transport code but other transports may not need it. Move it to trans_af_unix.c and then make struct vhost_user private again since nothing outside socket.c needs it. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/socket.c | 37 +++++++--------------------------- lib/librte_vhost/trans_af_unix.c | 43 +++++++++++++++++++++++++++++++++++----- lib/librte_vhost/vhost.h | 15 -------------- 3 files changed, 45 insertions(+), 50 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 373c01d..fc78b63 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -16,13 +16,14 @@ #include "vhost.h" #include "vhost_user.h" +#define MAX_VHOST_SOCKET 1024 +struct vhost_user { + struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET]; + int vsocket_cnt; + pthread_mutex_t mutex; +}; + struct vhost_user vhost_user = { - .fdset = { - .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} }, - .fd_mutex = PTHREAD_MUTEX_INITIALIZER, - .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER, - .num = 0 - }, .vsocket_cnt = 0, .mutex = PTHREAD_MUTEX_INITIALIZER, }; @@ -484,7 +485,6 @@ int rte_vhost_driver_start(const char *path) { struct vhost_user_socket *vsocket; - static pthread_t fdset_tid; pthread_mutex_lock(&vhost_user.mutex); vsocket = find_vhost_user_socket(path); @@ -493,28 +493,5 @@ rte_vhost_driver_start(const char *path) if (!vsocket) return -1; - if (fdset_tid == 0) { - /** - * create a pipe which will be waited by poll and notified to - * rebuild the wait list of poll. - */ - if (fdset_pipe_init(&vhost_user.fdset) < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to create pipe for vhost fdset\n"); - return -1; - } - - int ret = rte_ctrl_thread_create(&fdset_tid, - "vhost-events", NULL, fdset_event_dispatch, - &vhost_user.fdset); - if (ret != 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "failed to create fdset handling thread"); - - fdset_pipe_uninit(&vhost_user.fdset); - return -1; - } - } - return vsocket->trans_ops->socket_start(vsocket); } diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 00d5366..e8a4ef2 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -10,11 +10,19 @@ #include +#include "fd_man.h" #include "vhost.h" #include "vhost_user.h" #define MAX_VIRTIO_BACKLOG 128 +static struct fdset af_unix_fdset = { + .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} }, + .fd_mutex = PTHREAD_MUTEX_INITIALIZER, + .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER, + .num = 0 +}; + TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); struct vhost_user_connection { @@ -189,7 +197,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) conn->connfd = fd; conn->vsocket = vsocket; conn->vid = vid; - ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb, + ret = fdset_add(&af_unix_fdset, fd, vhost_user_read_cb, NULL, conn); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, @@ -206,7 +214,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) TAILQ_INSERT_TAIL(&af_vsocket->conn_list, conn, next); pthread_mutex_unlock(&af_vsocket->conn_mutex); - fdset_pipe_notify(&vhost_user.fdset); + fdset_pipe_notify(&af_unix_fdset); return; err_cleanup: @@ -330,7 +338,7 @@ vhost_user_start_server(struct vhost_user_socket *vsocket) if (ret < 0) goto err; - ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection, + ret = fdset_add(&af_unix_fdset, fd, vhost_user_server_new_connection, NULL, vsocket); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, @@ -556,7 +564,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) struct vhost_user_connection *conn, *next; if (vsocket->is_server) { - fdset_del(&vhost_user.fdset, af_vsocket->socket_fd); + fdset_del(&af_unix_fdset, af_vsocket->socket_fd); close(af_vsocket->socket_fd); unlink(vsocket->path); } else if (vsocket->reconnect) { @@ -575,7 +583,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) * conn_mutex lock, and try again since * the r/wcb may use the conn_mutex lock. */ - if (fdset_try_del(&vhost_user.fdset, + if (fdset_try_del(&af_unix_fdset, conn->connfd) == -1) { pthread_mutex_unlock( &af_vsocket->conn_mutex); @@ -598,6 +606,31 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) static int af_unix_socket_start(struct vhost_user_socket *vsocket) { + static pthread_t fdset_tid; + + if (fdset_tid == 0) { + /** + * create a pipe which will be waited by poll and notified to + * rebuild the wait list of poll. + */ + if (fdset_pipe_init(&af_unix_fdset) < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to create pipe for vhost fdset\n"); + return -1; + } + + int ret = rte_ctrl_thread_create(&fdset_tid, + "vhost-events", NULL, fdset_event_dispatch, + &af_unix_fdset); + if (ret != 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to create fdset handling thread"); + + fdset_pipe_uninit(&af_unix_fdset); + return -1; + } + } + if (vsocket->is_server) return vhost_user_start_server(vsocket); else diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index d8b5ec2..64b7f77 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -22,7 +22,6 @@ #include #include -#include "fd_man.h" #include "rte_vhost.h" #include "rte_vdpa.h" @@ -404,10 +403,6 @@ struct virtio_net { struct rte_vhost_user_extern_ops extern_ops; } __rte_cache_aligned; -/* The vhost_user and vhost_user_socket declarations are temporary measures for - * moving AF_UNIX code into trans_af_unix.c. They will be cleaned up as - * socket.c is untangled from trans_af_unix.c. - */ /* * Every time rte_vhost_driver_register() is invoked, an associated * vhost_user_socket struct will be created. @@ -448,16 +443,6 @@ struct vhost_user_socket { struct vhost_transport_ops const *trans_ops; }; -#define MAX_VHOST_SOCKET 1024 -struct vhost_user { - struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET]; - struct fdset fdset; - int vsocket_cnt; - pthread_mutex_t mutex; -}; - -extern struct vhost_user vhost_user; - static __rte_always_inline bool vq_is_packed(struct virtio_net *dev) { From patchwork Wed Jun 19 15:14:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54962 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6EB5C1C3C2; Wed, 19 Jun 2019 17:15:58 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 90A452BEA for ; Wed, 19 Jun 2019 17:15:39 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 6945318200E; Wed, 19 Jun 2019 18:15:39 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 005D82B2; Wed, 19 Jun 2019 18:15:38 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:34 +0300 Message-Id: <1560957293-17294-10-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 09/28] vhost: propagate vhost transport operations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch propagates struct vhost_user_socket's vhost_transport_ops into the newly created vhost device. This patch completes the initial refactoring of socket.c, with the AF_UNIX-specific code now in trans_af_unix.c and the librte_vhost API entrypoints in socket.c. Now it is time to turn towards vhost_user.c and its mixture of vhost-user protocol processing and socket I/O. The socket I/O will be moved into trans_af_unix.c so that other transports can be added that don't use file descriptors. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/trans_af_unix.c | 2 +- lib/librte_vhost/vhost.c | 4 ++-- lib/librte_vhost/vhost.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index e8a4ef2..865d862 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -167,7 +167,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) return; } - vid = vhost_new_device(); + vid = vhost_new_device(vsocket->trans_ops); if (vid == -1) { goto err; } diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index a36bc01..a72edf3 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -480,7 +480,7 @@ reset_device(struct virtio_net *dev) * there is a new virtio device being attached). */ int -vhost_new_device(void) +vhost_new_device(const struct vhost_transport_ops *trans_ops) { struct virtio_net *dev; int i; @@ -507,7 +507,7 @@ vhost_new_device(void) dev->vid = i; dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET; dev->slave_req_fd = -1; - dev->trans_ops = &af_unix_trans_ops; + dev->trans_ops = trans_ops; dev->vdpa_dev_id = -1; dev->postcopy_ufd = -1; rte_spinlock_init(&dev->slave_req_lock); diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 64b7f77..0831b27 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -568,7 +568,7 @@ get_device(int vid) return dev; } -int vhost_new_device(void); +int vhost_new_device(const struct vhost_transport_ops *trans_ops); void cleanup_device(struct virtio_net *dev, int destroy); void reset_device(struct virtio_net *dev); void vhost_destroy_device(int); From patchwork Wed Jun 19 15:14:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54964 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3EF601C3D3; Wed, 19 Jun 2019 17:16:05 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 112FD1C38F for ; Wed, 19 Jun 2019 17:15:40 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id D694618200F; Wed, 19 Jun 2019 18:15:39 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 3B12C394; Wed, 19 Jun 2019 18:15:39 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:35 +0300 Message-Id: <1560957293-17294-11-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 10/28] vhost: use a single structure for the device state X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" There is a 1:1 relationship between struct virtio_net and struct vhost_user_connection. They share the same lifetime. struct virtio_net is the per-device state that is part of the vhost.h API. struct vhost_user_connection is the AF_UNIX-specific per-device state and is private to trans_af_unix.c. It will be necessary to go between these two structs. This patch embeds struct virtio_net within struct vhost_user_connection so that AF_UNIX transport code can convert a struct virtio_net pointer into a struct vhost_user_connection pointer. There is now just a single malloc/free for both of these structs together. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/trans_af_unix.c | 60 +++++++++++++++------------------------- lib/librte_vhost/vhost.c | 12 ++++---- lib/librte_vhost/vhost.h | 11 +++++++- 3 files changed, 40 insertions(+), 43 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 865d862..7e119b4 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -26,9 +26,9 @@ static struct fdset af_unix_fdset = { TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection); struct vhost_user_connection { + struct virtio_net device; /* must be the first field! */ struct vhost_user_socket *vsocket; int connfd; - int vid; TAILQ_ENTRY(vhost_user_connection) next; }; @@ -153,7 +153,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) { struct af_unix_socket *af_vsocket = container_of(vsocket, struct af_unix_socket, socket); - int vid; + struct virtio_net *dev; size_t size; struct vhost_user_connection *conn; int ret; @@ -161,42 +161,37 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) if (vsocket == NULL) return; - conn = malloc(sizeof(*conn)); - if (conn == NULL) { - close(fd); + dev = vhost_new_device(vsocket->trans_ops); + if (!dev) { return; } - vid = vhost_new_device(vsocket->trans_ops); - if (vid == -1) { - goto err; - } + conn = container_of(dev, struct vhost_user_connection, device); + conn->connfd = fd; + conn->vsocket = vsocket; size = strnlen(vsocket->path, PATH_MAX); - vhost_set_ifname(vid, vsocket->path, size); + vhost_set_ifname(dev->vid, vsocket->path, size); - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); + vhost_set_builtin_virtio_net(dev->vid, vsocket->use_builtin_virtio_net); - vhost_attach_vdpa_device(vid, vsocket->vdpa_dev_id); + vhost_attach_vdpa_device(dev->vid, vsocket->vdpa_dev_id); if (vsocket->dequeue_zero_copy) - vhost_enable_dequeue_zero_copy(vid); + vhost_enable_dequeue_zero_copy(dev->vid); - RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", dev->vid); if (vsocket->notify_ops->new_connection) { - ret = vsocket->notify_ops->new_connection(vid); + ret = vsocket->notify_ops->new_connection(dev->vid); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "failed to add vhost user connection with fd %d\n", fd); - goto err_cleanup; + goto err; } } - conn->connfd = fd; - conn->vsocket = vsocket; - conn->vid = vid; ret = fdset_add(&af_unix_fdset, fd, vhost_user_read_cb, NULL, conn); if (ret < 0) { @@ -205,9 +200,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) fd); if (vsocket->notify_ops->destroy_connection) - vsocket->notify_ops->destroy_connection(conn->vid); + vsocket->notify_ops->destroy_connection(dev->vid); - goto err_cleanup; + goto err; } pthread_mutex_lock(&af_vsocket->conn_mutex); @@ -217,11 +212,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) fdset_pipe_notify(&af_unix_fdset); return; -err_cleanup: - vhost_destroy_device(vid); err: - free(conn); - close(fd); + close(conn->connfd); + vhost_destroy_device(dev->vid); } /* call back when there is new vhost-user connection from client */ @@ -247,26 +240,19 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) container_of(vsocket, struct af_unix_socket, socket); int ret; - ret = vhost_user_msg_handler(conn->vid, connfd); + ret = vhost_user_msg_handler(conn->device.vid, connfd); if (ret < 0) { - struct virtio_net *dev = get_device(conn->vid); - close(connfd); *remove = 1; - if (dev) - vhost_destroy_device_notify(dev); - if (vsocket->notify_ops->destroy_connection) - vsocket->notify_ops->destroy_connection(conn->vid); - - vhost_destroy_device(conn->vid); + vsocket->notify_ops->destroy_connection(conn->device.vid); pthread_mutex_lock(&af_vsocket->conn_mutex); TAILQ_REMOVE(&af_vsocket->conn_list, conn, next); pthread_mutex_unlock(&af_vsocket->conn_mutex); - free(conn); + vhost_destroy_device(conn->device.vid); if (vsocket->reconnect) { create_unix_socket(vsocket); @@ -594,9 +580,8 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket) "free connfd = %d for device '%s'\n", conn->connfd, vsocket->path); close(conn->connfd); - vhost_destroy_device(conn->vid); TAILQ_REMOVE(&af_vsocket->conn_list, conn, next); - free(conn); + vhost_destroy_device(conn->device.vid); } pthread_mutex_unlock(&af_vsocket->conn_mutex); @@ -648,6 +633,7 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused, const struct vhost_transport_ops af_unix_trans_ops = { .socket_size = sizeof(struct af_unix_socket), + .device_size = sizeof(struct vhost_user_connection), .socket_init = af_unix_socket_init, .socket_cleanup = af_unix_socket_cleanup, .socket_start = af_unix_socket_start, diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index a72edf3..0fdc54f 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -7,6 +7,7 @@ #include #include #include +#include #ifdef RTE_LIBRTE_VHOST_NUMA #include #include @@ -479,7 +480,7 @@ reset_device(struct virtio_net *dev) * Invoked when there is a new vhost-user connection established (when * there is a new virtio device being attached). */ -int +struct virtio_net * vhost_new_device(const struct vhost_transport_ops *trans_ops) { struct virtio_net *dev; @@ -493,14 +494,15 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops) if (i == MAX_VHOST_DEVICE) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to find a free slot for new device.\n"); - return -1; + return NULL; } - dev = rte_zmalloc(NULL, sizeof(struct virtio_net), 0); + assert(trans_ops->device_size >= sizeof(struct virtio_net)); + dev = rte_zmalloc(NULL, trans_ops->device_size, 0); if (dev == NULL) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to allocate memory for new dev.\n"); - return -1; + return NULL; } vhost_devices[i] = dev; @@ -512,7 +514,7 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops) dev->postcopy_ufd = -1; rte_spinlock_init(&dev->slave_req_lock); - return i; + return dev; } void diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 0831b27..b9e4df1 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -298,6 +298,9 @@ struct vhost_transport_ops { /** Size of struct vhost_user_socket-derived per-socket state */ size_t socket_size; + /** Size of struct virtio_net-derived per-device state */ + size_t device_size; + /** * Initialize a vhost-user socket that is being created by * rte_vhost_driver_register(). This function checks that the flags @@ -356,6 +359,11 @@ extern const struct vhost_transport_ops af_unix_trans_ops; /** * Device structure contains all configuration information relating * to the device. + * + * Transport-specific per-device state can be kept by embedding this struct at + * the beginning of a transport-specific struct. Set + * vhost_transport_ops->device_size to the size of the transport-specific + * struct. */ struct virtio_net { /* Frontend (QEMU) memory and memory region information */ @@ -568,7 +576,8 @@ get_device(int vid) return dev; } -int vhost_new_device(const struct vhost_transport_ops *trans_ops); +struct virtio_net * +vhost_new_device(const struct vhost_transport_ops *trans_ops); void cleanup_device(struct virtio_net *dev, int destroy); void reset_device(struct virtio_net *dev); void vhost_destroy_device(int); From patchwork Wed Jun 19 15:14:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54965 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 233671C3E8; Wed, 19 Jun 2019 17:16:10 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id A5DA91C389 for ; Wed, 19 Jun 2019 17:15:40 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 65685182010; Wed, 19 Jun 2019 18:15:40 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id A57D632C; Wed, 19 Jun 2019 18:15:39 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:36 +0300 Message-Id: <1560957293-17294-12-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 11/28] vhost: extract socket I/O into transport X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The core vhost-user protocol code should not do socket I/O, because the details are transport-specific. Move code to send and receive vhost-user messages into trans_af_unix.c. The connection fd is a transport-specific feature. Therefore, it should and eventually will be removed from the core vhost-user code. That is, it will be removed from the vhost_user_msg_handler() and the message handlers. We keep it for now, because vhost_user_set_mem_table() needs it. In a later commit, we will refactor the map/unmap functionality and after that we will be able to remove the connection fds from the core vhost-user code. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/trans_af_unix.c | 70 +++++++++++++++++++++++++++++++++--- lib/librte_vhost/vhost.h | 26 ++++++++++++++ lib/librte_vhost/vhost_user.c | 78 ++++++++-------------------------------- lib/librte_vhost/vhost_user.h | 7 +--- 4 files changed, 108 insertions(+), 73 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 7e119b4..c0ba8df 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -50,7 +50,7 @@ static void vhost_user_read_cb(int connfd, void *dat, int *remove); * return bytes# of read on success or negative val on failure. Update fdnum * with number of fds read. */ -int +static int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, int *fd_num) { @@ -101,8 +101,8 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, return ret; } -int -send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) +static int +send_fd_message(int sockfd, void *buf, int buflen, int *fds, int fd_num) { struct iovec iov; struct msghdr msgh; @@ -148,6 +148,23 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) return ret; } +static int +af_unix_send_reply(struct virtio_net *dev, struct VhostUserMsg *msg) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + + return send_fd_message(conn->connfd, msg, + VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num); +} + +static int +af_unix_send_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg) +{ + return send_fd_message(dev->slave_req_fd, msg, + VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num); +} + static void vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) { @@ -231,6 +248,36 @@ vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused) vhost_user_add_connection(fd, vsocket); } +/* return bytes# of read on success or negative val on failure. */ +int +read_vhost_message(int sockfd, struct VhostUserMsg *msg) +{ + int ret; + + ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE, + msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num); + if (ret <= 0) + return ret; + + if (msg->size) { + if (msg->size > sizeof(msg->payload)) { + RTE_LOG(ERR, VHOST_CONFIG, + "invalid msg size: %d\n", msg->size); + return -1; + } + ret = read(sockfd, &msg->payload, msg->size); + if (ret <= 0) + return ret; + if (ret != (int)msg->size) { + RTE_LOG(ERR, VHOST_CONFIG, + "read control message failed\n"); + return -1; + } + } + + return ret; +} + static void vhost_user_read_cb(int connfd, void *dat, int *remove) { @@ -238,10 +285,23 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) struct vhost_user_socket *vsocket = conn->vsocket; struct af_unix_socket *af_vsocket = container_of(vsocket, struct af_unix_socket, socket); + struct VhostUserMsg msg; int ret; - ret = vhost_user_msg_handler(conn->device.vid, connfd); + ret = read_vhost_message(connfd, &msg); + if (ret <= 0) { + if (ret < 0) + RTE_LOG(ERR, VHOST_CONFIG, + "vhost read message failed\n"); + else if (ret == 0) + RTE_LOG(INFO, VHOST_CONFIG, + "vhost peer closed\n"); + goto err; + } + + ret = vhost_user_msg_handler(conn->device.vid, connfd, &msg); if (ret < 0) { +err: close(connfd); *remove = 1; @@ -638,4 +698,6 @@ const struct vhost_transport_ops af_unix_trans_ops = { .socket_cleanup = af_unix_socket_cleanup, .socket_start = af_unix_socket_start, .vring_call = af_unix_vring_call, + .send_reply = af_unix_send_reply, + .send_slave_req = af_unix_send_slave_req, }; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index b9e4df1..b20773c 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -290,6 +290,7 @@ struct guest_page { struct virtio_net; struct vhost_user_socket; +struct VhostUserMsg; /** * A structure containing function pointers for transport-specific operations. @@ -351,6 +352,31 @@ struct vhost_transport_ops { * 0 on success, -1 on failure */ int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq); + + /** + * Send a reply to the master. + * + * @param dev + * vhost device + * @param reply + * reply message + * @return + * 0 on success, -1 on failure + */ + int (*send_reply)(struct virtio_net *dev, struct VhostUserMsg *reply); + + /** + * Send a slave request to the master. + * + * @param dev + * vhost device + * @param req + * request message + * @return + * 0 on success, -1 on failure + */ + int (*send_slave_req)(struct virtio_net *dev, + struct VhostUserMsg *req); }; /** The traditional AF_UNIX vhost-user protocol transport. */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index c9e29ec..5c12435 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -80,8 +80,8 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_POSTCOPY_END] = "VHOST_USER_POSTCOPY_END", }; -static int send_vhost_reply(int sockfd, struct VhostUserMsg *msg); -static int read_vhost_message(int sockfd, struct VhostUserMsg *msg); +static int send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg); +int read_vhost_message(int sockfd, struct VhostUserMsg *msg); static uint64_t get_blk_size(int fd) @@ -1042,7 +1042,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, if (dev->postcopy_listening) { /* Send the addresses back to qemu */ msg->fd_num = 0; - send_vhost_reply(main_fd, msg); + send_vhost_reply(dev, msg); /* Wait for qemu to acknolwedge it's got the addresses * we've got to wait before we're allowed to generate faults. @@ -1764,49 +1764,8 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = { [VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end, }; - -/* return bytes# of read on success or negative val on failure. */ static int -read_vhost_message(int sockfd, struct VhostUserMsg *msg) -{ - int ret; - - ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE, - msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num); - if (ret <= 0) - return ret; - - if (msg->size) { - if (msg->size > sizeof(msg->payload)) { - RTE_LOG(ERR, VHOST_CONFIG, - "invalid msg size: %d\n", msg->size); - return -1; - } - ret = read(sockfd, &msg->payload, msg->size); - if (ret <= 0) - return ret; - if (ret != (int)msg->size) { - RTE_LOG(ERR, VHOST_CONFIG, - "read control message failed\n"); - return -1; - } - } - - return ret; -} - -static int -send_vhost_message(int sockfd, struct VhostUserMsg *msg) -{ - if (!msg) - return 0; - - return send_fd_message(sockfd, (char *)msg, - VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num); -} - -static int -send_vhost_reply(int sockfd, struct VhostUserMsg *msg) +send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg) { if (!msg) return 0; @@ -1816,7 +1775,7 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg) msg->flags |= VHOST_USER_VERSION; msg->flags |= VHOST_USER_REPLY_MASK; - return send_vhost_message(sockfd, msg); + return dev->trans_ops->send_reply(dev, msg); } static int @@ -1827,7 +1786,7 @@ send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg) if (msg->flags & VHOST_USER_NEED_REPLY) rte_spinlock_lock(&dev->slave_req_lock); - ret = send_vhost_message(dev->slave_req_fd, msg); + ret = dev->trans_ops->send_slave_req(dev, msg); if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY)) rte_spinlock_unlock(&dev->slave_req_lock); @@ -1908,10 +1867,10 @@ vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) } int -vhost_user_msg_handler(int vid, int fd) +vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_) { + struct VhostUserMsg msg = *msg_; /* copy so we can build the reply */ struct virtio_net *dev; - struct VhostUserMsg msg; struct rte_vdpa_device *vdpa_dev; int did = -1; int ret; @@ -1933,15 +1892,8 @@ vhost_user_msg_handler(int vid, int fd) } } - ret = read_vhost_message(fd, &msg); - if (ret <= 0) { - if (ret < 0) - RTE_LOG(ERR, VHOST_CONFIG, - "vhost read message failed\n"); - else - RTE_LOG(INFO, VHOST_CONFIG, - "vhost peer closed\n"); - + if (msg.request.master >= VHOST_USER_MAX) { + RTE_LOG(ERR, VHOST_CONFIG, "vhost read incorrect message\n"); return -1; } @@ -2004,7 +1956,7 @@ vhost_user_msg_handler(int vid, int fd) (void *)&msg); switch (ret) { case RTE_VHOST_MSG_RESULT_REPLY: - send_vhost_reply(fd, &msg); + send_vhost_reply(dev, &msg); /* Fall-through */ case RTE_VHOST_MSG_RESULT_ERR: case RTE_VHOST_MSG_RESULT_OK: @@ -2038,7 +1990,7 @@ vhost_user_msg_handler(int vid, int fd) RTE_LOG(DEBUG, VHOST_CONFIG, "Processing %s succeeded and needs reply.\n", vhost_message_str[request]); - send_vhost_reply(fd, &msg); + send_vhost_reply(dev, &msg); handled = true; break; default: @@ -2053,7 +2005,7 @@ vhost_user_msg_handler(int vid, int fd) (void *)&msg); switch (ret) { case RTE_VHOST_MSG_RESULT_REPLY: - send_vhost_reply(fd, &msg); + send_vhost_reply(dev, &msg); /* Fall-through */ case RTE_VHOST_MSG_RESULT_ERR: case RTE_VHOST_MSG_RESULT_OK: @@ -2083,7 +2035,7 @@ vhost_user_msg_handler(int vid, int fd) msg.payload.u64 = ret == RTE_VHOST_MSG_RESULT_ERR; msg.size = sizeof(msg.payload.u64); msg.fd_num = 0; - send_vhost_reply(fd, &msg); + send_vhost_reply(dev, &msg); } else if (ret == RTE_VHOST_MSG_RESULT_ERR) { RTE_LOG(ERR, VHOST_CONFIG, "vhost message handling failed.\n"); @@ -2161,7 +2113,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) }, }; - ret = send_vhost_message(dev->slave_req_fd, &msg); + ret = send_vhost_slave_req(dev, &msg); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to send IOTLB miss message (%d)\n", diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 2a650fe..0169bd2 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -146,12 +146,7 @@ typedef struct VhostUserMsg { /* vhost_user.c */ -int vhost_user_msg_handler(int vid, int fd); +int vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg); int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm); -/* socket.c */ -int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, - int *fd_num); -int send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num); - #endif From patchwork Wed Jun 19 15:14:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54966 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CDEDF1C3ED; Wed, 19 Jun 2019 17:16:13 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 225361C389 for ; Wed, 19 Jun 2019 17:15:41 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id E6C1C18200C; Wed, 19 Jun 2019 18:15:40 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 34F492B2; Wed, 19 Jun 2019 18:15:40 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:37 +0300 Message-Id: <1560957293-17294-13-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 12/28] vhost: move slave request fd and lock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The slave request file descriptor is specific to the AF_UNIX transport. Move this field along with its spinlock out of struct virtio_net and into the trans_af_unix.c private struct vhost_user_connection struct. This implies that we also had to move the associated functions send_vhost_slave_message() and process_slave_message_reply() out of vhost_user.c and into trans_af_unix.c. We also moved the spinlock initialization out of vhost_new_connection() and into trans_af_unix.c. This change will allow future transports to implement the slave request fd without relying on socket I/O. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/trans_af_unix.c | 87 +++++++++++++++++++++++++++++++++++++++- lib/librte_vhost/vhost.c | 4 +- lib/librte_vhost/vhost.h | 41 +++++++++++++++++-- lib/librte_vhost/vhost_user.c | 67 ++++--------------------------- 4 files changed, 132 insertions(+), 67 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index c0ba8df..5f9ef5a 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -29,6 +29,8 @@ struct vhost_user_connection { struct virtio_net device; /* must be the first field! */ struct vhost_user_socket *vsocket; int connfd; + int slave_req_fd; + rte_spinlock_t slave_req_lock; TAILQ_ENTRY(vhost_user_connection) next; }; @@ -41,6 +43,7 @@ struct af_unix_socket { struct sockaddr_un un; }; +int read_vhost_message(int sockfd, struct VhostUserMsg *msg); static int create_unix_socket(struct vhost_user_socket *vsocket); static int vhost_user_start_server(struct vhost_user_socket *vsocket); static int vhost_user_start_client(struct vhost_user_socket *vsocket); @@ -161,8 +164,71 @@ af_unix_send_reply(struct virtio_net *dev, struct VhostUserMsg *msg) static int af_unix_send_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg) { - return send_fd_message(dev->slave_req_fd, msg, - VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num); + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + int ret; + + if (msg->flags & VHOST_USER_NEED_REPLY) + rte_spinlock_lock(&conn->slave_req_lock); + + ret = send_fd_message(conn->slave_req_fd, msg, + VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num); + + if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY)) + rte_spinlock_unlock(&conn->slave_req_lock); + + return ret; +} + +static int +af_unix_process_slave_message_reply(struct virtio_net *dev, + const struct VhostUserMsg *msg) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + struct VhostUserMsg msg_reply; + int ret; + + if ((msg->flags & VHOST_USER_NEED_REPLY) == 0) + return 0; + + if (read_vhost_message(conn->slave_req_fd, &msg_reply) < 0) { + ret = -1; + goto out; + } + + if (msg_reply.request.slave != msg->request.slave) { + RTE_LOG(ERR, VHOST_CONFIG, + "Received unexpected msg type (%u), expected %u\n", + msg_reply.request.slave, msg->request.slave); + ret = -1; + goto out; + } + + ret = msg_reply.payload.u64 ? -1 : 0; + +out: + rte_spinlock_unlock(&conn->slave_req_lock); + return ret; +} + +static int +af_unix_set_slave_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + int fd = msg->fds[0]; + + if (fd < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Invalid file descriptor for slave channel (%d)\n", + fd); + return -1; + } + + conn->slave_req_fd = fd; + + return 0; } static void @@ -185,7 +251,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) conn = container_of(dev, struct vhost_user_connection, device); conn->connfd = fd; + conn->slave_req_fd = -1; conn->vsocket = vsocket; + rte_spinlock_init(&conn->slave_req_lock); size = strnlen(vsocket->path, PATH_MAX); vhost_set_ifname(dev->vid, vsocket->path, size); @@ -682,6 +750,18 @@ af_unix_socket_start(struct vhost_user_socket *vsocket) return vhost_user_start_client(vsocket); } +static void +af_unix_cleanup_device(struct virtio_net *dev, int destroy __rte_unused) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + + if (conn->slave_req_fd >= 0) { + close(conn->slave_req_fd); + conn->slave_req_fd = -1; + } +} + static int af_unix_vring_call(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq) @@ -697,7 +777,10 @@ const struct vhost_transport_ops af_unix_trans_ops = { .socket_init = af_unix_socket_init, .socket_cleanup = af_unix_socket_cleanup, .socket_start = af_unix_socket_start, + .cleanup_device = af_unix_cleanup_device, .vring_call = af_unix_vring_call, .send_reply = af_unix_send_reply, .send_slave_req = af_unix_send_slave_req, + .process_slave_message_reply = af_unix_process_slave_message_reply, + .set_slave_req_fd = af_unix_set_slave_req_fd, }; diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 0fdc54f..5b16390 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -256,6 +256,8 @@ cleanup_device(struct virtio_net *dev, int destroy) for (i = 0; i < dev->nr_vring; i++) cleanup_vq(dev->virtqueue[i], destroy); + + dev->trans_ops->cleanup_device(dev, destroy); } void @@ -508,11 +510,9 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops) vhost_devices[i] = dev; dev->vid = i; dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET; - dev->slave_req_fd = -1; dev->trans_ops = trans_ops; dev->vdpa_dev_id = -1; dev->postcopy_ufd = -1; - rte_spinlock_init(&dev->slave_req_lock); return dev; } diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index b20773c..2213fbe 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -340,6 +340,16 @@ struct vhost_transport_ops { int (*socket_start)(struct vhost_user_socket *vsocket); /** + * Free resources associated with this device. + * + * @param dev + * vhost device + * @param destroy + * 0 on device reset, 1 on full cleanup. + */ + void (*cleanup_device)(struct virtio_net *dev, int destroy); + + /** * Notify the guest that used descriptors have been added to the vring. * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked * so this function just needs to perform the notification. @@ -377,6 +387,34 @@ struct vhost_transport_ops { */ int (*send_slave_req)(struct virtio_net *dev, struct VhostUserMsg *req); + + /** + * Process the master's reply on a slave request. + * + * @param dev + * vhost device + * @param msg + * slave request message + * @return + * 0 on success, -1 on failure + */ + int (*process_slave_message_reply)(struct virtio_net *dev, + const struct VhostUserMsg *msg); + + /** + * Process VHOST_USER_SET_SLAVE_REQ_FD message. After this function + * succeeds send_slave_req() may be called to submit requests to the + * master. + * + * @param dev + * vhost device + * @param msg + * message + * @return + * 0 on success, -1 on failure + */ + int (*set_slave_req_fd)(struct virtio_net *dev, + struct VhostUserMsg *msg); }; /** The traditional AF_UNIX vhost-user protocol transport. */ @@ -419,9 +457,6 @@ struct virtio_net { uint32_t max_guest_pages; struct guest_page *guest_pages; - int slave_req_fd; - rte_spinlock_t slave_req_lock; - int postcopy_ufd; int postcopy_listening; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 5c12435..a4dcba1 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -160,11 +160,6 @@ vhost_backend_cleanup(struct virtio_net *dev) dev->log_addr = 0; } - if (dev->slave_req_fd >= 0) { - close(dev->slave_req_fd); - dev->slave_req_fd = -1; - } - if (dev->postcopy_ufd >= 0) { close(dev->postcopy_ufd); dev->postcopy_ufd = -1; @@ -1549,17 +1544,13 @@ static int vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg, int main_fd __rte_unused) { + int ret; struct virtio_net *dev = *pdev; - int fd = msg->fds[0]; - if (fd < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "Invalid file descriptor for slave channel (%d)\n", - fd); - return RTE_VHOST_MSG_RESULT_ERR; - } + ret = dev->trans_ops->set_slave_req_fd(dev, msg); - dev->slave_req_fd = fd; + if (ret < 0) + return RTE_VHOST_MSG_RESULT_ERR; return RTE_VHOST_MSG_RESULT_OK; } @@ -1778,21 +1769,6 @@ send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg) return dev->trans_ops->send_reply(dev, msg); } -static int -send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg) -{ - int ret; - - if (msg->flags & VHOST_USER_NEED_REPLY) - rte_spinlock_lock(&dev->slave_req_lock); - - ret = dev->trans_ops->send_slave_req(dev, msg); - if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY)) - rte_spinlock_unlock(&dev->slave_req_lock); - - return ret; -} - /* * Allocate a queue pair if it hasn't been allocated yet */ @@ -2069,35 +2045,6 @@ vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_) return 0; } -static int process_slave_message_reply(struct virtio_net *dev, - const struct VhostUserMsg *msg) -{ - struct VhostUserMsg msg_reply; - int ret; - - if ((msg->flags & VHOST_USER_NEED_REPLY) == 0) - return 0; - - if (read_vhost_message(dev->slave_req_fd, &msg_reply) < 0) { - ret = -1; - goto out; - } - - if (msg_reply.request.slave != msg->request.slave) { - RTE_LOG(ERR, VHOST_CONFIG, - "Received unexpected msg type (%u), expected %u\n", - msg_reply.request.slave, msg->request.slave); - ret = -1; - goto out; - } - - ret = msg_reply.payload.u64 ? -1 : 0; - -out: - rte_spinlock_unlock(&dev->slave_req_lock); - return ret; -} - int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) { @@ -2113,7 +2060,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) }, }; - ret = send_vhost_slave_req(dev, &msg); + ret = dev->trans_ops->send_slave_req(dev, &msg); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to send IOTLB miss message (%d)\n", @@ -2148,14 +2095,14 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev, msg.fd_num = 1; } - ret = send_vhost_slave_message(dev, &msg); + ret = dev->trans_ops->send_slave_req(dev, &msg); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to set host notifier (%d)\n", ret); return ret; } - return process_slave_message_reply(dev, &msg); + return dev->trans_ops->process_slave_message_reply(dev, &msg); } int rte_vhost_host_notifier_ctrl(int vid, bool enable) From patchwork Wed Jun 19 15:14:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54967 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 177A91C3F4; Wed, 19 Jun 2019 17:16:17 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id E796A1C391 for ; Wed, 19 Jun 2019 17:15:41 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id A64F5182011; Wed, 19 Jun 2019 18:15:41 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id B5E4E32C; Wed, 19 Jun 2019 18:15:40 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:38 +0300 Message-Id: <1560957293-17294-14-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 13/28] vhost: move mmap/munmap X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Mapping the vhost memory regions is a transport-specific operation, so this patch moves the relevant code into trans_af_unix.c. The new .map_mem_table()/.unmap_mem_table() interfaces allow transports to perform the mapping and unmapping. In addition, the function vhost_user_set_mem_table(), which performs the mmaping, contains some code for postcopy live migration. However, postcopy live migration is an AF_UNIX-bound feature, due to the userfaultfd mechanism. The virtio-vhost-user transport, which will be added in later patches, cannot support it. Therefore, we move this code into trans_af_unix.c as well. The vhost_user_set_mem_table() debug logs have also been moved to the .map_mem_table(). All new .map_mem_table() interfaces have to implement the debug logs. This is necessary in order to keep the ordering of the log messages in case of postcopy live migration. Last but not least, after refactoring vhost_user_set_mem_table(), read_vhost_message() is no longer being used in vhost_user.c. So, mark it as static in trans_af_unix.c. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/trans_af_unix.c | 185 ++++++++++++++++++++++++++++++++++++++- lib/librte_vhost/vhost.h | 22 +++++ lib/librte_vhost/vhost_user.c | 171 ++++-------------------------------- lib/librte_vhost/vhost_user.h | 3 + 4 files changed, 225 insertions(+), 156 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 5f9ef5a..522823f 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -5,7 +5,14 @@ */ #include +#include +#include #include +#include +#include +#ifdef RTE_LIBRTE_VHOST_POSTCOPY +#include +#endif #include #include @@ -43,7 +50,7 @@ struct af_unix_socket { struct sockaddr_un un; }; -int read_vhost_message(int sockfd, struct VhostUserMsg *msg); +static int read_vhost_message(int sockfd, struct VhostUserMsg *msg); static int create_unix_socket(struct vhost_user_socket *vsocket); static int vhost_user_start_server(struct vhost_user_socket *vsocket); static int vhost_user_start_client(struct vhost_user_socket *vsocket); @@ -317,7 +324,7 @@ vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused) } /* return bytes# of read on success or negative val on failure. */ -int +static int read_vhost_message(int sockfd, struct VhostUserMsg *msg) { int ret; @@ -771,6 +778,178 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused, return 0; } +static uint64_t +get_blk_size(int fd) +{ + struct stat stat; + int ret; + + ret = fstat(fd, &stat); + return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize; +} + +static int +af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg) +{ + uint32_t i; + struct VhostUserMemory *memory = &msg->payload.memory; + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + + for (i = 0; i < dev->mem->nregions; i++) { + struct rte_vhost_mem_region *reg = &dev->mem->regions[i]; + uint64_t mmap_size = reg->mmap_size; + uint64_t mmap_offset = mmap_size - reg->size; + uint64_t alignment; + void *mmap_addr; + int populate; + + /* mmap() without flag of MAP_ANONYMOUS, should be called + * with length argument aligned with hugepagesz at older + * longterm version Linux, like 2.6.32 and 3.2.72, or + * mmap() will fail with EINVAL. + * + * to avoid failure, make sure in caller to keep length + * aligned. + */ + alignment = get_blk_size(reg->fd); + if (alignment == (uint64_t)-1) { + RTE_LOG(ERR, VHOST_CONFIG, + "couldn't get hugepage size through fstat\n"); + return -1; + } + mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment); + + populate = (dev->dequeue_zero_copy) ? MAP_POPULATE : 0; + mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, + MAP_SHARED | populate, reg->fd, 0); + + if (mmap_addr == MAP_FAILED) { + RTE_LOG(ERR, VHOST_CONFIG, + "mmap region %u failed.\n", i); + return -1; + } + + reg->mmap_addr = mmap_addr; + reg->mmap_size = mmap_size; + reg->host_user_addr = (uint64_t)(uintptr_t)reg->mmap_addr + + mmap_offset; + + if (dev->dequeue_zero_copy) + if (add_guest_pages(dev, reg, alignment) < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "adding guest pages to region %u failed.\n", + i); + return -1; + } + + RTE_LOG(INFO, VHOST_CONFIG, + "guest memory region %u, size: 0x%" PRIx64 "\n" + "\t guest physical addr: 0x%" PRIx64 "\n" + "\t guest virtual addr: 0x%" PRIx64 "\n" + "\t host virtual addr: 0x%" PRIx64 "\n" + "\t mmap addr : 0x%" PRIx64 "\n" + "\t mmap size : 0x%" PRIx64 "\n" + "\t mmap align: 0x%" PRIx64 "\n" + "\t mmap off : 0x%" PRIx64 "\n", + i, reg->size, + reg->guest_phys_addr, + reg->guest_user_addr, + reg->host_user_addr, + (uint64_t)(uintptr_t)reg->mmap_addr, + reg->mmap_size, + alignment, + mmap_offset); + + if (dev->postcopy_listening) { + /* + * We haven't a better way right now than sharing + * DPDK's virtual address with Qemu, so that Qemu can + * retrieve the region offset when handling userfaults. + */ + memory->regions[i].userspace_addr = + reg->host_user_addr; + } + } + + if (dev->postcopy_listening) { + /* Send the addresses back to qemu */ + msg->fd_num = 0; + /* Send reply */ + msg->flags &= ~VHOST_USER_VERSION_MASK; + msg->flags &= ~VHOST_USER_NEED_REPLY; + msg->flags |= VHOST_USER_VERSION; + msg->flags |= VHOST_USER_REPLY_MASK; + af_unix_send_reply(dev, msg); + + /* Wait for qemu to acknolwedge it's got the addresses + * we've got to wait before we're allowed to generate faults. + */ + VhostUserMsg ack_msg; + if (read_vhost_message(conn->connfd, &ack_msg) <= 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to read qemu ack on postcopy set-mem-table\n"); + return -1; + } + if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) { + RTE_LOG(ERR, VHOST_CONFIG, + "Bad qemu ack on postcopy set-mem-table (%d)\n", + ack_msg.request.master); + return -1; + } + + /* Now userfault register and we can use the memory */ + for (i = 0; i < memory->nregions; i++) { +#ifdef RTE_LIBRTE_VHOST_POSTCOPY + struct rte_vhost_mem_region *reg = &dev->mem->regions[i]; + struct uffdio_register reg_struct; + + /* + * Let's register all the mmap'ed area to ensure + * alignment on page boundary. + */ + reg_struct.range.start = + (uint64_t)(uintptr_t)reg->mmap_addr; + reg_struct.range.len = reg->mmap_size; + reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING; + + if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, + ®_struct)) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to register ufd for region %d: (ufd = %d) %s\n", + i, dev->postcopy_ufd, + strerror(errno)); + return -1; + } + RTE_LOG(INFO, VHOST_CONFIG, + "\t userfaultfd registered for range : %llx - %llx\n", + reg_struct.range.start, + reg_struct.range.start + + reg_struct.range.len - 1); +#else + return -1; +#endif + } + } + + return 0; +} + +static void +af_unix_unmap_mem_regions(struct virtio_net *dev) +{ + uint32_t i; + struct rte_vhost_mem_region *reg; + + for (i = 0; i < dev->mem->nregions; i++) { + reg = &dev->mem->regions[i]; + if (reg->host_user_addr) { + munmap(reg->mmap_addr, reg->mmap_size); + close(reg->fd); + } + } +} + const struct vhost_transport_ops af_unix_trans_ops = { .socket_size = sizeof(struct af_unix_socket), .device_size = sizeof(struct vhost_user_connection), @@ -783,4 +962,6 @@ const struct vhost_transport_ops af_unix_trans_ops = { .send_slave_req = af_unix_send_slave_req, .process_slave_message_reply = af_unix_process_slave_message_reply, .set_slave_req_fd = af_unix_set_slave_req_fd, + .map_mem_regions = af_unix_map_mem_regions, + .unmap_mem_regions = af_unix_unmap_mem_regions, }; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 2213fbe..28038c6 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -415,6 +415,28 @@ struct vhost_transport_ops { */ int (*set_slave_req_fd)(struct virtio_net *dev, struct VhostUserMsg *msg); + + /** + * Map memory table regions in dev->mem->regions[]. + * + * @param dev + * vhost device + * @param msg + * message + * @return + * 0 on success, -1 on failure + */ + int (*map_mem_regions)(struct virtio_net *dev, + struct VhostUserMsg *msg); + + /** + * Unmap memory table regions in dev->mem->regions[] and free any + * resources, such as file descriptors. + * + * @param dev + * vhost device + */ + void (*unmap_mem_regions)(struct virtio_net *dev); }; /** The traditional AF_UNIX vhost-user protocol transport. */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index a4dcba1..ed8dbd8 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -81,17 +81,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { }; static int send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg); -int read_vhost_message(int sockfd, struct VhostUserMsg *msg); - -static uint64_t -get_blk_size(int fd) -{ - struct stat stat; - int ret; - - ret = fstat(fd, &stat); - return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize; -} /* * Reclaim all the outstanding zmbufs for a virtqueue. @@ -120,7 +109,6 @@ static void free_mem_region(struct virtio_net *dev) { uint32_t i; - struct rte_vhost_mem_region *reg; struct vhost_virtqueue *vq; if (!dev || !dev->mem) @@ -134,13 +122,7 @@ free_mem_region(struct virtio_net *dev) } } - for (i = 0; i < dev->mem->nregions; i++) { - reg = &dev->mem->regions[i]; - if (reg->host_user_addr) { - munmap(reg->mmap_addr, reg->mmap_size); - close(reg->fd); - } - } + dev->trans_ops->unmap_mem_regions(dev); } void @@ -792,7 +774,7 @@ add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr, return 0; } -static int +int add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg, uint64_t page_size) { @@ -881,18 +863,13 @@ vhost_memory_changed(struct VhostUserMemory *new, static int vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd) + int main_fd __rte_unused) { struct virtio_net *dev = *pdev; struct VhostUserMemory *memory = &msg->payload.memory; struct rte_vhost_mem_region *reg; - void *mmap_addr; - uint64_t mmap_size; uint64_t mmap_offset; - uint64_t alignment; uint32_t i; - int populate; - int fd; if (memory->nregions > VHOST_MEMORY_MAX_NREGIONS) { RTE_LOG(ERR, VHOST_CONFIG, @@ -904,8 +881,11 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, RTE_LOG(INFO, VHOST_CONFIG, "(%d) memory regions not changed\n", dev->vid); - for (i = 0; i < memory->nregions; i++) - close(msg->fds[i]); + for (i = 0; i < memory->nregions; i++) { + if (msg->fds[i] >= 0) { + close(msg->fds[i]); + } + } return RTE_VHOST_MSG_RESULT_OK; } @@ -946,13 +926,15 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, dev->mem->nregions = memory->nregions; for (i = 0; i < memory->nregions; i++) { - fd = msg->fds[i]; reg = &dev->mem->regions[i]; reg->guest_phys_addr = memory->regions[i].guest_phys_addr; reg->guest_user_addr = memory->regions[i].userspace_addr; reg->size = memory->regions[i].memory_size; - reg->fd = fd; + reg->mmap_size = reg->size + memory->regions[i].mmap_offset; + reg->mmap_addr = NULL; + reg->host_user_addr = 0; + reg->fd = msg->fds[i]; mmap_offset = memory->regions[i].mmap_offset; @@ -962,132 +944,13 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, "mmap_offset (%#"PRIx64") and memory_size " "(%#"PRIx64") overflow\n", mmap_offset, reg->size); - goto err_mmap; - } - - mmap_size = reg->size + mmap_offset; - - /* mmap() without flag of MAP_ANONYMOUS, should be called - * with length argument aligned with hugepagesz at older - * longterm version Linux, like 2.6.32 and 3.2.72, or - * mmap() will fail with EINVAL. - * - * to avoid failure, make sure in caller to keep length - * aligned. - */ - alignment = get_blk_size(fd); - if (alignment == (uint64_t)-1) { - RTE_LOG(ERR, VHOST_CONFIG, - "couldn't get hugepage size through fstat\n"); - goto err_mmap; - } - mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment); - - populate = (dev->dequeue_zero_copy) ? MAP_POPULATE : 0; - mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED | populate, fd, 0); - - if (mmap_addr == MAP_FAILED) { - RTE_LOG(ERR, VHOST_CONFIG, - "mmap region %u failed.\n", i); - goto err_mmap; + goto err; } - reg->mmap_addr = mmap_addr; - reg->mmap_size = mmap_size; - reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + - mmap_offset; - - if (dev->dequeue_zero_copy) - if (add_guest_pages(dev, reg, alignment) < 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "adding guest pages to region %u failed.\n", - i); - goto err_mmap; - } - - RTE_LOG(INFO, VHOST_CONFIG, - "guest memory region %u, size: 0x%" PRIx64 "\n" - "\t guest physical addr: 0x%" PRIx64 "\n" - "\t guest virtual addr: 0x%" PRIx64 "\n" - "\t host virtual addr: 0x%" PRIx64 "\n" - "\t mmap addr : 0x%" PRIx64 "\n" - "\t mmap size : 0x%" PRIx64 "\n" - "\t mmap align: 0x%" PRIx64 "\n" - "\t mmap off : 0x%" PRIx64 "\n", - i, reg->size, - reg->guest_phys_addr, - reg->guest_user_addr, - reg->host_user_addr, - (uint64_t)(uintptr_t)mmap_addr, - mmap_size, - alignment, - mmap_offset); - - if (dev->postcopy_listening) { - /* - * We haven't a better way right now than sharing - * DPDK's virtual address with Qemu, so that Qemu can - * retrieve the region offset when handling userfaults. - */ - memory->regions[i].userspace_addr = - reg->host_user_addr; - } } - if (dev->postcopy_listening) { - /* Send the addresses back to qemu */ - msg->fd_num = 0; - send_vhost_reply(dev, msg); - - /* Wait for qemu to acknolwedge it's got the addresses - * we've got to wait before we're allowed to generate faults. - */ - VhostUserMsg ack_msg; - if (read_vhost_message(main_fd, &ack_msg) <= 0) { - RTE_LOG(ERR, VHOST_CONFIG, - "Failed to read qemu ack on postcopy set-mem-table\n"); - goto err_mmap; - } - if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) { - RTE_LOG(ERR, VHOST_CONFIG, - "Bad qemu ack on postcopy set-mem-table (%d)\n", - ack_msg.request.master); - goto err_mmap; - } - - /* Now userfault register and we can use the memory */ - for (i = 0; i < memory->nregions; i++) { -#ifdef RTE_LIBRTE_VHOST_POSTCOPY - reg = &dev->mem->regions[i]; - struct uffdio_register reg_struct; - /* - * Let's register all the mmap'ed area to ensure - * alignment on page boundary. - */ - reg_struct.range.start = - (uint64_t)(uintptr_t)reg->mmap_addr; - reg_struct.range.len = reg->mmap_size; - reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING; - - if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, - ®_struct)) { - RTE_LOG(ERR, VHOST_CONFIG, - "Failed to register ufd for region %d: (ufd = %d) %s\n", - i, dev->postcopy_ufd, - strerror(errno)); - goto err_mmap; - } - RTE_LOG(INFO, VHOST_CONFIG, - "\t userfaultfd registered for range : %llx - %llx\n", - reg_struct.range.start, - reg_struct.range.start + - reg_struct.range.len - 1); -#else - goto err_mmap; -#endif - } - } + if (dev->trans_ops->map_mem_regions(dev, msg) < 0) + goto err; for (i = 0; i < dev->nr_vring; i++) { struct vhost_virtqueue *vq = dev->virtqueue[i]; @@ -1103,7 +966,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, dev = translate_ring_addresses(dev, i); if (!dev) { dev = *pdev; - goto err_mmap; + goto err; } *pdev = dev; @@ -1114,7 +977,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, return RTE_VHOST_MSG_RESULT_OK; -err_mmap: +err: free_mem_region(dev); rte_free(dev->mem); dev->mem = NULL; diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 0169bd2..200e47b 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -147,6 +147,9 @@ typedef struct VhostUserMsg { /* vhost_user.c */ int vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg); +int add_guest_pages(struct virtio_net *dev, + struct rte_vhost_mem_region *reg, + uint64_t page_size); int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm); #endif From patchwork Wed Jun 19 15:14:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54968 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0940F1C3FD; Wed, 19 Jun 2019 17:16:22 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 2E5ED1C393 for ; Wed, 19 Jun 2019 17:15:42 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id F3E8C182012; Wed, 19 Jun 2019 18:15:41 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 75D6B2B2; Wed, 19 Jun 2019 18:15:41 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:39 +0300 Message-Id: <1560957293-17294-15-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 14/28] vhost: move setup of the log memory region X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Setting up the log memory region involves mapping/unmapping guest memory. This is a transport-specific operation. Other transports may use other means of accessing the guest memory log. Therefore, the mmap/unmap operations, that are related to the memory log, are moved to trans_af_unix.c. A new set_log_base() transport operation is introduced. Signed-off-by: Nikos Dragazis --- lib/librte_vhost/trans_af_unix.c | 41 ++++++++++++++++++++++++++++++++++++++++ lib/librte_vhost/vhost.h | 13 +++++++++++++ lib/librte_vhost/vhost_user.c | 27 +------------------------- 3 files changed, 55 insertions(+), 26 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 522823f..35b1c45 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -763,6 +763,11 @@ af_unix_cleanup_device(struct virtio_net *dev, int destroy __rte_unused) struct vhost_user_connection *conn = container_of(dev, struct vhost_user_connection, device); + if (dev->log_addr) { + munmap((void *)(uintptr_t)dev->log_addr, dev->log_size); + dev->log_addr = 0; + } + if (conn->slave_req_fd >= 0) { close(conn->slave_req_fd); conn->slave_req_fd = -1; @@ -950,6 +955,41 @@ af_unix_unmap_mem_regions(struct virtio_net *dev) } } +static int +af_unix_set_log_base(struct virtio_net *dev, const struct VhostUserMsg *msg) +{ + int fd = msg->fds[0]; + uint64_t size, off; + void *addr; + + size = msg->payload.log.mmap_size; + off = msg->payload.log.mmap_offset; + + /* + * mmap from 0 to workaround a hugepage mmap bug: mmap will + * fail when offset is not page size aligned. + */ + addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + close(fd); + if (addr == MAP_FAILED) { + RTE_LOG(ERR, VHOST_CONFIG, "mmap log base failed!\n"); + return -1; + } + + /* + * Free previously mapped log memory on occasionally + * multiple VHOST_USER_SET_LOG_BASE. + */ + if (dev->log_addr) { + munmap((void *)(uintptr_t)dev->log_addr, dev->log_size); + } + dev->log_addr = (uint64_t)(uintptr_t)addr; + dev->log_base = dev->log_addr + off; + dev->log_size = size; + + return 0; +} + const struct vhost_transport_ops af_unix_trans_ops = { .socket_size = sizeof(struct af_unix_socket), .device_size = sizeof(struct vhost_user_connection), @@ -964,4 +1004,5 @@ const struct vhost_transport_ops af_unix_trans_ops = { .set_slave_req_fd = af_unix_set_slave_req_fd, .map_mem_regions = af_unix_map_mem_regions, .unmap_mem_regions = af_unix_unmap_mem_regions, + .set_log_base = af_unix_set_log_base, }; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 28038c6..b15d223 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -437,6 +437,19 @@ struct vhost_transport_ops { * vhost device */ void (*unmap_mem_regions)(struct virtio_net *dev); + + /** + * Setup the log memory region. + * + * @param dev + * vhost device + * @param msg + * message + * @return + * 0 on success, -1 on failure + */ + int (*set_log_base)(struct virtio_net *dev, + const struct VhostUserMsg *msg); }; /** The traditional AF_UNIX vhost-user protocol transport. */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index ed8dbd8..acb1135 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -137,11 +137,6 @@ vhost_backend_cleanup(struct virtio_net *dev) free(dev->guest_pages); dev->guest_pages = NULL; - if (dev->log_addr) { - munmap((void *)(uintptr_t)dev->log_addr, dev->log_size); - dev->log_addr = 0; - } - if (dev->postcopy_ufd >= 0) { close(dev->postcopy_ufd); dev->postcopy_ufd = -1; @@ -1275,7 +1270,6 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg, struct virtio_net *dev = *pdev; int fd = msg->fds[0]; uint64_t size, off; - void *addr; if (fd < 0) { RTE_LOG(ERR, VHOST_CONFIG, "invalid log fd: %d\n", fd); @@ -1304,27 +1298,8 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg, "log mmap size: %"PRId64", offset: %"PRId64"\n", size, off); - /* - * mmap from 0 to workaround a hugepage mmap bug: mmap will - * fail when offset is not page size aligned. - */ - addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); - close(fd); - if (addr == MAP_FAILED) { - RTE_LOG(ERR, VHOST_CONFIG, "mmap log base failed!\n"); + if (dev->trans_ops->set_log_base(dev, msg) < 0) return RTE_VHOST_MSG_RESULT_ERR; - } - - /* - * Free previously mapped log memory on occasionally - * multiple VHOST_USER_SET_LOG_BASE. - */ - if (dev->log_addr) { - munmap((void *)(uintptr_t)dev->log_addr, dev->log_size); - } - dev->log_addr = (uint64_t)(uintptr_t)addr; - dev->log_base = dev->log_addr + off; - dev->log_size = size; /* * The spec is not clear about it (yet), but QEMU doesn't expect From patchwork Wed Jun 19 15:14:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54969 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ACD351C402; Wed, 19 Jun 2019 17:16:26 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id C99E41C393 for ; Wed, 19 Jun 2019 17:15:42 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 884F0182013; Wed, 19 Jun 2019 18:15:42 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id C68A1394; Wed, 19 Jun 2019 18:15:41 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:40 +0300 Message-Id: <1560957293-17294-16-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 15/28] vhost: remove main fd parameter from msg handlers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" After refactoring the socket I/O and the vhost-user map/unmap operations in previous patches, there is no need for the connection fds in the core vhost-user code. This patch removes the connection fds from the core vhost-user code. Connection fds are used for socket I/O between master and slave. However, this mechanism is transport-specific. Other transports may use other mechanisms for the master/slave communication. Therefore, the connection fds are moved into the AF_UNIX transport code. Signed-off-by: Nikos Dragazis --- lib/librte_vhost/trans_af_unix.c | 2 +- lib/librte_vhost/vhost_user.c | 82 ++++++++++++++-------------------------- lib/librte_vhost/vhost_user.h | 2 +- 3 files changed, 30 insertions(+), 56 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index 35b1c45..a451880 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -374,7 +374,7 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) goto err; } - ret = vhost_user_msg_handler(conn->device.vid, connfd, &msg); + ret = vhost_user_msg_handler(conn->device.vid, &msg); if (ret < 0) { err: close(connfd); diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index acb1135..d3c9c5f 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -151,16 +151,14 @@ vhost_backend_cleanup(struct virtio_net *dev) */ static int vhost_user_set_owner(struct virtio_net **pdev __rte_unused, - struct VhostUserMsg *msg __rte_unused, - int main_fd __rte_unused) + struct VhostUserMsg *msg __rte_unused) { return RTE_VHOST_MSG_RESULT_OK; } static int vhost_user_reset_owner(struct virtio_net **pdev, - struct VhostUserMsg *msg __rte_unused, - int main_fd __rte_unused) + struct VhostUserMsg *msg __rte_unused) { struct virtio_net *dev = *pdev; vhost_destroy_device_notify(dev); @@ -174,8 +172,7 @@ vhost_user_reset_owner(struct virtio_net **pdev, * The features that we support are requested. */ static int -vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; uint64_t features = 0; @@ -193,8 +190,7 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg, * The queue number that we support are requested. */ static int -vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; uint32_t queue_num = 0; @@ -212,8 +208,7 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg, * We receive the negotiated features supported by us and the virtio device. */ static int -vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; uint64_t features = msg->payload.u64; @@ -295,8 +290,7 @@ vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg, */ static int vhost_user_set_vring_num(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index]; @@ -665,8 +659,7 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index) * This function then converts these to our address space. */ static int -vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq; @@ -703,8 +696,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg, */ static int vhost_user_set_vring_base(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index]; @@ -857,8 +849,7 @@ vhost_memory_changed(struct VhostUserMemory *new, } static int -vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct VhostUserMemory *memory = &msg->payload.memory; @@ -1019,8 +1010,7 @@ virtio_is_ready(struct virtio_net *dev) } static int -vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_vring_file file; @@ -1044,8 +1034,7 @@ vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg, } static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { if (!(msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)) close(msg->fds[0]); @@ -1055,8 +1044,7 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused, } static int -vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_vring_file file; @@ -1111,8 +1099,7 @@ free_zmbufs(struct vhost_virtqueue *vq) */ static int vhost_user_get_vring_base(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index]; @@ -1182,8 +1169,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev, */ static int vhost_user_set_vring_enable(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; int enable = (int)msg->payload.state.num; @@ -1215,8 +1201,7 @@ vhost_user_set_vring_enable(struct virtio_net **pdev, static int vhost_user_get_protocol_features(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; uint64_t features, protocol_features; @@ -1242,8 +1227,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev, static int vhost_user_set_protocol_features(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; uint64_t protocol_features = msg->payload.u64; @@ -1264,8 +1248,7 @@ vhost_user_set_protocol_features(struct virtio_net **pdev, } static int -vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; int fd = msg->fds[0]; @@ -1312,8 +1295,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg, } static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { close(msg->fds[0]); RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n"); @@ -1330,8 +1312,7 @@ static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused, * a flag 'broadcast_rarp' to let rte_vhost_dequeue_burst() inject it. */ static int -vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; uint8_t *mac = (uint8_t *)&msg->payload.u64; @@ -1361,8 +1342,7 @@ vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg, } static int -vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; if (msg->payload.u64 < VIRTIO_MIN_MTU || @@ -1379,8 +1359,7 @@ vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg, } static int -vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg) { int ret; struct virtio_net *dev = *pdev; @@ -1443,8 +1422,7 @@ is_vring_iotlb_invalidate(struct vhost_virtqueue *vq, } static int -vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; struct vhost_iotlb_msg *imsg = &msg->payload.iotlb; @@ -1490,8 +1468,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg, static int vhost_user_set_postcopy_advise(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd __rte_unused) + struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; #ifdef RTE_LIBRTE_VHOST_POSTCOPY @@ -1527,8 +1504,7 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev, static int vhost_user_set_postcopy_listen(struct virtio_net **pdev, - struct VhostUserMsg *msg __rte_unused, - int main_fd __rte_unused) + struct VhostUserMsg *msg __rte_unused) { struct virtio_net *dev = *pdev; @@ -1543,8 +1519,7 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev, } static int -vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg, - int main_fd __rte_unused) +vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; @@ -1562,8 +1537,7 @@ vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg, } typedef int (*vhost_message_handler_t)(struct virtio_net **pdev, - struct VhostUserMsg *msg, - int main_fd); + struct VhostUserMsg *msg); static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = { [VHOST_USER_NONE] = NULL, [VHOST_USER_GET_FEATURES] = vhost_user_get_features, @@ -1681,7 +1655,7 @@ vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) } int -vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_) +vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg_) { struct VhostUserMsg msg = *msg_; /* copy so we can build the reply */ struct virtio_net *dev; @@ -1785,7 +1759,7 @@ vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_) if (request > VHOST_USER_NONE && request < VHOST_USER_MAX) { if (!vhost_message_handlers[request]) goto skip_to_post_handle; - ret = vhost_message_handlers[request](&dev, &msg, fd); + ret = vhost_message_handlers[request](&dev, &msg); switch (ret) { case RTE_VHOST_MSG_RESULT_ERR: diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 200e47b..4cc912d 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -146,7 +146,7 @@ typedef struct VhostUserMsg { /* vhost_user.c */ -int vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg); +int vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg); int add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg, uint64_t page_size); From patchwork Wed Jun 19 15:14:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54970 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C537E1C407; Wed, 19 Jun 2019 17:16:29 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 40E591C393 for ; Wed, 19 Jun 2019 17:15:43 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 129AE182014; Wed, 19 Jun 2019 18:15:43 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 5689F2B2; Wed, 19 Jun 2019 18:15:42 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:41 +0300 Message-Id: <1560957293-17294-17-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 16/28] vhost: move postcopy live migration code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Postcopy live migration is an AF_UNIX-bound feature due to the userfaultfd mechanism. Therefore, this patch moves the relevant code from vhost_user.c to trans_af_unix.c and exposes this functionality via transport-specific functions. Any other vhost-user transport could potentially implement this feature by implementing these transport-specific functions. Signed-off-by: Nikos Dragazis --- lib/librte_vhost/trans_af_unix.c | 94 ++++++++++++++++++++++++++++++++++++++-- lib/librte_vhost/vhost.c | 1 - lib/librte_vhost/vhost.h | 41 ++++++++++++++++-- lib/librte_vhost/vhost_user.c | 61 ++------------------------ 4 files changed, 131 insertions(+), 66 deletions(-) diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c index a451880..4ccf9a7 100644 --- a/lib/librte_vhost/trans_af_unix.c +++ b/lib/librte_vhost/trans_af_unix.c @@ -10,6 +10,7 @@ #include #include #include +#include #ifdef RTE_LIBRTE_VHOST_POSTCOPY #include #endif @@ -39,6 +40,9 @@ struct vhost_user_connection { int slave_req_fd; rte_spinlock_t slave_req_lock; + int postcopy_ufd; + int postcopy_listening; + TAILQ_ENTRY(vhost_user_connection) next; }; @@ -261,6 +265,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) conn->slave_req_fd = -1; conn->vsocket = vsocket; rte_spinlock_init(&conn->slave_req_lock); + conn->postcopy_ufd = -1; size = strnlen(vsocket->path, PATH_MAX); vhost_set_ifname(dev->vid, vsocket->path, size); @@ -772,6 +777,13 @@ af_unix_cleanup_device(struct virtio_net *dev, int destroy __rte_unused) close(conn->slave_req_fd); conn->slave_req_fd = -1; } + + if (conn->postcopy_ufd >= 0) { + close(conn->postcopy_ufd); + conn->postcopy_ufd = -1; + } + + conn->postcopy_listening = 0; } static int @@ -866,7 +878,7 @@ af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg) alignment, mmap_offset); - if (dev->postcopy_listening) { + if (conn->postcopy_listening) { /* * We haven't a better way right now than sharing * DPDK's virtual address with Qemu, so that Qemu can @@ -877,7 +889,7 @@ af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg) } } - if (dev->postcopy_listening) { + if (conn->postcopy_listening) { /* Send the addresses back to qemu */ msg->fd_num = 0; /* Send reply */ @@ -918,11 +930,11 @@ af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg) reg_struct.range.len = reg->mmap_size; reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING; - if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, + if (ioctl(conn->postcopy_ufd, UFFDIO_REGISTER, ®_struct)) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to register ufd for region %d: (ufd = %d) %s\n", - i, dev->postcopy_ufd, + i, conn->postcopy_ufd, strerror(errno)); return -1; } @@ -990,6 +1002,77 @@ af_unix_set_log_base(struct virtio_net *dev, const struct VhostUserMsg *msg) return 0; } +static int +af_unix_set_postcopy_advise(struct virtio_net *dev, struct VhostUserMsg *msg) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); +#ifdef RTE_LIBRTE_VHOST_POSTCOPY + struct uffdio_api api_struct; + + conn->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + + if (conn->postcopy_ufd == -1) { + RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n", + strerror(errno)); + return RTE_VHOST_MSG_RESULT_ERR; + } + api_struct.api = UFFD_API; + api_struct.features = 0; + if (ioctl(conn->postcopy_ufd, UFFDIO_API, &api_struct)) { + RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n", + strerror(errno)); + close(conn->postcopy_ufd); + conn->postcopy_ufd = -1; + return RTE_VHOST_MSG_RESULT_ERR; + } + msg->fds[0] = conn->postcopy_ufd; + msg->fd_num = 1; + + return RTE_VHOST_MSG_RESULT_REPLY; +#else + conn->postcopy_ufd = -1; + msg->fd_num = 0; + + return RTE_VHOST_MSG_RESULT_ERR; +#endif +} + +static int +af_unix_set_postcopy_listen(struct virtio_net *dev) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + + if (dev->mem && dev->mem->nregions) { + RTE_LOG(ERR, VHOST_CONFIG, + "Regions already registered at postcopy-listen\n"); + return RTE_VHOST_MSG_RESULT_ERR; + } + conn->postcopy_listening = 1; + + return RTE_VHOST_MSG_RESULT_OK; +} + +static int +af_unix_set_postcopy_end(struct virtio_net *dev, struct VhostUserMsg *msg) +{ + struct vhost_user_connection *conn = + container_of(dev, struct vhost_user_connection, device); + + conn->postcopy_listening = 0; + if (conn->postcopy_ufd >= 0) { + close(conn->postcopy_ufd); + conn->postcopy_ufd = -1; + } + + msg->payload.u64 = 0; + msg->size = sizeof(msg->payload.u64); + msg->fd_num = 0; + + return RTE_VHOST_MSG_RESULT_REPLY; +} + const struct vhost_transport_ops af_unix_trans_ops = { .socket_size = sizeof(struct af_unix_socket), .device_size = sizeof(struct vhost_user_connection), @@ -1005,4 +1088,7 @@ const struct vhost_transport_ops af_unix_trans_ops = { .map_mem_regions = af_unix_map_mem_regions, .unmap_mem_regions = af_unix_unmap_mem_regions, .set_log_base = af_unix_set_log_base, + .set_postcopy_advise = af_unix_set_postcopy_advise, + .set_postcopy_listen = af_unix_set_postcopy_listen, + .set_postcopy_end = af_unix_set_postcopy_end, }; diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 5b16390..91a286d 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -512,7 +512,6 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops) dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET; dev->trans_ops = trans_ops; dev->vdpa_dev_id = -1; - dev->postcopy_ufd = -1; return dev; } diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index b15d223..f5d6dc8 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -450,6 +450,44 @@ struct vhost_transport_ops { */ int (*set_log_base)(struct virtio_net *dev, const struct VhostUserMsg *msg); + + /** + * Register a userfault fd and send it to master. + * + * @param dev + * vhost device + * @param msg + * message + * @return + * RTE_VHOST_MSG_RESULT_REPLY on success, + * RTE_VHOST_MSG_RESULT_ERR on failure + */ + int (*set_postcopy_advise)(struct virtio_net *dev, + struct VhostUserMsg *msg); + + /** + * Change live migration mode (entering postcopy mode). + * + * @param dev + * vhost device + * @return + * RTE_VHOST_MSG_RESULT_OK on success, + * RTE_VHOST_MSG_RESULT_ERR on failure + */ + int (*set_postcopy_listen)(struct virtio_net *dev); + + /** + * Register completion of postcopy live migration. + * + * @param dev + * vhost device + * @param msg + * message + * @return + * RTE_VHOST_MSG_RESULT_REPLY + */ + int (*set_postcopy_end)(struct virtio_net *dev, + struct VhostUserMsg *msg); }; /** The traditional AF_UNIX vhost-user protocol transport. */ @@ -492,9 +530,6 @@ struct virtio_net { uint32_t max_guest_pages; struct guest_page *guest_pages; - int postcopy_ufd; - int postcopy_listening; - /* * Device id to identify a specific backend device. * It's set to -1 for the default software implementation. diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index d3c9c5f..29c99e7 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -29,14 +29,10 @@ #include #include #include -#include #include #ifdef RTE_LIBRTE_VHOST_NUMA #include #endif -#ifdef RTE_LIBRTE_VHOST_POSTCOPY -#include -#endif #include #include @@ -136,13 +132,6 @@ vhost_backend_cleanup(struct virtio_net *dev) free(dev->guest_pages); dev->guest_pages = NULL; - - if (dev->postcopy_ufd >= 0) { - close(dev->postcopy_ufd); - dev->postcopy_ufd = -1; - } - - dev->postcopy_listening = 0; } /* @@ -1471,35 +1460,8 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; -#ifdef RTE_LIBRTE_VHOST_POSTCOPY - struct uffdio_api api_struct; - - dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); - - if (dev->postcopy_ufd == -1) { - RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n", - strerror(errno)); - return RTE_VHOST_MSG_RESULT_ERR; - } - api_struct.api = UFFD_API; - api_struct.features = 0; - if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) { - RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n", - strerror(errno)); - close(dev->postcopy_ufd); - dev->postcopy_ufd = -1; - return RTE_VHOST_MSG_RESULT_ERR; - } - msg->fds[0] = dev->postcopy_ufd; - msg->fd_num = 1; - - return RTE_VHOST_MSG_RESULT_REPLY; -#else - dev->postcopy_ufd = -1; - msg->fd_num = 0; - return RTE_VHOST_MSG_RESULT_ERR; -#endif + return dev->trans_ops->set_postcopy_advise(dev, msg); } static int @@ -1508,14 +1470,7 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev, { struct virtio_net *dev = *pdev; - if (dev->mem && dev->mem->nregions) { - RTE_LOG(ERR, VHOST_CONFIG, - "Regions already registered at postcopy-listen\n"); - return RTE_VHOST_MSG_RESULT_ERR; - } - dev->postcopy_listening = 1; - - return RTE_VHOST_MSG_RESULT_OK; + return dev->trans_ops->set_postcopy_listen(dev); } static int @@ -1523,17 +1478,7 @@ vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg) { struct virtio_net *dev = *pdev; - dev->postcopy_listening = 0; - if (dev->postcopy_ufd >= 0) { - close(dev->postcopy_ufd); - dev->postcopy_ufd = -1; - } - - msg->payload.u64 = 0; - msg->size = sizeof(msg->payload.u64); - msg->fd_num = 0; - - return RTE_VHOST_MSG_RESULT_REPLY; + return dev->trans_ops->set_postcopy_end(dev, msg); } typedef int (*vhost_message_handler_t)(struct virtio_net **pdev, From patchwork Wed Jun 19 15:14:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54971 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0AA111C410; Wed, 19 Jun 2019 17:16:32 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 8C4B31C396 for ; Wed, 19 Jun 2019 17:15:43 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 59A8D182015; Wed, 19 Jun 2019 18:15:43 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id D5FE732C; Wed, 19 Jun 2019 18:15:42 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:42 +0300 Message-Id: <1560957293-17294-18-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 17/28] vhost: support registering additional vhost-user transports X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch introduces a global transport map which will hold pointers to the transport-specific operations of all the available transports. The AF_UNIX transport is supported by default. More transports can be hooked up by implementing struct vhost_transport_ops and registering this structure to the global transport map table. A new transport can be registered with rte_vhost_register_transport(), which is part of librtre_vhost public API. This patch also exports vhost.h and vhost_user.h and some private functions as part of librte_vhost public API. This allows implementing vhost-user transports outside of lib/librte_vhost/. Signed-off-by: Nikos Dragazis --- lib/librte_vhost/Makefile | 2 +- lib/librte_vhost/rte_vhost_version.map | 11 +++++++++++ lib/librte_vhost/socket.c | 26 +++++++++++++++++++++++++- lib/librte_vhost/vhost.h | 22 ++++++++++++++++++++++ 4 files changed, 59 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 5ff5fb2..4f867ec 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -26,7 +26,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \ vhost_user.c virtio_net.c vdpa.c trans_af_unix.c # install includes -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += vhost.h vhost_user.h rte_vhost.h rte_vdpa.h # only compile vhost crypto when cryptodev is enabled ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y) diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 5f1d4a7..9eda81f 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -60,6 +60,17 @@ DPDK_18.02 { } DPDK_17.08; +DPDK_19.05 { + global: + + rte_vhost_register_transport; + vhost_destroy_device; + vhost_new_device; + vhost_set_ifname; + vhost_user_msg_handler; + +} DPDK_18.02; + EXPERIMENTAL { global: diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index fc78b63..fe1c78d 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -317,7 +317,17 @@ rte_vhost_driver_register(const char *path, uint64_t flags) { int ret = -1; struct vhost_user_socket *vsocket; - const struct vhost_transport_ops *trans_ops = &af_unix_trans_ops; + const struct vhost_transport_ops *trans_ops; + + /* Register the AF_UNIX vhost-user transport in the transport map. + * The AF_UNIX transport is supported by default. + */ + if (g_transport_map[VHOST_TRANSPORT_UNIX] == NULL) { + if (rte_vhost_register_transport(VHOST_TRANSPORT_UNIX, &af_unix_trans_ops) < 0) + goto out; + } + + trans_ops = g_transport_map[VHOST_TRANSPORT_UNIX]; if (!path) return -1; @@ -495,3 +505,17 @@ rte_vhost_driver_start(const char *path) return vsocket->trans_ops->socket_start(vsocket); } + +int +rte_vhost_register_transport(VhostUserTransport trans, + const struct vhost_transport_ops *trans_ops) +{ + if (trans >= VHOST_TRANSPORT_MAX) { + RTE_LOG(ERR, VHOST_CONFIG, + "Invalid vhost-user transport %d\n", trans); + return -1; + } + + g_transport_map[trans] = trans_ops; + return 0; +} diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index f5d6dc8..aba8d9b 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -493,6 +493,28 @@ struct vhost_transport_ops { /** The traditional AF_UNIX vhost-user protocol transport. */ extern const struct vhost_transport_ops af_unix_trans_ops; +typedef enum VhostUserTransport { + VHOST_TRANSPORT_UNIX = 0, + VHOST_TRANSPORT_MAX = 1 +} VhostUserTransport; + +/* A list with all the available vhost-user transports. */ +const struct vhost_transport_ops *g_transport_map[VHOST_TRANSPORT_MAX]; + +/** + * Register a new vhost-user transport in the transport map. + * + * @param trans + * the transport that is going to be registered + * @param trans_ops + * the transport operations supported by this transport + * @return + * 0 on success, -1 on failure + * */ +int +rte_vhost_register_transport(VhostUserTransport trans, + const struct vhost_transport_ops *trans_ops); + /** * Device structure contains all configuration information relating * to the device. From patchwork Wed Jun 19 15:14:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54972 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5A5AB1C418; Wed, 19 Jun 2019 17:16:34 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id C8FA21C398 for ; Wed, 19 Jun 2019 17:15:44 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 89CE0182016; Wed, 19 Jun 2019 18:15:44 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 2888D394; Wed, 19 Jun 2019 18:15:43 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:43 +0300 Message-Id: <1560957293-17294-19-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The virtio-vhost-user transport requires a driver for the virtio-vhost-user PCI device, hence it needs a virtio-pci driver. There is currently no librte_virtio API that we can use. This commit is a hack that duplicates the virtio pci code from drivers/net/ into drivers/virtio_vhost_user/. A better solution would be to extract the code cleanly from drivers/net/ and share it. Or perhaps we could backport SPDK's lib/virtio/. drivers/virtio_vhost_user/ will host the virtio-vhost-user transport implementation in the upcoming patches. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- drivers/virtio_vhost_user/virtio_pci.c | 504 +++++++++++++++++++++++++++++++++ drivers/virtio_vhost_user/virtio_pci.h | 270 ++++++++++++++++++ drivers/virtio_vhost_user/virtqueue.h | 181 ++++++++++++ 3 files changed, 955 insertions(+) create mode 100644 drivers/virtio_vhost_user/virtio_pci.c create mode 100644 drivers/virtio_vhost_user/virtio_pci.h create mode 100644 drivers/virtio_vhost_user/virtqueue.h diff --git a/drivers/virtio_vhost_user/virtio_pci.c b/drivers/virtio_vhost_user/virtio_pci.c new file mode 100644 index 0000000..9c2c981 --- /dev/null +++ b/drivers/virtio_vhost_user/virtio_pci.c @@ -0,0 +1,504 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2014 Intel Corporation + */ +#include + +/* XXX This file is based on drivers/net/virtio/virtio_pci.c. It would be + * better to create a shared rte_virtio library instead of duplicating this + * code. + */ + +#ifdef RTE_EXEC_ENV_LINUX + #include + #include +#endif + +#include +#include + +#include "virtio_pci.h" +#include "virtqueue.h" + +/* + * Following macros are derived from linux/pci_regs.h, however, + * we can't simply include that header here, as there is no such + * file for non-Linux platform. + */ +#define PCI_CAPABILITY_LIST 0x34 +#define PCI_CAP_ID_VNDR 0x09 +#define PCI_CAP_ID_MSIX 0x11 + +/* + * The remaining space is defined by each driver as the per-driver + * configuration space. + */ +#define VIRTIO_PCI_CONFIG(hw) \ + (((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20) + +static inline int +check_vq_phys_addr_ok(struct virtqueue *vq) +{ + /* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit, + * and only accepts 32 bit page frame number. + * Check if the allocated physical memory exceeds 16TB. + */ + if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >> + (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "vring address shouldn't be above 16TB!\n"); + return 0; + } + + return 1; +} + +static inline void +io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi) +{ + rte_write32(val & ((1ULL << 32) - 1), lo); + rte_write32(val >> 32, hi); +} + +static void +modern_read_dev_config(struct virtio_hw *hw, size_t offset, + void *dst, int length) +{ + int i; + uint8_t *p; + uint8_t old_gen, new_gen; + + do { + old_gen = rte_read8(&hw->common_cfg->config_generation); + + p = dst; + for (i = 0; i < length; i++) + *p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i); + + new_gen = rte_read8(&hw->common_cfg->config_generation); + } while (old_gen != new_gen); +} + +static void +modern_write_dev_config(struct virtio_hw *hw, size_t offset, + const void *src, int length) +{ + int i; + const uint8_t *p = src; + + for (i = 0; i < length; i++) + rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i)); +} + +static uint64_t +modern_get_features(struct virtio_hw *hw) +{ + uint32_t features_lo, features_hi; + + rte_write32(0, &hw->common_cfg->device_feature_select); + features_lo = rte_read32(&hw->common_cfg->device_feature); + + rte_write32(1, &hw->common_cfg->device_feature_select); + features_hi = rte_read32(&hw->common_cfg->device_feature); + + return ((uint64_t)features_hi << 32) | features_lo; +} + +static void +modern_set_features(struct virtio_hw *hw, uint64_t features) +{ + rte_write32(0, &hw->common_cfg->guest_feature_select); + rte_write32(features & ((1ULL << 32) - 1), + &hw->common_cfg->guest_feature); + + rte_write32(1, &hw->common_cfg->guest_feature_select); + rte_write32(features >> 32, + &hw->common_cfg->guest_feature); +} + +static uint8_t +modern_get_status(struct virtio_hw *hw) +{ + return rte_read8(&hw->common_cfg->device_status); +} + +static void +modern_set_status(struct virtio_hw *hw, uint8_t status) +{ + rte_write8(status, &hw->common_cfg->device_status); +} + +static void +modern_reset(struct virtio_hw *hw) +{ + modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET); + modern_get_status(hw); +} + +static uint8_t +modern_get_isr(struct virtio_hw *hw) +{ + return rte_read8(hw->isr); +} + +static uint16_t +modern_set_config_irq(struct virtio_hw *hw, uint16_t vec) +{ + rte_write16(vec, &hw->common_cfg->msix_config); + return rte_read16(&hw->common_cfg->msix_config); +} + +static uint16_t +modern_set_queue_irq(struct virtio_hw *hw, struct virtqueue *vq, uint16_t vec) +{ + rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select); + rte_write16(vec, &hw->common_cfg->queue_msix_vector); + return rte_read16(&hw->common_cfg->queue_msix_vector); +} + +static uint16_t +modern_get_queue_num(struct virtio_hw *hw, uint16_t queue_id) +{ + rte_write16(queue_id, &hw->common_cfg->queue_select); + return rte_read16(&hw->common_cfg->queue_size); +} + +static int +modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq) +{ + uint64_t desc_addr, avail_addr, used_addr; + uint16_t notify_off; + + if (!check_vq_phys_addr_ok(vq)) + return -1; + + desc_addr = vq->vq_ring_mem; + avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc); + used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail, + ring[vq->vq_nentries]), + VIRTIO_PCI_VRING_ALIGN); + + rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select); + + io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo, + &hw->common_cfg->queue_desc_hi); + io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo, + &hw->common_cfg->queue_avail_hi); + io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo, + &hw->common_cfg->queue_used_hi); + + notify_off = rte_read16(&hw->common_cfg->queue_notify_off); + vq->notify_addr = (void *)((uint8_t *)hw->notify_base + + notify_off * hw->notify_off_multiplier); + + rte_write16(1, &hw->common_cfg->queue_enable); + + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "queue %u addresses:\n", vq->vq_queue_index); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t desc_addr: %" PRIx64 "\n", desc_addr); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t aval_addr: %" PRIx64 "\n", avail_addr); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t used_addr: %" PRIx64 "\n", used_addr); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t notify addr: %p (notify offset: %u)\n", + vq->notify_addr, notify_off); + + return 0; +} + +static void +modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq) +{ + rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select); + + io_write64_twopart(0, &hw->common_cfg->queue_desc_lo, + &hw->common_cfg->queue_desc_hi); + io_write64_twopart(0, &hw->common_cfg->queue_avail_lo, + &hw->common_cfg->queue_avail_hi); + io_write64_twopart(0, &hw->common_cfg->queue_used_lo, + &hw->common_cfg->queue_used_hi); + + rte_write16(0, &hw->common_cfg->queue_enable); +} + +static void +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq) +{ + rte_write16(vq->vq_queue_index, vq->notify_addr); +} + +const struct virtio_pci_ops virtio_pci_modern_ops = { + .read_dev_cfg = modern_read_dev_config, + .write_dev_cfg = modern_write_dev_config, + .reset = modern_reset, + .get_status = modern_get_status, + .set_status = modern_set_status, + .get_features = modern_get_features, + .set_features = modern_set_features, + .get_isr = modern_get_isr, + .set_config_irq = modern_set_config_irq, + .set_queue_irq = modern_set_queue_irq, + .get_queue_num = modern_get_queue_num, + .setup_queue = modern_setup_queue, + .del_queue = modern_del_queue, + .notify_queue = modern_notify_queue, +}; + + +void +virtio_pci_read_dev_config(struct virtio_hw *hw, size_t offset, + void *dst, int length) +{ + VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length); +} + +void +virtio_pci_write_dev_config(struct virtio_hw *hw, size_t offset, + const void *src, int length) +{ + VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length); +} + +uint64_t +virtio_pci_negotiate_features(struct virtio_hw *hw, uint64_t host_features) +{ + uint64_t features; + + /* + * Limit negotiated features to what the driver, virtqueue, and + * host all support. + */ + features = host_features & hw->guest_features; + VTPCI_OPS(hw)->set_features(hw, features); + + return features; +} + +void +virtio_pci_reset(struct virtio_hw *hw) +{ + VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET); + /* flush status write */ + VTPCI_OPS(hw)->get_status(hw); +} + +void +virtio_pci_reinit_complete(struct virtio_hw *hw) +{ + virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); +} + +void +virtio_pci_set_status(struct virtio_hw *hw, uint8_t status) +{ + if (status != VIRTIO_CONFIG_STATUS_RESET) + status |= VTPCI_OPS(hw)->get_status(hw); + + VTPCI_OPS(hw)->set_status(hw, status); +} + +uint8_t +virtio_pci_get_status(struct virtio_hw *hw) +{ + return VTPCI_OPS(hw)->get_status(hw); +} + +uint8_t +virtio_pci_isr(struct virtio_hw *hw) +{ + return VTPCI_OPS(hw)->get_isr(hw); +} + +static void * +get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap) +{ + uint8_t bar = cap->bar; + uint32_t length = cap->length; + uint32_t offset = cap->offset; + uint8_t *base; + + if (bar >= PCI_MAX_RESOURCE) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "invalid bar: %u\n", bar); + return NULL; + } + + if (offset + length < offset) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "offset(%u) + length(%u) overflows\n", + offset, length); + return NULL; + } + + if (offset + length > dev->mem_resource[bar].len) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, + "invalid cap: overflows bar space: %u > %" PRIu64 "\n", + offset + length, dev->mem_resource[bar].len); + return NULL; + } + + base = dev->mem_resource[bar].addr; + if (base == NULL) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "bar %u base addr is NULL\n", bar); + return NULL; + } + + return base + offset; +} + +#define PCI_MSIX_ENABLE 0x8000 + +static int +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) +{ + uint8_t pos; + struct virtio_pci_cap cap; + int ret; + + if (rte_pci_map_device(dev)) { + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to map pci device!\n"); + return -1; + } + + ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST); + if (ret < 0) { + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to read pci capability list\n"); + return -1; + } + + while (pos) { + ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos); + if (ret < 0) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, + "failed to read pci cap at pos: %x\n", pos); + break; + } + + if (cap.cap_vndr == PCI_CAP_ID_MSIX) { + /* Transitional devices would also have this capability, + * that's why we also check if msix is enabled. + * 1st byte is cap ID; 2nd byte is the position of next + * cap; next two bytes are the flags. + */ + uint16_t flags = ((uint16_t *)&cap)[1]; + + if (flags & PCI_MSIX_ENABLE) + hw->use_msix = VIRTIO_MSIX_ENABLED; + else + hw->use_msix = VIRTIO_MSIX_DISABLED; + } + + if (cap.cap_vndr != PCI_CAP_ID_VNDR) { + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, + "[%2x] skipping non VNDR cap id: %02x\n", + pos, cap.cap_vndr); + goto next; + } + + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, + "[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u\n", + pos, cap.cfg_type, cap.bar, cap.offset, cap.length); + + switch (cap.cfg_type) { + case VIRTIO_PCI_CAP_COMMON_CFG: + hw->common_cfg = get_cfg_addr(dev, &cap); + break; + case VIRTIO_PCI_CAP_NOTIFY_CFG: + rte_pci_read_config(dev, &hw->notify_off_multiplier, + 4, pos + sizeof(cap)); + hw->notify_base = get_cfg_addr(dev, &cap); + break; + case VIRTIO_PCI_CAP_DEVICE_CFG: + hw->dev_cfg = get_cfg_addr(dev, &cap); + break; + case VIRTIO_PCI_CAP_ISR_CFG: + hw->isr = get_cfg_addr(dev, &cap); + break; + } + +next: + pos = cap.cap_next; + } + + if (hw->common_cfg == NULL || hw->notify_base == NULL || + hw->dev_cfg == NULL || hw->isr == NULL) { + RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "no modern virtio pci device found.\n"); + return -1; + } + + RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "found modern virtio pci device.\n"); + + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "common cfg mapped at: %p\n", hw->common_cfg); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "device cfg mapped at: %p\n", hw->dev_cfg); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "isr cfg mapped at: %p\n", hw->isr); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notify base: %p, notify off multiplier: %u\n", + hw->notify_base, hw->notify_off_multiplier); + + return 0; +} + +struct virtio_hw_internal virtio_pci_hw_internal[8]; + +/* + * Return -1: + * if there is error mapping with VFIO/UIO. + * if port map error when driver type is KDRV_NONE. + * if whitelisted but driver type is KDRV_UNKNOWN. + * Return 1 if kernel driver is managing the device. + * Return 0 on success. + */ +int +virtio_pci_init(struct rte_pci_device *dev, struct virtio_hw *hw) +{ + static size_t internal_id; + + if (internal_id >= + sizeof(virtio_pci_hw_internal) / sizeof(*virtio_pci_hw_internal)) { + RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "too many virtio pci devices.\n"); + return -1; + } + + /* + * Try if we can succeed reading virtio pci caps, which exists + * only on modern pci device. + */ + if (virtio_read_caps(dev, hw) != 0) { + RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "legacy virtio pci is not supported.\n"); + return -1; + } + + RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "modern virtio pci detected.\n"); + hw->internal_id = internal_id++; + virtio_pci_hw_internal[hw->internal_id].vtpci_ops = + &virtio_pci_modern_ops; + return 0; +} + +enum virtio_msix_status +virtio_pci_msix_detect(struct rte_pci_device *dev) +{ + uint8_t pos; + struct virtio_pci_cap cap; + int ret; + + ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST); + if (ret < 0) { + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to read pci capability list\n"); + return VIRTIO_MSIX_NONE; + } + + while (pos) { + ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos); + if (ret < 0) { + RTE_LOG(ERR, VIRTIO_PCI_CONFIG, + "failed to read pci cap at pos: %x\n", pos); + break; + } + + if (cap.cap_vndr == PCI_CAP_ID_MSIX) { + uint16_t flags = ((uint16_t *)&cap)[1]; + + if (flags & PCI_MSIX_ENABLE) + return VIRTIO_MSIX_ENABLED; + else + return VIRTIO_MSIX_DISABLED; + } + + pos = cap.cap_next; + } + + return VIRTIO_MSIX_NONE; +} diff --git a/drivers/virtio_vhost_user/virtio_pci.h b/drivers/virtio_vhost_user/virtio_pci.h new file mode 100644 index 0000000..018e0b7 --- /dev/null +++ b/drivers/virtio_vhost_user/virtio_pci.h @@ -0,0 +1,270 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2014 Intel Corporation + */ + +/* XXX This file is based on drivers/net/virtio/virtio_pci.h. It would be + * better to create a shared rte_virtio library instead of duplicating this + * code. + */ + +#ifndef _VIRTIO_PCI_H_ +#define _VIRTIO_PCI_H_ + +#include + +#include +#include +#include +#include + +/* Macros for printing using RTE_LOG */ +#define RTE_LOGTYPE_VIRTIO_PCI_CONFIG RTE_LOGTYPE_USER2 + +struct virtqueue; + +/* VirtIO PCI vendor/device ID. */ +#define VIRTIO_PCI_VENDORID 0x1AF4 +#define VIRTIO_PCI_LEGACY_DEVICEID_VHOST_USER 0x1017 +#define VIRTIO_PCI_MODERN_DEVICEID_VHOST_USER 0x1058 + +/* VirtIO ABI version, this must match exactly. */ +#define VIRTIO_PCI_ABI_VERSION 0 + +/* + * VirtIO Header, located in BAR 0. + */ +#define VIRTIO_PCI_HOST_FEATURES 0 /* host's supported features (32bit, RO)*/ +#define VIRTIO_PCI_GUEST_FEATURES 4 /* guest's supported features (32, RW) */ +#define VIRTIO_PCI_QUEUE_PFN 8 /* physical address of VQ (32, RW) */ +#define VIRTIO_PCI_QUEUE_NUM 12 /* number of ring entries (16, RO) */ +#define VIRTIO_PCI_QUEUE_SEL 14 /* current VQ selection (16, RW) */ +#define VIRTIO_PCI_QUEUE_NOTIFY 16 /* notify host regarding VQ (16, RW) */ +#define VIRTIO_PCI_STATUS 18 /* device status register (8, RW) */ +#define VIRTIO_PCI_ISR 19 /* interrupt status register, reading + * also clears the register (8, RO) */ +/* Only if MSIX is enabled: */ +#define VIRTIO_MSI_CONFIG_VECTOR 20 /* configuration change vector (16, RW) */ +#define VIRTIO_MSI_QUEUE_VECTOR 22 /* vector for selected VQ notifications + (16, RW) */ + +/* The bit of the ISR which indicates a device has an interrupt. */ +#define VIRTIO_PCI_ISR_INTR 0x1 +/* The bit of the ISR which indicates a device configuration change. */ +#define VIRTIO_PCI_ISR_CONFIG 0x2 +/* Vector value used to disable MSI for queue. */ +#define VIRTIO_MSI_NO_VECTOR 0xFFFF + +/* VirtIO device IDs. */ +#define VIRTIO_ID_VHOST_USER 0x18 + +/* Status byte for guest to report progress. */ +#define VIRTIO_CONFIG_STATUS_RESET 0x00 +#define VIRTIO_CONFIG_STATUS_ACK 0x01 +#define VIRTIO_CONFIG_STATUS_DRIVER 0x02 +#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04 +#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08 +#define VIRTIO_CONFIG_STATUS_FAILED 0x80 + +/* + * Each virtqueue indirect descriptor list must be physically contiguous. + * To allow us to malloc(9) each list individually, limit the number + * supported to what will fit in one page. With 4KB pages, this is a limit + * of 256 descriptors. If there is ever a need for more, we can switch to + * contigmalloc(9) for the larger allocations, similar to what + * bus_dmamem_alloc(9) does. + * + * Note the sizeof(struct vring_desc) is 16 bytes. + */ +#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16)) + +/* Do we get callbacks when the ring is completely used, even if we've + * suppressed them? */ +#define VIRTIO_F_NOTIFY_ON_EMPTY 24 + +/* Can the device handle any descriptor layout? */ +#define VIRTIO_F_ANY_LAYOUT 27 + +/* We support indirect buffer descriptors */ +#define VIRTIO_RING_F_INDIRECT_DESC 28 + +#define VIRTIO_F_VERSION_1 32 +#define VIRTIO_F_IOMMU_PLATFORM 33 + +/* + * Some VirtIO feature bits (currently bits 28 through 31) are + * reserved for the transport being used (eg. virtio_ring), the + * rest are per-device feature bits. + */ +#define VIRTIO_TRANSPORT_F_START 28 + +#ifndef VIRTIO_TRANSPORT_F_END +#define VIRTIO_TRANSPORT_F_END 34 +#endif + +/* The Guest publishes the used index for which it expects an interrupt + * at the end of the avail ring. Host should ignore the avail->flags field. */ +/* The Host publishes the avail index for which it expects a kick + * at the end of the used ring. Guest should ignore the used->flags field. */ +#define VIRTIO_RING_F_EVENT_IDX 29 + +/* Common configuration */ +#define VIRTIO_PCI_CAP_COMMON_CFG 1 +/* Notifications */ +#define VIRTIO_PCI_CAP_NOTIFY_CFG 2 +/* ISR Status */ +#define VIRTIO_PCI_CAP_ISR_CFG 3 +/* Device specific configuration */ +#define VIRTIO_PCI_CAP_DEVICE_CFG 4 +/* PCI configuration access */ +#define VIRTIO_PCI_CAP_PCI_CFG 5 + +/* This is the PCI capability header: */ +struct virtio_pci_cap { + uint8_t cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ + uint8_t cap_next; /* Generic PCI field: next ptr. */ + uint8_t cap_len; /* Generic PCI field: capability length */ + uint8_t cfg_type; /* Identifies the structure. */ + uint8_t bar; /* Where to find it. */ + uint8_t padding[3]; /* Pad to full dword. */ + uint32_t offset; /* Offset within bar. */ + uint32_t length; /* Length of the structure, in bytes. */ +}; + +struct virtio_pci_notify_cap { + struct virtio_pci_cap cap; + uint32_t notify_off_multiplier; /* Multiplier for queue_notify_off. */ +}; + +/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */ +struct virtio_pci_common_cfg { + /* About the whole device. */ + uint32_t device_feature_select; /* read-write */ + uint32_t device_feature; /* read-only */ + uint32_t guest_feature_select; /* read-write */ + uint32_t guest_feature; /* read-write */ + uint16_t msix_config; /* read-write */ + uint16_t num_queues; /* read-only */ + uint8_t device_status; /* read-write */ + uint8_t config_generation; /* read-only */ + + /* About a specific virtqueue. */ + uint16_t queue_select; /* read-write */ + uint16_t queue_size; /* read-write, power of 2. */ + uint16_t queue_msix_vector; /* read-write */ + uint16_t queue_enable; /* read-write */ + uint16_t queue_notify_off; /* read-only */ + uint32_t queue_desc_lo; /* read-write */ + uint32_t queue_desc_hi; /* read-write */ + uint32_t queue_avail_lo; /* read-write */ + uint32_t queue_avail_hi; /* read-write */ + uint32_t queue_used_lo; /* read-write */ + uint32_t queue_used_hi; /* read-write */ +}; + +struct virtio_hw; + +struct virtio_pci_ops { + void (*read_dev_cfg)(struct virtio_hw *hw, size_t offset, + void *dst, int len); + void (*write_dev_cfg)(struct virtio_hw *hw, size_t offset, + const void *src, int len); + void (*reset)(struct virtio_hw *hw); + + uint8_t (*get_status)(struct virtio_hw *hw); + void (*set_status)(struct virtio_hw *hw, uint8_t status); + + uint64_t (*get_features)(struct virtio_hw *hw); + void (*set_features)(struct virtio_hw *hw, uint64_t features); + + uint8_t (*get_isr)(struct virtio_hw *hw); + + uint16_t (*set_config_irq)(struct virtio_hw *hw, uint16_t vec); + + uint16_t (*set_queue_irq)(struct virtio_hw *hw, struct virtqueue *vq, + uint16_t vec); + + uint16_t (*get_queue_num)(struct virtio_hw *hw, uint16_t queue_id); + int (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq); + void (*del_queue)(struct virtio_hw *hw, struct virtqueue *vq); + void (*notify_queue)(struct virtio_hw *hw, struct virtqueue *vq); +}; + +struct virtio_hw { + uint64_t guest_features; + uint32_t max_queue_pairs; + uint16_t started; + uint8_t use_msix; + uint16_t internal_id; + uint32_t notify_off_multiplier; + uint8_t *isr; + uint16_t *notify_base; + struct virtio_pci_common_cfg *common_cfg; + void *dev_cfg; + /* + * App management thread and virtio interrupt handler thread + * both can change device state, this lock is meant to avoid + * such a contention. + */ + rte_spinlock_t state_lock; + + struct virtqueue **vqs; +}; + +/* + * While virtio_hw is stored in shared memory, this structure stores + * some infos that may vary in the multiple process model locally. + * For example, the vtpci_ops pointer. + */ +struct virtio_hw_internal { + const struct virtio_pci_ops *vtpci_ops; +}; + +#define VTPCI_OPS(hw) (virtio_pci_hw_internal[(hw)->internal_id].vtpci_ops) + +extern struct virtio_hw_internal virtio_pci_hw_internal[8]; + +/* + * How many bits to shift physical queue address written to QUEUE_PFN. + * 12 is historical, and due to x86 page size. + */ +#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12 + +/* The alignment to use between consumer and producer parts of vring. */ +#define VIRTIO_PCI_VRING_ALIGN 4096 + +enum virtio_msix_status { + VIRTIO_MSIX_NONE = 0, + VIRTIO_MSIX_DISABLED = 1, + VIRTIO_MSIX_ENABLED = 2 +}; + +static inline int +virtio_pci_with_feature(struct virtio_hw *hw, uint64_t bit) +{ + return (hw->guest_features & (1ULL << bit)) != 0; +} + +/* + * Function declaration from virtio_pci.c + */ +int virtio_pci_init(struct rte_pci_device *dev, struct virtio_hw *hw); +void virtio_pci_reset(struct virtio_hw *); + +void virtio_pci_reinit_complete(struct virtio_hw *); + +uint8_t virtio_pci_get_status(struct virtio_hw *); +void virtio_pci_set_status(struct virtio_hw *, uint8_t); + +uint64_t virtio_pci_negotiate_features(struct virtio_hw *, uint64_t); + +void virtio_pci_write_dev_config(struct virtio_hw *, size_t, const void *, int); + +void virtio_pci_read_dev_config(struct virtio_hw *, size_t, void *, int); + +uint8_t virtio_pci_isr(struct virtio_hw *); + +enum virtio_msix_status virtio_pci_msix_detect(struct rte_pci_device *dev); + +extern const struct virtio_pci_ops virtio_pci_modern_ops; + +#endif /* _VIRTIO_PCI_H_ */ diff --git a/drivers/virtio_vhost_user/virtqueue.h b/drivers/virtio_vhost_user/virtqueue.h new file mode 100644 index 0000000..e2ac78e --- /dev/null +++ b/drivers/virtio_vhost_user/virtqueue.h @@ -0,0 +1,181 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2014 Intel Corporation + */ + +/* XXX This file is based on drivers/net/virtio/virtqueue.h. It would be + * better to create a shared rte_virtio library instead of duplicating this + * code. + */ + +#ifndef _VIRTQUEUE_H_ +#define _VIRTQUEUE_H_ + +#include +#include + +#include +#include +#include + +#include "virtio_pci.h" + +/* + * Per virtio_config.h in Linux. + * For virtio_pci on SMP, we don't need to order with respect to MMIO + * accesses through relaxed memory I/O windows, so smp_mb() et al are + * sufficient. + * + */ +#define virtio_mb() rte_smp_mb() +#define virtio_rmb() rte_smp_rmb() +#define virtio_wmb() rte_smp_wmb() + +#define VIRTQUEUE_MAX_NAME_SZ 32 + +/** + * The maximum virtqueue size is 2^15. Use that value as the end of + * descriptor chain terminator since it will never be a valid index + * in the descriptor table. This is used to verify we are correctly + * handling vq_free_cnt. + */ +#define VQ_RING_DESC_CHAIN_END 32768 + +struct vq_desc_extra { + void *cookie; + uint16_t ndescs; +}; + +struct virtqueue { + struct virtio_hw *hw; /**< virtio_hw structure pointer. */ + struct vring vq_ring; /**< vring keeping desc, used and avail */ + /** + * Last consumed descriptor in the used table, + * trails vq_ring.used->idx. + */ + uint16_t vq_used_cons_idx; + uint16_t vq_nentries; /**< vring desc numbers */ + uint16_t vq_free_cnt; /**< num of desc available */ + uint16_t vq_avail_idx; /**< sync until needed */ + uint16_t vq_free_thresh; /**< free threshold */ + + void *vq_ring_virt_mem; /**< linear address of vring*/ + unsigned int vq_ring_size; + + rte_iova_t vq_ring_mem; /**< physical address of vring */ + + const struct rte_memzone *mz; /**< memzone backing vring */ + + /** + * Head of the free chain in the descriptor table. If + * there are no free descriptors, this will be set to + * VQ_RING_DESC_CHAIN_END. + */ + uint16_t vq_desc_head_idx; + uint16_t vq_desc_tail_idx; + uint16_t vq_queue_index; /**< PCI queue index */ + uint16_t *notify_addr; + struct vq_desc_extra vq_descx[0]; +}; + +/* Chain all the descriptors in the ring with an END */ +static inline void +vring_desc_init(struct vring_desc *dp, uint16_t n) +{ + uint16_t i; + + for (i = 0; i < n - 1; i++) + dp[i].next = (uint16_t)(i + 1); + dp[i].next = VQ_RING_DESC_CHAIN_END; +} + +/** + * Tell the backend not to interrupt us. + */ +static inline void +virtqueue_disable_intr(struct virtqueue *vq) +{ + vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT; +} + +/** + * Tell the backend to interrupt us. + */ +static inline void +virtqueue_enable_intr(struct virtqueue *vq) +{ + vq->vq_ring.avail->flags &= (~VRING_AVAIL_F_NO_INTERRUPT); +} + +/** + * Dump virtqueue internal structures, for debug purpose only. + */ +void virtqueue_dump(struct virtqueue *vq); + +static inline int +virtqueue_full(const struct virtqueue *vq) +{ + return vq->vq_free_cnt == 0; +} + +#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx)) + +static inline void +vq_update_avail_idx(struct virtqueue *vq) +{ + virtio_wmb(); + vq->vq_ring.avail->idx = vq->vq_avail_idx; +} + +static inline void +vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx) +{ + uint16_t avail_idx; + /* + * Place the head of the descriptor chain into the next slot and make + * it usable to the host. The chain is made available now rather than + * deferring to virtqueue_notify() in the hopes that if the host is + * currently running on another CPU, we can keep it processing the new + * descriptor. + */ + avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1)); + if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx)) + vq->vq_ring.avail->ring[avail_idx] = desc_idx; + vq->vq_avail_idx++; +} + +static inline int +virtqueue_kick_prepare(struct virtqueue *vq) +{ + return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY); +} + +static inline void +virtqueue_notify(struct virtqueue *vq) +{ + /* + * Ensure updated avail->idx is visible to host. + * For virtio on IA, the notificaiton is through io port operation + * which is a serialization instruction itself. + */ + VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq); +} + +#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP +#define VIRTQUEUE_DUMP(vq) do { \ + uint16_t used_idx, nused; \ + used_idx = (vq)->vq_ring.used->idx; \ + nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \ + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, \ + "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \ + " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \ + " avail.flags=0x%x; used.flags=0x%x\n", \ + (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \ + (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \ + (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \ + (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \ +} while (0) +#else +#define VIRTQUEUE_DUMP(vq) do { } while (0) +#endif + +#endif /* _VIRTQUEUE_H_ */ From patchwork Wed Jun 19 15:14:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54973 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B8CD21C420; Wed, 19 Jun 2019 17:16:36 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id DB6361C392 for ; Wed, 19 Jun 2019 17:15:44 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id BCC94182017; Wed, 19 Jun 2019 18:15:44 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 5AF4E2B2; Wed, 19 Jun 2019 18:15:44 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:44 +0300 Message-Id: <1560957293-17294-20-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 19/28] vhost: add index field in vhost virtqueues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Stefan Hajnoczi Currently, the only way of determining a struct vhost_virtqueue's index is to search struct virtio_net->virtqueue[] for its address. Stash the index in struct vhost_virtqueue so we won't have to search the array. This new field will be used by virtio-vhost-user. Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/vhost.c | 2 ++ lib/librte_vhost/vhost.h | 1 + 2 files changed, 3 insertions(+) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 91a286d..d083d7e 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -407,6 +407,8 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx) memset(vq, 0, sizeof(struct vhost_virtqueue)); + vq->vring_idx = vring_idx; + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index aba8d9b..2e7eabe 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -107,6 +107,7 @@ struct vhost_virtqueue { struct vring_packed_desc_event *device_event; }; uint32_t size; + uint32_t vring_idx; uint16_t last_avail_idx; uint16_t last_used_idx; From patchwork Wed Jun 19 15:14:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54986 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C93C31C427; Wed, 19 Jun 2019 17:16:38 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 52BEA1C392 for ; Wed, 19 Jun 2019 17:15:46 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 13875182018; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 8C61B32C; Wed, 19 Jun 2019 18:15:44 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:45 +0300 Message-Id: <1560957293-17294-21-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 20/28] drivers: add virtio-vhost-user transport X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch introduces the virtio-vhost-user transport. This transport is based on the virtio-vhost-user device. This device replaces the AF_UNIX socket used by the vhost-user protocol with a virtio device that tunnels vhost-user protocol messages. This allows a guest to act as a vhost device backend for other guests. For more information on virtio-vhost-user, see https://wiki.qemu.org/Features/VirtioVhostUser. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- drivers/Makefile | 2 + drivers/virtio_vhost_user/Makefile | 27 + .../rte_virtio_vhost_user_version.map | 4 + .../virtio_vhost_user/trans_virtio_vhost_user.c | 1067 ++++++++++++++++++++ drivers/virtio_vhost_user/virtio_vhost_user.h | 18 + 5 files changed, 1118 insertions(+) create mode 100644 drivers/virtio_vhost_user/Makefile create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h diff --git a/drivers/Makefile b/drivers/Makefile index 7d5da5d..72e2579 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -22,5 +22,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event DEPDIRS-event := common bus mempool net DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += raw DEPDIRS-raw := common bus mempool net event +DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio_vhost_user +DEPDIRS-virtio_vhost_user := bus include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/virtio_vhost_user/Makefile b/drivers/virtio_vhost_user/Makefile new file mode 100644 index 0000000..61a77b6 --- /dev/null +++ b/drivers/virtio_vhost_user/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2019 Arrikto Inc. + +include $(RTE_SDK)/mk/rte.vars.mk + +# library name +LIB = librte_virtio_vhost_user.a + +EXPORT_MAP := rte_virtio_vhost_user_version.map + +LIBABIVER := 1 + +CFLAGS += -DALLOW_EXPERIMENTAL_API +CFLAGS += $(WERROR_FLAGS) -O3 +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs +LDLIBS += -lrte_bus_pci + +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) +LDLIBS += -lrte_vhost +endif + +# all source are stored in SRCS-y +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := trans_virtio_vhost_user.c \ + virtio_pci.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map b/drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map new file mode 100644 index 0000000..4b2e621 --- /dev/null +++ b/drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map @@ -0,0 +1,4 @@ +DPDK_19.05 { + + local: *; +}; diff --git a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c new file mode 100644 index 0000000..72018a4 --- /dev/null +++ b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c @@ -0,0 +1,1067 @@ +/* SPDX-License-Idenitifier: BSD-3-Clause + * Copyright(c) 2018 Red Hat, Inc. + * Copyright(c) 2019 Arrikto, Inc. + */ + +/* + * @file + * virtio-vhost-user PCI transport driver + * + * This vhost-user transport communicates with the vhost-user master process + * over the virtio-vhost-user PCI device. + * + * Interrupts are used since this is the control path, not the data path. This + * way the vhost-user command processing doesn't interfere with packet + * processing. This is similar to the AF_UNIX transport's fdman thread that + * processes socket I/O separately. + * + * This transport replaces the usual vhost-user file descriptor passing with a + * PCI BAR that contains doorbell registers for callfd and logfd, and shared + * memory for the memory table regions. + * + * VIRTIO device specification: + * https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007 + */ + +#include +#include +#include +#include + +#include "vhost.h" +#include "virtio_pci.h" +#include "virtqueue.h" +#include "virtio_vhost_user.h" +#include "vhost_user.h" + +/* + * Data structures: + * + * Successfully probed virtio-vhost-user PCI adapters are added to + * vvu_pci_device_list as struct vvu_pci_device elements. + * + * When rte_vhost_driver_register() is called, a struct vvu_socket is created + * as the endpoint for future vhost-user connections. The struct vvu_socket is + * associated with the struct vvu_pci_device that will be used for + * communication. + * + * When a vhost-user protocol connection is established, a struct + * vvu_connection is created and the application's new_device(int vid) callback + * is invoked. + */ + +/** Probed PCI devices for lookup by rte_vhost_driver_register() */ +TAILQ_HEAD(, vvu_pci_device) vvu_pci_device_list = + TAILQ_HEAD_INITIALIZER(vvu_pci_device_list); + +struct vvu_socket; +struct vvu_connection; + +/** A virtio-vhost-user PCI adapter */ +struct vvu_pci_device { + struct virtio_hw hw; + struct rte_pci_device *pci_dev; + struct vvu_socket *vvu_socket; + TAILQ_ENTRY(vvu_pci_device) next; +}; + +/** A vhost-user endpoint (aka per-path state) */ +struct vvu_socket { + struct vhost_user_socket socket; /* must be first field! */ + struct vvu_pci_device *pdev; + struct vvu_connection *conn; + + /** Doorbell registers */ + uint16_t *doorbells; + + /** This struct virtio_vhost_user_config field determines the number of + * doorbells available so we keep it saved. + */ + uint32_t max_vhost_queues; + + /** Receive buffers */ + const struct rte_memzone *rxbuf_mz; + + /** Transmit buffers. It is assumed that the device completes them + * in-order so a single wrapping index can be used to select the next + * free buffer. + */ + const struct rte_memzone *txbuf_mz; + unsigned int txbuf_idx; +}; + +/** A vhost-user protocol session (aka per-vid state) */ +struct vvu_connection { + struct virtio_net device; /* must be first field! */ + struct vvu_socket *vvu_socket; +}; + +/** Virtio feature bits that we support */ +#define VVU_VIRTIO_FEATURES ((1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \ + (1ULL << VIRTIO_F_ANY_LAYOUT) | \ + (1ULL << VIRTIO_F_VERSION_1) | \ + (1ULL << VIRTIO_F_IOMMU_PLATFORM)) + +/** Virtqueue indices */ +enum { + VVU_VQ_RX, + VVU_VQ_TX, + VVU_VQ_MAX, +}; + +enum { + /** Receive buffer size, in bytes */ + VVU_RXBUF_SIZE = 1024, + + /** Transmit buffer size, in bytes */ + VVU_TXBUF_SIZE = 1024, +}; + +/** Look up a struct vvu_pci_device from a DomBDF string */ +static struct vvu_pci_device * +vvu_pci_by_name(const char *name) +{ + struct vvu_pci_device *pdev; + + TAILQ_FOREACH(pdev, &vvu_pci_device_list, next) { + if (!strcmp(pdev->pci_dev->device.name, name)) + return pdev; + } + return NULL; +} + +/** Start connection establishment */ +static void +vvu_connect(struct vvu_socket *vvu_socket) +{ + struct virtio_hw *hw = &vvu_socket->pdev->hw; + uint32_t status; + + virtio_pci_read_dev_config(hw, + offsetof(struct virtio_vhost_user_config, status), + &status, sizeof(status)); + status |= RTE_LE32(1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP); + virtio_pci_write_dev_config(hw, + offsetof(struct virtio_vhost_user_config, status), + &status, sizeof(status)); +} + +static void +vvu_disconnect(struct vvu_socket *vvu_socket) +{ + struct vhost_user_socket *vsocket = &vvu_socket->socket; + struct vvu_connection *conn = vvu_socket->conn; + struct virtio_hw *hw = &vvu_socket->pdev->hw; + uint32_t status; + + if (conn) { + if (vsocket->notify_ops->destroy_connection) + vsocket->notify_ops->destroy_connection(conn->device.vid); + + vhost_destroy_device(conn->device.vid); + } + + /* Make sure we're disconnected */ + virtio_pci_read_dev_config(hw, + offsetof(struct virtio_vhost_user_config, status), + &status, sizeof(status)); + status &= ~RTE_LE32(1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP); + virtio_pci_write_dev_config(hw, + offsetof(struct virtio_vhost_user_config, status), + &status, sizeof(status)); +} + +static void +vvu_reconnect(struct vvu_socket *vvu_socket) +{ + vvu_disconnect(vvu_socket); + vvu_connect(vvu_socket); +} + +static void vvu_process_rxq(struct vvu_socket *vvu_socket); + +static void +vvu_cleanup_device(struct virtio_net *dev, int destroy __rte_unused) +{ + struct vvu_connection *conn = + container_of(dev, struct vvu_connection, device); + struct vvu_socket *vvu_socket = conn->vvu_socket; + + vvu_socket->conn = NULL; + vvu_process_rxq(vvu_socket); /* discard old replies from master */ + vvu_reconnect(vvu_socket); +} + +static int +vvu_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq) +{ + struct vvu_connection *conn = + container_of(dev, struct vvu_connection, device); + struct vvu_socket *vvu_socket = conn->vvu_socket; + uint16_t vq_idx = vq->vring_idx; + + RTE_LOG(DEBUG, VHOST_CONFIG, "%s vq_idx %u\n", __func__, vq_idx); + + rte_write16(rte_cpu_to_le_16(vq_idx), &vvu_socket->doorbells[vq_idx]); + return 0; +} + +static int +vvu_send_reply(struct virtio_net *dev, struct VhostUserMsg *reply) +{ + struct vvu_connection *conn = + container_of(dev, struct vvu_connection, device); + struct vvu_socket *vvu_socket = conn->vvu_socket; + struct virtqueue *vq = vvu_socket->pdev->hw.vqs[VVU_VQ_TX]; + struct vring_desc *desc; + struct vq_desc_extra *descx; + unsigned int i; + void *buf; + size_t len; + + RTE_LOG(DEBUG, VHOST_CONFIG, + "%s request %u flags %#x size %u\n", + __func__, reply->request.master, + reply->flags, reply->size); + + /* TODO convert reply to little-endian */ + + if (virtqueue_full(vq)) { + RTE_LOG(ERR, VHOST_CONFIG, "Out of tx buffers\n"); + return -1; + } + + i = vvu_socket->txbuf_idx; + len = VHOST_USER_HDR_SIZE + reply->size; + buf = (uint8_t *)vvu_socket->txbuf_mz->addr + i * VVU_TXBUF_SIZE; + + memcpy(buf, reply, len); + + desc = &vq->vq_ring.desc[i]; + descx = &vq->vq_descx[i]; + + desc->addr = rte_cpu_to_le_64(vvu_socket->txbuf_mz->iova + i * VVU_TXBUF_SIZE); + desc->len = rte_cpu_to_le_32(len); + desc->flags = 0; + + descx->cookie = buf; + descx->ndescs = 1; + + vq->vq_free_cnt--; + vvu_socket->txbuf_idx = (vvu_socket->txbuf_idx + 1) & (vq->vq_nentries - 1); + + vq_update_avail_ring(vq, i); + vq_update_avail_idx(vq); + + if (virtqueue_kick_prepare(vq)) + virtqueue_notify(vq); + + return 0; +} + +static int +vvu_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg __rte_unused) +{ + struct vvu_connection *conn = + container_of(dev, struct vvu_connection, device); + struct vvu_socket *vvu_socket = conn->vvu_socket; + struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev; + uint8_t *mmap_addr; + uint32_t i; + + /* Memory regions start after the doorbell registers */ + mmap_addr = (uint8_t *)pci_dev->mem_resource[2].addr + + RTE_ALIGN_CEIL((vvu_socket->max_vhost_queues + 1 /* log fd */) * + sizeof(uint16_t), 4096); + + for (i = 0; i < dev->mem->nregions; i++) { + struct rte_vhost_mem_region *reg = &dev->mem->regions[i]; + + reg->mmap_addr = mmap_addr; + reg->host_user_addr = (uint64_t)(uintptr_t)reg->mmap_addr + + reg->mmap_size - reg->size; + + mmap_addr += reg->mmap_size; + + RTE_LOG(INFO, VHOST_CONFIG, + "guest memory region %u, size: 0x%" PRIx64 "\n" + "\t guest physical addr: 0x%" PRIx64 "\n" + "\t guest virtual addr: 0x%" PRIx64 "\n" + "\t host virtual addr: 0x%" PRIx64 "\n" + "\t mmap addr : 0x%" PRIx64 "\n" + "\t mmap size : 0x%" PRIx64 "\n" + "\t mmap off : 0x%" PRIx64 "\n", + i, reg->size, + reg->guest_phys_addr, + reg->guest_user_addr, + reg->host_user_addr, + (uint64_t)(uintptr_t)reg->mmap_addr, + reg->mmap_size, + reg->mmap_size - reg->size); + } + + return 0; +} + +static void +vvu_unmap_mem_regions(struct virtio_net *dev) +{ + uint32_t i; + + for (i = 0; i < dev->mem->nregions; i++) { + struct rte_vhost_mem_region *reg = &dev->mem->regions[i]; + + /* Just clear the pointers, the PCI BAR stays there */ + reg->mmap_addr = NULL; + reg->host_user_addr = 0; + } +} + +static void vvu_process_new_connection(struct vvu_socket *vvu_socket) +{ + struct vhost_user_socket *vsocket = &vvu_socket->socket; + struct vvu_connection *conn; + struct virtio_net *dev; + size_t size; + + dev = vhost_new_device(vsocket->trans_ops); + if (!dev) { + vvu_reconnect(vvu_socket); + return; + } + + conn = container_of(dev, struct vvu_connection, device); + conn->vvu_socket = vvu_socket; + + size = strnlen(vsocket->path, PATH_MAX); + vhost_set_ifname(dev->vid, vsocket->path, size); + + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", dev->vid); + + if (vsocket->notify_ops->new_connection) { + int ret = vsocket->notify_ops->new_connection(dev->vid); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to add vhost user connection\n"); + vhost_destroy_device(dev->vid); + vvu_reconnect(vvu_socket); + return; + } + } + + vvu_socket->conn = conn; + return; +} + +static void vvu_process_status_change(struct vvu_socket *vvu_socket, bool slave_up, + bool master_up) +{ + RTE_LOG(DEBUG, VHOST_CONFIG, "%s slave_up %d master_up %d\n", + __func__, slave_up, master_up); + + /* Disconnected from the master, try reconnecting */ + if (!slave_up) { + vvu_reconnect(vvu_socket); + return; + } + + if (master_up && !vvu_socket->conn) { + vvu_process_new_connection(vvu_socket); + return; + } +} + +static void +vvu_process_txq(struct vvu_socket *vvu_socket) +{ + struct virtio_hw *hw = &vvu_socket->pdev->hw; + struct virtqueue *vq = hw->vqs[VVU_VQ_TX]; + uint16_t n = VIRTQUEUE_NUSED(vq); + + virtio_rmb(); + + /* Just mark the buffers complete */ + vq->vq_used_cons_idx += n; + vq->vq_free_cnt += n; +} + +static void +vvu_process_rxq(struct vvu_socket *vvu_socket) +{ + struct virtio_hw *hw = &vvu_socket->pdev->hw; + struct virtqueue *vq = hw->vqs[VVU_VQ_RX]; + bool refilled = false; + + while (VIRTQUEUE_NUSED(vq)) { + struct vring_used_elem *uep; + VhostUserMsg *msg; + uint32_t len; + uint32_t desc_idx; + uint16_t used_idx; + size_t i; + + virtio_rmb(); + + used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1)); + uep = &vq->vq_ring.used->ring[used_idx]; + desc_idx = rte_le_to_cpu_32(uep->id); + + msg = vq->vq_descx[desc_idx].cookie; + len = rte_le_to_cpu_32(uep->len); + + if (msg->size > sizeof(VhostUserMsg) || + len != VHOST_USER_HDR_SIZE + msg->size) { + RTE_LOG(ERR, VHOST_CONFIG, + "Invalid vhost-user message size %u, got %u bytes\n", + msg->size, len); + /* TODO reconnect */ + abort(); + } + + RTE_LOG(DEBUG, VHOST_CONFIG, + "%s request %u flags %#x size %u\n", + __func__, msg->request.master, + msg->flags, msg->size); + + /* Mark file descriptors invalid */ + for (i = 0; i < RTE_DIM(msg->fds); i++) + msg->fds[i] = VIRTIO_INVALID_EVENTFD; + + /* Only process messages while connected */ + if (vvu_socket->conn) { + if (vhost_user_msg_handler(vvu_socket->conn->device.vid, + msg) < 0) { + /* TODO reconnect */ + abort(); + } + } + + vq->vq_used_cons_idx++; + + /* Refill rxq */ + vq_update_avail_ring(vq, desc_idx); + vq_update_avail_idx(vq); + refilled = true; + } + + if (!refilled) + return; + if (virtqueue_kick_prepare(vq)) + virtqueue_notify(vq); +} + +/* TODO Audit thread safety. There are 3 threads involved: + * 1. The main process thread that calls librte_vhost APIs during startup. + * 2. The interrupt thread that calls vvu_interrupt_handler(). + * 3. Packet processing threads (lcores) calling librte_vhost APIs. + * + * It may be necessary to use locks if any of these code paths can race. The + * librte_vhost API entry points already do some locking but this needs to be + * checked. + */ +static void +vvu_interrupt_handler(void *cb_arg) +{ + struct vvu_socket *vvu_socket = cb_arg; + struct virtio_hw *hw = &vvu_socket->pdev->hw; + struct rte_intr_handle *intr_handle = &vvu_socket->pdev->pci_dev->intr_handle; + uint8_t isr; + + /* Read Interrupt Status Register (which also clears it) */ + isr = VTPCI_OPS(hw)->get_isr(hw); + + if (isr & VIRTIO_PCI_ISR_CONFIG) { + uint32_t status; + bool slave_up; + bool master_up; + + virtio_pci_read_dev_config(hw, + offsetof(struct virtio_vhost_user_config, status), + &status, sizeof(status)); + status = rte_le_to_cpu_32(status); + + RTE_LOG(DEBUG, VHOST_CONFIG, "%s isr %#x status %#x\n", __func__, isr, status); + + slave_up = status & (1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP); + master_up = status & (1u << VIRTIO_VHOST_USER_STATUS_MASTER_UP); + vvu_process_status_change(vvu_socket, slave_up, master_up); + } else + RTE_LOG(DEBUG, VHOST_CONFIG, "%s isr %#x\n", __func__, isr); + + /* Re-arm before processing virtqueues so no interrupts are lost */ + rte_intr_enable(intr_handle); + + vvu_process_txq(vvu_socket); + vvu_process_rxq(vvu_socket); +} + +static int +vvu_virtio_pci_init_rxq(struct vvu_socket *vvu_socket) +{ + char name[sizeof("0000:00:00.00 vq 0 rxbufs")]; + struct virtqueue *vq; + size_t size; + size_t align; + int i; + + vq = vvu_socket->pdev->hw.vqs[VVU_VQ_RX]; + + snprintf(name, sizeof(name), "%s vq %u rxbufs", + vvu_socket->pdev->pci_dev->device.name, VVU_VQ_RX); + + /* Allocate more than sizeof(VhostUserMsg) so there is room to grow */ + size = vq->vq_nentries * VVU_RXBUF_SIZE; + align = 1024; + vvu_socket->rxbuf_mz = rte_memzone_reserve_aligned(name, size, SOCKET_ID_ANY, + 0, align); + if (!vvu_socket->rxbuf_mz) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to allocate rxbuf memzone\n"); + return -1; + } + + for (i = 0; i < vq->vq_nentries; i++) { + struct vring_desc *desc = &vq->vq_ring.desc[i]; + struct vq_desc_extra *descx = &vq->vq_descx[i]; + + desc->addr = rte_cpu_to_le_64(vvu_socket->rxbuf_mz->iova + + i * VVU_RXBUF_SIZE); + desc->len = RTE_LE32(VVU_RXBUF_SIZE); + desc->flags = RTE_LE16(VRING_DESC_F_WRITE); + + descx->cookie = (uint8_t *)vvu_socket->rxbuf_mz->addr + i * VVU_RXBUF_SIZE; + descx->ndescs = 1; + + vq_update_avail_ring(vq, i); + vq->vq_free_cnt--; + } + + vq_update_avail_idx(vq); + virtqueue_notify(vq); + return 0; +} + +static int +vvu_virtio_pci_init_txq(struct vvu_socket *vvu_socket) +{ + char name[sizeof("0000:00:00.00 vq 0 txbufs")]; + struct virtqueue *vq; + size_t size; + size_t align; + + vq = vvu_socket->pdev->hw.vqs[VVU_VQ_TX]; + + snprintf(name, sizeof(name), "%s vq %u txbufs", + vvu_socket->pdev->pci_dev->device.name, VVU_VQ_TX); + + /* Allocate more than sizeof(VhostUserMsg) so there is room to grow */ + size = vq->vq_nentries * VVU_TXBUF_SIZE; + align = 1024; + vvu_socket->txbuf_mz = rte_memzone_reserve_aligned(name, size, SOCKET_ID_ANY, + 0, align); + if (!vvu_socket->txbuf_mz) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to allocate txbuf memzone\n"); + return -1; + } + + vvu_socket->txbuf_idx = 0; + return 0; +} + +static void +virtio_init_vring(struct virtqueue *vq) +{ + int size = vq->vq_nentries; + struct vring *vr = &vq->vq_ring; + uint8_t *ring_mem = vq->vq_ring_virt_mem; + + memset(ring_mem, 0, vq->vq_ring_size); + vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN); + vq->vq_used_cons_idx = 0; + vq->vq_desc_head_idx = 0; + vq->vq_avail_idx = 0; + vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); + vq->vq_free_cnt = vq->vq_nentries; + memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries); + + vring_desc_init(vr->desc, size); + virtqueue_enable_intr(vq); +} + +static int +vvu_virtio_pci_init_vq(struct vvu_socket *vvu_socket, int vq_idx) +{ + char vq_name[sizeof("0000:00:00.00 vq 0")]; + struct virtio_hw *hw = &vvu_socket->pdev->hw; + const struct rte_memzone *mz; + struct virtqueue *vq; + uint16_t q_num; + size_t size; + + q_num = VTPCI_OPS(hw)->get_queue_num(hw, vq_idx); + RTE_LOG(DEBUG, VHOST_CONFIG, "vq %d q_num: %u\n", vq_idx, q_num); + if (q_num == 0) { + RTE_LOG(ERR, VHOST_CONFIG, "virtqueue %d does not exist\n", + vq_idx); + return -1; + } + + if (!rte_is_power_of_2(q_num)) { + RTE_LOG(ERR, VHOST_CONFIG, + "virtqueue %d has non-power of 2 size (%u)\n", + vq_idx, q_num); + return -1; + } + + snprintf(vq_name, sizeof(vq_name), "%s vq %u", + vvu_socket->pdev->pci_dev->device.name, vq_idx); + + size = RTE_ALIGN_CEIL(sizeof(*vq) + + q_num * sizeof(struct vq_desc_extra), + RTE_CACHE_LINE_SIZE); + vq = rte_zmalloc(vq_name, size, RTE_CACHE_LINE_SIZE); + if (!vq) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to allocated virtqueue %d\n", vq_idx); + return -1; + } + hw->vqs[vq_idx] = vq; + + vq->hw = hw; + vq->vq_queue_index = vq_idx; + vq->vq_nentries = q_num; + + size = vring_size(q_num, VIRTIO_PCI_VRING_ALIGN); + vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN); + + mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size, + SOCKET_ID_ANY, 0, + VIRTIO_PCI_VRING_ALIGN); + if (mz == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to reserve memzone for virtqueue %d\n", + vq_idx); + goto err_vq; + } + + memset(mz->addr, 0, mz->len); + + vq->mz = mz; + vq->vq_ring_mem = mz->iova; + vq->vq_ring_virt_mem = mz->addr; + virtio_init_vring(vq); + + if (VTPCI_OPS(hw)->setup_queue(hw, vq) < 0) + goto err_mz; + + return 0; + +err_mz: + rte_memzone_free(mz); + +err_vq: + hw->vqs[vq_idx] = NULL; + rte_free(vq); + return -1; +} + +static void +vvu_virtio_pci_free_virtqueues(struct vvu_socket *vvu_socket) +{ + struct virtio_hw *hw = &vvu_socket->pdev->hw; + int i; + + if (vvu_socket->rxbuf_mz) { + rte_memzone_free(vvu_socket->rxbuf_mz); + vvu_socket->rxbuf_mz = NULL; + } + if (vvu_socket->txbuf_mz) { + rte_memzone_free(vvu_socket->txbuf_mz); + vvu_socket->txbuf_mz = NULL; + } + + for (i = 0; i < VVU_VQ_MAX; i++) { + struct virtqueue *vq = hw->vqs[i]; + + if (!vq) + continue; + + rte_memzone_free(vq->mz); + rte_free(vq); + hw->vqs[i] = NULL; + } + + rte_free(hw->vqs); + hw->vqs = NULL; +} + +static void +vvu_virtio_pci_intr_cleanup(struct vvu_socket *vvu_socket) +{ + struct virtio_hw *hw = &vvu_socket->pdev->hw; + struct rte_intr_handle *intr_handle = &vvu_socket->pdev->pci_dev->intr_handle; + int i; + + for (i = 0; i < VVU_VQ_MAX; i++) + VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i], + VIRTIO_MSI_NO_VECTOR); + VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR); + rte_intr_disable(intr_handle); + rte_intr_callback_unregister(intr_handle, vvu_interrupt_handler, vvu_socket); + rte_intr_efd_disable(intr_handle); +} + +static int +vvu_virtio_pci_init_intr(struct vvu_socket *vvu_socket) +{ + struct virtio_hw *hw = &vvu_socket->pdev->hw; + struct rte_intr_handle *intr_handle = &vvu_socket->pdev->pci_dev->intr_handle; + int i; + + if (!rte_intr_cap_multiple(intr_handle)) { + RTE_LOG(ERR, VHOST_CONFIG, + "Multiple intr vector not supported\n"); + return -1; + } + + if (rte_intr_efd_enable(intr_handle, VVU_VQ_MAX) < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to create eventfds\n"); + return -1; + } + + if (rte_intr_callback_register(intr_handle, vvu_interrupt_handler, vvu_socket) < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to register interrupt callback\n"); + goto err_efd; + } + + if (rte_intr_enable(intr_handle) < 0) + goto err_callback; + + if (VTPCI_OPS(hw)->set_config_irq(hw, 0) == VIRTIO_MSI_NO_VECTOR) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to set config MSI-X vector\n"); + goto err_enable; + } + + /* TODO use separate vectors and interrupt handler functions. It seems + * doesn't allow efds to have interrupt_handler + * functions and it just clears efds when they are raised. As a + * workaround we use the configuration change interrupt for virtqueue + * interrupts! + */ + for (i = 0; i < VVU_VQ_MAX; i++) { + if (VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i], 0) == + VIRTIO_MSI_NO_VECTOR) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to set virtqueue MSI-X vector\n"); + goto err_vq; + } + } + + return 0; + +err_vq: + for (i = 0; i < VVU_VQ_MAX; i++) + VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i], + VIRTIO_MSI_NO_VECTOR); + VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR); +err_enable: + rte_intr_disable(intr_handle); +err_callback: + rte_intr_callback_unregister(intr_handle, vvu_interrupt_handler, vvu_socket); +err_efd: + rte_intr_efd_disable(intr_handle); + return -1; +} + +static int +vvu_virtio_pci_init_bar(struct vvu_socket *vvu_socket) +{ + struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev; + struct virtio_net *dev = NULL; /* just for sizeof() */ + + vvu_socket->doorbells = pci_dev->mem_resource[2].addr; + if (!vvu_socket->doorbells) { + RTE_LOG(ERR, VHOST_CONFIG, "BAR 2 not available\n"); + return -1; + } + + /* The number of doorbells is max_vhost_queues + 1 */ + virtio_pci_read_dev_config(&vvu_socket->pdev->hw, + offsetof(struct virtio_vhost_user_config, + max_vhost_queues), + &vvu_socket->max_vhost_queues, + sizeof(vvu_socket->max_vhost_queues)); + vvu_socket->max_vhost_queues = rte_le_to_cpu_32(vvu_socket->max_vhost_queues); + if (vvu_socket->max_vhost_queues < RTE_DIM(dev->virtqueue)) { + /* We could support devices with a smaller max number of + * virtqueues than dev->virtqueue[] in the future. Fail early + * for now since the current assumption is that all of + * dev->virtqueue[] can be used. + */ + RTE_LOG(ERR, VHOST_CONFIG, + "Device supports fewer virtqueues than driver!\n"); + return -1; + } + + return 0; +} + +static int +vvu_virtio_pci_init(struct vvu_socket *vvu_socket) +{ + uint64_t host_features; + struct virtio_hw *hw = &vvu_socket->pdev->hw; + int i; + + virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK); + virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER); + + hw->guest_features = VVU_VIRTIO_FEATURES; + host_features = VTPCI_OPS(hw)->get_features(hw); + hw->guest_features = virtio_pci_negotiate_features(hw, host_features); + + if (!virtio_pci_with_feature(hw, VIRTIO_F_VERSION_1)) { + RTE_LOG(ERR, VHOST_CONFIG, "Missing VIRTIO 1 feature bit\n"); + goto err; + } + + virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK); + if (!(virtio_pci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) { + RTE_LOG(ERR, VHOST_CONFIG, "Failed to set FEATURES_OK\n"); + goto err; + } + + if (vvu_virtio_pci_init_bar(vvu_socket) < 0) + goto err; + + hw->vqs = rte_zmalloc(NULL, sizeof(struct virtqueue *) * VVU_VQ_MAX, 0); + if (!hw->vqs) + goto err; + + for (i = 0; i < VVU_VQ_MAX; i++) { + if (vvu_virtio_pci_init_vq(vvu_socket, i) < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "virtqueue %u init failed\n", i); + goto err_init_vq; + } + } + + if (vvu_virtio_pci_init_rxq(vvu_socket) < 0) + goto err_init_vq; + + if (vvu_virtio_pci_init_txq(vvu_socket) < 0) + goto err_init_vq; + + if (vvu_virtio_pci_init_intr(vvu_socket) < 0) + goto err_init_vq; + + virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); + + return 0; + +err_init_vq: + vvu_virtio_pci_free_virtqueues(vvu_socket); + +err: + virtio_pci_reset(hw); + RTE_LOG(DEBUG, VHOST_CONFIG, "%s failed\n", __func__); + return -1; +} + +static int +vvu_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, + struct rte_pci_device *pci_dev) +{ + struct vvu_pci_device *pdev; + + /* TODO support multi-process applications */ + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + RTE_LOG(ERR, VHOST_CONFIG, + "virtio-vhost-pci does not support multi-process " + "applications\n"); + return -1; + } + + pdev = rte_zmalloc_socket(pci_dev->device.name, sizeof(*pdev), + RTE_CACHE_LINE_SIZE, + pci_dev->device.numa_node); + if (!pdev) + return -1; + + pdev->pci_dev = pci_dev; + + if (virtio_pci_init(pci_dev, &pdev->hw) != 0) { + rte_free(pdev); + return -1; + } + + /* Reset the device now, the rest is done in vvu_socket_init() */ + virtio_pci_reset(&pdev->hw); + + if (pdev->hw.use_msix == VIRTIO_MSIX_NONE) { + RTE_LOG(ERR, VHOST_CONFIG, + "MSI-X is required for PCI device at %s\n", + pci_dev->device.name); + rte_free(pdev); + rte_pci_unmap_device(pci_dev); + return -1; + } + + TAILQ_INSERT_TAIL(&vvu_pci_device_list, pdev, next); + + RTE_LOG(INFO, VHOST_CONFIG, + "Added virtio-vhost-user device at %s\n", + pci_dev->device.name); + + return 0; +} + +static int +vvu_pci_remove(struct rte_pci_device *pci_dev) +{ + struct vvu_pci_device *pdev; + + TAILQ_FOREACH(pdev, &vvu_pci_device_list, next) + if (pdev->pci_dev == pci_dev) + break; + if (!pdev) + return -1; + + if (pdev->vvu_socket) { + RTE_LOG(ERR, VHOST_CONFIG, + "Cannot remove PCI device at %s with vhost still attached\n", + pci_dev->device.name); + return -1; + } + + TAILQ_REMOVE(&vvu_pci_device_list, pdev, next); + rte_free(pdev); + rte_pci_unmap_device(pci_dev); + return 0; +} + +static const struct rte_pci_id pci_id_vvu_map[] = { + { RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, + VIRTIO_PCI_LEGACY_DEVICEID_VHOST_USER) }, + { RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, + VIRTIO_PCI_MODERN_DEVICEID_VHOST_USER) }, + { .vendor_id = 0, /* sentinel */ }, +}; + +static struct rte_pci_driver vvu_pci_driver = { + .driver = { + .name = "virtio_vhost_user", + }, + .id_table = pci_id_vvu_map, + .drv_flags = 0, + .probe = vvu_pci_probe, + .remove = vvu_pci_remove, +}; + +RTE_INIT(vvu_pci_init); +static void +vvu_pci_init(void) +{ + if (rte_eal_iopl_init() != 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "IOPL call failed - cannot use virtio-vhost-user\n"); + return; + } + + rte_pci_register(&vvu_pci_driver); +} + +static int +vvu_socket_init(struct vhost_user_socket *vsocket, uint64_t flags) +{ + struct vvu_socket *vvu_socket = + container_of(vsocket, struct vvu_socket, socket); + struct vvu_pci_device *pdev; + + if (flags & RTE_VHOST_USER_NO_RECONNECT) { + RTE_LOG(ERR, VHOST_CONFIG, + "error: reconnect cannot be disabled for virtio-vhost-user\n"); + return -1; + } + if (flags & RTE_VHOST_USER_CLIENT) { + RTE_LOG(ERR, VHOST_CONFIG, + "error: virtio-vhost-user does not support client mode\n"); + return -1; + } + if (flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) { + RTE_LOG(ERR, VHOST_CONFIG, + "error: virtio-vhost-user does not support dequeue-zero-copy\n"); + return -1; + } + + pdev = vvu_pci_by_name(vsocket->path); + if (!pdev) { + RTE_LOG(ERR, VHOST_CONFIG, + "Cannot find virtio-vhost-user PCI device at %s\n", + vsocket->path); + return -1; + } + + if (pdev->vvu_socket) { + RTE_LOG(ERR, VHOST_CONFIG, + "Device at %s is already in use\n", + vsocket->path); + return -1; + } + + vvu_socket->pdev = pdev; + pdev->vvu_socket = vvu_socket; + + if (vvu_virtio_pci_init(vvu_socket) < 0) { + vvu_socket->pdev = NULL; + pdev->vvu_socket = NULL; + return -1; + } + + RTE_LOG(INFO, VHOST_CONFIG, "%s at %s\n", __func__, vsocket->path); + return 0; +} + +static void +vvu_socket_cleanup(struct vhost_user_socket *vsocket) +{ + struct vvu_socket *vvu_socket = + container_of(vsocket, struct vvu_socket, socket); + + if (vvu_socket->conn) + vhost_destroy_device(vvu_socket->conn->device.vid); + + vvu_virtio_pci_intr_cleanup(vvu_socket); + virtio_pci_reset(&vvu_socket->pdev->hw); + vvu_virtio_pci_free_virtqueues(vvu_socket); + + vvu_socket->pdev->vvu_socket = NULL; + vvu_socket->pdev = NULL; +} + +static int +vvu_socket_start(struct vhost_user_socket *vsocket) +{ + struct vvu_socket *vvu_socket = + container_of(vsocket, struct vvu_socket, socket); + + vvu_connect(vvu_socket); + return 0; +} + +const struct vhost_transport_ops virtio_vhost_user_trans_ops = { + .socket_size = sizeof(struct vvu_socket), + .device_size = sizeof(struct vvu_connection), + .socket_init = vvu_socket_init, + .socket_cleanup = vvu_socket_cleanup, + .socket_start = vvu_socket_start, + .cleanup_device = vvu_cleanup_device, + .vring_call = vvu_vring_call, + .send_reply = vvu_send_reply, + .map_mem_regions = vvu_map_mem_regions, + .unmap_mem_regions = vvu_unmap_mem_regions, +}; diff --git a/drivers/virtio_vhost_user/virtio_vhost_user.h b/drivers/virtio_vhost_user/virtio_vhost_user.h new file mode 100644 index 0000000..baeaa74 --- /dev/null +++ b/drivers/virtio_vhost_user/virtio_vhost_user.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (C) 2018 Red Hat, Inc. + */ + +#ifndef _LINUX_VIRTIO_VHOST_USER_H +#define _LINUX_VIRTIO_VHOST_USER_H + +#include + +struct virtio_vhost_user_config { + uint32_t status; +#define VIRTIO_VHOST_USER_STATUS_SLAVE_UP 0 +#define VIRTIO_VHOST_USER_STATUS_MASTER_UP 1 + uint32_t max_vhost_queues; + uint8_t uuid[16]; +}; + +#endif /* _LINUX_VIRTIO_VHOST_USER_H */ From patchwork Wed Jun 19 15:14:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54987 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4BC6B1C42E; Wed, 19 Jun 2019 17:16:40 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 9ACD51C3A0 for ; Wed, 19 Jun 2019 17:15:46 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 69350182019; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id D58742B2; Wed, 19 Jun 2019 18:15:45 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:46 +0300 Message-Id: <1560957293-17294-22-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 21/28] drivers/virtio_vhost_user: use additional device resources X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Enhance the virtio-vhost-user device driver so that it utilizes the device's additional resource capabilities. In specific, this patch adds support for the doorbells and shared_memory capabilities. The former is used to find the location of the device doorbells. The latter is used to find the location of the vhost memory regions in the device's memory address space. Also, support has been added for the notification capability, though this configuration structure is not currently being used by the virtio-vhost-user driver due to DPDK's poll-mode nature. Signed-off-by: Nikos Dragazis --- .../virtio_vhost_user/trans_virtio_vhost_user.c | 22 ++++++++++++++-------- drivers/virtio_vhost_user/virtio_pci.c | 16 ++++++++++++++++ drivers/virtio_vhost_user/virtio_pci.h | 19 +++++++++++++++++++ 3 files changed, 49 insertions(+), 8 deletions(-) diff --git a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c index 72018a4..45863bd 100644 --- a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c +++ b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c @@ -198,11 +198,14 @@ vvu_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq) struct vvu_connection *conn = container_of(dev, struct vvu_connection, device); struct vvu_socket *vvu_socket = conn->vvu_socket; + struct virtio_hw *hw = &vvu_socket->pdev->hw; uint16_t vq_idx = vq->vring_idx; + uint16_t *notify_addr = (void *)((uint8_t *)vvu_socket->doorbells + + vq_idx * hw->doorbell_off_multiplier); RTE_LOG(DEBUG, VHOST_CONFIG, "%s vq_idx %u\n", __func__, vq_idx); - rte_write16(rte_cpu_to_le_16(vq_idx), &vvu_socket->doorbells[vq_idx]); + rte_write16(rte_cpu_to_le_16(vq_idx), notify_addr); return 0; } @@ -265,14 +268,14 @@ vvu_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg __rte_unuse struct vvu_connection *conn = container_of(dev, struct vvu_connection, device); struct vvu_socket *vvu_socket = conn->vvu_socket; - struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev; + struct virtio_hw *hw = &vvu_socket->pdev->hw; uint8_t *mmap_addr; uint32_t i; - /* Memory regions start after the doorbell registers */ - mmap_addr = (uint8_t *)pci_dev->mem_resource[2].addr + - RTE_ALIGN_CEIL((vvu_socket->max_vhost_queues + 1 /* log fd */) * - sizeof(uint16_t), 4096); + /* Get the starting address of vhost memory regions from + * the shared memory virtio PCI capability + */ + mmap_addr = hw->shared_memory_cfg; for (i = 0; i < dev->mem->nregions; i++) { struct rte_vhost_mem_region *reg = &dev->mem->regions[i]; @@ -780,10 +783,13 @@ vvu_virtio_pci_init_intr(struct vvu_socket *vvu_socket) static int vvu_virtio_pci_init_bar(struct vvu_socket *vvu_socket) { - struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev; + struct virtio_hw *hw = &vvu_socket->pdev->hw; struct virtio_net *dev = NULL; /* just for sizeof() */ - vvu_socket->doorbells = pci_dev->mem_resource[2].addr; + /* Get the starting address of the doorbells from + * the doorbell virtio PCI capability + */ + vvu_socket->doorbells = hw->doorbell_base; if (!vvu_socket->doorbells) { RTE_LOG(ERR, VHOST_CONFIG, "BAR 2 not available\n"); return -1; diff --git a/drivers/virtio_vhost_user/virtio_pci.c b/drivers/virtio_vhost_user/virtio_pci.c index 9c2c981..7996729 100644 --- a/drivers/virtio_vhost_user/virtio_pci.c +++ b/drivers/virtio_vhost_user/virtio_pci.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arrikto Inc. */ #include @@ -407,6 +408,18 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) case VIRTIO_PCI_CAP_ISR_CFG: hw->isr = get_cfg_addr(dev, &cap); break; + case VIRTIO_PCI_CAP_DOORBELL_CFG: + rte_pci_read_config(dev, &hw->doorbell_off_multiplier, + 4, pos + sizeof(cap)); + hw->doorbell_base = get_cfg_addr(dev, &cap); + rte_pci_read_config(dev, &hw->doorbell_length, 4, pos + 10); + break; + case VIRTIO_PCI_CAP_NOTIFICATION_CFG: + hw->notify_cfg = get_cfg_addr(dev, &cap); + break; + case VIRTIO_PCI_CAP_SHARED_MEMORY_CFG: + hw->shared_memory_cfg = get_cfg_addr(dev, &cap); + break; } next: @@ -426,6 +439,9 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "isr cfg mapped at: %p\n", hw->isr); RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notify base: %p, notify off multiplier: %u\n", hw->notify_base, hw->notify_off_multiplier); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "doorbell base: %p, doorbell off multiplier: %u\n", hw->doorbell_base, hw->doorbell_off_multiplier); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notification cfg mapped at: %p\n", hw->notify_cfg); + RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "shared memory region mapped at: %p\n", hw->shared_memory_cfg); return 0; } diff --git a/drivers/virtio_vhost_user/virtio_pci.h b/drivers/virtio_vhost_user/virtio_pci.h index 018e0b7..12373d1 100644 --- a/drivers/virtio_vhost_user/virtio_pci.h +++ b/drivers/virtio_vhost_user/virtio_pci.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arrikto Inc. */ /* XXX This file is based on drivers/net/virtio/virtio_pci.h. It would be @@ -117,6 +118,10 @@ struct virtqueue; #define VIRTIO_PCI_CAP_DEVICE_CFG 4 /* PCI configuration access */ #define VIRTIO_PCI_CAP_PCI_CFG 5 +/* Additional capabilities for the virtio-vhost-user device */ +#define VIRTIO_PCI_CAP_DOORBELL_CFG 6 +#define VIRTIO_PCI_CAP_NOTIFICATION_CFG 7 +#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 /* This is the PCI capability header: */ struct virtio_pci_cap { @@ -161,6 +166,12 @@ struct virtio_pci_common_cfg { uint32_t queue_used_hi; /* read-write */ }; +/* Fields in VIRTIO_PCI_CAP_NOTIFICATION_CFG */ +struct virtio_pci_notification_cfg { + uint16_t notification_select; /* read-write */ + uint16_t notification_msix_vector; /* read-write */ +}; + struct virtio_hw; struct virtio_pci_ops { @@ -200,6 +211,14 @@ struct virtio_hw { uint16_t *notify_base; struct virtio_pci_common_cfg *common_cfg; void *dev_cfg; + /* virtio-vhost-user additional device resource capabilities + * https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007 + */ + uint32_t doorbell_off_multiplier; + uint16_t *doorbell_base; + uint32_t doorbell_length; + struct virtio_pci_notification_cfg *notify_cfg; + uint8_t *shared_memory_cfg; /* * App management thread and virtio interrupt handler thread * both can change device state, this lock is meant to avoid From patchwork Wed Jun 19 15:14:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54988 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CA6411C433; Wed, 19 Jun 2019 17:16:41 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id E6A831C3A1 for ; Wed, 19 Jun 2019 17:15:46 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id B640818201A; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 38033394; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:47 +0300 Message-Id: <1560957293-17294-23-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 22/28] vhost: add flag for choosing vhost-user transport X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Extend the API to support the virtio-vhost-user transport as an alternative to the AF_UNIX transport. The caller provides a PCI DomBDF address: rte_vhost_driver_register("0000:00:04.0", RTE_VHOST_USER_VIRTIO_TRANSPORT); Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- drivers/virtio_vhost_user/trans_virtio_vhost_user.c | 4 ++++ lib/librte_vhost/rte_vhost.h | 1 + lib/librte_vhost/socket.c | 19 ++++++++++++++++++- lib/librte_vhost/vhost.h | 6 +++++- 4 files changed, 28 insertions(+), 2 deletions(-) diff --git a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c index 45863bd..04dbbb1 100644 --- a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c +++ b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c @@ -979,6 +979,10 @@ vvu_pci_init(void) } rte_pci_register(&vvu_pci_driver); + if (rte_vhost_register_transport(VHOST_TRANSPORT_VVU, &virtio_vhost_user_trans_ops) < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Registration of vhost-user transport (%d) failed\n", VHOST_TRANSPORT_VVU); + } } static int diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 0226b3e..0573238 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -29,6 +29,7 @@ extern "C" { #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) #define RTE_VHOST_USER_POSTCOPY_SUPPORT (1ULL << 4) +#define RTE_VHOST_USER_VIRTIO_TRANSPORT (1ULL << 5) /** Protocol features. */ #ifndef VHOST_USER_PROTOCOL_F_MQ diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index fe1c78d..1295fdd 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -327,7 +327,16 @@ rte_vhost_driver_register(const char *path, uint64_t flags) goto out; } - trans_ops = g_transport_map[VHOST_TRANSPORT_UNIX]; + if (flags & RTE_VHOST_USER_VIRTIO_TRANSPORT) { + trans_ops = g_transport_map[VHOST_TRANSPORT_VVU]; + if (trans_ops == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, + "virtio-vhost-user transport is not supported\n"); + goto out; + } + } else { + trans_ops = g_transport_map[VHOST_TRANSPORT_UNIX]; + } if (!path) return -1; @@ -400,6 +409,14 @@ rte_vhost_driver_register(const char *path, uint64_t flags) "Postcopy requested but not compiled\n"); ret = -1; goto out_free; +#else + if (flags & RTE_VHOST_USER_VIRTIO_TRANSPORT) { + RTE_LOG(ERR, VHOST_CONFIG, + "Postcopy requested but not supported " + "by the virtio-vhost-user transport\n"); + ret = -1; + goto out_free; + } #endif } diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 2e7eabe..a6131da 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -494,9 +494,13 @@ struct vhost_transport_ops { /** The traditional AF_UNIX vhost-user protocol transport. */ extern const struct vhost_transport_ops af_unix_trans_ops; +/** The virtio-vhost-user PCI vhost-user protocol transport. */ +extern const struct vhost_transport_ops virtio_vhost_user_trans_ops; + typedef enum VhostUserTransport { VHOST_TRANSPORT_UNIX = 0, - VHOST_TRANSPORT_MAX = 1 + VHOST_TRANSPORT_VVU = 1, + VHOST_TRANSPORT_MAX = 2 } VhostUserTransport; /* A list with all the available vhost-user transports. */ From patchwork Wed Jun 19 15:14:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54989 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5F7AB1C438; Wed, 19 Jun 2019 17:16:43 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 180A61C392 for ; Wed, 19 Jun 2019 17:15:47 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id E9EE018201B; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 8410532C; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:48 +0300 Message-Id: <1560957293-17294-24-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 23/28] net/vhost: add virtio-vhost-user support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The new virtio-transport=0|1 argument enables virtio-vhost-user support: testpmd ... --pci-whitelist 0000:00:04.0 \ --vdev vhost,iface=0000:00:04.0,virtio-transport=1 Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- drivers/net/vhost/rte_eth_vhost.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index 0b61e37..c0d087f 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -31,6 +31,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; #define ETH_VHOST_DEQUEUE_ZERO_COPY "dequeue-zero-copy" #define ETH_VHOST_IOMMU_SUPPORT "iommu-support" #define ETH_VHOST_POSTCOPY_SUPPORT "postcopy-support" +#define ETH_VHOST_VIRTIO_TRANSPORT "virtio-transport" #define VHOST_MAX_PKT_BURST 32 static const char *valid_arguments[] = { @@ -40,6 +41,7 @@ static const char *valid_arguments[] = { ETH_VHOST_DEQUEUE_ZERO_COPY, ETH_VHOST_IOMMU_SUPPORT, ETH_VHOST_POSTCOPY_SUPPORT, + ETH_VHOST_VIRTIO_TRANSPORT, NULL }; @@ -1341,6 +1343,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) int dequeue_zero_copy = 0; int iommu_support = 0; int postcopy_support = 0; + uint16_t virtio_transport = 0; struct rte_eth_dev *eth_dev; const char *name = rte_vdev_device_name(dev); @@ -1422,6 +1425,16 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT; } + if (rte_kvargs_count(kvlist, ETH_VHOST_VIRTIO_TRANSPORT) == 1) { + ret = rte_kvargs_process(kvlist, ETH_VHOST_VIRTIO_TRANSPORT, + &open_int, &virtio_transport); + if (ret < 0) + goto out_free; + + if (virtio_transport) + flags |= RTE_VHOST_USER_VIRTIO_TRANSPORT; + } + if (dev->device.numa_node == SOCKET_ID_ANY) dev->device.numa_node = rte_socket_id(); From patchwork Wed Jun 19 15:14:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54990 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CBD861C43E; Wed, 19 Jun 2019 17:16:44 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 5154C1C3A0 for ; Wed, 19 Jun 2019 17:15:47 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 21CD018201C; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id BD9052B2; Wed, 19 Jun 2019 18:15:46 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:49 +0300 Message-Id: <1560957293-17294-25-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 24/28] examples/vhost_scsi: add --socket-file argument X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Stefan Hajnoczi The default filename built into examples/vhost_scsi may not be convenient. Allow the user to specify the full UNIX domain socket path on the command-line. Signed-off-by: Stefan Hajnoczi --- examples/vhost_scsi/vhost_scsi.c | 93 ++++++++++++++++++++++++++++++++-------- 1 file changed, 75 insertions(+), 18 deletions(-) diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c index 513af0c..d2d02fd 100644 --- a/examples/vhost_scsi/vhost_scsi.c +++ b/examples/vhost_scsi/vhost_scsi.c @@ -2,6 +2,7 @@ * Copyright(c) 2010-2017 Intel Corporation */ +#include #include #include #include @@ -402,26 +403,10 @@ static const struct vhost_device_ops vhost_scsi_device_ops = { }; static struct vhost_scsi_ctrlr * -vhost_scsi_ctrlr_construct(const char *ctrlr_name) +vhost_scsi_ctrlr_construct(void) { int ret; struct vhost_scsi_ctrlr *ctrlr; - char *path; - char cwd[PATH_MAX]; - - /* always use current directory */ - path = getcwd(cwd, PATH_MAX); - if (!path) { - fprintf(stderr, "Cannot get current working directory\n"); - return NULL; - } - snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name); - - if (access(dev_pathname, F_OK) != -1) { - if (unlink(dev_pathname) != 0) - rte_exit(EXIT_FAILURE, "Cannot remove %s.\n", - dev_pathname); - } if (rte_vhost_driver_register(dev_pathname, 0) != 0) { fprintf(stderr, "socket %s already exists\n", dev_pathname); @@ -455,6 +440,71 @@ signal_handler(__rte_unused int signum) exit(0); } +static void +set_dev_pathname(const char *path) +{ + if (dev_pathname[0]) + rte_exit(EXIT_FAILURE, "--socket-file can only be given once.\n"); + + snprintf(dev_pathname, sizeof(dev_pathname), "%s", path); +} + +static void +vhost_scsi_usage(const char *prgname) +{ + fprintf(stderr, "%s [EAL options] --\n" + " --socket-file PATH: The path of the UNIX domain socket\n", + prgname); +} + +static void +vhost_scsi_parse_args(int argc, char **argv) +{ + int opt; + int option_index; + const char *prgname = argv[0]; + static struct option long_option[] = { + {"socket-file", required_argument, NULL, 0}, + {NULL, 0, 0, 0}, + }; + + while ((opt = getopt_long(argc, argv, "", long_option, + &option_index)) != -1) { + switch (opt) { + case 0: + if (!strcmp(long_option[option_index].name, + "socket-file")) { + set_dev_pathname(optarg); + } + break; + default: + vhost_scsi_usage(prgname); + rte_exit(EXIT_FAILURE, "Invalid argument\n"); + } + } +} + +static void +vhost_scsi_set_default_dev_pathname(void) +{ + char *path; + char cwd[PATH_MAX]; + + /* always use current directory */ + path = getcwd(cwd, PATH_MAX); + if (!path) { + rte_exit(EXIT_FAILURE, + "Cannot get current working directory\n"); + } + snprintf(dev_pathname, sizeof(dev_pathname), "%s/vhost.socket", path); + + if (access(dev_pathname, F_OK) != -1) { + if (unlink(dev_pathname) != 0) + rte_exit(EXIT_FAILURE, "Cannot remove %s.\n", + dev_pathname); + } +} + int main(int argc, char *argv[]) { int ret; @@ -465,8 +515,15 @@ int main(int argc, char *argv[]) ret = rte_eal_init(argc, argv); if (ret < 0) rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); + argc -= ret; + argv += ret; + + vhost_scsi_parse_args(argc, argv); + + if (!dev_pathname[0]) + vhost_scsi_set_default_dev_pathname(); - g_vhost_ctrlr = vhost_scsi_ctrlr_construct("vhost.socket"); + g_vhost_ctrlr = vhost_scsi_ctrlr_construct(); if (g_vhost_ctrlr == NULL) { fprintf(stderr, "Construct vhost scsi controller failed\n"); return 0; From patchwork Wed Jun 19 15:14:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54992 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E759B1C44A; Wed, 19 Jun 2019 17:16:47 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id B09FC1C3A1 for ; Wed, 19 Jun 2019 17:15:47 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 8FF3118201D; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 1219C32C; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:50 +0300 Message-Id: <1560957293-17294-26-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 25/28] examples/vhost_scsi: add virtio-vhost-user support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Stefan Hajnoczi The new --virtio-vhost-user-pci command-line argument uses virtio-vhost-user instead of the default AF_UNIX transport. Signed-off-by: Stefan Hajnoczi --- examples/vhost_scsi/vhost_scsi.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c index d2d02fd..020f4a0 100644 --- a/examples/vhost_scsi/vhost_scsi.c +++ b/examples/vhost_scsi/vhost_scsi.c @@ -27,6 +27,7 @@ /* Path to folder where character device will be created. Can be set by user. */ static char dev_pathname[PATH_MAX] = ""; +static uint64_t dev_flags; /* for rte_vhost_driver_register() */ static struct vhost_scsi_ctrlr *g_vhost_ctrlr; static int g_should_stop; @@ -408,7 +409,7 @@ vhost_scsi_ctrlr_construct(void) int ret; struct vhost_scsi_ctrlr *ctrlr; - if (rte_vhost_driver_register(dev_pathname, 0) != 0) { + if (rte_vhost_driver_register(dev_pathname, dev_flags) != 0) { fprintf(stderr, "socket %s already exists\n", dev_pathname); return NULL; } @@ -444,7 +445,8 @@ static void set_dev_pathname(const char *path) { if (dev_pathname[0]) - rte_exit(EXIT_FAILURE, "--socket-file can only be given once.\n"); + rte_exit(EXIT_FAILURE, "Only one of --socket-file or " + "--virtio-vhost-user-pci can be given.\n"); snprintf(dev_pathname, sizeof(dev_pathname), "%s", path); } @@ -453,7 +455,8 @@ static void vhost_scsi_usage(const char *prgname) { fprintf(stderr, "%s [EAL options] --\n" - " --socket-file PATH: The path of the UNIX domain socket\n", + " --socket-file PATH: The path of the UNIX domain socket\n" + " --virtio-vhost-user-pci DomBDF: PCI adapter address\n", prgname); } @@ -465,6 +468,7 @@ vhost_scsi_parse_args(int argc, char **argv) const char *prgname = argv[0]; static struct option long_option[] = { {"socket-file", required_argument, NULL, 0}, + {"virtio-vhost-user-pci", required_argument, NULL, 0}, {NULL, 0, 0, 0}, }; @@ -475,6 +479,10 @@ vhost_scsi_parse_args(int argc, char **argv) if (!strcmp(long_option[option_index].name, "socket-file")) { set_dev_pathname(optarg); + } else if (!strcmp(long_option[option_index].name, + "virtio-vhost-user-pci")) { + set_dev_pathname(optarg); + dev_flags = RTE_VHOST_USER_VIRTIO_TRANSPORT; } break; default: From patchwork Wed Jun 19 15:14:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54991 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 55F251C444; Wed, 19 Jun 2019 17:16:46 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id DF2FF1C3A4 for ; Wed, 19 Jun 2019 17:15:47 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id BE5A618201E; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id 5D31A2B2; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:51 +0300 Message-Id: <1560957293-17294-27-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 26/28] mk: link apps with virtio-vhost-user driver X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Export the virtio-vhost-user transport to all the apps. Support using the virtio-vhost-user transport with shared libraries by unconditionally linking librte_virtio_vhost_user.so with the apps. Signed-off-by: Nikos Dragazis --- mk/rte.app.mk | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 7c9b4b5..77e02d1 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -132,6 +132,12 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += -lrte_bus_fslmc endif +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) +_LDLIBS-y += --no-as-needed +_LDLIBS-y += -lrte_virtio_vhost_user +_LDLIBS-y += --as-needed +endif + ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n) # plugins (link only if static libraries) From patchwork Wed Jun 19 15:14:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54993 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 247F71C450; Wed, 19 Jun 2019 17:16:49 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 19D101C3A6 for ; Wed, 19 Jun 2019 17:15:48 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id EE46818201F; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id A51C4394; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:52 +0300 Message-Id: <1560957293-17294-28-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 27/28] config: add option for the virtio-vhost-user transport X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a configuration option for compiling and linking with the virtio-vhost-user library. Signed-off-by: Nikos Dragazis --- config/common_base | 6 ++++++ config/common_linux | 1 + drivers/Makefile | 5 ++++- mk/rte.app.mk | 2 +- 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/config/common_base b/config/common_base index 6f19ad5..2559d69 100644 --- a/config/common_base +++ b/config/common_base @@ -963,6 +963,12 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n CONFIG_RTE_LIBRTE_PMD_VHOST=n # +# Compile virtio-vhost-user library +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled. +# +CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER=n + +# # Compile IFC driver # To compile, CONFIG_RTE_LIBRTE_VHOST and CONFIG_RTE_EAL_VFIO # should be enabled. diff --git a/config/common_linux b/config/common_linux index 7533427..7e4279f 100644 --- a/config/common_linux +++ b/config/common_linux @@ -17,6 +17,7 @@ CONFIG_RTE_LIBRTE_VHOST=y CONFIG_RTE_LIBRTE_VHOST_NUMA=y CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n CONFIG_RTE_LIBRTE_PMD_VHOST=y +CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER=y CONFIG_RTE_LIBRTE_IFC_PMD=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y diff --git a/drivers/Makefile b/drivers/Makefile index 72e2579..971dc6c 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -22,7 +22,10 @@ DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event DEPDIRS-event := common bus mempool net DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += raw DEPDIRS-raw := common bus mempool net event -DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio_vhost_user + +ifeq ($(CONFIG_RTE_LIBRTE_VHOST)$(CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER),yy) +DIRS-y += virtio_vhost_user DEPDIRS-virtio_vhost_user := bus +endif include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 77e02d1..8dd2922 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -132,7 +132,7 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += -lrte_bus_fslmc endif -ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) +ifeq ($(CONFIG_RTE_LIBRTE_VHOST)$(CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER),yy) _LDLIBS-y += --no-as-needed _LDLIBS-y += -lrte_virtio_vhost_user _LDLIBS-y += --as-needed From patchwork Wed Jun 19 15:14:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Dragazis X-Patchwork-Id: 54994 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C090D1C455; Wed, 19 Jun 2019 17:16:50 +0200 (CEST) Received: from mx0.arrikto.com (mx0.arrikto.com [212.71.252.59]) by dpdk.org (Postfix) with ESMTP id 706381C3A8 for ; Wed, 19 Jun 2019 17:15:48 +0200 (CEST) Received: from troi.prod.arr (mail.arr [10.99.0.5]) by mx0.arrikto.com (Postfix) with ESMTP id 4F428182020; Wed, 19 Jun 2019 18:15:48 +0300 (EEST) Received: from localhost.localdomain (unknown [10.89.50.133]) by troi.prod.arr (Postfix) with ESMTPSA id DCDD92B2; Wed, 19 Jun 2019 18:15:47 +0300 (EEST) From: Nikos Dragazis To: dev@dpdk.org Cc: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Stefan Hajnoczi , Wei Wang , Stojaczyk Dariusz , Vangelis Koukis Date: Wed, 19 Jun 2019 18:14:53 +0300 Message-Id: <1560957293-17294-29-git-send-email-ndragazis@arrikto.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> References: <1560957293-17294-1-git-send-email-ndragazis@arrikto.com> Subject: [dpdk-dev] [PATCH 28/28] usertools: add virtio-vhost-user devices to dpdk-devbind.py X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The virtio-vhost-user PCI adapter is not detected in any existing group of devices supported by dpdk-devbind.py. Add a new "Others" group for miscellaneous devices like this one. Signed-off-by: Nikos Dragazis Signed-off-by: Stefan Hajnoczi --- usertools/dpdk-devbind.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py index 9e79f0d..642b182 100755 --- a/usertools/dpdk-devbind.py +++ b/usertools/dpdk-devbind.py @@ -30,6 +30,8 @@ 'SVendor': None, 'SDevice': None} avp_vnic = {'Class': '05', 'Vendor': '1af4', 'Device': '1110', 'SVendor': None, 'SDevice': None} +virtio_vhost_user = {'Class': '00', 'Vendor': '1af4', 'Device': '1017,1058', + 'SVendor': None, 'SDevice': None} octeontx2_sso = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f9,a0fa', 'SVendor': None, 'SDevice': None} @@ -41,6 +43,7 @@ eventdev_devices = [cavium_sso, cavium_tim, octeontx2_sso] mempool_devices = [cavium_fpa, octeontx2_npa] compress_devices = [cavium_zip] +other_devices = [virtio_vhost_user] # global dict ethernet devices present. Dictionary indexed by PCI address. # Each device within this is itself a dictionary of device properties @@ -595,6 +598,8 @@ def show_status(): if status_dev == "compress" or status_dev == "all": show_device_status(compress_devices , "Compress") + if status_dev == 'other' or status_dev == 'all': + show_device_status(other_devices, "Other") def parse_args(): '''Parses the command-line arguments given by the user and takes the @@ -670,6 +675,7 @@ def do_arg_actions(): get_device_details(eventdev_devices) get_device_details(mempool_devices) get_device_details(compress_devices) + get_device_details(other_devices) show_status() @@ -690,6 +696,7 @@ def main(): get_device_details(eventdev_devices) get_device_details(mempool_devices) get_device_details(compress_devices) + get_device_details(other_devices) do_arg_actions() if __name__ == "__main__":